Compare commits
10 Commits
833ba094fc
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
c3e5a7135a | ||
|
|
54a44733aa | ||
|
|
9809fe7926 | ||
|
|
6063b35336 | ||
|
|
f14197b6b9 | ||
|
|
47250b5781 | ||
|
|
ca19976ff7 | ||
|
|
c1135eb8a0 | ||
|
|
2c3cbf1155 | ||
|
|
3e1a35690d |
23
.gitignore
vendored
23
.gitignore
vendored
@@ -44,3 +44,26 @@ htmlcov/
|
||||
# Data files (cache)
|
||||
/data/
|
||||
*.log
|
||||
|
||||
# AI/Tool config
|
||||
.claude/
|
||||
.serena/
|
||||
.cursor/
|
||||
.copilot/
|
||||
.codex/
|
||||
.aider*
|
||||
.continue/
|
||||
.codeium/
|
||||
.tabnine/
|
||||
CLAUDE.md
|
||||
GEMINI.md
|
||||
AGENTS.md
|
||||
.cursorrules
|
||||
.windsurfrules
|
||||
copilot-instructions.md
|
||||
|
||||
# Local session files
|
||||
*.zip
|
||||
*.txt
|
||||
C--Users-btoll-claude-dispatcharr-Stream-Mapparr/
|
||||
docs/superpowers/
|
||||
|
||||
245
README.md
245
README.md
@@ -1,4 +1,4 @@
|
||||
# Stream Mapparr
|
||||
# Stream-Mapparr
|
||||
[](https://github.com/Dispatcharr/Dispatcharr)
|
||||
[](https://deepwiki.com/PiratesIRC/Stream-Mapparr)
|
||||
|
||||
@@ -10,187 +10,146 @@
|
||||

|
||||

|
||||
|
||||
A Dispatcharr plugin that automatically matches and assigns streams to channels based on advanced fuzzy matching, quality prioritization, and OTA callsign recognition.
|
||||
A Dispatcharr plugin that automatically matches and assigns streams to channels using fuzzy matching, quality prioritization, and OTA callsign recognition.
|
||||
|
||||
## ⚠️ Important: Backup Your Database
|
||||
Before installing or using this plugin, it is **highly recommended** that you create a backup of your Dispatcharr database. This plugin makes significant changes to your channel and stream assignments.
|
||||
## Backup Your Database
|
||||
|
||||
**[Click here for instructions on how to back up your database.](https://dispatcharr.github.io/Dispatcharr-Docs/user-guide/?h=backup#backup-restore)**
|
||||
Before installing or using this plugin, create a backup of your Dispatcharr database. This plugin modifies channel and stream assignments.
|
||||
|
||||
**[Backup instructions](https://dispatcharr.github.io/Dispatcharr-Docs/user-guide/?h=backup#backup-restore)**
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Important: Background Operations & Monitoring
|
||||
**Please Read Carefully:** This plugin uses **Background Threading** to prevent browser timeouts during long operations. This changes how you interact with the plugin:
|
||||
## Background Operations
|
||||
|
||||
1. **Instant Notification:** The frontend will show a green "✅ Started in background" notification immediately. This does **NOT** mean the task is finished.
|
||||
2. **Immediate Re-enabling:** Buttons re-enable instantly so the UI remains responsive. Do not click the button again; the task is active in the background.
|
||||
3. **Real-Time Monitoring:** To see progress, ETAs, and completion status, you **must** check your Docker logs.
|
||||
This plugin uses background threading to prevent browser timeouts during long operations.
|
||||
|
||||
**To monitor operations, run this in your terminal:**
|
||||
`docker logs -n 20 -f Dispatcharr | grep plugins`
|
||||
or
|
||||
`docker logs -f dispatcharr | grep Stream-Mapparr`
|
||||
- The frontend shows a green "Started in background" notification immediately — this does **not** mean the task is finished
|
||||
- Buttons re-enable instantly. Do not click again; the task is running
|
||||
- Monitor progress and completion in Docker logs:
|
||||
|
||||
Look for "**✅ [ACTION] COMPLETED**" or "**📄 CSV EXPORT CREATED**" to know when the process is finished.
|
||||
```bash
|
||||
docker logs -f dispatcharr | grep Stream-Mapparr
|
||||
```
|
||||
|
||||
Look for `COMPLETED` or `CSV EXPORT CREATED` to know when the process is finished.
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
### Automation & Scheduling
|
||||
* **Background Processing Engine**: Operations run in threads to prevent HTTP timeouts and "broken pipe" errors on large datasets.
|
||||
* **Integrated Scheduler**: Configure automated daily runs directly in settings with custom Timezone and HHMM time support.
|
||||
* **Smart Rate Limiting**: Configurable API throttling with exponential backoff to prevent server overload and 429/5xx errors.
|
||||
* **Operation Lock System**: Prevents concurrent tasks from running and allows manual clearing of stuck locks after system restarts.
|
||||
### Matching
|
||||
- **Multi-stage fuzzy matching**: Exact, substring, and token-sort matching with configurable sensitivity (Relaxed/Normal/Strict/Exact)
|
||||
- **US OTA callsign matching**: Dedicated action for matching US broadcast channels by callsign against authoritative database (5,900+ callsigns)
|
||||
- **Multi-country channel databases**: US, UK, CA, AU, BR, DE, ES, FR, IN, MX, NL
|
||||
- **Normalization cache**: Pre-normalizes stream names once for batch matching performance
|
||||
- **rapidfuzz acceleration**: Uses C-accelerated Levenshtein when available (20-50x faster), with pure Python early-termination fallback
|
||||
|
||||
### Matching & Accuracy
|
||||
* **Advanced Fuzzy Matching**: Automatically finds and assigns streams using an advanced matching engine.
|
||||
* **Strict Numeric Matching**: Prevents false positives between numbered channels (e.g., ensuring "Sports 1" does not match "Sports 2").
|
||||
* **US OTA Specialized Matching**: Dedicated action to match US broadcast channels by callsign using authoritative databases.
|
||||
* **Multi-Country Support**: Support for multiple regional databases (US, UK, CA, NL, etc.) to refine matching accuracy.
|
||||
* **Customizable Ignore Tags**: Filter out unwanted keywords like `[Backup]` or `West` during the matching process.
|
||||
### Quality & Streams
|
||||
- **Quality-based stream sorting**: 4K > UHD > FHD > HD > SD, using probed resolution (from IPTV Checker) or name-based detection
|
||||
- **M3U source prioritization**: Prefer streams from specific M3U providers
|
||||
- **Dead stream filtering**: Skip streams with 0x0 resolution (requires IPTV Checker)
|
||||
- **Auto-deduplication**: Removes duplicate stream names during assignment
|
||||
|
||||
### Quality & Stream Health
|
||||
* **IPTV Checker Integration**: Automatically filters out "dead" streams with 0x0 resolution using metadata from the IPTV Checker plugin.
|
||||
* **M3U Source Prioritization**: Prioritize streams from specific M3U providers (e.g., premium sources over backup sources) regardless of quality metrics.
|
||||
* **Resolution & FPS Ranking**: Automatically sorts alternate streams by physical quality (Resolution/FPS) to ensure the best source is primary.
|
||||
* **Auto-Deduplication**: Automatically removes duplicate stream names during assignment to keep channel lists clean.
|
||||
### Automation
|
||||
- **Built-in scheduler**: Configure daily runs with timezone and HHMM time support
|
||||
- **Rate limiting**: Configurable throttling (None/Low/Medium/High)
|
||||
- **Operation lock**: Prevents concurrent tasks; auto-expires after 10 minutes
|
||||
- **Dry run mode**: Preview results with CSV export without making changes
|
||||
|
||||
### Reporting & Visibility
|
||||
* **Live Progress Tracking**: Real-time progress engine providing accurate ETAs and minute-by-minute reporting in the logs.
|
||||
* **Intelligent CSV Analysis**: Reports analyze why channels didn't match and provide specific recommendations (e.g., "Add 'UK' to ignore tags").
|
||||
* **Dry Run Mode**: Global toggle to preview match results and export reports without making any database changes.
|
||||
* **Channel Visibility Management**: Automatically enables channels with valid streams and disables those without assignments.
|
||||
### Reporting
|
||||
- **CSV exports**: Detailed reports with threshold recommendations and token mismatch analysis
|
||||
- **Channel visibility management**: Auto-enable channels with streams, disable those without
|
||||
|
||||
## Requirements
|
||||
* Active Dispatcharr installation
|
||||
* Admin username and password for API access
|
||||
* **A Channels Profile other than "All"**
|
||||
* Multiple streams of the same channel are available in your setup
|
||||
|
||||
- Dispatcharr v0.20.0+
|
||||
- A Channel Profile (other than "All")
|
||||
|
||||
## Installation
|
||||
1. Log in to Dispatcharr's web UI.
|
||||
2. Navigate to **Plugins**.
|
||||
3. Click **Import Plugin** and upload the plugin zip file.
|
||||
4. Enable the plugin after installation.
|
||||
|
||||
## Updating the Plugin
|
||||
To update Channel Mapparr from a previous version:
|
||||
1. Navigate to **Plugins** in Dispatcharr
|
||||
2. Click **Import Plugin** and upload the plugin zip
|
||||
3. Enable the plugin
|
||||
|
||||
### 1. Remove Old Version
|
||||
1. Navigate to **Plugins** in Dispatcharr.
|
||||
2. Click the trash icon next to the old Stream Mapparr plugin.
|
||||
3. Confirm deletion.
|
||||
|
||||
### 2. Restart Dispatcharr
|
||||
1. Log out of Dispatcharr.
|
||||
2. Restart the Docker container:
|
||||
```bash
|
||||
docker restart dispatcharr
|
||||
```
|
||||
|
||||
### 3. Install New Version
|
||||
1. Log back into Dispatcharr.
|
||||
2. Navigate to **Plugins**.
|
||||
3. Click **Import Plugin** and upload the new plugin zip file.
|
||||
4. Enable the plugin after installation.
|
||||
|
||||
### 4. Verify Installation
|
||||
1. Check that the new version number appears in the plugin list.
|
||||
2. Reconfigure your settings if needed.
|
||||
3. Run **Load/Process Channels** to test the update.
|
||||
|
||||
## Operations & Monitoring Guide
|
||||
|
||||
Because this plugin processes potentially thousands of streams, operations can take 5-15+ minutes.
|
||||
|
||||
### How to Run an Action
|
||||
1. Open a terminal connected to your server.
|
||||
2. Run `docker logs -f dispatcharr` to watch the logs.
|
||||
3. In the browser, click an action button (e.g., "Add Stream(s) to Channels").
|
||||
4. You will see a notification: **"✅ Operation started in background"**.
|
||||
5. **Wait.** Do not close the browser or restart the container.
|
||||
6. Watch the terminal logs for updates like:
|
||||
* `[Stream-Mapparr] Progress: 20%...`
|
||||
* `[Stream-Mapparr] Progress: 50%...`
|
||||
7. The operation is finished when you see:
|
||||
* `[Stream-Mapparr] ✅ ADD STREAMS TO CHANNELS COMPLETED`
|
||||
|
||||
### Why can't I see progress in the UI?
|
||||
Dispatcharr's plugin system currently supports synchronous HTTP responses. Because we moved to asynchronous background threads (to fix timeouts), the frontend receives the "Started" signal but cannot listen for the "Completed" signal without core modifications to Dispatcharr. We prioritize **successful completion** over UI feedback.
|
||||
|
||||
## Scheduling (New in v0.6.0)
|
||||
|
||||
You can now schedule Stream-Mapparr to run automatically without setting up external Celery tasks.
|
||||
|
||||
1. Go to **Plugin Settings**.
|
||||
2. **Timezone**: Select your local timezone (e.g., `US/Central`, `Europe/London`).
|
||||
3. **Scheduled Run Times**: Enter times in 24-hour format, comma-separated (e.g., `0400, 1600`).
|
||||
* Example: `0400` runs at 4:00 AM.
|
||||
* Example: `0400,1600` runs at 4:00 AM and 4:00 PM.
|
||||
4. **Enable CSV Export**: Check this to generate a report every time the scheduler runs.
|
||||
5. Click **Update Schedule**.
|
||||
|
||||
*Note: The scheduler runs in a background thread. If you restart the Dispatcharr container, the scheduler restarts automatically.*
|
||||
|
||||
## Settings Reference
|
||||
## Settings
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|:---|:---|:---|:---|
|
||||
| **Overwrite Existing Streams** | `boolean` | True | If enabled, removes all existing streams and replaces with matched streams. |
|
||||
| **Fuzzy Match Threshold** | `number` | 85 | Minimum similarity score (0-100). Higher = stricter matching. |
|
||||
| **Dispatcharr URL/Creds** | `string` | - | Connection details for the API. |
|
||||
| **Profile Name** | `string` | - | Name of the Channel Profile to process. |
|
||||
| **Channel Groups** | `string` | - | Specific groups to process (empty = all). |
|
||||
| **Rate Limiting** *(v0.6.0)* | `select` | Medium | Controls API speed. Use 'High' if experiencing timeouts/errors. |
|
||||
| **Timezone** *(v0.6.0)* | `select` | US/Central | Timezone for scheduled runs. |
|
||||
| **Scheduled Run Times** | `string` | - | Times to run automatically (HHMM format). |
|
||||
| **Visible Channel Limit** | `number` | 1 | How many duplicate channels to enable per group. |
|
||||
| **Ignore Tags** | `string` | - | Tags to strip before matching (e.g., `[Dead], (Backup)`). |
|
||||
| **Overwrite Existing Streams** | boolean | True | Replace existing streams vs append-only |
|
||||
| **Match Sensitivity** | select | Normal (80) | Relaxed (70), Normal (80), Strict (90), Exact (95) |
|
||||
| **Channel Profile** | select | - | Profile to process channels from (dropdown from DB) |
|
||||
| **Channel Groups** | string | (all) | Specific groups to process, comma-separated |
|
||||
| **Stream Groups** | string | (all) | Specific stream groups to use, comma-separated |
|
||||
| **M3U Sources** | string | (all) | Specific M3U sources, comma-separated (order = priority) |
|
||||
| **Prioritize Quality** | boolean | False | Sort by quality first, then M3U source priority |
|
||||
| **Custom Ignore Tags** | string | (none) | Tags to strip before matching (e.g., `[Dead], (Backup)`) |
|
||||
| **Tag Handling** | select | Strip All | Strip All / Keep Regional / Keep All |
|
||||
| **Channel Database** | select | US | Channel database for callsign and name matching |
|
||||
| **Visible Channel Limit** | number | 1 | Channels per group to enable and assign streams |
|
||||
| **Rate Limiting** | select | None | None / Low / Medium / High |
|
||||
| **Timezone** | select | US/Central | Timezone for scheduled runs |
|
||||
| **Filter Dead Streams** | boolean | False | Skip 0x0 resolution streams (requires IPTV Checker) |
|
||||
| **Scheduled Run Times** | string | (none) | HHMM times, comma-separated (e.g., `0400,1600`) |
|
||||
| **Dry Run Mode** | boolean | False | Preview without making database changes |
|
||||
|
||||
## Channel Databases
|
||||
## Actions
|
||||
|
||||
Stream-Mapparr uses `*_channels.json` files to improve OTA and cable channel matching.
|
||||
| Action | Description |
|
||||
|:---|:---|
|
||||
| **Validate Settings** | Check configuration, profiles, groups, databases |
|
||||
| **Load/Process Channels** | Load channel and stream data from database |
|
||||
| **Preview Changes** | Dry-run with CSV export |
|
||||
| **Match & Assign Streams** | Fuzzy match and assign streams to channels |
|
||||
| **Match US OTA Only** | Match US broadcast channels by callsign |
|
||||
| **Sort Alternate Streams** | Re-sort existing streams by quality |
|
||||
| **Manage Channel Visibility** | Enable/disable channels based on stream count |
|
||||
| **Clear CSV Exports** | Delete all plugin CSV files |
|
||||
|
||||
1. Navigate to **Settings**.
|
||||
2. Scroll to **Channel Databases**.
|
||||
3. Check the boxes for the countries you want to enable (e.g., `Enable US`, `Enable UK`).
|
||||
4. If you need a custom database, create a JSON file (e.g., `CA_channels.json`) and place it in `/data/plugins/stream_mapparr/`.
|
||||
## Scheduling
|
||||
|
||||
## CSV Reports & Recommendations
|
||||
1. Set **Timezone** (e.g., `US/Central`)
|
||||
2. Set **Scheduled Run Times** in 24-hour format (e.g., `0400,1600` for 4 AM and 4 PM)
|
||||
3. Enable **CSV Export** if desired
|
||||
4. Click **Update Schedule**
|
||||
|
||||
When running **Preview Changes** or **Add Streams**, the plugin generates a CSV file in `/data/exports/`.
|
||||
The scheduler runs in a background thread and restarts automatically with the container.
|
||||
|
||||
**New in v0.6.0**: Open the CSV file in a text editor or spreadsheet. The header contains detailed analysis:
|
||||
* **Threshold Recommendations**: "3 additional streams available at lower thresholds. Consider lowering to 75."
|
||||
* **Token Mismatches**: "Channel 'BBC One' vs Stream 'UK BBC One'. Mismatched token: 'UK'. Add 'UK' to Ignore Tags."
|
||||
## CSV Reports
|
||||
|
||||
Preview and scheduled exports are saved to `/data/exports/`. Reports include:
|
||||
- Threshold recommendations ("3 additional streams available at lower thresholds")
|
||||
- Token mismatch analysis ("Add 'UK' to Ignore Tags")
|
||||
- Match type breakdown (exact, substring, fuzzy)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Buttons are clickable but nothing happens?**
|
||||
Check the Docker logs. If an operation is already running, the logs will say: `❌ Cannot start... Another operation is already running`. Wait for the current operation to finish (lock expires after 10 mins).
|
||||
**Operation seems stuck?**
|
||||
Check Docker logs. If another operation is running, wait for completion (lock auto-expires after 10 min) or click **Clear Operation Lock**.
|
||||
|
||||
**"API token expired" in logs**
|
||||
The plugin now automatically attempts to refresh tokens. If it fails, check your Username/Password in settings.
|
||||
**No matches found?**
|
||||
- Lower Match Sensitivity from Strict to Normal or Relaxed
|
||||
- For US OTA channels, use **Match US OTA Only** instead of fuzzy matching
|
||||
- Check that the correct Channel Database is selected
|
||||
|
||||
**System/API is slow during scanning**
|
||||
Change the **Rate Limiting** setting to "High (Slow)". This adds a delay between API calls to reduce load on your server.
|
||||
**System slow during scanning?**
|
||||
Set Rate Limiting to Medium or High.
|
||||
|
||||
**How to stop a running operation?**
|
||||
Restart the Dispatcharr container: `docker restart dispatcharr`.
|
||||
|
||||
**Cleaning up old tasks**
|
||||
If you upgraded from an older version, run the **"Cleanup Orphaned Tasks"** action to remove old Celery schedules that might conflict with the new internal scheduler.
|
||||
|
||||
## Debugging Commands
|
||||
```bash
|
||||
# Monitor plugin activity (The most important command!)
|
||||
docker logs -f dispatcharr | grep "Stream-Mapparr"
|
||||
# Monitor plugin activity
|
||||
docker logs -f dispatcharr | grep Stream-Mapparr
|
||||
|
||||
# Check generated CSVs
|
||||
# Check CSV exports
|
||||
docker exec dispatcharr ls -lh /data/exports/
|
||||
|
||||
# Check plugin files
|
||||
docker exec dispatcharr ls -la /data/plugins/stream_mapparr/
|
||||
```bash
|
||||
docker logs -f dispatcharr | grep "Stream-Mapparr"
|
||||
docker exec dispatcharr ls -la /data/plugins/stream-mapparr/
|
||||
```
|
||||
|
||||
## Changelog
|
||||
|
||||
See [CHANGELOG.md](Stream-Mapparr/CHANGELOG.md) for full version history.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
223
Stream-Mapparr/CHANGELOG.md
Normal file
223
Stream-Mapparr/CHANGELOG.md
Normal file
@@ -0,0 +1,223 @@
|
||||
# Stream-Mapparr CHANGELOG
|
||||
|
||||
## v0.9.0 (April 4, 2026)
|
||||
**Type**: Performance & UI Enhancement Release
|
||||
|
||||
### Performance Optimizations (ported from Linearr plugin)
|
||||
|
||||
**Levenshtein Acceleration**:
|
||||
- Uses `rapidfuzz` C extension when available (20-50x faster)
|
||||
- Pure Python fallback with early termination via new `threshold` parameter
|
||||
- Combined effect: matching 94 channels x 3,362 streams in ~2s (was ~5 minutes)
|
||||
|
||||
**Normalization Cache**:
|
||||
- Added `precompute_normalizations()` to cache stream name normalization once before matching loops
|
||||
- `fuzzy_match()` and `find_best_match()` use cached results via `_get_cached_norm()` / `_get_cached_processed()`
|
||||
- Eliminates redundant `normalize_name()` calls across all 3 matching stages
|
||||
- Cache fallback uses stored ignore flags for consistency
|
||||
|
||||
**ETA Calculation**:
|
||||
- Updated `ESTIMATED_SECONDS_PER_ITEM` from 7.73s to 0.1s to reflect actual performance
|
||||
|
||||
### UI Simplification
|
||||
|
||||
**Profile Name**: Free-text input replaced with dynamic dropdown populated from database
|
||||
|
||||
**Match Sensitivity**: Numeric threshold (0-100) replaced with named presets:
|
||||
- Relaxed (70), Normal (80), Strict (90), Exact (95)
|
||||
|
||||
**Tag Handling**: Four separate boolean toggles consolidated into single dropdown:
|
||||
- Strip All Tags (default), Keep Regional Tags, Keep All Tags
|
||||
|
||||
**Channel Database**: Per-country boolean toggles consolidated into single dropdown:
|
||||
- None, individual country, or All databases
|
||||
|
||||
All changes are backward compatible — legacy field IDs still work as fallbacks.
|
||||
|
||||
### Files Modified
|
||||
- `fuzzy_matcher.py` v26.095.0100: Cache system, rapidfuzz support, early termination
|
||||
- `plugin.py` v0.9.0: Precompute calls, UI fields, settings resolvers
|
||||
- `plugin.json` v0.9.0: Updated field definitions
|
||||
|
||||
### Version Compatibility
|
||||
| Plugin Version | Required fuzzy_matcher |
|
||||
|---------------|------------------------|
|
||||
| 0.9.0 | 26.095.0100+ |
|
||||
| 0.8.0b | 26.018.0100+ |
|
||||
| 0.7.4a | 26.018.0100+ |
|
||||
|
||||
---
|
||||
|
||||
## v0.8.0b (March 11, 2026)
|
||||
**Type**: Bugfix Release
|
||||
**Severity**: HIGH (ORM Migration)
|
||||
|
||||
### Bug Fixed: Invalid `group_title` Field on Stream Model
|
||||
|
||||
**Issue**: After migrating from HTTP API to Django ORM in v0.8.0a, the plugin used `group_title` as a field name on the Stream model. This field does not exist — the correct field is `channel_group` (a ForeignKey to `ChannelGroup`). Any action that loads streams (Add Streams, Preview Changes, Load/Process Channels) would fail with:
|
||||
```
|
||||
Cannot resolve keyword 'group_title' into field.
|
||||
```
|
||||
|
||||
**Root Cause**: During the ORM migration, the old API response field name `group_title` was carried over into ORM `.values()` queries, but the Django model uses `channel_group` (FK) instead.
|
||||
|
||||
**Fix**: Replaced `group_title` with `channel_group__name` (Django FK traversal) in two locations:
|
||||
- `_get_all_streams()`: Stream data query
|
||||
- `_get_stream_groups()`: Distinct stream group name query
|
||||
|
||||
**Files Modified**:
|
||||
- `plugin.py` v0.8.0b: Fixed ORM field references
|
||||
- `plugin.json` v0.8.0b: Version bump
|
||||
|
||||
---
|
||||
|
||||
## v0.7.4a (January 18, 2026)
|
||||
**Type**: Critical Bugfix Release
|
||||
**Severity**: HIGH (Stream Matching)
|
||||
|
||||
### Bug Fixed: 4K/8K Quality Tags Not Removed During Normalization
|
||||
|
||||
**Issue**: Streams with "4K" or "8K" quality suffixes were not matching correctly because the space normalization step was splitting "4K" into "4 K" before quality patterns could remove it.
|
||||
|
||||
**Example**:
|
||||
- Stream: `┃NL┃ RTL 4 4K`
|
||||
- Expected: Tag removed → "RTL 4" → matches channel
|
||||
- Actual: "4K" split to "4 K" → patterns fail → "RTL 4 4 K" → no match
|
||||
|
||||
**Root Cause**: The digit-to-letter space normalization (`re.sub(r'(\d)([a-zA-Z])', r'\1 \2', name)`) transformed "4K" into "4 K" before quality patterns could match and remove "4K".
|
||||
|
||||
**Pattern Observed**:
|
||||
| Quality Suffix | Affected? | Reason |
|
||||
|---------------|-----------|--------|
|
||||
| HD, SD, FHD, UHD | No | All letters, not split |
|
||||
| 4K, 8K | **Yes** | Digit+letter split to "4 K", "8 K" |
|
||||
|
||||
**Fix**: Quality patterns are now applied BEFORE space normalization to prevent "4K"/"8K" from being broken.
|
||||
|
||||
**Files Modified**:
|
||||
- `fuzzy_matcher.py` v26.018.0100: Moved quality pattern removal before space normalization
|
||||
- `plugin.py` v0.7.4a: Updated version and minimum fuzzy_matcher requirement
|
||||
|
||||
---
|
||||
|
||||
## v0.7.3c (December 23, 2025)
|
||||
**Type**: Critical Bugfix Release
|
||||
**Severity**: HIGH (Unicode tag users)
|
||||
|
||||
### Bug Fixed: Custom Ignore Tags with Unicode Characters Not Working
|
||||
|
||||
**Issue**: Custom ignore tags containing Unicode or special characters (like `┃NLZIET┃`) were completely ignored during normalization, causing all channels to fail matching.
|
||||
|
||||
**Root Cause**: Code used regex word boundaries (`\b`) for all custom tags. Word boundaries only work with alphanumeric characters. Unicode characters like `┃` (U+2503) are not word characters.
|
||||
|
||||
**Fix**: Smart tag detection - only use word boundaries for pure alphanumeric tags, use literal matching for Unicode/special character tags.
|
||||
|
||||
---
|
||||
|
||||
## v0.7.4 (December 22, 2025)
|
||||
**Type**: Critical Bugfix Release
|
||||
**Severity**: HIGH (Matching Accuracy)
|
||||
|
||||
### Bug #1 Fixed: Substring Matching Too Permissive
|
||||
|
||||
**Issue**: "Story" matched "HISTORY" at threshold 80 because substring matching didn't validate semantic similarity.
|
||||
|
||||
**Fix**: Added 75% length ratio requirement for substring matches.
|
||||
|
||||
### Bug #2 Fixed: Regional Tags Stripped Despite Setting
|
||||
|
||||
**Issue**: `Ignore Regional Tags: False` didn't work - "(WEST)" was still being removed by MISC_PATTERNS and callsign patterns.
|
||||
|
||||
**Fix**: Conditional MISC_PATTERNS application and callsign pattern with negative lookahead for regional indicators.
|
||||
|
||||
---
|
||||
|
||||
## v0.7.3 (December 21, 2025)
|
||||
**Type**: Enhancement Release
|
||||
|
||||
### Added FuzzyMatcher Version to CSV Headers
|
||||
|
||||
CSV exports now show both plugin and fuzzy_matcher versions for better troubleshooting:
|
||||
```csv
|
||||
# Stream-Mapparr Export v0.7.3
|
||||
# FuzzyMatcher Version: 25.354.1835
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## v0.7.2 (December 20, 2025)
|
||||
**Type**: Bugfix Release
|
||||
|
||||
### Fixed: Incomplete Regional Patterns
|
||||
|
||||
Updated fuzzy_matcher dependency to include all 6 US timezone regional indicators (East, West, Pacific, Central, Mountain, Atlantic) instead of just "East".
|
||||
|
||||
---
|
||||
|
||||
## v0.6.x Series
|
||||
|
||||
### v0.6.17 - M3U Source Prioritization
|
||||
Added M3U source priority ordering for stream sorting.
|
||||
|
||||
### v0.6.16 - Channel Loading Fix
|
||||
Fixed channel loading issues with profile filtering.
|
||||
|
||||
### v0.6.15 - Smart Stream Sorting
|
||||
Implemented quality-based stream sorting using stream_stats (resolution + FPS).
|
||||
|
||||
### v0.6.14 - CSV Headers Enhancement
|
||||
Added comprehensive CSV headers with action name, execution mode, and settings.
|
||||
|
||||
### v0.6.13 - Channel Groups Filter Fix
|
||||
Fixed Sort Alternate Streams ignoring channel groups filter setting.
|
||||
|
||||
### v0.6.12 - Sort Streams Fix
|
||||
Critical fix for Sort Alternate Streams action using wrong API endpoint.
|
||||
|
||||
### v0.6.11 - Dry Run Mode & Sort Streams
|
||||
Added dry run mode toggle, Sort Alternate Streams action, flexible scheduled task configuration.
|
||||
|
||||
### v0.6.10 - Lock Detection Enhancement
|
||||
Added Stream model import, enhanced lock detection, manual lock clear action.
|
||||
|
||||
### v0.6.9 - IPTV Checker Integration
|
||||
Filter dead streams (0x0 resolution) and optional scheduler coordination.
|
||||
|
||||
### v0.6.8 - Quality-Based Stream Ordering
|
||||
Automatic quality-based stream ordering when assigning streams.
|
||||
|
||||
### v0.6.7 - Deduplication & Decade Fix
|
||||
Stream deduplication, decade number preservation ("70s" not matching "90s"), plus sign handling.
|
||||
|
||||
### v0.6.3 - Numbered Channel Fix
|
||||
Fixed false positive matches for numbered channels (Premier Sports 1 vs 2).
|
||||
|
||||
### v0.6.2 - Token Matching Fix
|
||||
Fixed Sky Cinema channels matching incorrect streams.
|
||||
|
||||
### v0.6.0 - Major Refactor
|
||||
Replaced Celery Beat with background threading scheduler, operation lock system, WebSocket notifications, centralized configuration.
|
||||
|
||||
---
|
||||
|
||||
## Upgrade Instructions
|
||||
|
||||
**For v0.7.4a**:
|
||||
1. Replace `plugin.py` with v0.7.4a
|
||||
2. Replace `fuzzy_matcher.py` with v26.018.0100
|
||||
3. Restart Dispatcharr container
|
||||
4. Re-run "Match & Assign Streams"
|
||||
|
||||
**IMPORTANT**: Both files must be updated together!
|
||||
|
||||
---
|
||||
|
||||
## Version Compatibility
|
||||
|
||||
| Plugin Version | Required fuzzy_matcher |
|
||||
|---------------|------------------------|
|
||||
| 0.7.4a | 26.018.0100+ |
|
||||
| 0.7.3c | 25.358.0200+ |
|
||||
| 0.7.4 | 25.356.0230+ |
|
||||
| 0.7.3 | 25.354.1835+ |
|
||||
| 0.7.2 | 25.354.1835+ |
|
||||
@@ -11,8 +11,15 @@ import logging
|
||||
import unicodedata
|
||||
from glob import glob
|
||||
|
||||
# Optional C-accelerated Levenshtein (20-50x faster when available)
|
||||
try:
|
||||
from rapidfuzz.distance import Levenshtein as _rf_lev
|
||||
_USE_RAPIDFUZZ = True
|
||||
except ImportError:
|
||||
_USE_RAPIDFUZZ = False
|
||||
|
||||
# Version: YY.DDD.HHMM (Julian date format: Year.DayOfYear.Time)
|
||||
__version__ = "25.358.0200"
|
||||
__version__ = "26.095.0100"
|
||||
|
||||
# Setup logging
|
||||
LOGGER = logging.getLogger("plugins.fuzzy_matcher")
|
||||
@@ -44,31 +51,20 @@ QUALITY_PATTERNS = [
|
||||
]
|
||||
|
||||
# Regional indicator patterns: East, West, Pacific, Central, Mountain, Atlantic
|
||||
# All patterns are applied with re.IGNORECASE, so no need to spell out both cases.
|
||||
REGIONAL_PATTERNS = [
|
||||
# Regional: " East" or " east" (word with space prefix)
|
||||
r'\s[Ee][Aa][Ss][Tt]',
|
||||
# Regional: " West" or " west" (word with space prefix)
|
||||
r'\s[Ww][Ee][Ss][Tt]',
|
||||
# Regional: " Pacific" or " pacific" (word with space prefix)
|
||||
r'\s[Pp][Aa][Cc][Ii][Ff][Ii][Cc]',
|
||||
# Regional: " Central" or " central" (word with space prefix)
|
||||
r'\s[Cc][Ee][Nn][Tt][Rr][Aa][Ll]',
|
||||
# Regional: " Mountain" or " mountain" (word with space prefix)
|
||||
r'\s[Mm][Oo][Uu][Nn][Tt][Aa][Ii][Nn]',
|
||||
# Regional: " Atlantic" or " atlantic" (word with space prefix)
|
||||
r'\s[Aa][Tt][Ll][Aa][Nn][Tt][Ii][Cc]',
|
||||
# Regional: (East) or (EAST) (parenthesized format)
|
||||
r'\s*\([Ee][Aa][Ss][Tt]\)\s*',
|
||||
# Regional: (West) or (WEST) (parenthesized format)
|
||||
r'\s*\([Ww][Ee][Ss][Tt]\)\s*',
|
||||
# Regional: (Pacific) or (PACIFIC) (parenthesized format)
|
||||
r'\s*\([Pp][Aa][Cc][Ii][Ff][Ii][Cc]\)\s*',
|
||||
# Regional: (Central) or (CENTRAL) (parenthesized format)
|
||||
r'\s*\([Cc][Ee][Nn][Tt][Rr][Aa][Ll]\)\s*',
|
||||
# Regional: (Mountain) or (MOUNTAIN) (parenthesized format)
|
||||
r'\s*\([Mm][Oo][Uu][Nn][Tt][Aa][Ii][Nn]\)\s*',
|
||||
# Regional: (Atlantic) or (ATLANTIC) (parenthesized format)
|
||||
r'\s*\([Aa][Tt][Ll][Aa][Nn][Tt][Ii][Cc]\)\s*',
|
||||
r'\sEast',
|
||||
r'\sWest',
|
||||
r'\sPacific',
|
||||
r'\sCentral',
|
||||
r'\sMountain',
|
||||
r'\sAtlantic',
|
||||
r'\s*\(East\)\s*',
|
||||
r'\s*\(West\)\s*',
|
||||
r'\s*\(Pacific\)\s*',
|
||||
r'\s*\(Central\)\s*',
|
||||
r'\s*\(Mountain\)\s*',
|
||||
r'\s*\(Atlantic\)\s*',
|
||||
]
|
||||
|
||||
# Geographic prefix patterns: US:, USA:, etc.
|
||||
@@ -124,10 +120,61 @@ class FuzzyMatcher:
|
||||
self.channel_lookup = {} # Callsign -> channel data mapping
|
||||
self.country_codes = None # Track which country databases are currently loaded
|
||||
|
||||
# Normalization cache for performance (avoids redundant normalize_name calls)
|
||||
self._norm_cache = {} # raw_name -> normalized_lower
|
||||
self._norm_nospace_cache = {} # raw_name -> normalized with spaces/&/- removed
|
||||
self._processed_cache = {} # raw_name -> process_string_for_matching result
|
||||
self._cached_ignore_tags = None # user_ignored_tags used during precompute
|
||||
self._cached_flags = {} # ignore_quality/regional/geographic/misc used during precompute
|
||||
|
||||
# Load all channel databases if plugin_dir is provided
|
||||
if self.plugin_dir:
|
||||
self._load_channel_databases()
|
||||
|
||||
def _parse_channel_file(self, channel_file):
|
||||
"""Parse a single *_channels.json file and append entries to instance collections.
|
||||
|
||||
Returns:
|
||||
Tuple of (broadcast_count, premium_count) for the file, or (0, 0) on error.
|
||||
"""
|
||||
try:
|
||||
with open(channel_file, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
channels_list = data.get('channels', []) if isinstance(data, dict) else data
|
||||
|
||||
broadcast_count = 0
|
||||
premium_count = 0
|
||||
|
||||
for channel in channels_list:
|
||||
channel_type = channel.get('type', '').lower()
|
||||
|
||||
if 'broadcast' in channel_type or channel_type == 'broadcast (ota)':
|
||||
self.broadcast_channels.append(channel)
|
||||
broadcast_count += 1
|
||||
|
||||
callsign = channel.get('callsign', '').strip()
|
||||
if callsign:
|
||||
self.channel_lookup[callsign] = channel
|
||||
base_callsign = re.sub(r'-(?:TV|CD|LP|DT|LD)$', '', callsign)
|
||||
if base_callsign != callsign:
|
||||
self.channel_lookup[base_callsign] = channel
|
||||
else:
|
||||
channel_name = channel.get('channel_name', '').strip()
|
||||
if channel_name:
|
||||
self.premium_channels.append(channel_name)
|
||||
self.premium_channels_full.append(channel)
|
||||
premium_count += 1
|
||||
|
||||
self.logger.info(
|
||||
f"Loaded from {os.path.basename(channel_file)}: "
|
||||
f"{broadcast_count} broadcast, {premium_count} premium channels"
|
||||
)
|
||||
return broadcast_count, premium_count
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error loading {channel_file}: {e}")
|
||||
return 0, 0
|
||||
|
||||
def _load_channel_databases(self):
|
||||
"""Load all *_channels.json files from the plugin directory."""
|
||||
pattern = os.path.join(self.plugin_dir, "*_channels.json")
|
||||
@@ -141,49 +188,10 @@ class FuzzyMatcher:
|
||||
|
||||
total_broadcast = 0
|
||||
total_premium = 0
|
||||
|
||||
for channel_file in channel_files:
|
||||
try:
|
||||
with open(channel_file, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
# Extract the channels array from the JSON structure
|
||||
channels_list = data.get('channels', []) if isinstance(data, dict) else data
|
||||
|
||||
file_broadcast = 0
|
||||
file_premium = 0
|
||||
|
||||
for channel in channels_list:
|
||||
channel_type = channel.get('type', '').lower()
|
||||
|
||||
if 'broadcast' in channel_type or channel_type == 'broadcast (ota)':
|
||||
# Broadcast channel with callsign
|
||||
self.broadcast_channels.append(channel)
|
||||
file_broadcast += 1
|
||||
|
||||
# Create lookup by callsign
|
||||
callsign = channel.get('callsign', '').strip()
|
||||
if callsign:
|
||||
self.channel_lookup[callsign] = channel
|
||||
|
||||
# Also store base callsign without suffix for easier matching
|
||||
base_callsign = re.sub(r'-(?:TV|CD|LP|DT|LD)$', '', callsign)
|
||||
if base_callsign != callsign:
|
||||
self.channel_lookup[base_callsign] = channel
|
||||
else:
|
||||
# Premium/cable/national channel
|
||||
channel_name = channel.get('channel_name', '').strip()
|
||||
if channel_name:
|
||||
self.premium_channels.append(channel_name)
|
||||
self.premium_channels_full.append(channel)
|
||||
file_premium += 1
|
||||
|
||||
total_broadcast += file_broadcast
|
||||
total_premium += file_premium
|
||||
|
||||
self.logger.info(f"Loaded from {os.path.basename(channel_file)}: {file_broadcast} broadcast, {file_premium} premium channels")
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error loading {channel_file}: {e}")
|
||||
b, p = self._parse_channel_file(channel_file)
|
||||
total_broadcast += b
|
||||
total_premium += p
|
||||
|
||||
self.logger.info(f"Total channels loaded: {total_broadcast} broadcast, {total_premium} premium")
|
||||
return True
|
||||
@@ -204,13 +212,9 @@ class FuzzyMatcher:
|
||||
self.premium_channels = []
|
||||
self.premium_channels_full = []
|
||||
self.channel_lookup = {}
|
||||
|
||||
# Update country_codes tracking
|
||||
self.country_codes = country_codes
|
||||
|
||||
# Determine which files to load
|
||||
if country_codes:
|
||||
# Load only specified country databases
|
||||
channel_files = []
|
||||
for code in country_codes:
|
||||
file_path = os.path.join(self.plugin_dir, f"{code}_channels.json")
|
||||
@@ -219,7 +223,6 @@ class FuzzyMatcher:
|
||||
else:
|
||||
self.logger.warning(f"Channel database not found: {code}_channels.json")
|
||||
else:
|
||||
# Load all available databases
|
||||
pattern = os.path.join(self.plugin_dir, "*_channels.json")
|
||||
channel_files = glob(pattern)
|
||||
|
||||
@@ -231,49 +234,10 @@ class FuzzyMatcher:
|
||||
|
||||
total_broadcast = 0
|
||||
total_premium = 0
|
||||
|
||||
for channel_file in channel_files:
|
||||
try:
|
||||
with open(channel_file, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
# Extract the channels array from the JSON structure
|
||||
channels_list = data.get('channels', []) if isinstance(data, dict) else data
|
||||
|
||||
file_broadcast = 0
|
||||
file_premium = 0
|
||||
|
||||
for channel in channels_list:
|
||||
channel_type = channel.get('type', '').lower()
|
||||
|
||||
if 'broadcast' in channel_type or channel_type == 'broadcast (ota)':
|
||||
# Broadcast channel with callsign
|
||||
self.broadcast_channels.append(channel)
|
||||
file_broadcast += 1
|
||||
|
||||
# Create lookup by callsign
|
||||
callsign = channel.get('callsign', '').strip()
|
||||
if callsign:
|
||||
self.channel_lookup[callsign] = channel
|
||||
|
||||
# Also store base callsign without suffix for easier matching
|
||||
base_callsign = re.sub(r'-(?:TV|CD|LP|DT|LD)$', '', callsign)
|
||||
if base_callsign != callsign:
|
||||
self.channel_lookup[base_callsign] = channel
|
||||
else:
|
||||
# Premium/cable/national channel
|
||||
channel_name = channel.get('channel_name', '').strip()
|
||||
if channel_name:
|
||||
self.premium_channels.append(channel_name)
|
||||
self.premium_channels_full.append(channel)
|
||||
file_premium += 1
|
||||
|
||||
total_broadcast += file_broadcast
|
||||
total_premium += file_premium
|
||||
|
||||
self.logger.info(f"Loaded from {os.path.basename(channel_file)}: {file_broadcast} broadcast, {file_premium} premium channels")
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error loading {channel_file}: {e}")
|
||||
b, p = self._parse_channel_file(channel_file)
|
||||
total_broadcast += b
|
||||
total_premium += p
|
||||
|
||||
self.logger.info(f"Total channels loaded: {total_broadcast} broadcast, {total_premium} premium")
|
||||
return True
|
||||
@@ -322,6 +286,61 @@ class FuzzyMatcher:
|
||||
callsign = re.sub(r'-(?:TV|CD|LP|DT|LD)$', '', callsign)
|
||||
return callsign
|
||||
|
||||
def precompute_normalizations(self, names, user_ignored_tags=None,
|
||||
ignore_quality=True, ignore_regional=True,
|
||||
ignore_geographic=True, ignore_misc=True):
|
||||
"""
|
||||
Pre-normalize a list of names and cache the results.
|
||||
Call this once before matching loops to avoid redundant normalization
|
||||
when matching many channels against the same stream list.
|
||||
Flags must match the flags passed to fuzzy_match() for correct results.
|
||||
"""
|
||||
self._norm_cache.clear()
|
||||
self._norm_nospace_cache.clear()
|
||||
self._processed_cache.clear()
|
||||
self._cached_ignore_tags = user_ignored_tags
|
||||
self._cached_flags = {
|
||||
'ignore_quality': ignore_quality,
|
||||
'ignore_regional': ignore_regional,
|
||||
'ignore_geographic': ignore_geographic,
|
||||
'ignore_misc': ignore_misc,
|
||||
}
|
||||
|
||||
for name in names:
|
||||
norm = self.normalize_name(name, user_ignored_tags,
|
||||
ignore_quality=ignore_quality,
|
||||
ignore_regional=ignore_regional,
|
||||
ignore_geographic=ignore_geographic,
|
||||
ignore_misc=ignore_misc)
|
||||
if norm and len(norm) >= 2:
|
||||
norm_lower = norm.lower()
|
||||
self._norm_cache[name] = norm_lower
|
||||
self._norm_nospace_cache[name] = re.sub(r'[\s&\-]+', '', norm_lower)
|
||||
self._processed_cache[name] = self.process_string_for_matching(norm)
|
||||
|
||||
self.logger.info(f"Pre-normalized {len(self._norm_cache)} stream names (from {len(names)} total)")
|
||||
|
||||
def _get_cached_norm(self, name, user_ignored_tags=None):
|
||||
"""Get cached normalization or compute on the fly using stored flags."""
|
||||
if name in self._norm_cache:
|
||||
return self._norm_cache[name], self._norm_nospace_cache[name]
|
||||
tags = user_ignored_tags if user_ignored_tags is not None else self._cached_ignore_tags
|
||||
norm = self.normalize_name(name, tags, **self._cached_flags)
|
||||
if not norm or len(norm) < 2:
|
||||
return None, None
|
||||
norm_lower = norm.lower()
|
||||
return norm_lower, re.sub(r'[\s&\-]+', '', norm_lower)
|
||||
|
||||
def _get_cached_processed(self, name, user_ignored_tags=None):
|
||||
"""Get cached processed string or compute on the fly using stored flags."""
|
||||
if name in self._processed_cache:
|
||||
return self._processed_cache[name]
|
||||
tags = user_ignored_tags if user_ignored_tags is not None else self._cached_ignore_tags
|
||||
norm = self.normalize_name(name, tags, **self._cached_flags)
|
||||
if not norm or len(norm) < 2:
|
||||
return None
|
||||
return self.process_string_for_matching(norm)
|
||||
|
||||
def normalize_name(self, name, user_ignored_tags=None, ignore_quality=True, ignore_regional=True,
|
||||
ignore_geographic=True, ignore_misc=True, remove_cinemax=False, remove_country_prefix=False):
|
||||
"""
|
||||
@@ -346,17 +365,26 @@ class FuzzyMatcher:
|
||||
# Store original for logging
|
||||
original_name = name
|
||||
|
||||
# CRITICAL FIX: Normalize spacing around numbers FIRST, before any other processing
|
||||
# CRITICAL FIX (v25.019.0100): Apply quality patterns FIRST, before space normalization
|
||||
# This prevents space normalization from breaking quality tags like "4K" -> "4 K"
|
||||
# which would then fail to match quality patterns looking for "4K"
|
||||
# Bug: Streams with "4K" suffix were not matching because "4K" was split to "4 K"
|
||||
# by the space normalization step, then quality patterns couldn't find "4K" at end
|
||||
if ignore_quality:
|
||||
for pattern in QUALITY_PATTERNS:
|
||||
name = re.sub(pattern, '', name, flags=re.IGNORECASE)
|
||||
|
||||
# Normalize spacing around numbers (AFTER quality patterns are removed)
|
||||
# This ensures "ITV1" and "ITV 1" are treated identically during matching
|
||||
# Pattern: Insert space before number if preceded by letter, and after number if followed by letter
|
||||
# Examples: "ITV1" -> "ITV 1", "BBC2" -> "BBC 2", "E4" -> "E 4"
|
||||
name = re.sub(r'([a-zA-Z])(\d)', r'\1 \2', name) # Letter followed by digit
|
||||
name = re.sub(r'(\d)([a-zA-Z])', r'\1 \2', name) # Digit followed by letter
|
||||
|
||||
# CRITICAL FIX: Normalize hyphens to spaces for better token matching
|
||||
# Normalize hyphens to spaces for better token matching
|
||||
# This ensures "UK-ITV" becomes "UK ITV" and matches properly
|
||||
# Common patterns: "UK-ITV 1", "US-CNN", etc.
|
||||
name = re.sub(r'-', ' ', name) # Digit followed by letter
|
||||
name = re.sub(r'-', ' ', name)
|
||||
|
||||
# Remove ALL leading parenthetical prefixes like (US) (PRIME2), (SP2), (D1), etc.
|
||||
# Loop until no more leading parentheses are found
|
||||
@@ -386,11 +414,10 @@ class FuzzyMatcher:
|
||||
name = re.sub(r'\bCinemax\b\s*', '', name, flags=re.IGNORECASE)
|
||||
|
||||
# Build list of patterns to apply based on category flags
|
||||
# NOTE: Quality patterns are now applied earlier (before space normalization)
|
||||
# to prevent "4K" from being split to "4 K" before removal
|
||||
patterns_to_apply = []
|
||||
|
||||
if ignore_quality:
|
||||
patterns_to_apply.extend(QUALITY_PATTERNS)
|
||||
|
||||
if ignore_regional:
|
||||
patterns_to_apply.extend(REGIONAL_PATTERNS)
|
||||
|
||||
@@ -531,20 +558,40 @@ class FuzzyMatcher:
|
||||
|
||||
return regional, extra_tags, quality_tags
|
||||
|
||||
def calculate_similarity(self, str1, str2):
|
||||
def calculate_similarity(self, str1, str2, threshold=None):
|
||||
"""
|
||||
Calculate Levenshtein distance-based similarity ratio between two strings.
|
||||
|
||||
Args:
|
||||
str1: First string
|
||||
str2: Second string
|
||||
threshold: Optional minimum similarity (0.0-1.0). When set, returns 0.0
|
||||
early if the score cannot possibly meet this threshold.
|
||||
Used with rapidfuzz's score_cutoff and for pure-Python early termination.
|
||||
|
||||
Returns:
|
||||
Similarity ratio between 0.0 and 1.0
|
||||
"""
|
||||
if len(str1) == 0 or len(str2) == 0:
|
||||
return 0.0
|
||||
|
||||
# Fast path: use C-accelerated rapidfuzz when available
|
||||
if _USE_RAPIDFUZZ:
|
||||
cutoff = threshold if threshold is not None else 0.0
|
||||
return _rf_lev.normalized_similarity(str1, str2, score_cutoff=cutoff)
|
||||
|
||||
# Pure Python fallback with optional early termination
|
||||
if len(str1) < len(str2):
|
||||
str1, str2 = str2, str1
|
||||
|
||||
# Empty strings should not match anything (including other empty strings)
|
||||
# This prevents false positives when normalization strips everything
|
||||
if len(str2) == 0 or len(str1) == 0:
|
||||
return 0.0
|
||||
total_len = len(str1) + len(str2)
|
||||
|
||||
# Early rejection: if strings differ in length too much, max possible
|
||||
# similarity is bounded. Check before doing the full DP.
|
||||
if threshold is not None:
|
||||
max_possible = (total_len - abs(len(str1) - len(str2))) / total_len
|
||||
if max_possible < threshold:
|
||||
return 0.0
|
||||
|
||||
previous_row = list(range(len(str2) + 1))
|
||||
|
||||
@@ -555,16 +602,22 @@ class FuzzyMatcher:
|
||||
deletions = current_row[j] + 1
|
||||
substitutions = previous_row[j] + (c1 != c2)
|
||||
current_row.append(min(insertions, deletions, substitutions))
|
||||
|
||||
# Early termination: check if minimum possible distance in this row
|
||||
# already makes it impossible to meet the threshold
|
||||
if threshold is not None:
|
||||
min_distance_so_far = min(current_row)
|
||||
# Best case: remaining chars all match perfectly
|
||||
remaining = len(str1) - i - 1
|
||||
best_possible_distance = max(0, min_distance_so_far - remaining)
|
||||
best_possible_ratio = (total_len - best_possible_distance) / total_len
|
||||
if best_possible_ratio < threshold:
|
||||
return 0.0
|
||||
|
||||
previous_row = current_row
|
||||
|
||||
distance = previous_row[-1]
|
||||
total_len = len(str1) + len(str2)
|
||||
|
||||
if total_len == 0:
|
||||
return 1.0
|
||||
|
||||
ratio = (total_len - distance) / total_len
|
||||
return ratio
|
||||
return (total_len - distance) / total_len
|
||||
|
||||
def process_string_for_matching(self, s):
|
||||
"""
|
||||
@@ -643,20 +696,22 @@ class FuzzyMatcher:
|
||||
best_match = None
|
||||
|
||||
for candidate in candidate_names:
|
||||
# Normalize candidate (stream name) with Cinemax removal if requested
|
||||
candidate_normalized = self.normalize_name(candidate, user_ignored_tags,
|
||||
ignore_quality=ignore_quality,
|
||||
ignore_regional=ignore_regional,
|
||||
ignore_geographic=ignore_geographic,
|
||||
ignore_misc=ignore_misc,
|
||||
remove_cinemax=remove_cinemax)
|
||||
# Use cached processed string when available
|
||||
processed_candidate = self._get_cached_processed(candidate, user_ignored_tags)
|
||||
if not processed_candidate:
|
||||
# Fallback: normalize and process on the fly
|
||||
candidate_normalized = self.normalize_name(candidate, user_ignored_tags,
|
||||
ignore_quality=ignore_quality,
|
||||
ignore_regional=ignore_regional,
|
||||
ignore_geographic=ignore_geographic,
|
||||
ignore_misc=ignore_misc,
|
||||
remove_cinemax=remove_cinemax)
|
||||
if not candidate_normalized or len(candidate_normalized) < 2:
|
||||
continue
|
||||
processed_candidate = self.process_string_for_matching(candidate_normalized)
|
||||
|
||||
# Skip candidates that normalize to empty or very short strings
|
||||
if not candidate_normalized or len(candidate_normalized) < 2:
|
||||
continue
|
||||
|
||||
processed_candidate = self.process_string_for_matching(candidate_normalized)
|
||||
score = self.calculate_similarity(processed_query, processed_candidate)
|
||||
score = self.calculate_similarity(processed_query, processed_candidate,
|
||||
threshold=self.match_threshold / 100.0)
|
||||
|
||||
if score > best_score:
|
||||
best_score = score
|
||||
@@ -714,28 +769,17 @@ class FuzzyMatcher:
|
||||
normalized_query_nospace = re.sub(r'[\s&\-]+', '', normalized_query_lower)
|
||||
|
||||
for candidate in candidate_names:
|
||||
# Normalize candidate (stream name) with Cinemax removal if requested
|
||||
candidate_normalized = self.normalize_name(candidate, user_ignored_tags,
|
||||
ignore_quality=ignore_quality,
|
||||
ignore_regional=ignore_regional,
|
||||
ignore_geographic=ignore_geographic,
|
||||
ignore_misc=ignore_misc,
|
||||
remove_cinemax=remove_cinemax)
|
||||
|
||||
# Skip candidates that normalize to empty or very short strings (< 2 chars)
|
||||
# This prevents false positives where multiple streams all normalize to ""
|
||||
if not candidate_normalized or len(candidate_normalized) < 2:
|
||||
# Use cached normalization when available
|
||||
candidate_lower, candidate_nospace = self._get_cached_norm(candidate, user_ignored_tags)
|
||||
if not candidate_lower:
|
||||
continue
|
||||
|
||||
candidate_lower = candidate_normalized.lower()
|
||||
candidate_nospace = re.sub(r'[\s&\-]+', '', candidate_lower)
|
||||
|
||||
# Exact match
|
||||
# Exact match (space/punctuation insensitive)
|
||||
if normalized_query_nospace == candidate_nospace:
|
||||
return candidate, 100, "exact"
|
||||
|
||||
# Very high similarity (97%+)
|
||||
ratio = self.calculate_similarity(normalized_query_lower, candidate_lower)
|
||||
ratio = self.calculate_similarity(normalized_query_lower, candidate_lower, threshold=0.97)
|
||||
if ratio >= 0.97 and ratio > best_ratio:
|
||||
best_match = candidate
|
||||
best_ratio = ratio
|
||||
@@ -746,30 +790,17 @@ class FuzzyMatcher:
|
||||
|
||||
# Stage 2: Substring matching
|
||||
for candidate in candidate_names:
|
||||
# Normalize candidate (stream name) with Cinemax removal if requested
|
||||
candidate_normalized = self.normalize_name(candidate, user_ignored_tags,
|
||||
ignore_quality=ignore_quality,
|
||||
ignore_regional=ignore_regional,
|
||||
ignore_geographic=ignore_geographic,
|
||||
ignore_misc=ignore_misc,
|
||||
remove_cinemax=remove_cinemax)
|
||||
|
||||
# Skip candidates that normalize to empty or very short strings
|
||||
if not candidate_normalized or len(candidate_normalized) < 2:
|
||||
# Use cached normalization when available
|
||||
candidate_lower, _ = self._get_cached_norm(candidate, user_ignored_tags)
|
||||
if not candidate_lower:
|
||||
continue
|
||||
|
||||
candidate_lower = candidate_normalized.lower()
|
||||
|
||||
# Check if one is a substring of the other
|
||||
if normalized_query_lower in candidate_lower or candidate_lower in normalized_query_lower:
|
||||
# CRITICAL FIX: Add length ratio requirement to prevent false positives
|
||||
# like "story" matching "history" (story is 5 chars, history is 7 chars)
|
||||
# Require strings to be within 75% of same length for substring match
|
||||
# This ensures substring matches are semantically meaningful
|
||||
length_ratio = min(len(normalized_query_lower), len(candidate_lower)) / max(len(normalized_query_lower), len(candidate_lower))
|
||||
if length_ratio >= 0.75:
|
||||
# Calculate similarity score
|
||||
ratio = self.calculate_similarity(normalized_query_lower, candidate_lower)
|
||||
ratio = self.calculate_similarity(normalized_query_lower, candidate_lower,
|
||||
threshold=self.match_threshold / 100.0)
|
||||
if ratio > best_ratio:
|
||||
best_match = candidate
|
||||
best_ratio = ratio
|
||||
@@ -779,14 +810,26 @@ class FuzzyMatcher:
|
||||
return best_match, int(best_ratio * 100), match_type
|
||||
|
||||
# Stage 3: Fuzzy matching with token sorting
|
||||
fuzzy_match, score = self.find_best_match(query_name, candidate_names, user_ignored_tags,
|
||||
remove_cinemax=remove_cinemax,
|
||||
ignore_quality=ignore_quality,
|
||||
ignore_regional=ignore_regional,
|
||||
ignore_geographic=ignore_geographic,
|
||||
ignore_misc=ignore_misc)
|
||||
if fuzzy_match:
|
||||
return fuzzy_match, score, f"fuzzy ({score})"
|
||||
processed_query = self.process_string_for_matching(normalized_query)
|
||||
best_score = -1.0
|
||||
best_fuzzy = None
|
||||
threshold_ratio = self.match_threshold / 100.0
|
||||
|
||||
for candidate in candidate_names:
|
||||
# Use cached processed string when available
|
||||
processed_candidate = self._get_cached_processed(candidate, user_ignored_tags)
|
||||
if not processed_candidate:
|
||||
continue
|
||||
|
||||
score = self.calculate_similarity(processed_query, processed_candidate,
|
||||
threshold=threshold_ratio)
|
||||
if score > best_score:
|
||||
best_score = score
|
||||
best_fuzzy = candidate
|
||||
|
||||
percentage_score = int(best_score * 100)
|
||||
if percentage_score >= self.match_threshold and best_fuzzy:
|
||||
return best_fuzzy, percentage_score, f"fuzzy ({percentage_score})"
|
||||
|
||||
return None, 0, None
|
||||
|
||||
|
||||
@@ -1,16 +1,45 @@
|
||||
{
|
||||
"name": "Stream-Mapparr",
|
||||
"key": "stream-mapparr",
|
||||
"module": "stream_mapparr.plugin",
|
||||
"class": "Plugin",
|
||||
"version": "0.9.0",
|
||||
"description": "Automatically add matching streams to channels based on name similarity and quality precedence. Supports unlimited stream matching, channel visibility management, and CSV export cleanup.",
|
||||
"author": "community",
|
||||
"homepage": "https://github.com/PiratesIRC/Stream-Mapparr",
|
||||
"author": "PiratesIRC",
|
||||
"license": "MIT",
|
||||
"repo_url": "https://github.com/PiratesIRC/Stream-Mapparr",
|
||||
"min_dispatcharr_version": "v0.20.0",
|
||||
"help_url": "https://github.com/PiratesIRC/Stream-Mapparr",
|
||||
"fields": [
|
||||
{"id": "overwrite_streams", "label": "Overwrite Existing Streams", "type": "boolean", "default": true},
|
||||
{"id": "match_sensitivity", "label": "Match Sensitivity", "type": "select", "default": "normal"},
|
||||
{"id": "profile_name", "label": "Channel Profile", "type": "select", "default": ""},
|
||||
{"id": "selected_groups", "label": "Channel Groups", "type": "string", "default": ""},
|
||||
{"id": "selected_stream_groups", "label": "Stream Groups", "type": "string", "default": ""},
|
||||
{"id": "selected_m3us", "label": "M3U Sources", "type": "string", "default": ""},
|
||||
{"id": "prioritize_quality", "label": "Prioritize Quality Before Source", "type": "boolean", "default": false},
|
||||
{"id": "ignore_tags", "label": "Custom Ignore Tags", "type": "string", "default": ""},
|
||||
{"id": "tag_handling", "label": "Tag Handling", "type": "select", "default": "strip_all"},
|
||||
{"id": "channel_database", "label": "Channel Database", "type": "select", "default": "US"},
|
||||
{"id": "visible_channel_limit", "label": "Visible Channel Limit", "type": "number", "default": 1},
|
||||
{"id": "rate_limiting", "label": "Rate Limiting", "type": "select", "default": "none"},
|
||||
{"id": "timezone", "label": "Timezone", "type": "select", "default": "US/Central"},
|
||||
{"id": "filter_dead_streams", "label": "Filter Dead Streams", "type": "boolean", "default": false},
|
||||
{"id": "wait_for_iptv_checker", "label": "Wait for IPTV Checker", "type": "boolean", "default": false},
|
||||
{"id": "iptv_checker_max_wait_hours", "label": "IPTV Checker Max Wait", "type": "number", "default": 6},
|
||||
{"id": "dry_run_mode", "label": "Dry Run Mode", "type": "boolean", "default": false},
|
||||
{"id": "scheduled_times", "label": "Scheduled Run Times", "type": "string", "default": ""},
|
||||
{"id": "scheduled_sort_streams", "label": "Schedule Sort Streams", "type": "boolean", "default": false},
|
||||
{"id": "scheduled_match_streams", "label": "Schedule Match Streams", "type": "boolean", "default": true},
|
||||
{"id": "enable_scheduled_csv_export", "label": "Enable CSV Export", "type": "boolean", "default": true}
|
||||
],
|
||||
"actions": [
|
||||
"load_process_channels",
|
||||
"preview_changes",
|
||||
"add_streams_to_channels",
|
||||
"manage_channel_visibility",
|
||||
"clear_csv_exports"
|
||||
{"id": "validate_settings", "label": "Validate Settings", "description": "Check database connectivity, profiles, groups, and channel databases"},
|
||||
{"id": "update_schedule", "label": "Update Schedule", "description": "Save settings and restart background scheduler"},
|
||||
{"id": "load_process_channels", "label": "Load/Process Channels", "description": "Load channel and stream data from database"},
|
||||
{"id": "preview_changes", "label": "Preview Changes", "description": "Generate CSV preview without making changes"},
|
||||
{"id": "add_streams_to_channels", "label": "Match & Assign Streams", "description": "Match and assign streams to channels"},
|
||||
{"id": "match_us_ota_only", "label": "Match US OTA Only", "description": "Match US OTA channels by callsign"},
|
||||
{"id": "sort_streams", "label": "Sort Alternate Streams", "description": "Sort existing alternate streams by quality"},
|
||||
{"id": "manage_channel_visibility", "label": "Manage Channel Visibility", "description": "Enable/disable channels based on stream count"},
|
||||
{"id": "clear_csv_exports", "label": "Clear CSV Exports", "description": "Delete all plugin CSV export files"},
|
||||
{"id": "clear_operation_lock", "label": "Clear Operation Lock", "description": "Manually clear stuck operation lock"}
|
||||
]
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user