Ensure consistent quality tag matching across all pattern types

Fix inconsistency where some quality tags were only matched in bracketed
format but not in unbracketed format, causing streams with unbracketed
quality tags to be excluded from matching.

Previous issue:
- [4K], [Unknown], [Unk], [Slow], [Dead] were matched (bracketed)
- But "4K", "Unknown", "Unk", "Slow", "Dead" were NOT matched (unbracketed)

This caused streams like:
- "BBC One 4K" (no brackets) - NOT normalized, NOT matched
- "BBC One Unknown" - NOT normalized, NOT matched
- "ESPN (Slow)" - NOT normalized, NOT matched

Changes to HARDCODED_IGNORE_PATTERNS:
- Added 4K, Unknown, Unk, Slow, Dead to middle pattern: " TAG "
- Added 4K, Unknown, Unk, Slow, Dead to end pattern: " TAG"
- Added 4K, Unknown, Unk, Slow, Dead to colon pattern: "TAG:"
- Added 4K, Unknown, Unk, Slow, Dead to parentheses pattern: (TAG)
- Added inline documentation for each pattern type
- Note: All patterns already use re.IGNORECASE for case-insensitive matching

Now ALL quality tags are handled consistently across:
- Bracketed: [TAG] ✓
- Unbracketed at end: " TAG" ✓
- Unbracketed in middle: " TAG " ✓
- Parenthesized: (TAG) ✓
- With colon: "TAG:" ✓

This ensures no quality tags are missed regardless of format.
This commit is contained in:
Claude
2025-11-06 18:04:33 +00:00
parent 0c0b045e68
commit 95df38905b

View File

@@ -14,18 +14,38 @@ from glob import glob
LOGGER = logging.getLogger("plugins.fuzzy_matcher") LOGGER = logging.getLogger("plugins.fuzzy_matcher")
# Hardcoded regex patterns to ignore during fuzzy matching # Hardcoded regex patterns to ignore during fuzzy matching
# Note: All patterns are applied with re.IGNORECASE flag in normalize_name()
HARDCODED_IGNORE_PATTERNS = [ HARDCODED_IGNORE_PATTERNS = [
# Bracketed quality tags: [4K], [UHD], [FHD], [HD], [SD], [Unknown], [Unk], [Slow], [Dead]
r'\[(4K|UHD|FHD|HD|SD|Unknown|Unk|Slow|Dead)\]', r'\[(4K|UHD|FHD|HD|SD|Unknown|Unk|Slow|Dead)\]',
r'\[(?:4k|uhd|fhd|hd|sd|unknown|unk|slow|dead)\]', r'\[(?:4k|uhd|fhd|hd|sd|unknown|unk|slow|dead)\]',
# Single letter tags in parentheses: (A), (B), (C), etc.
r'\([A-Z]\)', r'\([A-Z]\)',
# Regional: " East" or " east"
r'\s[Ee][Aa][Ss][Tt]', r'\s[Ee][Aa][Ss][Tt]',
r'\s(?:UHD|FHD|SD|HD|FD)\s',
r'\s(?:UHD|FHD|SD|HD|FD)$', # Unbracketed quality tags in middle: " 4K ", " UHD ", " FHD ", " HD ", " SD ", etc.
r'\b(?:UHD|FHD|SD|HD|FD):?\s', r'\s(?:4K|UHD|FHD|HD|SD|Unknown|Unk|Slow|Dead|FD)\s',
r'\s\(CX\)',
r'\s\((UHD|FHD|SD|HD|FD|Backup)\)', # Unbracketed quality tags at end: " 4K", " UHD", " FHD", " HD", " SD", etc.
r'\bUSA?:\s', r'\s(?:4K|UHD|FHD|HD|SD|Unknown|Unk|Slow|Dead|FD)$',
r'\bUS\s',
# Word boundary quality tags with optional colon: "4K:", "UHD:", "FHD:", "HD:", etc.
r'\b(?:4K|UHD|FHD|HD|SD|Unknown|Unk|Slow|Dead|FD):?\s',
# Special tags
r'\s\(CX\)', # Cinemax tag
# Parenthesized quality tags: (4K), (UHD), (FHD), (HD), (SD), (Unknown), (Unk), (Slow), (Dead), (Backup)
r'\s\((4K|UHD|FHD|HD|SD|Unknown|Unk|Slow|Dead|FD|Backup)\)',
# Geographic prefixes
r'\bUSA?:\s', # "US:" or "USA:"
r'\bUS\s', # "US " at word boundary
# Backup tags
r'\([bB]ackup\)', r'\([bB]ackup\)',
] ]