Root cause: Stream names were normalizing to empty strings through aggressive pattern stripping, causing multiple unrelated channels (GNT, MTV, TLC, BIS, TNT, etc.) to incorrectly match each other with 100% similarity scores. Changes made: 1. Fixed calculate_similarity() to return 0.0 for empty string comparisons instead of 1.0, preventing false positives 2. Added validation in normalize_name() to log warnings when normalization results in empty strings 3. Added empty string checks (< 2 chars) in all matching stages: - fuzzy_match() Stage 1 (exact match) - fuzzy_match() Stage 2 (substring match) - find_best_match() (token-sort matching) 4. Added validation in plugin.py _match_streams_to_channel() to skip streams with empty/short cleaned names in: - Fuzzy matcher result collection - JSON exact match section - Basic substring matching fallback 5. Fixed country prefix regex pattern from [:|\s] to [:\s] (removed incorrect pipe and backslash characters) Testing: Added comprehensive test suite (test_fuzzy_matcher_fix.py) that verifies empty strings don't match and valid matches still work. All tests pass.
5.8 KiB
5.8 KiB