Files
Stream-Mapparr/test_fuzzy_matcher_fix.py
Claude 55d493658e Fix fuzzy matcher producing false positive matches with 100% scores
Root cause: Stream names were normalizing to empty strings through
aggressive pattern stripping, causing multiple unrelated channels
(GNT, MTV, TLC, BIS, TNT, etc.) to incorrectly match each other
with 100% similarity scores.

Changes made:

1. Fixed calculate_similarity() to return 0.0 for empty string
   comparisons instead of 1.0, preventing false positives

2. Added validation in normalize_name() to log warnings when
   normalization results in empty strings

3. Added empty string checks (< 2 chars) in all matching stages:
   - fuzzy_match() Stage 1 (exact match)
   - fuzzy_match() Stage 2 (substring match)
   - find_best_match() (token-sort matching)

4. Added validation in plugin.py _match_streams_to_channel() to
   skip streams with empty/short cleaned names in:
   - Fuzzy matcher result collection
   - JSON exact match section
   - Basic substring matching fallback

5. Fixed country prefix regex pattern from [:|\s] to [:\s]
   (removed incorrect pipe and backslash characters)

Testing: Added comprehensive test suite (test_fuzzy_matcher_fix.py)
that verifies empty strings don't match and valid matches still work.
All tests pass.
2025-11-13 18:11:41 +00:00

5.8 KiB