Root cause: Stream names were normalizing to empty strings through
aggressive pattern stripping, causing multiple unrelated channels
(GNT, MTV, TLC, BIS, TNT, etc.) to incorrectly match each other
with 100% similarity scores.
Changes made:
1. Fixed calculate_similarity() to return 0.0 for empty string
comparisons instead of 1.0, preventing false positives
2. Added validation in normalize_name() to log warnings when
normalization results in empty strings
3. Added empty string checks (< 2 chars) in all matching stages:
- fuzzy_match() Stage 1 (exact match)
- fuzzy_match() Stage 2 (substring match)
- find_best_match() (token-sort matching)
4. Added validation in plugin.py _match_streams_to_channel() to
skip streams with empty/short cleaned names in:
- Fuzzy matcher result collection
- JSON exact match section
- Basic substring matching fallback
5. Fixed country prefix regex pattern from [:|\s] to [:\s]
(removed incorrect pipe and backslash characters)
Testing: Added comprehensive test suite (test_fuzzy_matcher_fix.py)
that verifies empty strings don't match and valid matches still work.
All tests pass.