Stream-Mapparr

Joren/Stream-Mapparr

Fork 0

Commit Graph

Author	SHA1	Message	Date
Claude	55d493658e	Fix fuzzy matcher producing false positive matches with 100% scores Root cause: Stream names were normalizing to empty strings through aggressive pattern stripping, causing multiple unrelated channels (GNT, MTV, TLC, BIS, TNT, etc.) to incorrectly match each other with 100% similarity scores. Changes made: 1. Fixed calculate_similarity() to return 0.0 for empty string comparisons instead of 1.0, preventing false positives 2. Added validation in normalize_name() to log warnings when normalization results in empty strings 3. Added empty string checks (< 2 chars) in all matching stages: - fuzzy_match() Stage 1 (exact match) - fuzzy_match() Stage 2 (substring match) - find_best_match() (token-sort matching) 4. Added validation in plugin.py _match_streams_to_channel() to skip streams with empty/short cleaned names in: - Fuzzy matcher result collection - JSON exact match section - Basic substring matching fallback 5. Fixed country prefix regex pattern from [:\|\s] to [:\s] (removed incorrect pipe and backslash characters) Testing: Added comprehensive test suite (test_fuzzy_matcher_fix.py) that verifies empty strings don't match and valid matches still work. All tests pass.	2025-11-13 18:11:41 +00:00

Author

SHA1

Message

Date

Claude

55d493658e

Fix fuzzy matcher producing false positive matches with 100% scores

Root cause: Stream names were normalizing to empty strings through
aggressive pattern stripping, causing multiple unrelated channels
(GNT, MTV, TLC, BIS, TNT, etc.) to incorrectly match each other
with 100% similarity scores.

Changes made:

1. Fixed calculate_similarity() to return 0.0 for empty string
   comparisons instead of 1.0, preventing false positives

2. Added validation in normalize_name() to log warnings when
   normalization results in empty strings

3. Added empty string checks (< 2 chars) in all matching stages:
   - fuzzy_match() Stage 1 (exact match)
   - fuzzy_match() Stage 2 (substring match)
   - find_best_match() (token-sort matching)

4. Added validation in plugin.py _match_streams_to_channel() to
   skip streams with empty/short cleaned names in:
   - Fuzzy matcher result collection
   - JSON exact match section
   - Basic substring matching fallback

5. Fixed country prefix regex pattern from [:|\s] to [:\s]
   (removed incorrect pipe and backslash characters)

Testing: Added comprehensive test suite (test_fuzzy_matcher_fix.py)
that verifies empty strings don't match and valid matches still work.
All tests pass.

2025-11-13 18:11:41 +00:00

1 Commits