admin管理员组

文章数量:1429951

What regex (or other technique) would help calculate or 'best-guess' the artist in a fairly unpredictable full song title;

e.g. find Dr Dre from the following song title (examples from youtube):

Xxplosive - Dr. Dre
Dr Dre - Xxplosive
Dr Dre- Xxplosive (lyrics)
Dr. Dre - 05 - The Chronic - Nuthin' But AG Thang

My aim is find the most likely 2 or 3 matches, which I intend to send to an existing API which should determine the correct artist.

What regex (or other technique) would help calculate or 'best-guess' the artist in a fairly unpredictable full song title;

e.g. find Dr Dre from the following song title (examples from youtube):

Xxplosive - Dr. Dre
Dr Dre - Xxplosive
Dr Dre- Xxplosive (lyrics)
Dr. Dre - 05 - The Chronic - Nuthin' But AG Thang

My aim is find the most likely 2 or 3 matches, which I intend to send to an existing API which should determine the correct artist.

Share Improve this question asked Feb 13, 2012 at 23:57 sgbsgb 2,3742 gold badges19 silver badges31 bronze badges 6
  • So, if you get 2 outputs from those 5 inputs, that would be fine? For example, "Xxplosive" and "Dr. Dre" as outputs, where we don't know which is the artist. – Brigand Commented Feb 14, 2012 at 0:00
  • The only way you could make that work with the data you've provided is if you always know the artist string, in which case you can just use the name itself as the regex. – Wayne Koorts Commented Feb 14, 2012 at 0:01
  • I don't think there is any regex in the world that can help you guess what word represents a name in a sentence... not even with standard names, let alone all the random pseudonames people e up with for music artists these days... – Matthew Abbott Commented Feb 14, 2012 at 0:02
  • @MatthewAbbott I'm not looking for interpretation, I'm looking to strip out words/phrases – sgb Commented Feb 14, 2012 at 0:03
  • @samb: Regex can only help you split up the track names into substrings, which are likely to contain artist names, album names, song names and track numbers. For deciding which of the substring is the artist's name you can only guess. Or apply some kind of heuristic, such as: "most of the time the pattern is <artist> - <album> - <title>" (or similar), hence assume the first substring to be the artist (would have a success rate of 75% for your sample data). Simple String parison won't do much help here though, as the Youtube folks usually ain't among the more literate. Ergo: fuzzy matching – Regexident Commented Feb 14, 2012 at 0:18
 |  Show 1 more ment

2 Answers 2

Reset to default 8

Split the song title up using some regex like /\s*-\s*, which would turn "Dr Dre - Xxplosive" into an array like: {Dr Dre, Xxplosive}.

Then match the search term Dr. Dre against your split segments using either:

  • Levenshtein distance (O(log(n)), probably best fit for you)
  • Metaphone (O(1), probably good fit, moderate potential for false positives)
  • Soundex (O(1), probably good fit, high potential for false positives)

If your list of tracks is huge, use a BKTree.

In other words, use either fuzzy/approximate string matching or phonetic string matching.

Protip: Use a levenshtein limit relative to the length of your search term (the longer the string, the higher the limit).

why must you use a regex?

wouldn't just using simple string splitting work? you could just split the string by the dash, trim it and send each bit to the API. you could then use a distance based string proximity algorithm to see which bit of the song title is most likely the artist

本文标签: javascriptRegex (or techniques) to guess artist from full song titleStack Overflow