(Ir)Regular Expression

I spent almost half of yesterday trying to figure out how to capture a substring from a list of text. The text is a result of querying products from the database. They contain product name, color code, and many of them also have the color name (in parentheses).

So what I want is everything before the number. First thing I think of is regular expression. So I started crafting the regex pattern.

First I came up with this one.

It should just work. I thought. Capture 1 or more non-digit character. Well it did for most of the lines, except for line 11. It returns the whole string with parentheses.

Huh? Why? Then I thought, maybe because it does not have the digit to separate the string. So I modify the pattern to look for optional digits.

Still not working. (same result as above)

After many tries, I RTFM and found out that \D will match any non-digit (meaning, include symbols) but \w will match any word characters (meaning a-z, 0-9 and _ but not symbols). So I changed the pattern.

That’s \w+ (one or more word character) but [^\d|(]* (not digit or parentheses). And BOOM!

It worked.

Note:

  1. The first ^ means beginning of line but the second ^ means not.
  2. I use Rubular to help visually forming the pattern.