0
votes

I am attempting to match a string formatted as [integer][colon][alphanum][colon][integer]. For example, 42100:ZBA01:20. I need to split these by colon...

I'd like to learn regex, so if you could, tell me what I'm doing wrong: This is what I've been able to come up with...

^(\d):([A-Za-z0-9_]):(\d)+$
^(\d+)$ 
^[a-zA-Z0-9_](:)+$
^(:)(\d+)$

At first I tried matching parts of the string, these matching the entire string. As you can tell, I'm not very familiar with regular expressions.

EDIT: The regex is for input into a desktop application. I'm was not certain what 'language' or 'type' of regex to use, so I assumed .NET . I need to be able to identify each of those grouped characters, split by colon. So Group #1 should be the first integer, Group #2 should be the alphanumeric group, Group #3 should be an integer (ranging 1-4).

Thank you in advance,

Darius

2
Sorry about the semi-colons, that was a change I had made. Post updated.Darius
and you need to specify the language you are using..regex implementation differs across languagesAnirudha

2 Answers

7
votes

I assume the semicolons (;) are meant to be colons (:)? All right, a bit of the basics.

  • ^ matches the beginning of the input. That is, the regular expression will only match if it finds a match at the start of the input.
  • Similarly, $ matches the end of the input.

^(\d+)$ will match a string consisting only of one or more numbers. This is because the match needs to start at the beginning of the input and stop at the end of the input. In other words, the whole input needs to match (not just a part of it). The + denotes one or more matches.

With this knowledge, you'll notice that ^(\d):([A-Za-z0-9_]):(\d)+$ was actually very close to being right. This expression indicates that the whole input needs to match:

  1. one digit;
  2. a colon;
  3. one word character (or an alphanumeric character as you call it);
  4. a colon;
  5. one or more digits.

The problem is clearly in 1 and 3. You need to add a + quantifier there to match one or more times instead of just once. Also, you want to place these quantifiers inside the capturing groups in order to get the multiple matches inside one capturing group as opposed to receiving multiple capturing groups containing single matches.

^(\d+):([A-Za-z0-9_]+):(\d+)$
3
votes

You need to use quantifiers

^(\d+):([A-Za-z0-9_]+):(\d+)$
    ^     ^     ^

+ is quantifier that matches preceeding pattern 1 to many times

Now you can access the values by accessing the particular groups