2
votes

I have the following input:

-key1:"val1" -key2: "val2" -key3:(val3) -key4: "(val4)" -key5: val5 -key6: "val-6" -key-7: val7 -key-eight: "val 8"

With only the following assumption about the pattern:

  • Keys always start with a - followed by a value delimited by :

How can I match and extract each key and it's corresponding value?

I have so far come up with the following regex:

-(?<key>\S*):\s?(?<val>\S*)

But it's currently not matching the complete value for the last argument as it contains a space but I cannot figure out how to match it.

The expected output should be:

  • key1 "val1"
  • key2 "val2"
  • key3 (val3)
  • key4 "(val4)"
  • key5 val5
  • key6 "val-6"
  • key-7 val7
  • key-eight val 8

Any help is much appreciated.

5
are you passing all key , value in single string or iterating over it ? - CodeGuru
What about simply removing that last space after this regex? No big deal - Matěj Štágl
What is your desired output/outcome? - Tim Biegeleisen
So... the space is part of the value or part of the key? ... can you give an example of in and output as Tim suggested? - Stefan
@popeye, yes all keyvals are in a single line which will be matched against the regex to extract each keyval. - MaYaN

5 Answers

4
votes

Guessing that you want to only allow whitespace characters that are not at the beginning or end, change your regex to:

-(?<key>\S*):\s?(?<val>\S+(\s*[^-\s])*)

This assumes that the character - preceeded by a whitespace unquestioningly means a new key is beginning, it cannot be a part of any value.

For this example:

-key: value -key2: value with whitespace -key3: value-with-hyphens -key4: v

The matches are: -key: value, -key2: value with whitespace, -key3: value-with-hyphens, -key4: v.

It also works perfectly well on your provided example.

1
votes

A low tech (non regex) solution, just for an alternative. Trim guff, ToDictionary if you need

var results = input.Split(new[] { " -" }, StringSplitOptions.RemoveEmptyEntries)
                   .Select(x => x.Trim('-').Split(':'));

Full Demo Here

Output

key1 -> "val1"
key2 ->  "val2"
key3 -> (val3)
key4 ->  "(val4)"
key5 ->  val5
key6 ->  "val-6"
key-7 ->  val7
key8 ->  "val 8"
1
votes

Try this regex using Replace function:

(?:^|(?!\S)\s*)-|\s*:\s*

and replace with "\n". You should get key values in separate lines.

1
votes

I presume you're wanting to keep the brackets and quotation marks as that's what you're doing in the example you gave? If so then the following should work:

-(?<key>\S+):+\s?(?<val>\S+\s?\d+\)?\"?)

This does presume that all val's end with a number though.

EDIT: Given that the val doesn't always end with a number, but I'm guessing it always starts with val, this is what I have:

-(?<key>\S+):+\s?(?<val>\"?\(?(val)+\s?\S+)

Seems to be working properly...

0
votes

This should do the trick

-(?<key>\S*):\s*(?<value>(?(?=")((")(?:(?=(\\?))\2.)*?\1))(\S*))

a sample link can be found here. Basically it does and if/else/then to detect if the value contain " as (?(?=")(true regex)(false regex), the false regex is yours \S* while the true regex will try to match start/end quote (")(?:(?=(\\?))\2.)*?\1).