0
votes

Here's a sample string:

Lorem ipsum dolor sit amet, ad eam option suscipit invidunt, ius propriae detracto cu. Nec te wisi lo{"firstName":"John", "lastName":"Doe"}rem, in quo vocent erroribus {"firstName":"Anna", "lastName":"Smith"}dissentias. At omittam pertinax senserit est, pri nihil alterum omittam ad, vix aperiam sententiae an. Ferri accusam an eos, an facete tractatos moderatius sea{"firstName":"Peter", "lastName":"Jones"}. Mel ad sale utamur, qui ut oportere omittantur, eos in facer ludus dicant.

Assume the following data model exists:

public class Person
{
   public string firstName;
   public string lastName;
}

How could I use regex to extract JSON out of this text and create a List<Person> with:

{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}

Objects can be buried anywhere in the string, so their position relative to words, letters, punctuations, whitespaces, etc. does not matter. If the above JSON notation is broken, simply ignore it. The following would be invalid:

{"firstName":"John", "middleName":"", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith", "age":""},
{"firstName":"Peter", "lastName":"Jones" some text}

In other words, pattern search must be strict to the following:

{"firstName":"[val]", "lastName":"[val]"}
2
Does it have to be a Regex? It might be easier to just do this with procedural code. - Matthew Haugen
No, it doesn't have to be Regex. Just need to get it done. - Mark13426
You shouldn't use Regex with hierarchical data like JSON for the same reasons as XML - Micky
To Micky's point, is it legal to have nested JSON objects? I'd assume so, depending on where this is coming from. Or, equally frustrating, I assume you could have a string that contains a curly brace? Nobody would have a name with one, I assume, but it'd be totally valid JSON. - Matthew Haugen
Nested objects aren't a concern at the moment. Ah, never thought about curly braces in the middle. I could instead use another symbol or substring and disallow the user from inputting it. - Mark13426

2 Answers

0
votes

use this code snippet,

//Take All first Name
    string strRegex1 = @"firstName"":""([^""""]*)"",";
//Take All Last Name
    string strRegex2 = @"lastName"":""([^""""]*)""";
    Regex myRegex = new Regex(strRegex, RegexOptions.None);
   Regex myRegex2 = new Regex(strRegex2, RegexOptions.None);
    string strTargetString = @"{""firstName"":""John"", ""middleName"":"""", ""lastName"":""Doe""}," + "\n" + @"{""firstName"":""Anna"", ""lastName"":""Smith"", ""age"":""""}," + "\n" + @"{""firstName"":""Peter"", ""lastName"":""Jones"" some text}";

    foreach (Match myMatch in myRegex.Matches(strTargetString))
    {
      if (myMatch.Success)
      {
       // Add your code here for First Name
      }
    }

foreach (Match myMatch in myRegex2.Matches(strTargetString))
    {
      if (myMatch.Success)
      {
        // Add your code herefor Last Name
      }
    }
0
votes

Here's a regex that you could use to extract the values:

({\s*"firstName":\s*"[^"]+",\s*"lastName":\s*"[^"]+"\s*})

After this, I'd suggest just using Json.NET to deserialize the objects.