0
votes

I want to parse json response in nifi processor I have json data like this :

{
  "squadName": "Super hero squad",
  "homeTown": "Metro City",
  "formed": 2016,
  "secretBase": "Super tower",
  "active": true,
  "Data":{"row": [
    {
      "name": "Molecule Man",
      "age": 29,
      "secretIdentity": "Dan Jukes",
      "powers": [
        "Radiation resistance",
        "Turning tiny",
        "Radiation blast"
      ]
    },
    {
      "name": "Madame Uppercut",
      "age": 39,
      "secretIdentity": "Jane Wilson",
      "powers": [
        "Million tonne punch",
        "Damage resistance",
        "Superhuman reflexes"
      ]
    },
    {
      "name": "Eternal Flame",
      "age": 1000000,
      "secretIdentity": "Unknown",
      "powers": [
        "Immortality",
        "Heat Immunity",
        "Inferno",
        "Teleportation",
        "Interdimensional travel"
      ]
    }
  ]
}

and i want totransform it into this format:

  {"name": "Molecule Man", "age": 29,  "secretIdentity": "Dan Jukes", "powers":  ["Radiation resistance", "Turning tiny", "Radiation blast"]}
   {name": "Molecule Man", "age": 29,  "secretIdentity": "Dan Jukes", "powers":  ["Radiation resistance", "Turning tiny", "Radiation blast"]}
   {"name": "Molecule Man", "age": 29,  "secretIdentity": "Dan Jukes", "powers":  ["Radiation resistance", "Turning tiny", "Radiation blast"]}

I have already used this expression inside evaluatejsonpath processor : $.Data['row'] and thanks to it i got row data then i have used another expression inside replacetext processor: [] to get rid of this '[]' but i can't replace ',' with new line how can i do this?

1
What are you replacing the coma with? \n? Also - shouldn't you (and wouldn't it be easier to) parse the JSON using any existing framework for this? Also I don't understand how do you want to get rid of newlines by replacing commas with them (unless there is problem with your formatting in question). - Asunez
I want to replace commans after { } with new line - Sagitarius
Then simply capture }, and replace it with }\n. This will replace all commas that are next to } with a newline. This isn't what you provided with "and i want totransform it into this format", as this will add newlines where you already have them... - Asunez
how can i do it in regex can you reccomend me any online regex simuator i have found one but it is useless for me - Sagitarius
Please check my answer, if you have further questions about my solution post them as a comment there. As for regex simulator I recommend regex101.com (linked demos in my answer). - Asunez

1 Answers

1
votes

Solution

If you simply want to have each row in single line, you can simply delete all newlines that are not prefixed with },. Say after your work described in your last paragraph you ended up with something like this:

{
  "name": "Molecule Man",
  "age": 29,
  "secretIdentity": "Dan Jukes",
  "powers": [
    "Radiation resistance",
    "Turning tiny",
    "Radiation blast"
  ]
},
{
  "name": "Madame Uppercut",
  "age": 39,
  "secretIdentity": "Jane Wilson",
  "powers": [
    "Million tonne punch",
    "Damage resistance",
    "Superhuman reflexes"
  ]
},
{
  "name": "Eternal Flame",
  "age": 1000000,
  "secretIdentity": "Unknown",
  "powers": [
    "Immortality",
    "Heat Immunity",
    "Inferno",
    "Teleportation",
    "Interdimensional travel"
  ]
}

Now, substitute (?<!},)\n with (leave this empty, it's not a space). You can see this change here: Link to Regex101.com

You can also get rid of multiple spaces by changing all occurences of multiple spaces to single space with this replacement: substitute (?<!},)\s+ with _ (single space, not an underscore of course) (demo here


How does it work?

I have divided the work into two phases (you could do this with one regex, but for simplicity sake I made the division). First of all, I look for all newlines that are there in the text that are not preceded with },, as these are not the newlines we want to remove.

After removing this we almost get what we want - but it's ugly because of multiple spaces and broken formatting. So again I search for all whitespace characters (excluding the }, lines again, as newline is also a whitespace character), and then change all multiple occurences with a single occurence of a space.

End result:

{ "name": "Molecule Man", "age": 29, "secretIdentity": "Dan Jukes", "powers": [ "Radiation resistance", "Turning tiny", "Radiation blast" ]},
{ "name": "Madame Uppercut", "age": 39, "secretIdentity": "Jane Wilson", "powers": [ "Million tonne punch", "Damage resistance", "Superhuman reflexes" ]},
{ "name": "Eternal Flame", "age": 1000000, "secretIdentity": "Unknown", "powers": [ "Immortality", "Heat Immunity", "Inferno", "Teleportation", "Interdimensional travel" ]}