I am using Splunk to extract a number of fields from xml data this is contained in a log file. So to limit the search to be MOSTLY the xml file I start the search with this: sourcetype="name of type here" "RULE"
This returns:
0123459 TripMessage.createMessage MsgSource <?xml version="1.0" encoding="UTF-8"?>
<tmsTrip xmlns="http://ground.fedex.com/schemas/linehaul/trip" xmlns:ns2="http://ground.fedex.com/schemas/linehaul/TMSCommon">
...
The file is very large. This is part of it.
<?xml version="1.0" encoding="UTF-8"?>
<tmsTrip xmlns="http://ground.fedex.com/schemas/linehaul/trip" xmlns:ns2="http://ground.fedex.com/schemas/linehaul/TMSCommon">
<recordType>PURCHASEDLINEHAUL</recordType>
<eventType>APPROVE</eventType>
<tripId>116029927</tripId>
<legId>104257037</legId>
<tripNumber>104257037</tripNumber>
<tripLegNumber>1</tripLegNumber>
<updatedDateGMT>2020-02-20T21:53:39.000Z</updatedDateGMT>
.... more lines here that are not important
<purchasedCost>
<purchasedCostTripSegment>
<purchCostReference>1587040</purchCostReference>
<carrier>FXTR</carrier>
<vendorType>DRAY</vendorType>
<billingMethod>RULE</billingMethod>
<carrierTrailerType>PZ1</carrierTrailerType>
<origin>
<ns2:numberCode>923</ns2:numberCode>
<ns2:locAbbr>RLTO</ns2:locAbbr>
<ns2:address1>330 RESOURCE DRIVE</ns2:address1>
<ns2:address2>LH PHONE 877-851-3543</ns2:address2>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</origin>
This query selects the xml part text in the logging file and some of the fields are extracted and I can add to a table. (not including the source and sourcetype..)
| xmlkv | table purchCostReference, eventType, carrier, billingMethod
But need more fields that are child elements within the xml data. One of them is the numberCode. I am trying to use xpath to extract these additional fields.
| xmlkv | xpath
"//tmsTrip/purchasedCost/purchasedCostTripSegment/origin/ns2:numberCode" outfield=Origin | table purchCostReference, eventType, carrier, billingMethod, Origin
But no Origin data is returned when I add the field to the table. There is no error. The Origin column is empty.
UPDATE I think the problem is that I need to add the field parameter. The xml file is within a log text file. I limit the search to get the xml file but not only the xml. So I think xpath is struggling with the other text that is not xml.
UPDATE I tried creating an extracted field using the wizard of the xml file that is within the logging statement. The xml is huge and I can only select about 30% of it. If anyone is good at regex, maybe they can give me some pointers as to how to complete the regex command to get all of the xml. (I tried updating the props.conf file but do not have permission to add TRUNCATE = 0). This is the xml file sample:
<?xml version="1.0" encoding="UTF-8"?>
<tmsTrip xmlns="http://ground.fedex.com/schemas/linehaul/trip" xmlns:ns2="http://ground.fedex.com/schemas/linehaul/TMSCommon">
<recordType>PURCHASEDLINEHAUL</recordType>
<eventType>APPROVE</eventType>
<tripId>143642990</tripId>
<legId>129014817</legId>
<tripNumber>129014817</tripNumber>
<tripLegNumber>1</tripLegNumber>
<updatedDateGMT>2020-05-22T00:53:21.000Z</updatedDateGMT>
<origin>
<ns2:numberCode>928</ns2:numberCode>
<ns2:locAbbr>ANAH</ns2:locAbbr>
<ns2:address1>590 E ORANGE THORPE AVENUE</ns2:address1>
<ns2:city>ANAHEIM</ns2:city>
<ns2:stateProvince>CA</ns2:stateProvince>
<ns2:postalCode>92801</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>1</ns2:numberType>
<ns2:timeZoneAbbr>PST</ns2:timeZoneAbbr>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</origin>
<destination>
<ns2:numberCode>89</ns2:numberCode>
<ns2:locAbbr>WOOD</ns2:locAbbr>
<ns2:address1>6000 RIVERSIDE DR</ns2:address1>
<ns2:address2>LH PHONE 732-512-5579</ns2:address2>
<ns2:city>KEASBEY</ns2:city>
<ns2:stateProvince>NJ</ns2:stateProvince>
<ns2:postalCode>08832</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>2</ns2:numberType>
<ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</destination>
<schedDispatchDateGMT>2020-05-22T13:00:00.000Z</schedDispatchDateGMT>
<estimatedArrivalDateGMT>2020-05-26T06:00:00.000Z</estimatedArrivalDateGMT>
<drop/>
<hook/>
<actualRoute>
<routeNumber>308229</routeNumber>
<routeOrderNumber>0</routeOrderNumber>
<totalMiles>2787</totalMiles>
<runTime>54.6</runTime>
</actualRoute>
<standardRoute>
<routeNumber>308229</routeNumber>
<routeOrderNumber>0</routeOrderNumber>
<totalMiles>2787</totalMiles>
<runTime>54.6</runTime>
</standardRoute>
<paidRoute>
<routeNumber>308229</routeNumber>
<routeOrderNumber>0</routeOrderNumber>
<totalMiles>2787</totalMiles>
<runTime>54.6</runTime>
</paidRoute>
<settlement>
<dispatchSettlementEligibility>false</dispatchSettlementEligibility>
</settlement>
<livePkgCount>0.0</livePkgCount>
<tripTollAmount>0.0</tripTollAmount>
<trailers>
<ns2:trailer>
<ns2:trailerNbr>531823</ns2:trailerNbr>
<ns2:trailerPrefix>FDXU</ns2:trailerPrefix>
<ns2:configOrderNbr>1</ns2:configOrderNbr>
<ns2:sealNbr>60606220</ns2:sealNbr>
<ns2:packageWeight>9931.59</ns2:packageWeight>
<ns2:unladenWeight>13870.0</ns2:unladenWeight>
<ns2:totalWeight>23801.59</ns2:totalWeight>
<ns2:packageNumber>703</ns2:packageNumber>
<ns2:percentCube>1</ns2:percentCube>
<ns2:hazmatFlag>false</ns2:hazmatFlag>
<ns2:originPlanned>
<ns2:numberCode>928</ns2:numberCode>
<ns2:locAbbr>ANAH</ns2:locAbbr>
<ns2:address1>590 E ORANGE THORPE AVENUE</ns2:address1>
<ns2:city>ANAHEIM</ns2:city>
<ns2:stateProvince>CA</ns2:stateProvince>
<ns2:postalCode>92801</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>1</ns2:numberType>
<ns2:timeZoneAbbr>PST</ns2:timeZoneAbbr>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</ns2:originPlanned>
<ns2:nextSortLocation>
<ns2:numberCode>89</ns2:numberCode>
<ns2:locAbbr>WOOD</ns2:locAbbr>
<ns2:address1>6000 RIVERSIDE DR</ns2:address1>
<ns2:address2>LH PHONE 732-512-5579</ns2:address2>
<ns2:city>KEASBEY</ns2:city>
<ns2:stateProvince>NJ</ns2:stateProvince>
<ns2:postalCode>08832</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>2</ns2:numberType>
<ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</ns2:nextSortLocation>
<ns2:destinationPlanned>
<ns2:numberCode>89</ns2:numberCode>
<ns2:locAbbr>WOOD</ns2:locAbbr>
<ns2:address1>6000 RIVERSIDE DR</ns2:address1>
<ns2:address2>LH PHONE 732-512-5579</ns2:address2>
<ns2:city>KEASBEY</ns2:city>
<ns2:stateProvince>NJ</ns2:stateProvince>
<ns2:postalCode>08832</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>2</ns2:numberType>
<ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</ns2:destinationPlanned>
<ns2:loads>
<ns2:load>
<ns2:loadId>103718801</ns2:loadId>
<ns2:loadNumber>1</ns2:loadNumber>
<ns2:origin>
<ns2:numberCode>928</ns2:numberCode>
<ns2:locAbbr>ANAH</ns2:locAbbr>
<ns2:numberType>1</ns2:numberType>
</ns2:origin>
<ns2:destination>
<ns2:numberCode>89</ns2:numberCode>
<ns2:locAbbr>WOOD</ns2:locAbbr>
<ns2:address2>LH PHONE 732-512-5579</ns2:address2>
<ns2:numberType>2</ns2:numberType>
</ns2:destination>
<ns2:openDateGMT>2020-05-21T19:53:46.000Z</ns2:openDateGMT>
<ns2:dueOverrideFlag>false</ns2:dueOverrideFlag>
<ns2:hazmatFlag>false</ns2:hazmatFlag>
</ns2:load>
</ns2:loads>
</ns2:trailer>
</trailers>
<dollys/>
<purchasedCost>
<purchasedCostTripSegment>
<purchCostReference>2625998</purchCostReference>
<carrier>BNSF</carrier>
<vendorType>RAIL</vendorType>
<carrierTrailerType>53PC</carrierTrailerType>
<origin>
<ns2:numberCode>4022</ns2:numberCode>
<ns2:locAbbr>BNSF</ns2:locAbbr>
<ns2:address1>3770 EAST WASHINGTON AVENUE</ns2:address1>
<ns2:city>LOS ANGELES</ns2:city>
<ns2:stateProvince>CA</ns2:stateProvince>
<ns2:postalCode>90040</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>8</ns2:numberType>
<ns2:timeZoneAbbr>PST</ns2:timeZoneAbbr>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</origin>
<destination>
<ns2:numberCode>4040</ns2:numberCode>
<ns2:locAbbr>CROX</ns2:locAbbr>
<ns2:address1>NORFOLK SOUTHERN RAILROAD</ns2:address1>
<ns2:address2>125 COUNTY ROAD</ns2:address2>
<ns2:city>CROXTON</ns2:city>
<ns2:stateProvince>NJ</ns2:stateProvince>
<ns2:postalCode>07307</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>8</ns2:numberType>
<ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</destination>
<stopOff>
<ns2:stopOffLocation>
<ns2:numberCode>9996</ns2:numberCode>
<ns2:stateProvince>DU</ns2:stateProvince>
<ns2:postalCode>00000</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>1</ns2:numberType>
</ns2:stopOffLocation>
</stopOff>
<schedDispatchDate>2020-05-22T05:00:00.000Z</schedDispatchDate>
<estimatedArrivalDate>2020-05-26T00:59:00.000Z</estimatedArrivalDate>
<billingMethod>RULE</billingMethod>
<STCCCode>4711110</STCCCode>
<planNumber>065</planNumber>
<powerType>1X</powerType>
<powerOnlyFlag>false</powerOnlyFlag>
</purchasedCostTripSegment>
<purchasedCostTripSegment>
<purchCostReference>2625998</purchCostReference>
<carrier>NS</carrier>
<vendorType>RAIL</vendorType>
<carrierTrailerType>53PC</carrierTrailerType>
<origin>
<ns2:numberCode>4061</ns2:numberCode>
<ns2:locAbbr>NSAU</ns2:locAbbr>
<ns2:address1>6300 SOUTH INDIANA AVENUE</ns2:address1>
<ns2:city>CHICAGO</ns2:city>
<ns2:stateProvince>IL</ns2:stateProvince>
<ns2:postalCode>60637</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>8</ns2:numberType>
<ns2:timeZoneAbbr>CST</ns2:timeZoneAbbr>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</origin>
<destination>
<ns2:numberCode>4040</ns2:numberCode>
<ns2:locAbbr>CROX</ns2:locAbbr>
<ns2:address1>NORFOLK SOUTHERN RAILROAD</ns2:address1>
<ns2:address2>125 COUNTY ROAD</ns2:address2>
<ns2:city>CROXTON</ns2:city>
<ns2:stateProvince>NJ</ns2:stateProvince>
<ns2:postalCode>07307</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>8</ns2:numberType>
<ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</destination>
<stopOff>
<ns2:stopOffLocation>
<ns2:numberCode>4040</ns2:numberCode>
<ns2:locAbbr>CROX</ns2:locAbbr>
<ns2:address1>NORFOLK SOUTHERN RAILROAD</ns2:address1>
<ns2:address2>125 COUNTY ROAD</ns2:address2>
<ns2:city>CROXTON</ns2:city>
<ns2:stateProvince>NJ</ns2:stateProvince>
<ns2:postalCode>07307</ns2:postalCode>
<ns2:locType>FDEG</ns2:locType>
<ns2:numberType>8</ns2:numberType>
<ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
<ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
</ns2:stopOffLocation>
</stopOff>
<schedDispatchDate>2020-05-22T05:00:00.000Z</schedDispatchDate>
<estimatedArrivalDate>2020-05-26T01:00:00.000Z</estimatedArrivalDate>
<billingMethod>LOCAL</billingMethod>
<STCCCode>4711110</STCCCode>
<planNumber>045</planNumber>
<powerType>1X</powerType>
<powerOnlyFlag>false</powerOnlyFlag>
</purchasedCostTripSegment>
</purchasedCost>
<drivers/>
</tmsTrip>
This is how much the extracted field I can select: http://ground.fedex.com/schemas/linehaul/trip\" xmlns:ns2=\"http://ground.fedex.com/schemas/linehaul/TMSCommon\"> PURCHASEDLINEHAUL APPROVE 143642990 129014817 129014817 1 2020-05-22T00:53:21.000Z 928 ANAH 590 E ORANGE THORPE AVENUE ANAHEIM CA 92801 FDEG 1 PST true
This is the regex that Splunk creates to select the above xml
^[^\$\n]*\$\d+\.\w+\s+\w+\s+(?P<xmlMessage><\?\w+\s+\w+="\d+\.\d+"\s+\w+="\w+\-\d+"\?>\s+<\w+\s+\w+="\w+://\w+\.\w+\.\w+/\w+/\w+/\w+"\s+\w+:\w+="\w+://\w+\.\w+\.\w+/\w+/\w+/\w+">\s+<\w+>\w+</\w+>\s+<\w+>\w+</\w+>\s+<\w+>\d+</\w+>\s+<\w+>\d+</\w+>\s+<\w+>\d+</\w+>\s+<\w+>\d+</\w+>\s+<\w+>\d+\-\d+\-\d+\w+:\d+:\d+\.\d+\w+</\w+>\s+<\w+>\s+<\w+:\w+>\d+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>\s+<\w+:\w+>\d+\s+\w+\s+\w+\s+\w+\s+\w+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>\s+<\w+:\w+>\d+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>\s+<\w+:\w+>\d+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>)
So can I change the above regex to include the entire xml?
UPDATE I tried extracting a field from the xmlMessage extracted field. The xmlMessage field is above. I used the xpath command to extract recordType. Put the result in a table. This is the command
| xmlkv | xpath field=xmlMessage
"//tmsTrip/recordType" outfield=Origin | table Origin
It returned no results. This xpath command does not work for the simplest of queries. What am I doing wrong?
ns2:numberCode
) is what's tripping it up. You could try this ugly workaround://tmsTrip/purchasedCost/purchasedCostTripSegment/origin/*[local-name() = 'numberCode']
– Trevor Lawrencexmlns="http://ground.fedex.com/schemas/linehaul/trip"
does), so it's still possible that the xpath expression is evaluating correctly, it's just not selecting what you want. Anyways, my knowledge only extends to XPath, so I'll just offer the hideous:/*[local-name() = 'tmsTrip']/*[local-name() = 'recordType']
. If that still doesn't return anything, then I'm out of my depth and must bow out. – Trevor Lawrence