0
votes

I am using Splunk to extract a number of fields from xml data this is contained in a log file. So to limit the search to be MOSTLY the xml file I start the search with this: sourcetype="name of type here" "RULE"

This returns:

0123459 TripMessage.createMessage MsgSource <?xml version="1.0" encoding="UTF-8"?>
<tmsTrip xmlns="http://ground.fedex.com/schemas/linehaul/trip" xmlns:ns2="http://ground.fedex.com/schemas/linehaul/TMSCommon">

...

The file is very large. This is part of it.

<?xml version="1.0" encoding="UTF-8"?>
<tmsTrip xmlns="http://ground.fedex.com/schemas/linehaul/trip" xmlns:ns2="http://ground.fedex.com/schemas/linehaul/TMSCommon">
   <recordType>PURCHASEDLINEHAUL</recordType>
   <eventType>APPROVE</eventType>
   <tripId>116029927</tripId>
   <legId>104257037</legId>
   <tripNumber>104257037</tripNumber>
   <tripLegNumber>1</tripLegNumber>
   <updatedDateGMT>2020-02-20T21:53:39.000Z</updatedDateGMT>
.... more lines here that are not important
     <purchasedCost>
      <purchasedCostTripSegment>
         <purchCostReference>1587040</purchCostReference>
         <carrier>FXTR</carrier>
         <vendorType>DRAY</vendorType>
         <billingMethod>RULE</billingMethod>
         <carrierTrailerType>PZ1</carrierTrailerType>
         <origin>
            <ns2:numberCode>923</ns2:numberCode>
            <ns2:locAbbr>RLTO</ns2:locAbbr>
            <ns2:address1>330 RESOURCE DRIVE</ns2:address1>
            <ns2:address2>LH PHONE 877-851-3543</ns2:address2>
            <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
         </origin>

This query selects the xml part text in the logging file and some of the fields are extracted and I can add to a table. (not including the source and sourcetype..)

| xmlkv | table purchCostReference, eventType, carrier, billingMethod

But need more fields that are child elements within the xml data. One of them is the numberCode. I am trying to use xpath to extract these additional fields.

| xmlkv | xpath
"//tmsTrip/purchasedCost/purchasedCostTripSegment/origin/ns2:numberCode" outfield=Origin | table purchCostReference, eventType, carrier, billingMethod, Origin

But no Origin data is returned when I add the field to the table. There is no error. The Origin column is empty. enter image description here

UPDATE I think the problem is that I need to add the field parameter. The xml file is within a log text file. I limit the search to get the xml file but not only the xml. So I think xpath is struggling with the other text that is not xml.

UPDATE I tried creating an extracted field using the wizard of the xml file that is within the logging statement. The xml is huge and I can only select about 30% of it. If anyone is good at regex, maybe they can give me some pointers as to how to complete the regex command to get all of the xml. (I tried updating the props.conf file but do not have permission to add TRUNCATE = 0). This is the xml file sample:

<?xml version="1.0" encoding="UTF-8"?>
<tmsTrip xmlns="http://ground.fedex.com/schemas/linehaul/trip" xmlns:ns2="http://ground.fedex.com/schemas/linehaul/TMSCommon">
   <recordType>PURCHASEDLINEHAUL</recordType>
   <eventType>APPROVE</eventType>
   <tripId>143642990</tripId>
   <legId>129014817</legId>
   <tripNumber>129014817</tripNumber>
   <tripLegNumber>1</tripLegNumber>
   <updatedDateGMT>2020-05-22T00:53:21.000Z</updatedDateGMT>
   <origin>
      <ns2:numberCode>928</ns2:numberCode>
      <ns2:locAbbr>ANAH</ns2:locAbbr>
      <ns2:address1>590 E ORANGE THORPE AVENUE</ns2:address1>
      <ns2:city>ANAHEIM</ns2:city>
      <ns2:stateProvince>CA</ns2:stateProvince>
      <ns2:postalCode>92801</ns2:postalCode>
      <ns2:locType>FDEG</ns2:locType>
      <ns2:numberType>1</ns2:numberType>
      <ns2:timeZoneAbbr>PST</ns2:timeZoneAbbr>
      <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
   </origin>
   <destination>
      <ns2:numberCode>89</ns2:numberCode>
      <ns2:locAbbr>WOOD</ns2:locAbbr>
      <ns2:address1>6000 RIVERSIDE DR</ns2:address1>
      <ns2:address2>LH PHONE 732-512-5579</ns2:address2>
      <ns2:city>KEASBEY</ns2:city>
      <ns2:stateProvince>NJ</ns2:stateProvince>
      <ns2:postalCode>08832</ns2:postalCode>
      <ns2:locType>FDEG</ns2:locType>
      <ns2:numberType>2</ns2:numberType>
      <ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
      <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
   </destination>
   <schedDispatchDateGMT>2020-05-22T13:00:00.000Z</schedDispatchDateGMT>
   <estimatedArrivalDateGMT>2020-05-26T06:00:00.000Z</estimatedArrivalDateGMT>
   <drop/>
   <hook/>
   <actualRoute>
      <routeNumber>308229</routeNumber>
      <routeOrderNumber>0</routeOrderNumber>
      <totalMiles>2787</totalMiles>
      <runTime>54.6</runTime>
   </actualRoute>
   <standardRoute>
      <routeNumber>308229</routeNumber>
      <routeOrderNumber>0</routeOrderNumber>
      <totalMiles>2787</totalMiles>
      <runTime>54.6</runTime>
   </standardRoute>
   <paidRoute>
      <routeNumber>308229</routeNumber>
      <routeOrderNumber>0</routeOrderNumber>
      <totalMiles>2787</totalMiles>
      <runTime>54.6</runTime>
   </paidRoute>
   <settlement>
      <dispatchSettlementEligibility>false</dispatchSettlementEligibility>
   </settlement>
   <livePkgCount>0.0</livePkgCount>
   <tripTollAmount>0.0</tripTollAmount>
   <trailers>
      <ns2:trailer>
         <ns2:trailerNbr>531823</ns2:trailerNbr>
         <ns2:trailerPrefix>FDXU</ns2:trailerPrefix>
         <ns2:configOrderNbr>1</ns2:configOrderNbr>
         <ns2:sealNbr>60606220</ns2:sealNbr>
         <ns2:packageWeight>9931.59</ns2:packageWeight>
         <ns2:unladenWeight>13870.0</ns2:unladenWeight>
         <ns2:totalWeight>23801.59</ns2:totalWeight>
         <ns2:packageNumber>703</ns2:packageNumber>
         <ns2:percentCube>1</ns2:percentCube>
         <ns2:hazmatFlag>false</ns2:hazmatFlag>
         <ns2:originPlanned>
            <ns2:numberCode>928</ns2:numberCode>
            <ns2:locAbbr>ANAH</ns2:locAbbr>
            <ns2:address1>590 E ORANGE THORPE AVENUE</ns2:address1>
            <ns2:city>ANAHEIM</ns2:city>
            <ns2:stateProvince>CA</ns2:stateProvince>
            <ns2:postalCode>92801</ns2:postalCode>
            <ns2:locType>FDEG</ns2:locType>
            <ns2:numberType>1</ns2:numberType>
            <ns2:timeZoneAbbr>PST</ns2:timeZoneAbbr>
            <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
         </ns2:originPlanned>
         <ns2:nextSortLocation>
            <ns2:numberCode>89</ns2:numberCode>
            <ns2:locAbbr>WOOD</ns2:locAbbr>
            <ns2:address1>6000 RIVERSIDE DR</ns2:address1>
            <ns2:address2>LH PHONE 732-512-5579</ns2:address2>
            <ns2:city>KEASBEY</ns2:city>
            <ns2:stateProvince>NJ</ns2:stateProvince>
            <ns2:postalCode>08832</ns2:postalCode>
            <ns2:locType>FDEG</ns2:locType>
            <ns2:numberType>2</ns2:numberType>
            <ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
            <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
         </ns2:nextSortLocation>
         <ns2:destinationPlanned>
            <ns2:numberCode>89</ns2:numberCode>
            <ns2:locAbbr>WOOD</ns2:locAbbr>
            <ns2:address1>6000 RIVERSIDE DR</ns2:address1>
            <ns2:address2>LH PHONE 732-512-5579</ns2:address2>
            <ns2:city>KEASBEY</ns2:city>
            <ns2:stateProvince>NJ</ns2:stateProvince>
            <ns2:postalCode>08832</ns2:postalCode>
            <ns2:locType>FDEG</ns2:locType>
            <ns2:numberType>2</ns2:numberType>
            <ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
            <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
         </ns2:destinationPlanned>
         <ns2:loads>
            <ns2:load>
               <ns2:loadId>103718801</ns2:loadId>
               <ns2:loadNumber>1</ns2:loadNumber>
               <ns2:origin>
                  <ns2:numberCode>928</ns2:numberCode>
                  <ns2:locAbbr>ANAH</ns2:locAbbr>
                  <ns2:numberType>1</ns2:numberType>
               </ns2:origin>
               <ns2:destination>
                  <ns2:numberCode>89</ns2:numberCode>
                  <ns2:locAbbr>WOOD</ns2:locAbbr>
                  <ns2:address2>LH PHONE 732-512-5579</ns2:address2>
                  <ns2:numberType>2</ns2:numberType>
               </ns2:destination>
               <ns2:openDateGMT>2020-05-21T19:53:46.000Z</ns2:openDateGMT>
               <ns2:dueOverrideFlag>false</ns2:dueOverrideFlag>
               <ns2:hazmatFlag>false</ns2:hazmatFlag>
            </ns2:load>
         </ns2:loads>
      </ns2:trailer>
   </trailers>
   <dollys/>
   <purchasedCost>
      <purchasedCostTripSegment>
         <purchCostReference>2625998</purchCostReference>
         <carrier>BNSF</carrier>
         <vendorType>RAIL</vendorType>
         <carrierTrailerType>53PC</carrierTrailerType>
         <origin>
            <ns2:numberCode>4022</ns2:numberCode>
            <ns2:locAbbr>BNSF</ns2:locAbbr>
            <ns2:address1>3770 EAST WASHINGTON AVENUE</ns2:address1>
            <ns2:city>LOS ANGELES</ns2:city>
            <ns2:stateProvince>CA</ns2:stateProvince>
            <ns2:postalCode>90040</ns2:postalCode>
            <ns2:locType>FDEG</ns2:locType>
            <ns2:numberType>8</ns2:numberType>
            <ns2:timeZoneAbbr>PST</ns2:timeZoneAbbr>
            <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
         </origin>
         <destination>
            <ns2:numberCode>4040</ns2:numberCode>
            <ns2:locAbbr>CROX</ns2:locAbbr>
            <ns2:address1>NORFOLK SOUTHERN RAILROAD</ns2:address1>
            <ns2:address2>125 COUNTY ROAD</ns2:address2>
            <ns2:city>CROXTON</ns2:city>
            <ns2:stateProvince>NJ</ns2:stateProvince>
            <ns2:postalCode>07307</ns2:postalCode>
            <ns2:locType>FDEG</ns2:locType>
            <ns2:numberType>8</ns2:numberType>
            <ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
            <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
         </destination>
         <stopOff>
            <ns2:stopOffLocation>
               <ns2:numberCode>9996</ns2:numberCode>
               <ns2:stateProvince>DU</ns2:stateProvince>
               <ns2:postalCode>00000</ns2:postalCode>
               <ns2:locType>FDEG</ns2:locType>
               <ns2:numberType>1</ns2:numberType>
            </ns2:stopOffLocation>
         </stopOff>
         <schedDispatchDate>2020-05-22T05:00:00.000Z</schedDispatchDate>
         <estimatedArrivalDate>2020-05-26T00:59:00.000Z</estimatedArrivalDate>
         <billingMethod>RULE</billingMethod>
         <STCCCode>4711110</STCCCode>
         <planNumber>065</planNumber>
         <powerType>1X</powerType>
         <powerOnlyFlag>false</powerOnlyFlag>
      </purchasedCostTripSegment>
      <purchasedCostTripSegment>
         <purchCostReference>2625998</purchCostReference>
         <carrier>NS</carrier>
         <vendorType>RAIL</vendorType>
         <carrierTrailerType>53PC</carrierTrailerType>
         <origin>
            <ns2:numberCode>4061</ns2:numberCode>
            <ns2:locAbbr>NSAU</ns2:locAbbr>
            <ns2:address1>6300 SOUTH INDIANA AVENUE</ns2:address1>
            <ns2:city>CHICAGO</ns2:city>
            <ns2:stateProvince>IL</ns2:stateProvince>
            <ns2:postalCode>60637</ns2:postalCode>
            <ns2:locType>FDEG</ns2:locType>
            <ns2:numberType>8</ns2:numberType>
            <ns2:timeZoneAbbr>CST</ns2:timeZoneAbbr>
            <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
         </origin>
         <destination>
            <ns2:numberCode>4040</ns2:numberCode>
            <ns2:locAbbr>CROX</ns2:locAbbr>
            <ns2:address1>NORFOLK SOUTHERN RAILROAD</ns2:address1>
            <ns2:address2>125 COUNTY ROAD</ns2:address2>
            <ns2:city>CROXTON</ns2:city>
            <ns2:stateProvince>NJ</ns2:stateProvince>
            <ns2:postalCode>07307</ns2:postalCode>
            <ns2:locType>FDEG</ns2:locType>
            <ns2:numberType>8</ns2:numberType>
            <ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
            <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
         </destination>
         <stopOff>
            <ns2:stopOffLocation>
               <ns2:numberCode>4040</ns2:numberCode>
               <ns2:locAbbr>CROX</ns2:locAbbr>
               <ns2:address1>NORFOLK SOUTHERN RAILROAD</ns2:address1>
               <ns2:address2>125 COUNTY ROAD</ns2:address2>
               <ns2:city>CROXTON</ns2:city>
               <ns2:stateProvince>NJ</ns2:stateProvince>
               <ns2:postalCode>07307</ns2:postalCode>
               <ns2:locType>FDEG</ns2:locType>
               <ns2:numberType>8</ns2:numberType>
               <ns2:timeZoneAbbr>EST</ns2:timeZoneAbbr>
               <ns2:daylightSavingsFlag>true</ns2:daylightSavingsFlag>
            </ns2:stopOffLocation>
         </stopOff>
         <schedDispatchDate>2020-05-22T05:00:00.000Z</schedDispatchDate>
         <estimatedArrivalDate>2020-05-26T01:00:00.000Z</estimatedArrivalDate>
         <billingMethod>LOCAL</billingMethod>
         <STCCCode>4711110</STCCCode>
         <planNumber>045</planNumber>
         <powerType>1X</powerType>
         <powerOnlyFlag>false</powerOnlyFlag>
      </purchasedCostTripSegment>
   </purchasedCost>
   <drivers/>
</tmsTrip>

This is how much the extracted field I can select: http://ground.fedex.com/schemas/linehaul/trip\" xmlns:ns2=\"http://ground.fedex.com/schemas/linehaul/TMSCommon\"> PURCHASEDLINEHAUL APPROVE 143642990 129014817 129014817 1 2020-05-22T00:53:21.000Z 928 ANAH 590 E ORANGE THORPE AVENUE ANAHEIM CA 92801 FDEG 1 PST true

This is the regex that Splunk creates to select the above xml

^[^\$\n]*\$\d+\.\w+\s+\w+\s+(?P<xmlMessage><\?\w+\s+\w+="\d+\.\d+"\s+\w+="\w+\-\d+"\?>\s+<\w+\s+\w+="\w+://\w+\.\w+\.\w+/\w+/\w+/\w+"\s+\w+:\w+="\w+://\w+\.\w+\.\w+/\w+/\w+/\w+">\s+<\w+>\w+</\w+>\s+<\w+>\w+</\w+>\s+<\w+>\d+</\w+>\s+<\w+>\d+</\w+>\s+<\w+>\d+</\w+>\s+<\w+>\d+</\w+>\s+<\w+>\d+\-\d+\-\d+\w+:\d+:\d+\.\d+\w+</\w+>\s+<\w+>\s+<\w+:\w+>\d+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>\s+<\w+:\w+>\d+\s+\w+\s+\w+\s+\w+\s+\w+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>\s+<\w+:\w+>\d+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>\s+<\w+:\w+>\d+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>\s+<\w+:\w+>\w+</\w+:\w+>)

So can I change the above regex to include the entire xml?

UPDATE I tried extracting a field from the xmlMessage extracted field. The xmlMessage field is above. I used the xpath command to extract recordType. Put the result in a table. This is the command

| xmlkv | xpath field=xmlMessage
"//tmsTrip/recordType" outfield=Origin | table Origin

It returned no results. This xpath command does not work for the simplest of queries. What am I doing wrong?

3
I'm not familiar with Splunk, but it's possible the namespaced element (ns2:numberCode) is what's tripping it up. You could try this ugly workaround: //tmsTrip/purchasedCost/purchasedCostTripSegment/origin/*[local-name() = 'numberCode']Trevor Lawrence
That still did not return any results. I tried selecting one of the parent elements like "//tmsTrip/recordType" and it also does not return anything. I don't know if Splunk has to be configured to use the xpath command also.Gloria Santin
Technically, even those elements are in a namespace (that's what xmlns="http://ground.fedex.com/schemas/linehaul/trip" does), so it's still possible that the xpath expression is evaluating correctly, it's just not selecting what you want. Anyways, my knowledge only extends to XPath, so I'll just offer the hideous: /*[local-name() = 'tmsTrip']/*[local-name() = 'recordType']. If that still doesn't return anything, then I'm out of my depth and must bow out.Trevor Lawrence
Thanks for trying...This is an xml file within a text file. So I think this is specific to SplunkGloria Santin

3 Answers

0
votes

Without seeing the rest of the event data, I can't say why the xpath command isn't working.

However, as a workaround, try the following instead of the xmlkv and xpath commands

| rex field=_raw "numberCode>(?<Origin>\d+)</"

This should work fine with mixed text and xml events

0
votes
| xmlkv | spath path="tmsTrip.purchasedCost.purchasedCostTripSegment.origin.ns2:numberCode" output=Origin

Try spath

0
votes

I was able to extract the data from the xml using rex and identify each instance of the numberCode using max_match and mvindex Here is an example of anyone who has this problem:

 rex max_match=0 "\<ns2\:numberCode\>(?P<location>[^\<]+)"| eval Segment1_Origin =  mvindex(location, 7)

The xml element is ns2:numberCode. It is renamed to location. max_match=0 means unlimited number of instances. The mvindex is zero based. So the 8th instance of the location is set to the variable Segment1_origin