0
votes

i got a big xml file and i want to save each id, source & target in a stringlist to generate after successfull import to stringlists build a query to mysql.

heres a snippet of my xml:

xliff version="1.1">
 <file original="Xliff Demo" source-language="EN" target-language="DE" datatype="html">
 <header>
 <skl>
 <external-file uid="017dbcf0-c82c-11e2-ba2b-005056c00008" href="skl\simple.htm.skl"/>
 </skl>
 </header>
 <body>
 <trans-unit id="00ffmnpB5wBV5KFqBxuHLi4fwJvvuB">
 <source xml:lang="EN">1lnRUfBBeHtbS96uULSht42VNMN7XE4qt9JrOcWhtoTuhnbAQ9</source>
 <target xml:lang="DE">zZvOLJfLCy9oP5GQYfEqw5LAeC2ESAxRmVe1JyQdmJ1eG2jz1N</target>
 <note/></trans-unit>
 <trans-unit id="00kjUwy1rJ54bEGYp7XZvtBiY32pmj">
 <source xml:lang="EN">HXOQLUWkfJg206vRw8lyWhCWChOacVxbMukfQ0HUdNHSI18GG4</source>
 <target xml:lang="DE">8dsX38mezeZ0w0w37LI66CDRuI8gBD23zT5KR4iqYNv3IGUgH0</target>
 <note/></trans-unit>
 <trans-unit id="00kk3Af8SFpHyelAaYrgK58b9GbIDj">
 <source xml:lang="EN">wQFxZiCiRsSNWs20G4WXAmDBRdRL6fcrrJnCgtbiXGSfHzpYrT</source>
 <target xml:lang="DE">oFVTUdPkExOhISYofIImLsnVKd3NSZg32tyeP5iRxRZdmuYQDy</target>
 <note/></trans-unit>
 <trans-unit id="00Ky2dmDU9wGTWBnJxeL9b9gkts5UQ">
 <source xml:lang="EN">nHQcjAW02lWe0SyOhqGtyqUhpwQ8qgWX3rUynMRf4BDHfVdHOC</source>
 <target xml:lang="DE">0CURp1dcZydB1V2rEZ1lnOhmYufOYbrLbh84e1ZnALlzZPVq4F</target>
 <note/></trans-unit>
 <trans-unit id="00pMSFlBfA3bJ8Xy9I78wz6XisPYcV">
 <source xml:lang="EN">IuhtaVnZtF67nxKz5dbmuy8BEMTs2X1120FzDtIplKF2Me5AsQ</source>
 <target xml:lang="DE">1BGSJQDZBm4UW974pucnX3XHuYOQYpC7nTcIH01rbKlOkVi9bo</target>
 <note/></trans-unit>
 <trans-unit id="012w2kb2d1Lo6NbJLE0BawThzsSuCJ">
 <source xml:lang="EN">0RoniOGZ7V7WTF1YQg59B8jBhRxnLVXscC1LOGPzKPYRs76oIz</source>
 <target xml:lang="DE">gyw15fkHTni2aUGWI5qiPHEz8vsJJJsW4OOqKwGYL1qzfUVfLO</target>
 <note/></trans-unit>
...
..
..

So i try to save each entry of trans-unit id, source xml:lang"EN", target xml:lang="DE" in a seperate stringlist but only the values.

Thats my code:

{ -----------  Import Procedure ------------ }
procedure TForm2.Button2Click(Sender: TObject);
var
  xmlFile, idList, sourceList, targetList: TStringList; // StringListe wo die Xml Datei eingelesen wird
  i: Integer;
  id, source, target: String;
  idTmp, idTmp2, sourceTmp, sourceTmp2, targetTmp, targetTmp2: Integer;
begin
  try
    xmlFile := TStringList.Create;
    idList := TStringList.Create;
    sourceList := TStringList.Create;
    targetList := TStringList.Create;

    if OpenDialog1.Execute then
      xmlFile.LoadFromFile(OpenDialog1.FileName);

      {Debug}
        //ShowMessage(IntToStr(XmlFile.Count));   Ausgabe der Zeilenlänge
        //ShowMessage(XmlFile[8]);                // Ausgabe der Zeile 8
      {/Debug}

      for i := 0 to xmlFile.Count-1 do // Über alle Zeilen der StringList gehen und folgendes tun:
        begin // Code pro Zeile

          {id}
          idTmp  := Pos('<trans-unit id="', xmlFile.Strings[i])+16;  //  Sucht nach trans-unit id   (16 ist die Anzahl der Länge vom Suchstring in dem Fall trans-unit id 16 Stellen lang
          if idTmp > 5 then // Überprüfen ob was gefunden wurde (Ungleich 0)
          begin
            idTmp2 := Pos('"', xmlFile.Strings[i], idTmp); // Ermittelt die Position vom Ende des Strings (")
            idList.Add(Copy(xmlFile.Strings[i], idTmp, idTmp2-idTmp));
          end;

          {source}
          sourceTmp  := Pos('<source xml:lang="EN">', xmlFile.Strings[i])+22;
          if sourceTmp > 5 then // Überprüfen ob was gefunden wurde (Ungleich 0)
          begin
            sourceTmp2 := Pos('<', xmlFile.Strings[i], sourceTmp); // Ermittelt die Position vom Ende des Strings (")
            sourceList.Add(Copy(xmlFile.Strings[i], sourceTmp, sourceTmp2-sourceTmp));
          end;

          {target}
          targetTmp  := Pos('<target xml:lang="DE">', xmlFile.Strings[i])+22;
          if targetTmp > 5 then // Überprüfen ob was gefunden wurde (Ungleich 0)
          begin
            targetTmp2 := Pos('<', xmlFile.Strings[i], targetTmp); // Ermittelt die Position vom Ende des Strings (")
            targetList.Add(Copy(xmlFile.Strings[i], targetTmp, targetTmp2-targetTmp));
          end;
        end;

      StartPerformance;
      UniConnection1.Open;
  finally
    ListBox1.items.assign(idList);
    ListBox2.items.assign(sourceList);
    ListBox3.items.assign(targetList);
    ShowMessage('Import in StringListen fertiggestellt.');
    xmlFile.Free;
    idList.Free;
    sourceList.Free;;
    targetList.Free;
  end;
end;

But it's not working like i want. My problem is, that it saves empty lines too in the stringlist and other trash. I dont really find my error and its the first time im using this copy/pos function.

Heres a screenshot

enter image description here

What should i change to fixx my problem and only save the correct strings in my 3 stringlists?

2
already the first if idTmp > 5 and similar will allways be truebummi
No, that's not how to parse XML.David Heffernan
Never ever try to parse XML yourself. See here why: stackoverflow.com/questions/1732348/…Jeroen Wiert Pluimers
@J... The approach in the code is no better than regex. Looking at the code, I cannot see a tokenizer, for a start. The sentiment of bobince's seminal answer applies equally here.David Heffernan
@JeroenWiertPluimers - I agree, of course. I suppose the best analogy is that of toying with petrol vs playing with semtex. While both ill advised, the lesser of the two ad-hoc approaches is generally, by its own unwieldiness, at least more self-limiting in the degree of monster it can become before its author realizes the futility of the approach. I fully admit, however, that I may be woefully underestimating the resolve with which a determined madman might grow the above into a million line fiend of a program.J...

2 Answers

6
votes

Maybe you should think about using the IXMLDocument interface to load the XML File into a data structure and fill your stringlists afterwards.

An example has been posted here: https://stackoverflow.com/a/8651934/2207071

3
votes

Here :

idTmp  := Pos('<trans-unit id="', xmlFile.Strings[i])+16;  
if idTmp > 5 then 
  ...

idTmp will always be greater than 5 - you are adding 16 to it no matter what and it always returns a positive value (or zero if no match).

The simplest change here would be :

 idTmp  := Pos('<trans-unit id="', xmlFile.Strings[i]);  
 if idTmp > 0 then begin //Pos returns 0 if no match found
   idTmp := idTmp + 16;
   idTmp2 := PosEx('"', xmlFile.Strings[i], idTmp); 
   idList.Add(Copy(xmlFile.Strings[i], idTmp, idTmp2-idTmp));
 end;

The change for the other two blocks would follow in a similar way.

You'll notice that I used StrUtils.PosEx here for idTmp2 - I don't know how your code compiled using Pos for the second function...

Edit

Ok, it looks like Pos was changed in XE3 to include offset overloads. If performance is your objective here (as it seems from comments) you should probably have a read of this :

http://qc.embarcadero.com/wc/qcmain.aspx?d=111103

Additionally, which I think is probably quite important, this really is a terrible way to parse XML. I highly suggest you read through some source code from projects that do this already to get a better understanding of how you should approach the problem. Some examples might be :