0
votes

So I have been spending the last two days trying every possible solution on all the other entries but have had no result so far. Our company developed a software that converts .XML files into .TXT while also filtering the fields that we need.

Recently we have received over 500 files from a client and have neither been able to run the program, nor opening it in a browser correctly. A few ways to overcome the problem are either removing manually the special characters such as ã, ç, è, ô or changing the encoding from UTF-8 to ISO-8859-1.

Sensing that it would be easier to create a command to change the encoding from all the files I got to the following command:

iconv -c -f UTF-8 -t ISO-8859-1 test.xml > test1.xml

By using this command I am able to open it in a browser and convert it correctly into .TXT by using our own program. My challenge is to apply this command to all of the 500 files. I have tried these suggestions, without result:

for %a in (*.xml) do iconv -c -f UTF-8 -t ISO-8859-1 %a

and

find . -name ".xml" -exec iconv -c -f UTF-8 -t ISO-8859-1

And several other variations of these two, but I had no results so far... Any idea or advice is welcome. Thank you in advance!

UPDATE:

I decided to give it a try with recode using:

recode UTF-8..ISO-8859-1 *.xml

but it returns:

failed: Invalid input in step 'UTF-8..ISO-8859-1'

UPDATE 2:

I have found a solution, by forcing the recode function. This is what the command looked like:

recode -f UTF-8..ISO-8859-1 *xml

I must say that all the special characters such as ã,ç,ê where lost in the process, but since I only need access to the numbers this solution works fine for me. Im sure there is a cleaner way to doing it without loosing information, but this worked for me...

2
Did you run the first on the command line (%%a is needed) or in a batch file (%a is ok)? What do you mean by "no result"? No error message? No file? Incorrect files? As it stands, we can only guess. We need more information to solve this. If the single command worked, there's no reason a loop wouldn't do - except that there's something wrong with the loop.Thomas Weller
Which technology is it? Linux or Windows/DOS?Thomas Weller
I am running it through Windows/DOS. On the first example the code does run but the files remain in the UTF-8 coding. When I use the second example it says "Access denied"P-theMoser
Hmm, that's very important. Have you tried running as admin? Can you remove readonly flags, ...Thomas Weller
@P-theMoser Instead of changing the title to solved make an answer of the solution and mark that one as the correct answerDarkBee

2 Answers

0
votes

If you would be using Linux, the correct answer using the bash syntax would be:

for a in *.xml; do iconv -c -f UTF-8 -t ISO-8859-1 $a; done

Applying this syntax to a batch of files results in the following command line (the target name is appended by .suffix(or whatever you choose):

for a in *.xml; do iconv -c -f UTF-8 -t ISO-8859-1 $a.suffix; done

For a Windows environment this answer will not be applicable (see comments).

0
votes

I have found one simple answer to this question. By using the recode function i was able to easily batch recode all of the files I needed. This solution does remove all the special caracters, but since I only needed access to the numbers in the files, im ok with it.

Here is the code I used:

CD file-location-path
recode -f UTF-8..ISO-8859-1 *.xml

Like I said, im sure this is not the cleanest or best way of doing it, but it worked for me... maybe it will help someone else out there too