2
votes

I'm trying to unzip a file in a job, everything's ok until the filename inside the zip has some special characters like "á, é, í, ó, ú". When the filename inside the zip has those characters I got an error and this log:

Unzip file - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : Could not unzip file [file:///C:/pentaho/data/example.zip]. Exception : [MALFORMED]
Unzip file - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : java.lang.IllegalArgumentException: MALFORMED
Unzip file -    at java.util.zip.ZipCoder.toString(Unknown Source)
Unzip file -    at java.util.zip.ZipFile.getZipEntry(Unknown Source)
Unzip file -    at java.util.zip.ZipFile.access$900(Unknown Source)
Unzip file -    at java.util.zip.ZipFile$ZipEntryIterator.next(Unknown Source)
Unzip file -    at java.util.zip.ZipFile$ZipEntryIterator.nextElement(Unknown Source)
Unzip file -    at java.util.zip.ZipFile$ZipEntryIterator.nextElement(Unknown Source)
Unzip file -    at org.apache.commons.vfs2.provider.zip.ZipFileSystem.init(ZipFileSystem.java:83)
Unzip file -    at org.apache.commons.vfs2.provider.AbstractVfsContainer.addComponent(AbstractVfsContainer.java:49)
Unzip file -    at org.apache.commons.vfs2.provider.AbstractFileProvider.addFileSystem(AbstractFileProvider.java:96)
Unzip file -    at org.apache.commons.vfs2.provider.AbstractLayeredFileProvider.createFileSystem(AbstractLayeredFileProvider.java:80)
Unzip file -    at org.apache.commons.vfs2.provider.AbstractLayeredFileProvider.findFile(AbstractLayeredFileProvider.java:56)
Unzip file -    at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:711)
Unzip file -    at org.pentaho.di.core.vfs.ConcurrentFileSystemManager.resolveFile(ConcurrentFileSystemManager.java:91)
Unzip file -    at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:648)
Unzip file -    at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:152)
Unzip file -    at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:109)
Unzip file -    at org.pentaho.di.job.entries.unzip.JobEntryUnZip.unzipFile(JobEntryUnZip.java:626)
Unzip file -    at org.pentaho.di.job.entries.unzip.JobEntryUnZip.processOneFile(JobEntryUnZip.java:525)
Unzip file -    at org.pentaho.di.job.entries.unzip.JobEntryUnZip.execute(JobEntryUnZip.java:470)
Unzip file -    at org.pentaho.di.job.Job.execute(Job.java:676)
Unzip file -    at org.pentaho.di.job.Job.execute(Job.java:817)
Unzip file -    at org.pentaho.di.job.Job.execute(Job.java:493)
Unzip file -    at org.pentaho.di.job.Job.run(Job.java:380)

How may I fix this issue?

I'm attaching image of the job: Unzip File Job

Pd. I already look here and other forums. Thanks

1
What is the OS you are using, the filesystem you are writing to and if known the OS and filesystem the zip file was created on? - Cyrus
I'm using Windows 10, and it should be NTFS. Let me check for the other information. When the filenames does not have those characters it works fine. Any suggestions? Thanks - Roger Nieto

1 Answers

1
votes

I found the solution, could help someone so I'm posting it.

  1. Step: Get Variables, to get the parameters I needed.
  2. Step: User Defined Java Class, in here I unzip the file, changing the encoding, this is the code:

import javax.swing.*;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Enumeration;
import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipFile;
import org.apache.commons.io.IOUtils;


public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{

    Object[] r = getRow();
    if (r == null) {
        setOutputDone();
        return false;
    }
    Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());


    String fname = getVariable("VARIABLE_NAME", null);
    String outDir = getVariable("VARIABLE_NAME", null);


    System.out.println(fname + "  " + outDir);

    try {
            java.io.File inputFile = new java.io.File(fname);
            ZipFile zipFile = new ZipFile(inputFile, "cp866", false);
            Enumeration enumEntry = zipFile.getEntries();
            int i = 0;
            while(enumEntry.hasMoreElements()){
                ZipArchiveEntry entry = (ZipArchiveEntry) enumEntry.nextElement();
                String entryName = entry.getName();
                System.out.println(entryName);
                OutputStream os = new FileOutputStream(new File(outDir, entryName));
                InputStream is = zipFile.getInputStream(entry);
                IOUtils.copy(is, os);


                is.close();
                os.close();
                //Printing output fields
                get(Fields.Out, "FNAME").setValue(outputRow, fname);
                get(Fields.Out, "FileNameUnzipped").setValue(outputRow, entryName);
                putRow(data.outputRowMeta, outputRow);
            }
        } catch (Exception exc) {
            System.out.println("Faild to unzip");
            exc.printStackTrace();
        }


    return true;    
}
  1. Step: Set Variables.