3
votes

I'm building something on Google App Engine in Java (JDO). I'm programmatically compressing a large byte[] with Deflater and then storing the compressed byte[] as in the blobstore. This works great:

 public class Functions {

public static byte[] compress(byte[] input) throws UnsupportedEncodingException, IOException, MessagingException
    {

        Deflater df = new Deflater();       //this function mainly generate the byte code
        df.setLevel(Deflater.BEST_COMPRESSION);
        df.setInput(input);

        ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length);   //we write the generated byte code in this array
        df.finish();
        byte[] buff = new byte[1024];   //segment segment pop....segment set 1024
        while(!df.finished())
        {
            int count = df.deflate(buff);       //returns the generated code... index
            baos.write(buff, 0, count);     //write 4m 0 to count
        }
        baos.close();

        int baosLength = baos.toByteArray().length;
        int inputLength = input.length;
        //System.out.println("Original: "+inputLength);
        // System.out.println("Compressed: "+ baosLength);

        return baos.toByteArray();

    }

 public static byte[] decompress(byte[] input) throws UnsupportedEncodingException, IOException, DataFormatException
    {

        Inflater decompressor = new Inflater();
        decompressor.setInput(input);

        // Create an expandable byte array to hold the decompressed data
        ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);

        // Decompress the data
        byte[] buf = new byte[1024];
        while (!decompressor.finished()) {
            try {
                int count = decompressor.inflate(buf);
                bos.write(buf, 0, count);
            } catch (DataFormatException e) {
            }
        }
        try {
            bos.close();
        } catch (IOException e) {
        }

        // Get the decompressed data
        byte[] decompressedData = bos.toByteArray();

        return decompressedData;


    }

 public static BlobKey putInBlobStore(String contentType, byte[] filebytes) throws IOException {

        // Get a file service
          FileService fileService = FileServiceFactory.getFileService();


          AppEngineFile file = fileService.createNewBlobFile(contentType);

          // Open a channel to write to it
          boolean lock = true;
          FileWriteChannel writeChannel = fileService.openWriteChannel(file, lock);

          // This time we write to the channel using standard Java
          BufferedInputStream in = new BufferedInputStream(new ByteArrayInputStream(filebytes));
          byte[] buffer;
          int defaultBufferSize = 524288;
          if(filebytes.length > defaultBufferSize){
              buffer = new byte[defaultBufferSize]; // 0.5 MB buffers
          }
          else{
              buffer = new byte[filebytes.length]; // buffer the size of the data
          }

            int read;
            while( (read = in.read(buffer)) > 0 ){ //-1 means EndOfStream
                System.out.println(read);
                if(read < defaultBufferSize){
                    buffer = new byte[read];
                }
                ByteBuffer bb = ByteBuffer.wrap(buffer);
                writeChannel.write(bb);
            }
            writeChannel.closeFinally();

        return fileService.getBlobKey(file);
    }
}

Using the static compress() and putInBlobStore() functions in my Functions class, I can compress and store a byte[] like so:

BlobKey dataBlobKey =  Functions.putInBlobStore("MULTIPART_FORM_DATA", Functions.compress(orginalDataByteArray));

Pretty sweet. I'm really digging GAE.

But now, the problem:

I'm storing compressed HTML which I'd like to retrieve and decompress on the fly to display in an iframe within a JSP page. Compression is quick, but decompression takes FOREVER! Even when the compressed HTML is 15k, sometimes the decompression just dies.

Here's my decompression approach:

 URL file = new URL("/blobserve?key=" + htmlBlobKey);
         URLConnection conn = file.openConnection();
         conn.setReadTimeout(30000);
         conn.setConnectTimeout(30000);
         InputStream inputStream = conn.getInputStream();
         byte[] data = IOUtils.toByteArray(inputStream);
         return new String(Functions.decompress(data));

Any thoughts on how best to take compressed HTML from the blobstore, decompress it and display it? Even if I need to pass it to a task queue and poll for completion while showing a progress bar - that would be fine. I truly don't care, so long as it's efficient and ultimately WORKS. I'd appreciate any guidance you can share with me here.

Thanks for your help.

2
Is the delay definitely in the decompression? have you checked just outputting the compressed retrieved data to see if it is equally slow?Sanjay Manohar
Why are you fetching the blob from yourself? Why not just use the blob reading API?Nick Johnson
Also, why are you storing the data in the blobstore compressed, given the additional latency it imposes?Nick Johnson
Still working on implementing Sasha's recommendation below, but to Nick's question, it's an archive that would become massive (20TB per client or more) and a client might only access 5 - 10 items per month for legal depositions. So I'm happy to sacrifice speed for storage size. Nick, I see the blobReader object in the python docs, but what's the java equivalent? ThanksBob
...looks like BlobstoreInputStream is the equivavlent. I'll take a look at this as well.Bob

2 Answers

0
votes

you could look into RequestBuilder which runs async

RequestBuilder requestBuilder = new RequestBuilder(RequestBuilder.GET,"/blobserve?key=" + htmlBlobKey);
try {
requestBuilder.sendRequest(null, new RequestCallback() {
public void onError(Request request, Throwable exception) {
  GWT.log(exception.getMessage());
}
public void onResponseReceived(Request request, Response response) {
  doSomething(response.getText());//here update your iframe and stop progress indicator
}
});
} catch (RequestException ex) {
  GWT.log(ex.getMessage());
}
0
votes

I took Nick Johnson's idea and read from the Blobstore directly vs serving the blobs. Now it's lightning fast! Here's the code:

try{
        ChainedBlobstoreInputStream inputStream = new ChainedBlobstoreInputStream(this.getHtmlBlobKey());
        //StringWriter writer = new StringWriter();
         byte[] data = IOUtils.toByteArray(inputStream);
         return new String(Functions.decompress(Encrypt.AESDecrypt(data)));
         //return new String(data);
    } 
    catch(Exception e){
            return "No HTML Version";
        }

I got the ChainedBlobstoreInputStream class from here: Reading a BlobstoreInputStream >= 1MB in size