1
votes

I'm doing a program that modifies only the metadata (standard and custom) in files Doc, xls, ppt and Vsd, the program works correctly but I wonder if there is a way to do this without loading the entire file into memory:

POIFSFileSystem POIFS = new POIFSFileSystem (new FileInputStream ("file.xls"))

The NPOIFSFileSystem method is faster and consumes less memory but is read only.

I'm using Apache POI 3.9

2
NPOIFS is nearly ready for write support, might contributing some fixes to it be possible?Gagravarr
Of course, how I can help?user2671914
If you send an email to the dev@ list I'll give you advice there, it's more than can fit in a comment block! Basically there are a couple of disabled failing unit tests that need their underlying logic fixing, and a couple of stub unit tests that need writingGagravarr

2 Answers

0
votes

You could map the desired part to memory and then work on it using java.nio.FileChannel.

In addition to the familiar read, write, and close operations of byte channels, this class defines the following file-specific operations:

  • Bytes may be read or written at an absolute position in a file in a way that does not affect the channel's current position.

  • A region of a file may be mapped directly into memory; for large files this is often much more efficient than invoking the usual read or write methods.

0
votes

At the time of your question, there sadly wasn't a very low memory way to do it. The good news is that as of 2014-04-28 it is possible! (This code should be in 3.11 when that's released, but for now it's too new)

Now that NPOIFS supports writing, including in-place write, what you'll want to do is something like:

// Open the file, and grab the entries for the summary streams
NPOIFSFileSystem poifs = new NPOIFSFileSystem(file, false);
DocumentNode sinfDoc = 
     (DocumentNode)root.getEntry(SummaryInformation.DEFAULT_STREAM_NAME);
DocumentNode dinfDoc = 
     (DocumentNode)root.getEntry(DocumentSummaryInformation.DEFAULT_STREAM_NAME);

// Open and parse the metadata
SummaryInformation sinf = (SummaryInformation)PropertySetFactory.create(
     new NDocumentInputStream(sinfDoc));
DocumentSummaryInformation dinf = (DocumentSummaryInformation)PropertySetFactory.create(
     new NDocumentInputStream(dinfDoc));

// Make some metadata changes
sinf.setAuthor("Changed Author");
sinf.setTitle("Le titre \u00e9tait chang\u00e9");
dinf.setManager("Changed Manager");

// Update the metadata streams in the file
sinf.write(new NDocumentOutputStream(sinfDoc));
dinf.write(new NDocumentOutputStream(dinfDoc));

// Write out our changes
fs.writeFilesystem();
fs.close();

You ought to be able to do all of that in under 20% of the memory of the size of your file, quite possibly less than that for larger files!

(If you want to see more on this, look at the ModifyDocumentSummaryInformation example and the HPSF TestWrite unit test)