END GOAL: Efficiently (in one pass) read all CellRecords on a huge (30,000+ row), protected Worksheet.
Problem:
Using the HSSF.EventUserModel, how can I read all Records (including CellRecords) for an XLS file with both Workbook and Worksheet protection?
Create Input Spreadsheet (in Excel 2010):
- Create new Blank workbook.
- Set value of A1 to number: 50
- Set value of A2 to string: fifty
- Set value of A3 to formula: =25*2
- Review (ribbon) -> Protect Sheet -> Password: pass1
- Review (ribbon) -> Protect Workbook -> Password: pass1
- File (ribbon) ->Save As... -> Save as type: Excel 97-2003 Workbook
Progress thus far:
- The XLS file opens without a password in Excel. Therefore, you shouldn't need the password to open it in POI.
- The XLS file opens successfully with
new HSSFWorkbook(Stream fs). However, I need the efficiency ofEventUserModelfor my actual spreadsheet. - Setting
NPOI.HSSF.Record.Crypto.Biff8EncryptionKey.CurrentUserPassword = "pass1";did not work. - The
ProcessRecord( )function catches aPasswordRecord, but I can't find any documentation on how to properly handle it. - Perhaps, the
EncryptionInfoorDecryptorclasses may be of some use.
Note:
I'm using NPOI. However, I can translate any java examples to C#.
Code:
I use the following code to capture Record events. My Book1-unprotected.xls (without protection) shows all Record events (including cell values). My Book1-protected.xls displays some records and throws an exception.
I just view processedEvents in the debugger.
using System;
using System.Collections.Generic;
using System.IO;
using NPOI.HSSF.Record;
using NPOI.HSSF.Model;
using NPOI.HSSF.UserModel;
using NPOI.HSSF.EventUserModel;
using NPOI.POIFS;
using NPOI.POIFS.FileSystem;
namespace NPOI_small {
class myListener : IHSSFListener {
List<Record> processedRecords;
private Stream fs;
public myListener(Stream fs) {
processedRecords = new List<Record>();
this.fs = fs;
HSSFEventFactory factory = new HSSFEventFactory();
HSSFRequest request = new HSSFRequest();
MissingRecordAwareHSSFListener mraListener;
FormatTrackingHSSFListener fmtListener;
EventWorkbookBuilder.SheetRecordCollectingListener recListener;
mraListener = new MissingRecordAwareHSSFListener(this);
fmtListener = new FormatTrackingHSSFListener(mraListener);
recListener = new EventWorkbookBuilder.SheetRecordCollectingListener(fmtListener);
request.AddListenerForAllRecords(recListener);
POIFSFileSystem poifs = new POIFSFileSystem(this.fs);
factory.ProcessWorkbookEvents(request, poifs);
}
public void ProcessRecord(Record record) {
processedRecords.Add(record);
}
}
class Program {
static void Main(string[] args) {
Stream fs = File.OpenRead(@"c:\users\me\desktop\xx\Book1-protected.xls");
myListener testListener = new myListener(fs); // Use EventModel
//HSSFWorkbook book = new HSSFWorkbook(fs); // Use UserModel
Console.Read();
}
}
}
UPDATE (for Juan Mellado):
Below is the exception. My best guess right now (in the answer by Victor Petrykin) is that the HSSFEventFactory uses RecordInputStream which cannot natively decrypt protected records. Upon receiving the exception, processedRecords contains 22 records including the following potentially significant ones:
- processedRecords[5] is a
WriteAccessRecordwith a garbled (probably encrypted) value for.name - processedRecords[22] is a
RefreshAllRecordand is the lastRecordin the list
Exception:
NPOI.Util.RecordFormatException was unhandled
HResult=-2146233088
Message=Unable to construct record instance
Source=NPOI
StackTrace:
at NPOI.HSSF.Record.RecordFactory.ReflectionConstructorRecordCreator.Create(RecordInputStream in1)
at NPOI.HSSF.Record.RecordFactory.CreateSingleRecord(RecordInputStream in1)
at NPOI.HSSF.Record.RecordFactory.CreateRecord(RecordInputStream in1)
at NPOI.HSSF.EventUserModel.HSSFRecordStream.GetNextRecord()
at NPOI.HSSF.EventUserModel.HSSFRecordStream.NextRecord()
at NPOI.HSSF.EventUserModel.HSSFEventFactory.GenericProcessEvents(HSSFRequest req, RecordInputStream in1)
at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessEvents(HSSFRequest req, Stream in1)
at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessWorkbookEvents(HSSFRequest req, POIFSFileSystem fs)
at NPOI_small.myListener..ctor(Stream fs) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 35
at NPOI_small.Program.Main(String[] args) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 80
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException: NPOI.Util.RecordFormatException
HResult=-2146233088
Message=Expected to find a ContinueRecord in order to read remaining 137 of 144 chars
Source=NPOI
StackTrace:
at NPOI.HSSF.Record.RecordInputStream.ReadStringCommon(Int32 requestedLength, Boolean pIsCompressedEncoding)
at NPOI.HSSF.Record.RecordInputStream.ReadUnicodeLEString(Int32 requestedLength)
at NPOI.HSSF.Record.FontRecord..ctor(RecordInputStream in1)
processRecordcallback method receive all theCellRecordobjects, such asNumberRecordorFormulaRecord, so I could detect them and access their attributes (Ex:((NumberRecord) record).getValue()). AProtectRecordobject just has an flag attribute to say if the current record is protected. Can you add some more information about what means "how to properly handle it"? And, of course, the stack trace of the exception you got. - Juan Mellado