My goal is to extract embedded documents from a OneNote notebook programmatically. The embedded documents are likely to be Office documents, PDFs, and other arbitrary files. I do not have any difficulty getting a Base64 string for inline images, but I do have a problem getting a Base64 string for other file types.
I am using VS 2008 C#, OneNote 2007, Windows XP SP3.
I am using a sample .ONE file, which consists of a small amount of text, a PDF file, and one inline image. I am able to identify the ID of the containing page and the ID of PDF. I have hard-coded the IDs into the following example.
// ID of the Application
string strID;
Microsoft.Office.Interop.OneNote.Application onApplication = new Microsoft.Office.Interop.OneNote.Application();
onApplication.OpenHierarchy(@"D:\Projects\OneNote\test.one",
System.String.Empty, out strID, Microsoft.Office.Interop.OneNote.CreateFileType.cftSection);
string strXML1;
onApplication.GetPageContent("{460ABC12-855F-09E4-3724-85E8DE17BD57}{1}{B0}", out strXML1, PageInfo.piAll);
// Get page reference
string strXML2;
onApplication.GetPageContent("{4AA5B6DF-1C90-0B3D-3FFD-687B0AF4A632}{1}{B0}", out strXML2, PageInfo.piAll);
//Get Hyperlink to embedded object
string strHyperlink;
onApplication.GetHyperlinkToObject("{4AA5B6DF-1C90-0B3D-3FFD-687B0AF4A632}{1}{B0}", "{23A17F23-F743-0C9B-082A-BC6BD5D9CA6E}{13}{B0}", out strHyperlink);
//Condition to ensure that the ObjectID is good.
if ((strHyperlink != null) && (strHyperlink != ""))
{
//Get Base64 string.
string strBase64;
onApplication.GetBinaryPageContent("{4AA5B6DF-1C90-0B3D-3FFD-687B0AF4A632}{1}{B0}", "{23A17F23-F743-0C9B-082A-BC6BD5D9CA6E}{13}{B0}", out strBase64);
}
The application returns a good hyperlink whether I reference the PDF or the inline image. The application returns a good Base64 string for the inline image. However, the application returns error 0x8004200f The binary object does not exist.
for the PDF. The same is true if I try a version containing an embedded Word document.
How can I get a Base64 string for the PDF?
I am open to using http://onom.codeplex.com/
, but I have not found a solution there.
By the way, I am aware that IDs may not be the same from from one OneNote session to another. In my tests, I make sure the IDs are correct manually viewing the XML in debug mode.
Here is a snippet of the XML written to strXML2.
The inline image
<![CDATA[Attachment_Test_01]]>
</one:T>
</one:OE>
</one:Title>
<one:Image format=\"jpg\" originalPageNumber=\"0\" lastModifiedTime=\"2013-06-10T18:39:46.000Z\" objectID=\"{1A32E30F-091E-4F03-8147-D00D0D16C6FD}{20}{B0}\">
<one:Position x=\"90.0\" y=\"104.400001525879\" z=\"3\"/>
<one:Size width=\"767.9999389648437\" height=\"576.0\"/>
<one:Data>/9j/4AAQSkZJRgABAQAAAQABAAD//gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJ (SNIP)
The embedded PDF
<![CDATA[4\r\n‘4]]>
</one:OCRText>
<one:OCRToken startPos=\"0\" region=\"0\" line=\"0\" x=\"564.631591796875\" y=\"250.1052703857422\" width=\"6.063148498535156\" height=\"5.30526351928711\"/>
<one:OCRToken startPos=\"3\" region=\"1\" line=\"1\" x=\"684.3789672851562\" y=\"462.3157653808594\" width=\"5.305229187011718\" height=\"6.821067810058594\"/>
</one:OCRData>
</one:Image>
<one:InsertedFile pathCache=\"C:\\TEST\\D62228.pdf\" pathSource=\"C:\\C++_Neural_Networks_And_Fuzzy_Logic.pdf\" preferredName=\"C++_Neural_Networks_And_Fuzzy_Logic.pdf\" lastModifiedTime=\"2013-06-10T18:39:43.000Z\" objectID=\"{23A17F23-F743-0C9B-082A-BC6BD5D9CA6E}{13}{B0}\">
Thank you.