27
votes

Following on from my last question here

OpenXML looks like it probably does exactly what I want, but the documentation is terrible. An hour of googling hasn't got me any closer to figuring out what I need to do.

I have a word document. I want to add an image to that word document (using word) in such a way that I can then open the document in OpenXML and replace that image. Should be simple enough, yes?

I'm assuming I should be able to give my image 'placeholder' an id of some sort and then use GetPartById to locate the image and replace it. Would this be the correct method? What is this Id? How do you add it using Word?

Every example I can find which does anything remotely similar starts by building the whole word document from scratch in ML, which really isn't a lot of use.

EDIT: it occured to me that it would be easier to just replace the image in the media folder with the new image, but again can't find any indication of how to do this.

9

9 Answers

37
votes

Although the documentation for OpenXML isn't great, there is an excellent tool that you can use to see how existing Word documents are built. If you install the OpenXml SDK it comes with the DocumentReflector.exe tool under the Open XML Format SDK\V2.0\tools directory.

Images in Word documents consist of the image data and an ID that is assigned to it that is referenced in the body of the document. It seems like your problem can be broken down into two parts: finding the ID of the image in the document, and then re-writing the image data for it.

To find the ID of the image, you'll need to parse the MainDocumentPart. Images are stored in Runs as a Drawing element

<w:p>
  <w:r>
    <w:drawing>
      <wp:inline>
        <wp:extent cx="3200400" cy="704850" /> <!-- describes the size of the image -->
        <wp:docPr id="2" name="Picture 1" descr="filename.JPG" />
        <a:graphic>
          <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
            <pic:pic>
              <pic:nvPicPr>
                <pic:cNvPr id="0" name="filename.JPG" />
                <pic:cNvPicPr />
              </pic:nvPicPr>
              <pic:blipFill>
                <a:blip r:embed="rId5" /> <!-- this is the ID you need to find -->
                <a:stretch>
                  <a:fillRect />
                </a:stretch>
              </pic:blipFill>
              <pic:spPr>
                <a:xfrm>
                  <a:ext cx="3200400" cy="704850" />
                </a:xfrm>
                <a:prstGeom prst="rect" />
              </pic:spPr>
            </pic:pic>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</w:p>

In the above example, you need to find the ID of the image stored in the blip element. How you go about finding that is dependent on your problem, but if you know the filename of the original image you can look at the docPr element:

using (WordprocessingDocument document = WordprocessingDocument.Open("docfilename.docx", true)) {

  // go through the document and pull out the inline image elements
  IEnumerable<Inline> imageElements = from run in Document.MainDocumentPart.Document.Descendants<Run>()
      where run.Descendants<Inline>().First() != null
      select run.Descendants<Inline>().First();

  // select the image that has the correct filename (chooses the first if there are many)
  Inline selectedImage = (from image in imageElements
      where (image.DocProperties != null &&
          image.DocProperties.Equals("image filename"))
      select image).First();

  // get the ID from the inline element
  string imageId = "default value";
  Blip blipElement = selectedImage.Descendants<Blip>().First();
  if (blipElement != null) {
      imageId = blipElement.Embed.Value;
  }
}

Then when you have the image ID, you can use that to rewrite the image data. I think this is how you would do it:

ImagePart imagePart = (ImagePart)document.MainDocumentPart.GetPartById(imageId);
byte[] imageBytes = File.ReadAllBytes("new_image.jpg");
BinaryWriter writer = new BinaryWriter(imagePart.GetStream());
writer.Write(imageBytes);
writer.Close();
18
votes

I'd like to update this thread and add to Adam's answer above for the benefit of others.

I actually managed to hack some working code together the other day, (before Adam posted his answer) but it was pretty difficult. The documentation really is poor and there isn't a lot of info out there.

I didn't know about the Inline and Run elements which Adam uses in his answer, but the trick seems to be in getting to the Descendants<> property and then you can pretty much parse any element like a normal XML mapping.

byte[] docBytes = File.ReadAllBytes(_myFilePath);
using (MemoryStream ms = new MemoryStream())
{
    ms.Write(docBytes, 0, docBytes.Length);

    using (WordprocessingDocument wpdoc = WordprocessingDocument.Open(ms, true))
    {
        MainDocumentPart mainPart = wpdoc.MainDocumentPart;
        Document doc = mainPart.Document;

        // now you can use doc.Descendants<T>()
    }
}

Once you've got this it's fairly easy to search for things, although you have to work out what everything is called. For example, the <pic:nvPicPr> is Picture.NonVisualPictureProperties, etc.

As Adam correctly says, the element you need to find to replace the image is the Blip element. But you need to find the correct blip which corresponds to the image you're trying to replace.

Adam shows a way using the Inline element. I just dived straight in and looked for all the picture elements. I'm not sure which is the better or more robust way (I don't know how consistent the xml structure is between documents and if this cause breaking code).

Blip GetBlipForPicture(string picName, Document document)
{
    return document.Descendants<Picture>()
         .Where(p => picName == p.NonVisualPictureProperties.NonVisualDrawingProperties.Name)
         .Select(p => p.BlipFill.Blip)
         .Single(); // return First or ToList or whatever here, there can be more than one
}

See Adam's XML example to make sense of the different elements here and see what I'm searching for.

The blip has an ID in the Embed property, eg: <a:blip r:embed="rId4" cstate="print" />, what this does is map the Blip to an image in the Media folder (you can see all these folders and files if you rename you .docx to a .zip and unzip it). You can find the mapping in _rels\document.xml.rels:

<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png" />

So what you need to do is add a new image, and then point this blip at the id of your newly created image:

// add new ImagePart
ImagePart newImg = mainPart.AddImagePart(ImagePartType.Png);
// Put image data into the ImagePart (from a filestream)
newImg .FeedData(File.Open(_myImgPath, FileMode.Open, FileAccess.Read));
// Get the blip
Blip blip = GetBlipForPicture("MyPlaceholder.png", doc);
// Point blip at new image
blip.Embed = mainPart.GetIdOfPart(newImg);

I presume this just orphans the old image in the Media folder which isn't ideal, although maybe it's clever enough to garbage collect it so to speak. There may be a better way to do it, but I couldn't find it.

Anyway, there you have it. This thread is now the most complete documentation on how to swap an image anywhere on the web (I know this, I spent hours searching). So hopefully some people will find it useful.

9
votes

I had the same fun trying to work out how to do this until I saw this thread. Excellent helpful answers guys.

A simple way to select the ImagePart if you know the name of the image in the package is to check the Uri


ImagePart GetImagePart(WordprocessingDocument document, string imageName)
{
    return document.MainDocumentPart.ImageParts
        .Where(p => p.Uri.ToString().Contains(imageName)) // or EndsWith
        .First();
}

You can then do


var imagePart = GetImagePart(document, imageName);
var newImageBytes = GetNewImageBytes(): // however the image is generated or obtained

using(var writer = new BinaryWriter(imagePart.GetStream()))
{
    writer.Write(newImageBytes);
}

4
votes

The following code will retrieve the images from the specified document (filename) and save them to a D:\TestArea folder using the internal filenames. The answers on this page helped me come up with my solution.

Note: this solution does not help someone replace an image in a word doc, however in all of my searching in how to retrieve an image from a word doc this was the only/closest link I could find; just in case someone else is in the same boat I post my solution here.

private void ProcessImages(string filename)
{
    var xpic = "";
    var xr = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";

    using (WordprocessingDocument document = WordprocessingDocument.Open(filename, true)) 
    {
        var imageParts = 
            from paragraph in document.MainDocumentPart.Document.Body
                from graphic in paragraph.Descendants<Graphic>()
                    let graphicData = graphic.Descendants<GraphicData>().FirstOrDefault()
                        let pic = graphicData.ElementAt(0)
                            let nvPicPrt = pic.ElementAt(0).FirstOrDefault()
                            let blip = pic.Descendants<Blip>().FirstOrDefault()
                            select new 
                            {
                                Id = blip.GetAttribute("embed",xr).Value,
                                Filename = nvPicPrt.GetAttribute("name",xpic).Value
                            };

        foreach(var image in imageParts)
        {
            var outputFilename = string.Format(@"d:\TestArea\{0}",image.Filename);
            Debug.WriteLine(string.Format("Creating file: {0}",outputFilename));

            // Get image from document
            var imageData = document.MainDocumentPart.GetPartById(image.Id);

            // Read image data into bytestream
            var stream = imageData.GetStream();
            var byteStream = new byte[stream.Length];
            int length = (int)stream.Length;
            stream.Read(byteStream, 0, length);

            // Write bytestream to disk
            using (var fileStream = new FileStream(outputFilename,FileMode.OpenOrCreate))
            {
                fileStream.Write(byteStream, 0, length);
            }
        }
    }
}
4
votes

I love this Section, because there is so many bad documentation on this subject, and after many hours of trying to make the above answers work. I came up with my own solution.

How I give the Image a tagName:

enter image description here

First I select the Image I want to replace in word and give it a name (for instance "toReplace") afterwards I loop through the Drawings select the Image with the correct tagName and write my own Image in its place.

private void ReplaceImage(string tagName, string imagePath)
{
    this.wordDoc = WordprocessingDocument.Open(this.stream, true);
    IEnumerable<Drawing> drawings = this.wordDoc.MainDocumentPart.Document.Descendants<Drawing>().ToList();
    foreach (Drawing drawing in drawings)
    {
        DocProperties dpr = drawing.Descendants<DocProperties>().FirstOrDefault();
        if (dpr != null && dpr.Name == tagName)
        {
            foreach (DocumentFormat.OpenXml.Drawing.Blip b in drawing.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().ToList())
            {
                OpenXmlPart imagePart = wordDoc.MainDocumentPart.GetPartById(b.Embed);
                using (var writer = new BinaryWriter(imagePart.GetStream()))
                {
                    writer.Write(File.ReadAllBytes(imagePath));
                }
            }
        }
    }
}
1
votes

in order to get images and copy them to a folder, you can use more simple method

        System.Collections.Generic.IEnumerable<ImagePart> imageParts =  doc.MainDocumentPart.ImageParts;

        foreach (ImagePart img in imageParts)
        {
          var uri = img.Uri;
          var fileName = uri.ToString().Split('/').Last();
          var fileWordMedia = img.GetStream(FileMode.Open);
          string imgPath = mediaPath + fileName;//mediaPath it is folder
          FileStream fileHtmlMedia = new FileStream(imgPath, FileMode.Create);
          int i = 0;
          while (i != (-1))
          {
              i = fileWordMedia.ReadByte();
              if (i != (-1))
              {
                  fileHtmlMedia.WriteByte((byte)i);
              }
          }
          fileHtmlMedia.Close();
          fileWordMedia.Close();

        }
1
votes

openXml documentation is very skinny and most of them deal takes too much time. I was doing a specific task and want to share the solution. I hope it will help people and they save your time. I had to get a picture of a particular place in the text, particularly if it is an object of Run.

 static string RunToHTML(Run r)
       {
            string exit = "";
            OpenXmlElementList list = r.ChildElements;
            foreach (OpenXmlElement element in list)
            {
                if (element is DocumentFormat.OpenXml.Wordprocessing.Picture)
                {
                    exit += AddPictureToHtml((DocumentFormat.OpenXml.Wordprocessing.Picture)element);
                    return exit;
                }
            }

More specifically, I need to translate the paragraph of the document in html format.

 static string AddPictureToHtml(DocumentFormat.OpenXml.Wordprocessing.Picture pic)
        {
            string exit = "";
            DocumentFormat.OpenXml.Vml.Shape shape = pic.Descendants<DocumentFormat.OpenXml.Vml.Shape>().First();
            DocumentFormat.OpenXml.Vml.ImageData imageData = shape.Descendants<DocumentFormat.OpenXml.Vml.ImageData>().First();                 
            //style image
            string style = shape.Style;
            style = style.Replace("width:", "");
            style = style.Replace("height:", "");
            style = style.Replace('.', ',');
            style = style.Replace("pt", "");
            string[] arr = style.Split(';');
            float styleW = float.Parse(arr[0]);//width picture
            float styleH = float.Parse(arr[1]);//height picture
            string relationId = imageData.RelationshipId;
            var img = doc.MainDocumentPart.GetPartById(relationId);
            var uri = img.Uri;//path in file
            var fileName = uri.ToString().Split('/').Last();//name picture
            var fileWordMedia = img.GetStream(FileMode.Open);
            exit = String.Format("<img src=\"" + docPath+uri+ "\" width=\""+styleW+"\" heigth=\""+styleH+"\" > ");
            return exit;
        }

uri it is a path to picture in .docx file , for example : "test.docx/media/image.bmp" using this imformation picture so you can get picture

static void SavePictures(ImagePart img, string savePath)
        {
                var uri = img.Uri;
               var fileName = uri.ToString().Split('/').Last();
                var fileWordMedia = img.GetStream(FileMode.Open);
                string imgPath = savePath + fileName;
                FileStream fileHtmlMedia = new FileStream(imgPath, FileMode.Create);
                int i = 0;
                while (i != (-1))
                {
                    i = fileWordMedia.ReadByte();
                    if (i != (-1))
                    {
                        fileHtmlMedia.WriteByte((byte)i);
                    }
                }
                fileHtmlMedia.Close();
                fileWordMedia.Close();       
        }
1
votes

@Ludisposed excellent answer worked perfectly for me, but it took me a bit of digging to work out how to actually set the image name in Word in the first place. For anyone else who doesn't speak German, this is how to do it:

In MS Word, click on the image then in the Home ribbon, select Select -> Selection Pane in the ribbon to show the list of images in the right hand navigation:

MS Word Selection Pane

You can then click on an image's name/tag in the Selection Pane to change its name:

Changing an Image name in the selection pane in MS Word

Once you've done that you can then see how that text was incorporated into the Open XML file by using the Open XML SDK 2.5 Productivity Tool:

enter image description here

Having done that I extended @Ludisposed's solution slightly into a reusable method, and tweaked the code so that passing in a null byte array would trigger the removal of the image from the document:

/// <summary>
/// Replaces the image in a document with the new file bytes, or removes the image if the newImageBytes parameter is null.
/// Relies on a the image having had it's name set via the 'Selection Pane' in Word
/// </summary>
/// <param name="document">The OpenXML document</param>
/// <param name="oldImagesPlaceholderText">The placeholder name for the image set via Selection in Word</param>
/// <param name="newImageBytes">The new file. Pass null to remove the selected image from the document instead</param>
public void ReplaceInternalImage(WordprocessingDocument document, string oldImagesPlaceholderText, byte[] newImageBytes)
{
    var imagesToRemove = new List<Drawing>();

    IEnumerable<Drawing> drawings = document.MainDocumentPart.Document.Descendants<Drawing>().ToList();
    foreach (Drawing drawing in drawings)
    {
        DocProperties dpr = drawing.Descendants<DocProperties>().FirstOrDefault();
        if (dpr != null && dpr.Name == oldImagesPlaceholderText)
        {
            foreach (Blip b in drawing.Descendants<Blip>().ToList())
            {
                OpenXmlPart imagePart = document.MainDocumentPart.GetPartById(b.Embed);

                if (newImageBytes == null)
                {
                    imagesToRemove.Add(drawing);
                }
                else
                {
                    using (var writer = new BinaryWriter(imagePart.GetStream()))
                    {
                        writer.Write(newImageBytes);
                    }
                }
            }
        }

        foreach (var image in imagesToRemove)
        {
            image.Remove();
        }
    }
}
0
votes

Okay, thank you to everyone who helped me out on this. My goal was simpler than replacing an image, mainly to pull out all images in a Word document. I found this code did the work for me on that, INCLUDING the needed extension.

Feel free to use:

var inlineImages = from paragraph in wordprocessingDocument.MainDocumentPart.Document.Body
  from graphic in paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Graphic>()
  let graphicData = graphic.Descendants<DocumentFormat.OpenXml.Drawing.GraphicData>().FirstOrDefault()
  let pic = graphicData.ElementAt(0).Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault()
  let imgPID = pic.GetAttribute("embed", "http://schemas.openxmlformats.org/officeDocument/2006/relationships").Value
  select new { Id = imgPID,
               Extension = ((ImagePart)wordprocessingDocument.MainDocumentPart.GetPartById(imgPID)).ContentType.Split('/')[1]
};