0
votes

I am using Apache POI for Generating Html from MS word files(.doc).I want to Add images from .doc to html but i am unable to do that.I got solution for .docx and given method is as follows foe .docx.

private void processImage(Element wrap, List<XWPFPicture> pics)
               throws IOException {

        int pos = output.lastIndexOf(".");
        String path = output.substring(0, pos).concat(File.separator);
        File folder = new File(path);
                System.out.println("path="+path);
        if(!folder.canRead())
            folder.mkdirs();
        folder = null;

        for(XWPFPicture pic : pics)
                {    

            XWPFPictureData data = pic.getPictureData();
            System.out.println("image extension="+data.suggestFileExtension());
                        {
                        System.out.println("Data name="+data.getFileName());
                        ByteArrayInputStream is = new ByteArrayInputStream(data.getData());
                     ImageConverter.convertFormat(path,"c:/hello.jpg","jpg");

                        try
                        {
                         BufferedImage image = ImageIO.read(is);

            // TODO image type convert   like .tif etc.
            String imgFullPath = path.concat(data.getFileName());
            {// extract picture
                FileOutputStream fos = null;
                try {
                    fos = new FileOutputStream(new File(imgFullPath));
                    fos.write(data.getData());
                } catch (FileNotFoundException e) {
                    e.printStackTrace();
                }finally{
                    if(fos != null) fos.close();
                }
            }
            {// add picture to html page
                //TODO get img relative path for showing html page when on server &&  get the picture style, scaling in the docx file (with description style?)
              System.out.println("img full Path="+imgFullPath);
                          int index=imgFullPath.indexOf('/');
                          imgFullPath=imgFullPath.substring(index+1);
                          index=imgFullPath.lastIndexOf('\\');
                          String cu_path=imgFullPath.substring(index+1);
                          String imgFolder=imgFullPath.substring(0, index);
                          index=imgFolder.lastIndexOf('\\');
                          imgFolder=imgFolder.substring(index+1);
                          System.out.println("imgFolder="+imgFolder);
                           System.out.println("cur_Path="+cu_path);
                           imgFullPath="./"+imgFolder+"/"+cu_path;
                          System.out.println(" After remove img full Path="+imgFullPath);
                            Element img = htmlDocumentFacade.createImage(imgFullPath);
                if(!StringUtil.isEmpty(pic.getDescription())){
                    img.setAttribute("Title", pic.getDescription());
                }
                if(image != null && image.getWidth() > 600){
                    img.setAttribute("width", "600px");
                }
                img.setAttribute("align", "center");
                wrap.appendChild(img);

            }  
                        if("gif".equals(data.suggestFileExtension()))
                        {
                           System.out.println("File name="+data.getFileName());
                        }
        }
                 catch(Exception ex)
                        {
                            continue;
                        }

    }
                } }

There is not much documentation or tutorials available for this. Javadoc also does not contain much helpful information. Based on above code i tried to add images but its not working. :/

1

1 Answers

1
votes

As per the Apache POI Documentation HWPF is for ".doc" files whereas XWPF is for the ".docx" files.

As per the Java Docs, the getPicturesTable() method should help you extract the required images.

Hope this helps.