1
votes

This question is already present but doesn't provide the answer using PDFsharp but iTextPDF.

Now coming back to question, I know a way to read and extract the String. But I'm having trouble REPLACING the text.

My Code:

        var content = ContentReader.ReadContent(page);      
        var text = content.ExtractText();
        text = text.Replace("Replace This", "With This");
        XFont font = new XFont("Times New Roman", 11, XFontStyle.BoldItalic);

        gfx.DrawString(text, font, XBrushes.Black, new XRect(0, 0, page.Width, page.Height), XStringFormats.Left);

        // Save the document...
        const string filename = "New Doc.pdf";
        document.Save(filename);
    }   

    public static IEnumerable<string> ExtractText(this CObject cObject)
    {   
        if (cObject is COperator)
        {
            var cOperator = cObject as COperator;
            if (cOperator.OpCode.Name== OpCodeName.Tj.ToString() ||
                cOperator.OpCode.Name == OpCodeName.TJ.ToString())
            {
                foreach (var cOperand in cOperator.Operands)
                    foreach (var txt in ExtractText(cOperand))
                        yield return txt;   
            }
        }
        else if (cObject is CSequence)
        {
            var cSequence = cObject as CSequence;
            foreach (var element in cSequence)
                foreach (var txt in ExtractText(element))
                    yield return txt;
        }
        else if (cObject is CString)
        {
            var cString = cObject as CString;
            yield return cString.Value;
        }
    }

This is a sample code and this one would ignore the graphics and images. And end up writing only text in the output file. Is there way I can replace the text without touching Graphics and Images in the content?

1

1 Answers

2
votes

The sample seems to be a wrong approach: it returns text only, but ignores graphics, images, and even text positions and text attributes.

You can try to locate the text instructions (TJ, Tj) in the content and replace them with new instructions (also TJ or Tj) without touching anything else in the stream. Such a simple approach would lead to overlapping text or large gaps if the new text has a different lengths.

PDFsharp was not designed to parse the content streams. You have to write your own code to extract text, you have to write your own code to modify text (or use a third-party library that was built on PDFsharp).

To answer your question: yes, there is a way (as outlined above), but you will have to write a whole lot of code to achieve this (or find suitable code written by a third party).