0
votes

My goal is to draw rectangle over searched text.

I already implemented LocationTextExtractionStrategy class, which is connecting text chunks into sentences (one per each line), and it returns starting location- X and Y.

I was using solution from: Getting Coordinates of string using ITextExtractionStrategy and LocationTextExtractionStrategy in Itextsharp , and here is what i got so far (below is the code for organizing chunks)

  public override void RenderText(TextRenderInfo renderInfo)
    {
        LineSegment segment = renderInfo.GetBaseline();
        if (renderInfo.GetRise() != 0)
        { // remove the rise from the baseline - we do this because the text from a super/subscript render operations should probably be considered as part of the baseline of the text the super/sub is relative to 
            Matrix riseOffsetTransform = new Matrix(0, -renderInfo.GetRise());
            segment = segment.TransformBy(riseOffsetTransform);
        }
        TextChunk tc = new TextChunk(renderInfo.GetText(), tclStrat.CreateLocation(renderInfo, segment));
        locationalResult.Add(tc);
    }

  public IList<TextLocation> GetLocations()
    {

        var filteredTextChunks = filterTextChunks(locationalResult, null);
        filteredTextChunks.Sort();

        TextChunk lastChunk = null;

        var textLocations = new List<TextLocation>();

        foreach (var chunk in filteredTextChunks)
        {

            if (lastChunk == null)
            {
                //initial
                textLocations.Add(new TextLocation
                {
                    Text = chunk.Text,
                     X = chunk.Location.StartLocation[0],
                     Y = chunk.Location.StartLocation[1]
                });

            }
            else
            {
                if (chunk.SameLine(lastChunk))
                {
                    var text = "";
                    // we only insert a blank space if the trailing character of the previous string wasn't a space, and the leading character of the current string isn't a space
                    if (IsChunkAtWordBoundary(chunk, lastChunk) && !StartsWithSpace(chunk.Text) && !EndsWithSpace(lastChunk.Text))
                        text += ' ';

                    text += chunk.Text;

                    textLocations[textLocations.Count - 1].Text += text;

                }
                else
                {

                    textLocations.Add(new TextLocation
                    {
                        Text = chunk.Text,

                        X = chunk.Location.StartLocation[0],
                        Y = chunk.Location.StartLocation[1]
                    });
                }
            }
            lastChunk = chunk;
        }

        //now find the location(s) with the given texts
        return textLocations;

    }

When i try to draw a rectangle in cords of text, it isnt even close to it. Im drawing rectangle like that:

PdfContentByte content = pdfStamper.GetOverContent(pageNumber);
iTextSharp.text.Rectangle rectangle = new iTextSharp.text.Rectangle(leftLowerX, leftLowerY, upperRightX, upperRightY);//pdfReader.GetPageSizeWithRotation(x);
rectangle.BackgroundColor = color;
content.Rectangle(rectangle);
2
Please share an example PDF you experience the issue with. - mkl
PDF Example Lets look at page 21. - Bartosz Olchowik
Please set pdfStamper.RotateContents = false after instantiating the stamper. Your sample PDF has rotated pages. In this case iText tries to help you by using a different coordinate system when drawing. As the text extraction coordinate system remains unchanged, though, using extracted coordinates to draw something fails for rotated pages. The above setting disables this setting. - mkl
Your knowledge is awesome, it works. Thank you for simple and good solution! - Bartosz Olchowik
I'll make that an actual answer you can accept. - mkl

2 Answers

1
votes

If you were to use iText7 and pdfSweep it literally has a function that does this.

RegexBasedCleanupStrategy st = new RegexBasedCleanupStrategy("the_word_to_highlight");

PdfAutoSweep sweep = new PdfAutoSweep(st);

PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputfile)); 
sweep.highlight(pdfDocument);
pdfDocument.close();

That will highlight the words you're looking for. Of course you can do much more, with some minor configuration.

0
votes

Please set

pdfStamper.RotateContents = false;

after instantiating the stamper.

Your sample PDF has rotated pages. In this case iText 5.x by default tries to assist you by interpreting coordinates you give in drawing instructions in a different, rotated coordinate system. As the text extraction coordinate system remains unchanged, though, using extracted coordinates to draw something fails for rotated pages. The above setting disables this assistance.