1
votes

I am studying Marked content in PDF.

I came across one PDF file which has Marked content but few object from marked content are hidden. So here one block of BDC-EMC has both visible and hidden objects. I don't see OCGs array in document. How does this works, how to know which object (graphics/text) is visible and which one is hidden?

I do not see option to attach pdf file here so sharing content stream. Here only one BT-ET block enter code herein "/PlacedPDF /MC0 BDC " is visible all other are hidden.

Any help is highly appreciated.enter code here Thanks!!, Chetan

PDF Content Stream

/Span <</Lang (en)/MCID 1597 >>BDC 
    /Span <</ActualText (þÿ    )>>BDC 
    EMC 
EMC 
/Span <</Lang (en)/MCID 1598 >>BDC 
EMC 
/Span <</Lang (en)/MCID 1599 >>BDC 
    /Span <</ActualText (þÿ    )>>BDC 
   EMC 
EMC

q
    /Perceptual ri
    /GS0 gs
    /T1_1 1 Tf
   /Fm0 Do
Q
/Figure <</MCID 1602 >>BDC 
/PlacedPDF /MC0 BDC 

<------------------------------- START -------------------------------->

BT
0 0 0 1 k
/Perceptual ri
/GS0 gs
/T1_0 1 Tf
6.7092 0 0 6.7092 91.8006 408.647 Tm
[(St)-20(andard)]TJ
ET

<------------------------------- END -------------------------------->


q
67.107 261.154 77 188.188 re
W n
BT
-0.12 Tw 6.7092 0 0 6.7092 332.5724 347.7748 Tm
[(Mec)50(hanical T)115(ee)]TJ
0 Tw 17.697 9.073 Td
[(A)40(WW)40(A Ductile Iron Pipe)]TJ
-34.941 -1.057 Td
[(R)20(educing)]TJ
-1.399 -20.545 Td
(Outlet Coupling)Tj
ET
Q
q
67.107 261.154 77 188.188 re
W n
BT
6.7092 0 0 6.7092 339.3285 306.0237 Tm
[(Saddle-L)20(et)]TJ
-18.251 6.751 Td
[(R)20(educing)]TJ
-4.096 -1.2 Td
(\(2" x 1\275", 2\275" x 2", 3" x 2\275"\))Tj
-0.025 Tw 20.744 8.715 Td
[(Flange A)20(dapter)]TJ
0 Tw 2.279 -20.578 Td
[(W)-20(ildcat)]TJ
19.004 0 Td
(HDPE Pipe)Tj
ET
Q
q
67.107 261.154 77 188.188 re
W n
BT
-0.025 Tw 6.7092 0 0 6.7092 467.048 359.0001 Tm
[(IPS )-25(to A)40(WW)40(A)]TJ
ET
EMC 
EMC 
/Figure <</MCID 1603 >>BDC 
/PlacedPDF /MC1 BDC 
Q
q
170.527 255.484 83.892 188.189 re
W n
BT
6.7092 0 0 6.7092 73.8793 402.9777 Tm
[(St)-20(andard)]TJ
0.205 -7.706 Td
(GapSeal)Tj
-0.12 Tw 35.682 -1.367 Td
[(Mec)50(hanical T)115(ee)]TJ
0 Tw 17.697 9.073 Td
[(A)40(WW)40(A Ductile Iron Pipe)]TJ
ET
Q
q
170.527 255.484 83.892 188.189 re
W n
BT
6.7092 0 0 6.7092 65.2513 303.5076 Tm
[(End P)20(rotection)]TJ
38.18 -0.47 Td
[(Saddle-L)20(et)]TJ
ET
Q
q
170.527 255.484 83.892 188.189 re
W n
BT
-0.025 Tw 6.7092 0 0 6.7092 310.6531 396.0676 Tm
[(Flange A)20(dapter)]TJ
0 Tw 2.279 -20.578 Td
(W)Tj
6.7092 0 0 6.7092 171.4775 337.5969 Tm
24.043 -11.863 Td
(ildcat)Tj
17.984 0 Td
(HDPE Pipe)Tj
-56.203 -0.017 Td
[(F)20(astFit)]TJ
4.1287 0 0 4.1287 96.3529 259.9574 Tm
(\256)Tj
-0.025 Tw 6.7092 0 0 6.7092 449.1268 353.3308 Tm
[(IPS )-25(to A)40(WW)40(A)]TJ
ET
EMC 
EMC 
/Figure <</MCID 1604 >>BDC 
/PlacedPDF /MC2 BDC 
Q
q
62.748 59.87 83.953 188.188 re
W n
BT
6.7092 0 0 6.7092 -157.3332 207.3635 Tm
[(St)-20(andard)]TJ
0.205 -7.706 Td
(GapSeal)Tj
ET
Q
q
62.748 59.87 83.953 188.188 re
W n
BT
6.7092 0 0 6.7092 202.1706 207.3635 Tm
[(A)40(WW)40(A Ductile Iron Pipe)]TJ
-34.941 -1.057 Td
[(R)20(educing)]TJ
-1.399 -20.545 Td
(Outlet Coupling)Tj
-18.53 6.776 Td
[(End P)20(rotection)]TJ
ET
Q
q
62.748 59.87 83.953 188.188 re
W n
BT
6.7092 0 0 6.7092 -32.2543 150.0337 Tm
[(R)20(educing)]TJ
-4.096 -1.2 Td
(\(2" x 1\275", 2\275" x 2", 3" x 2\275"\))Tj
ET
Q
q
62.748 59.87 83.953 188.188 re
W n
BT
6.7092 0 0 6.7092 222.231 62.3919 Tm
(HDPE Pipe)Tj
-56.203 -0.017 Td
[(F)20(astFit)]TJ
4.1287 0 0 4.1287 -134.8597 64.3432 Tm
(\256)Tj
-0.025 Tw 6.7092 0 0 6.7092 217.9142 157.7166 Tm
[(IPS )-25(to A)40(WW)40(A)]TJ
ET
EMC 
EMC 
/Figure <</MCID 1605 >>BDC 
/PlacedPDF /MC3 BDC 
Q
q
169.441 59.898 85.291 183.362 re
W n
BT
6.7092 0 0 6.7092 -181.845 207.3911 Tm
[(St)-20(andard)]TJ
0.205 -7.706 Td
(GapSeal)Tj
-0.12 Tw 35.682 -1.367 Td
[(Mec)50(hanical T)115(ee)]TJ
ET
Q
q
169.441 59.898 85.291 183.362 re
W n
BT
6.7092 0 0 6.7092 -56.7661 200.2995 Tm
[(R)20(educing)]TJ
-1.399 -20.545 Td
(Outlet Coupling)Tj
-18.53 6.776 Td
[(End P)20(rotection)]TJ
38.18 -0.47 Td
[(Saddle-L)20(et)]TJ
-18.251 6.751 Td
[(R)20(educing)]TJ
-4.096 -1.2 Td
(\(2" x 1\275", 2\275" x 2", 3" x 2\275"\))Tj
-0.025 Tw 20.744 8.715 Td
[(Flange A)20(dapter)]TJ
0 Tw 2.279 -20.578 Td
[(W)-20(ildcat)]TJ
ET
Q
q
169.441 59.898 85.291 183.362 re
W n
BT
6.7092 0 0 6.7092 -179.356 62.3054 Tm
[(F)20(astFit)]TJ
4.1287 0 0 4.1287 -159.3715 64.3708 Tm
(\256)Tj
ET
EMC 
EMC 
Q
1
"I do not see option to attach pdf file here so sharing content stream." - stack overflow does not itself provide the means for file sharing. Usually, therefore, one uses a third party file sharing service (drop box, google drive, ...), creates a public share there and posts the URL of that share here. Please share the PDF that way. Furthermore, please describe clearer which contents you think unexpectedly hidden.mkl
Thanks for your reply, here is PDF file that I am trying to parse, in this file if you see only word "Standard" is visible. where are there is more text which is hidden. I am able to parse all text but not able to understand which text is visible and which one is invisible. drive.google.com/file/d/1WP8y0GQBZujdKMNhE7MWSAFvGxJhy3VZ/…user3500523

1 Answers

2
votes

The text objects (except the first one drawing "Standard") are prepended by a clip path definition their respective text is drawn outside of. Thus, those text pieces are not visible.

For example:

q
67.107 261.154 77 188.188 re
W n
BT
-0.12 Tw 6.7092 0 0 6.7092 332.5724 347.7748 Tm
[(Mec)50(hanical T)115(ee)]TJ
0 Tw 17.697 9.073 Td
[(A)40(WW)40(A Ductile Iron Pipe)]TJ
-34.941 -1.057 Td
[(R)20(educing)]TJ
-1.399 -20.545 Td
(Outlet Coupling)Tj
ET
Q 

At the beginning of this block the current clip path is reduced to a rectangle with its lower left corner at (67.107, 261.154) and a size of 77×188.188. The text pieces thereafter are drawn rightwards with approximate baseline starts at

  • (333, 348)
  • (350, 357)
  • (315, 356)
  • (314, 335)

These baseline starts clearly are right of that clip path rectangle, so the text pieces drawn rightwards also are. Thus, they are hidden.