[R] Tools to modify highlighted areas in pdf documents?

Leo Mada |eo@m@d@ @end|ng |rom @yon|c@eu
Sat Jun 1 18:16:23 CEST 2024


Dear R-Users,

Are there any packages that enable the modifications of highlighted areas / annotations in pdf documents?

It seems feasible - I have explored some R code (see below). However, I would rather avoid to reinvent the wheel.

The problem:
When highlighting pdf-documents with Microsoft Edge, the bounding box is sometimes misplaced, and quite ugly so. It also lacks the ability to draw lines or arrows.

On the other hand, I did not get used to Acrobat Reader: it usually involves much more effort to add specific highlights. Lines can be drawn, but are NOT straight!

Are there tools to change the size/position of highlights?
Or to add highlights and underline words?
 Changing position/size manually by editing the data in the pdf-document is possible. Changing the color is more trickier (somehow possible in Microsoft Edger; though the direct approach to rewrite the actual stream is better). Maybe there are some tools to do it?

Some R code is below.

Sincerely,

Leonard
#########

library(zip)

con = file("_some_pdf_.pdf", "rb")

NL = 0
# - very dirty hack;
# - assumes Annotations are in the last fragment/chunk;
while(TRUE) {
    tmp = readBin(con, "raw", 1024*128 + 515);
      if(length(tmp) == 0) break;
      x = tmp;
      # isNL = (x == 10) | (x == 13);
      isNL = (x == 13);
      isNL = isNL & (x[which(isNL) + 1] == 10);
    NL = NL + sum(isNL);
}

close(con)

idP = which(isNL)

idS = 935; # will vary with pdf and Annotations and ...;
nLast = 4; # usually 2 chunks
idx = idP[seq(idS, length.out = nLast)]

# Check: Right position?
# tmp = x[seq(idx[1] + 2, idx[1 + 2] - 1)]
# intToUtf8(tmp)

tmp = inflate(x[seq(idx[1] + 2, idx[nLast] - 1)])
intToUtf8(tmp$output)

# Output of inflate: an Example
# "/GS gs .56078434 .87058824 .97647059 rg\n
# 337.298 183.836 m 364.322 183.836 l 364.322 171.83 l 337.298 171.83 l h f\n"

# Note: /BBox[ 337.298 171.83 364.322 183.836]

The raw pdf data:

1948 0 obj
<</AP<</N 1949 0 R >>/C[ 0.560784 0.870588 0.976471]/CA 1/F 4/PDFIUM_HasGeneratedAP true/QuadPoints[ 337.298 186 364.322 186 337.298 174.6 364.322 174.6]/Rect[ 337.298 174.6 364.322 186]/Subtype/Highlight/Type/Annot>>
endobj
1949 0 obj
<</BBox[ 337.298 171.83 364.322 183.836]/Filter/FlateDecode/FormType 1/Length 86/Matrix[ 1 0 0 1 0 0]/Resources<</ExtGState<</GS<</AIS false/BM/Multiply/CA 1/Type/ExtGState/ca 1>>>>>>/Subtype/Form/Type/XObject>>stream
xœE˱
€0 Àž)~“ä
Û™€؁P@ûKˆ"Оtó²¢ßjÉC©ðT#ŠBš›zª
WŸH—Ò9(AÃ  š
Kùäøų_iÀŽmz dR²
endstream
endobj


	[[alternative HTML version deleted]]



More information about the R-help mailing list