[R] Tools to modify highlighted areas in pdf documents?
Leo Mada
|eo@m@d@ @end|ng |rom @yon|c@eu
Sat Jun 1 18:16:23 CEST 2024
Dear R-Users,
Are there any packages that enable the modifications of highlighted areas / annotations in pdf documents?
It seems feasible - I have explored some R code (see below). However, I would rather avoid to reinvent the wheel.
The problem:
When highlighting pdf-documents with Microsoft Edge, the bounding box is sometimes misplaced, and quite ugly so. It also lacks the ability to draw lines or arrows.
On the other hand, I did not get used to Acrobat Reader: it usually involves much more effort to add specific highlights. Lines can be drawn, but are NOT straight!
Are there tools to change the size/position of highlights?
Or to add highlights and underline words?
Changing position/size manually by editing the data in the pdf-document is possible. Changing the color is more trickier (somehow possible in Microsoft Edger; though the direct approach to rewrite the actual stream is better). Maybe there are some tools to do it?
Some R code is below.
Sincerely,
Leonard
#########
library(zip)
con = file("_some_pdf_.pdf", "rb")
NL = 0
# - very dirty hack;
# - assumes Annotations are in the last fragment/chunk;
while(TRUE) {
tmp = readBin(con, "raw", 1024*128 + 515);
if(length(tmp) == 0) break;
x = tmp;
# isNL = (x == 10) | (x == 13);
isNL = (x == 13);
isNL = isNL & (x[which(isNL) + 1] == 10);
NL = NL + sum(isNL);
}
close(con)
idP = which(isNL)
idS = 935; # will vary with pdf and Annotations and ...;
nLast = 4; # usually 2 chunks
idx = idP[seq(idS, length.out = nLast)]
# Check: Right position?
# tmp = x[seq(idx[1] + 2, idx[1 + 2] - 1)]
# intToUtf8(tmp)
tmp = inflate(x[seq(idx[1] + 2, idx[nLast] - 1)])
intToUtf8(tmp$output)
# Output of inflate: an Example
# "/GS gs .56078434 .87058824 .97647059 rg\n
# 337.298 183.836 m 364.322 183.836 l 364.322 171.83 l 337.298 171.83 l h f\n"
# Note: /BBox[ 337.298 171.83 364.322 183.836]
The raw pdf data:
1948 0 obj
<</AP<</N 1949 0 R >>/C[ 0.560784 0.870588 0.976471]/CA 1/F 4/PDFIUM_HasGeneratedAP true/QuadPoints[ 337.298 186 364.322 186 337.298 174.6 364.322 174.6]/Rect[ 337.298 174.6 364.322 186]/Subtype/Highlight/Type/Annot>>
endobj
1949 0 obj
<</BBox[ 337.298 171.83 364.322 183.836]/Filter/FlateDecode/FormType 1/Length 86/Matrix[ 1 0 0 1 0 0]/Resources<</ExtGState<</GS<</AIS false/BM/Multiply/CA 1/Type/ExtGState/ca 1>>>>>>/Subtype/Form/Type/XObject>>stream
xœE˱
€0 Àž)~“ä
Û™€ØP@ûKˆ"Оtó²¢ßjÉC©ðT#ŠBš›zª
WŸH—Ò9(AÃ š
Kùäøų_iÀŽmz dR²
endstream
endobj
[[alternative HTML version deleted]]
More information about the R-help
mailing list