[R] Tools to modify highlighted areas in pdf documents?
Leo Mada
|eo@m@d@ @end|ng |rom @yon|c@eu
Sun Jun 2 01:19:56 CEST 2024
Dear Bert,
Thank you very much for the response.
I was aware of pdftools - but did not recall any such functionality. I have checked again (both pdftools, qpdf and the 3rd one): unfortunately, they do not implement such functionality. There might be other packages, which I missed.
However, the functionality is feasible. I will add a few more details - maybe someone picks up the task.
It is possible to edit manually the pdf-file, though it is quite cumbersome to find the right annotation.
1. One needs to edit the values both in the \QuadPoints and the \Rect in the \AP object.
2. Modifying the color is trickier:
\C() encodes the color and \CA the alpha channel (= 1): but neither Acrobat, nor MIcrosoft Edge update the color. The value of the color encoded in the stream is used instead.
It is possible to "trick" Edge: modify the \C color and set "\ca 1" (in the stream block) to a lower value (e.g. "\ca 0.99"). MS Edge will then accept the modified color (but Acrobat ignores it). Changing the value in the stream is the actual solution.
Note: non-rectangular shapes can be specified as well.
I hope that some of the referenced packages pick up this task.
Sincerely,
Leonard
________________________________
From: Bert Gunter <bgunter.4567 using gmail.com>
Sent: Saturday, June 1, 2024 9:23 PM
To: Leo Mada <leo.mada using syonic.eu>
Cc: r-help using r-project.org <r-help using r-project.org>
Subject: Re: [R] Tools to modify highlighted areas in pdf documents?
Search!
on rseek.org<http://rseek.org>, the query "modify pdf documents in R" brought up the staplr package. A quick web search with the same query brought up the pdftools package.
These were cursory efforts, so you may well find more. You will have to determine whether and to what degree any meet your needs.
-- Bert
On Sat, Jun 1, 2024 at 9:16 AM Leo Mada via R-help <r-help using r-project.org<mailto:r-help using r-project.org>> wrote:
Dear R-Users,
Are there any packages that enable the modifications of highlighted areas / annotations in pdf documents?
It seems feasible - I have explored some R code (see below). However, I would rather avoid to reinvent the wheel.
The problem:
When highlighting pdf-documents with Microsoft Edge, the bounding box is sometimes misplaced, and quite ugly so. It also lacks the ability to draw lines or arrows.
On the other hand, I did not get used to Acrobat Reader: it usually involves much more effort to add specific highlights. Lines can be drawn, but are NOT straight!
Are there tools to change the size/position of highlights?
Or to add highlights and underline words?
Changing position/size manually by editing the data in the pdf-document is possible. Changing the color is more trickier (somehow possible in Microsoft Edger; though the direct approach to rewrite the actual stream is better). Maybe there are some tools to do it?
Some R code is below.
Sincerely,
Leonard
#########
library(zip)
con = file("_some_pdf_.pdf", "rb")
NL = 0
# - very dirty hack;
# - assumes Annotations are in the last fragment/chunk;
while(TRUE) {
tmp = readBin(con, "raw", 1024*128 + 515);
if(length(tmp) == 0) break;
x = tmp;
# isNL = (x == 10) | (x == 13);
isNL = (x == 13);
isNL = isNL & (x[which(isNL) + 1] == 10);
NL = NL + sum(isNL);
}
close(con)
idP = which(isNL)
idS = 935; # will vary with pdf and Annotations and ...;
nLast = 4; # usually 2 chunks
idx = idP[seq(idS, length.out = nLast)]
# Check: Right position?
# tmp = x[seq(idx[1] + 2, idx[1 + 2] - 1)]
# intToUtf8(tmp)
tmp = inflate(x[seq(idx[1] + 2, idx[nLast] - 1)])
intToUtf8(tmp$output)
# Output of inflate: an Example
# "/GS gs .56078434 .87058824 .97647059 rg\n
# 337.298 183.836 m 364.322 183.836 l 364.322 171.83 l 337.298 171.83 l h f\n"
# Note: /BBox[ 337.298 171.83 364.322 183.836]
The raw pdf data:
1948 0 obj
<</AP<</N 1949 0 R >>/C[ 0.560784 0.870588 0.976471]/CA 1/F 4/PDFIUM_HasGeneratedAP true/QuadPoints[ 337.298 186 364.322 186 337.298 174.6 364.322 174.6]/Rect[ 337.298 174.6 364.322 186]/Subtype/Highlight/Type/Annot>>
endobj
1949 0 obj
<</BBox[ 337.298 171.83 364.322 183.836]/Filter/FlateDecode/FormType 1/Length 86/Matrix[ 1 0 0 1 0 0]/Resources<</ExtGState<</GS<</AIS false/BM/Multiply/CA 1/Type/ExtGState/ca 1>>>>>>/Subtype/Form/Type/XObject>>stream
xœE˱
€0 Àž)~ “ä Û™€ Ø P@ ûKˆ"Оtó²¢ß jÉC© ðT#ŠBš›zª
WŸH—Ò 9(AÃ š
Kùäøų _ iÀŽmz dR ²
endstream
endobj
[[alternative HTML version deleted]]
______________________________________________
R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list