[R] Tools to modify highlighted areas in pdf documents?
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Sat Jun 1 20:23:28 CEST 2024
Search!
on rseek.org, the query "modify pdf documents in R" brought up the staplr
package. A quick web search with the same query brought up the pdftools
package.
These were cursory efforts, so you may well find more. You will have to
determine whether and to what degree any meet your needs.
-- Bert
On Sat, Jun 1, 2024 at 9:16 AM Leo Mada via R-help <r-help using r-project.org>
wrote:
> Dear R-Users,
>
> Are there any packages that enable the modifications of highlighted areas
> / annotations in pdf documents?
>
> It seems feasible - I have explored some R code (see below). However, I
> would rather avoid to reinvent the wheel.
>
> The problem:
> When highlighting pdf-documents with Microsoft Edge, the bounding box is
> sometimes misplaced, and quite ugly so. It also lacks the ability to draw
> lines or arrows.
>
> On the other hand, I did not get used to Acrobat Reader: it usually
> involves much more effort to add specific highlights. Lines can be drawn,
> but are NOT straight!
>
> Are there tools to change the size/position of highlights?
> Or to add highlights and underline words?
> Changing position/size manually by editing the data in the pdf-document
> is possible. Changing the color is more trickier (somehow possible in
> Microsoft Edger; though the direct approach to rewrite the actual stream is
> better). Maybe there are some tools to do it?
>
> Some R code is below.
>
> Sincerely,
>
> Leonard
> #########
>
> library(zip)
>
> con = file("_some_pdf_.pdf", "rb")
>
> NL = 0
> # - very dirty hack;
> # - assumes Annotations are in the last fragment/chunk;
> while(TRUE) {
> tmp = readBin(con, "raw", 1024*128 + 515);
> if(length(tmp) == 0) break;
> x = tmp;
> # isNL = (x == 10) | (x == 13);
> isNL = (x == 13);
> isNL = isNL & (x[which(isNL) + 1] == 10);
> NL = NL + sum(isNL);
> }
>
> close(con)
>
> idP = which(isNL)
>
> idS = 935; # will vary with pdf and Annotations and ...;
> nLast = 4; # usually 2 chunks
> idx = idP[seq(idS, length.out = nLast)]
>
> # Check: Right position?
> # tmp = x[seq(idx[1] + 2, idx[1 + 2] - 1)]
> # intToUtf8(tmp)
>
> tmp = inflate(x[seq(idx[1] + 2, idx[nLast] - 1)])
> intToUtf8(tmp$output)
>
> # Output of inflate: an Example
> # "/GS gs .56078434 .87058824 .97647059 rg\n
> # 337.298 183.836 m 364.322 183.836 l 364.322 171.83 l 337.298 171.83 l h
> f\n"
>
> # Note: /BBox[ 337.298 171.83 364.322 183.836]
>
> The raw pdf data:
>
> 1948 0 obj
> <</AP<</N 1949 0 R >>/C[ 0.560784 0.870588 0.976471]/CA 1/F
> 4/PDFIUM_HasGeneratedAP true/QuadPoints[ 337.298 186 364.322 186 337.298
> 174.6 364.322 174.6]/Rect[ 337.298 174.6 364.322
> 186]/Subtype/Highlight/Type/Annot>>
> endobj
> 1949 0 obj
> <</BBox[ 337.298 171.83 364.322 183.836]/Filter/FlateDecode/FormType
> 1/Length 86/Matrix[ 1 0 0 1 0 0]/Resources<</ExtGState<</GS<</AIS
> false/BM/Multiply/CA 1/Type/ExtGState/ca
> 1>>>>>>/Subtype/Form/Type/XObject>>stream
> xœE˱
> €0 Àž)~ “ä Û™€ Ø P@ ûKˆ"Оtó²¢ß jÉC© ðT#ŠBš›zª
> WŸH—Ò 9(AÃ š
> Kùäøų _ iÀŽmz dR ²
> endstream
> endobj
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list