[R] reading data from a pdf
    Jean Eid 
    jeaneid at chass.utoronto.ca
       
    Mon Oct 24 17:04:07 CEST 2005
    
    
  
Hi,
In my experience pdftotext did not do a very good job at this because it 
screws up the formatting of tables. This of course depends on what 
program the pdf document was originally constructed with. What I found 
most appealing is the use of cut and paste into xemacs or emacs and use 
M-x  canonically-space-region function. This  will eliminate any extra 
spaces. However if the pdf document was prepared through scanning and 
one uses a  character recognition program, then all is up in the air and 
the formatting of tables have to be done by hand.
Jean
rambam at bigpond.net.au wrote:
>>Hi, I'm trying to read data from a PDF file.Is it possible to do it
>>with R? Thanks,  Marco
>>    
>>
>
>If cut and paste to a text file fails, try this:
>
>pdftotext (from the xpdf project)
>
>or
>
>http://pdftohtml.sourceforge.net
>pdftohtml is a utility which converts PDF files into HTML and
>XML formats
>
>In addition, pdftk, the command line pdf toolkit may be useful
>http://www.accesspdf.com/pdftk/
>
>  
>
    
    
More information about the R-help
mailing list