[R] Import fixed-format ascii file with mixed record types

trece por ciento el13porciento at yahoo.com
Tue Feb 2 19:33:50 CET 2010


Thanks again, David
I think that this could work.
Final questions:
1. I have read that read.fwt could be slow for big tables (my tables have aprox. 160000 records, with 176 characters of recordlenght, almost 28MBytes). Could that be a problem?
2. If using read.fwt is not a problem, wouldn't be better to read all the records by read.fwt into a dataframe with the Type 1 structure, and then process the Type 2 records in the dataframe adding new fields for these records (NULL valued for Type 1)?
Hug

--- On Mon, 2/1/10, David Winsemius <dwinsemius at comcast.net> wrote:

> From: David Winsemius <dwinsemius at comcast.net>
> Subject: Re: [R] Import fixed-format ascii file with mixed record types
> To: "trece por ciento" <el13porciento at yahoo.com>
> Cc: r-help at r-project.org
> Date: Monday, February 1, 2010, 2:23 PM
> 
> On Feb 1, 2010, at 2:33 PM, trece por ciento wrote:
> 
> > Thanks David, but can read.fwf cope with different
> record types?
> > For example, if recordtype is the 4th character, I
> could have:
> > 
> > 011125678 ---> This is record Type 1
> > 011136779 ---> This is record Type 1
> > 011124943 ---> This is record Type 1
> > 011286711 ---> This is record Type 2
> > 011234872 ---> This is record Type 2
> > 011135628 ---> This is record Type 1
> > 
> > So, how can I tell read.fwf to take the correct type
> into account?
> 
> You may need to separate the line-types first. If the
> numbers of lines are not too large then this would exemplify
> a strategy:
> 
> > txt <- "011125678
> + 011136779
> + 011124943
> + 011286711
> + 011234872
> + 011135628"
> 
> > substr(readLines(textConnection(txt)), 4,4)
> [1] "1" "1" "1" "2" "2" "1"
> > file1 <-
> readLines(textConnection(txt))[substr(readLines(textConnection(txt)),
> 4,4) == "1"]
> > file2 <-
> readLines(textConnection(txt))[substr(readLines(textConnection(txt)),
> 4,4) == "2"]
> > file1
> [1] "011125678" "011136779" "011124943" "011135628"
> > file2
> [1] "011286711" "011234872"
> 
> Then these text objects could be processed with
> read.fwf(textConnection(file1) and the same for file2.
> 
> --David.
> 
> > Thanks again,
> > Hug
> > 
> > --- On Mon, 2/1/10, David Winsemius <dwinsemius at comcast.net>
> wrote:
> > 
> > From: David Winsemius <dwinsemius at comcast.net>
> > Subject: Re: [R] Import fixed-format ascii file with
> mixed record types
> > To: "trece por ciento" <el13porciento at yahoo.com>
> > Cc: r-help at r-project.org
> > Date: Monday, February 1, 2010, 12:01 PM
> > 
> > 
> > On Feb 1, 2010, at 11:40 AM, trece por ciento wrote:
> > 
> >> I need to import several ascii files in fixed
> format with two different record types. The data comes from
> European Labor Force Surveys, wich is a household survey.
> The first record type is for people over 16 years, and the
> second much sorter is for people aged 15 or less (this
> record has a filler with several blanks to get the same
> record length).
> >> The files tipically have 160000 records, with 176
> characters per record, the data is numeric, corresponding to
> 102 variables, mostly integers (seven variables have two
> decimals). My opertating system is Windows XP.
> >> My questions:
> >> 1. Wich do you think is the best way to import the
> files into R?
> > 
> > 
> > ?read.fwf
> > 
> >> 2. Could you give me any references or examples?
> > 
> > There are examples in the help page.
> > 
> >> Thanking you in advance,
> >> Hug
> >> 
> >> 
> >> 
> >> 
> >>     [[alternative HTML version
> deleted]]
> >> 
> >> ______________________________________________
> >> R-help at r-project.org
> mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained,
> reproducible code.
> > 
> > David Winsemius, MD
> > Heritage Laboratories
> > West Hartford, CT
> > 
> > 
> > 
> > 
> > 
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> 






More information about the R-help mailing list