[R] Reading a tab delimted file of varying length using read.table
Ben Tupper
btupper at bigelow.org
Sun Jan 17 22:46:36 CET 2016
Hi Pradeep,
Any software would be challenged to determine the boundaries between your columns.
ff <- 'http://data.princeton.edu/wws509/datasets/divorce.dat'
txt <- readLines(ff)
head(txt)
# [1] " id heduc heblack mixed years div " " 9 12-15 years No No 10.546 No "
# [3] " 11 < 12 years No No 34.943 No " " 13 < 12 years No No 2.834 Yes "
# [5] " 15 < 12 years No No 17.532 Yes " " 33 12-15 years No No 1.418 No
You don't have tab delimiters but instead have space delimiters (well sort of). Your second column has either one ("12-15 years") or two ("< 12 years") spaces embedded in the values. That will mess up any scheme using spaces to delineate the columns.
Perhaps you can read this as fixed width - see ?read.fwf - but you'll have to fiddle with the width specifications.
Cheers,
Ben
> On Jan 17, 2016, at 10:31 AM, Pradeep Bisht <pradeep.bisht0303 at gmail.com> wrote:
>
> Hello Experts ,
>
> Being a SAS developer I am finding it difficult to perform some of data
> cleaning in R that are quite easy to perform in SAS .
>
> I have been trying to read a .dat file and after a lot of attempts have
> failed to find a solution . Maybe R doesn't have the functionality right
> now or I am not looking in the right place . Here is my code .
>
> f5=read.table("http://data.princeton.edu/wws509/datasets/divorce.dat
> <http://www.linkedin.com/redir/redirect?url=http%3A%2F%2Fdata%2Eprinceton%2Eedu%2Fwws509%2Fdatasets%2Fdivorce%2Edat&urlhash=GVbR&_t=tracking_anet>
> ",
> header=T,
> sep="\t",
> colClasses = c("numeric", "character", "character","character", "double",
> "character" ) )
> The error i get i
> s
> this .
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
> scan() expected 'a real', got '912-15yearsNoNo10.546No'
>
> Also does read.table always calls scan in background to do its job . If so
> why use read.table in first place .
>
> Pradeep
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org
More information about the R-help
mailing list