[R] read. table()

arun smartpink111 at yahoo.com
Sun Dec 9 00:11:12 CET 2012


HI Pradip,

Try this:
source("Muhuri.txt")
#Muhuri.txt
Lines<-  "race    age   percent  sepercent  flag_var
         Mexican 12-17  5.7926   0.64195      any-------------------------------------------------------------
--------------------------------------------------------
"
Lines1<-readLines(textConnection(Lines))

Col1new<-gsub(" ","",gsub("\\s+(\\D+)[[:digit:]]+\\+.*","\\1",gsub("\\s+(\\D+)[[:digit:]]+\\-.*","\\1",Lines1[-1])))
Col2<-gsub("\\s+\\D+([[:digit:]]+\\+.*)","\\1",gsub("\\s+\\D+([[:digit:]]+\\-.*)","\\1",Lines1[-1]))
dat1<-data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=""),stringsAsFactors=FALSE)

heading<-unlist(strsplit(Lines1[1]," "))
colnames(dat1)<-heading[heading!=""]
 head(dat1,6)
#            race   age percent sepercent flag_var
#1        Mexican 12-17  5.7926   0.64195      any
#2    PuertoRican 12-17  5.1975   0.24929      any
#3          Cuban 12-17  3.7977   1.00487      any
#4    C-SAmerican 12-17  4.3665   0.55329      any
#5      Dominican 12-17  1.8149   0.46677      any
#6 Spanish(Spain) 12-17  6.1971   0.98386      any



 str(dat1)
'data.frame':    195 obs. of  5 variables:
 $ race     : chr  "Mexican" "PuertoRican" "Cuban" "C-SAmerican" ...
 $ age      : chr  "12-17" "12-17" "12-17" "12-17" ...
 $ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
 $ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
 $ flag_var : chr  "any" "any" "any" "any" ...

A.K.



----- Original Message -----
From: "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov>
To: 'arun' <smartpink111 at yahoo.com>
Cc: David L Carlson <dcarlson at tamu.edu>; R help <r-help at r-project.org>
Sent: Saturday, December 8, 2012 5:20 PM
Subject: RE: [R] read. table()

Dear Arun,

The issue is that the column names are incorrect.  I will also look into the comment by Prof Ripley.

Thanks for your continued support and help.

Pradip

> str(read.delim(textConnection(xd1),header=TRUE,sep="\t"))
'data.frame':   195 obs. of  1 variable:
$ race....age...percent..sepercent..flag_var: Factor w/ 195 levels "            Cuban   26+  0.6653   0.31239      mrj",..: 27 148 13 140 108 193 169 100 85 67 ...
> names(agerace)
[1] "race....age...percent..sepercent..flag_var"
> head(agerace)
         race....age...percent..sepercent..flag_var
1          Mexican 12-17  5.7926   0.64195      any
2     Puerto Rican 12-17  5.1975   0.24929      any
3            Cuban 12-17  3.7977   1.00487      any
4     C-S American 12-17  4.3665   0.55329      any
5        Dominican 12-17  1.8149   0.46677      any
6  Spanish (Spain) 12-17  6.1971   0.98386      any

Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: Pradip.Muhuri at samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please click on the following link to complete a brief customer survey:  http://cbhsqsurvey.samhsa.gov


-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: Saturday, December 08, 2012 5:13 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: David L Carlson; R help
Subject: Re: [R] read. table()



Hi,

You can check the str()
I assume it will be like this:
str(read.delim(textConnection(Lines),header=TRUE,sep="\t"))
#'data.frame':    195 obs. of  1 variable:
# $ race....age...percent..sepercent..flag_var: Factor w/ 195 levels "    C-S American 12-17  0.2399   0.15804      coc",..: 50 170 20 5 35 185 65 155 110 80 ...

A.K.




----- Original Message -----
From: "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov>
To: 'Prof Brian Ripley' <ripley at stats.ox.ac.uk>; "r-help at r-project.org" <r-help at r-project.org>
Cc:
Sent: Saturday, December 8, 2012 5:05 PM
Subject: Re: [R] read. table()

Dear Prof Ripley,

Your hint is helpful, and I see considerable improvements in the results.

The only issue is that the column names do not seem to be correct.  I did not understand part of your comment, which says "fortunes::fortune(14) applies" although I read about the double colon operator- ns-dblcolon {base}.

Could you please provide a little more hint for me to resolve the issue?

Thanks and regards,

######### new result ########
> agerace <- read.delim(textConnection(xd1), sep="\t",  header=TRUE, as.is=TRUE)
> names(agerace)
[1] "race....age...percent..sepercent..flag_var"
> head(agerace)
         race....age...percent..sepercent..flag_var
1          Mexican 12-17  5.7926   0.64195      any
2     Puerto Rican 12-17  5.1975   0.24929      any
3            Cuban 12-17  3.7977   1.00487      any
4     C-S American 12-17  4.3665   0.55329      any
5        Dominican 12-17  1.8149   0.46677      any
6  Spanish (Spain) 12-17  6.1971   0.98386      any


Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: Pradip.Muhuri at samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please click on the following link to complete a brief customer survey:  http://cbhsqsurvey.samhsa.gov


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Prof Brian Ripley
Sent: Saturday, December 08, 2012 2:29 PM
To: r-help at r-project.org
Subject: Re: [R] read.table()

On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>
> Hi List,
>
> I have spent more than 30 minutes, but failed to read in this file using the read.table() function. I could not figure out how to fix the following error.

Well, we have a whole manual on this, mentioned on ?read.table (see See
Also)  Have you read it?  fortunes::fortune(14) applies.

The issue is what the separator is.  You have specified whitespace, and
that is not correct.  The original might have had tabs (see ?read.delim)
but as pasted into this email only a human can disentangle this file.

> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :   line 1 did not have 6 elements
>
> Any help would be be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
>
> ####### below is the  reproducible example
> xd1 <-  "race    age   percent  sepercent  flag_var
>           Mexican 12-17  5.7926   0.64195      any
>      Puerto Rican 12-17  5.1975   0.24929      any
>             Cuban 12-17  3.7977   1.00487      any
>      C-S American 12-17  4.3665   0.55329      any
>         Dominican 12-17  1.8149   0.46677      any
>   Spanish (Spain) 12-17  6.1971   0.98386      any
>    Multi Hisp Eth 12-17  6.7006   1.12464      any
>          NH White 12-17  4.8442   0.08660      any
>          NH Black 12-17  3.6943   0.16045      any
>          NH AM-AK 12-17  9.6325   1.06100      any
>         NH HI-OPI 12-17  3.9189   1.08047      any
>          NH Asian 12-17  1.9115   0.28432      any
>    NH Multiracial 12-17  6.4255   0.51434      any
>            Mexican 18-25  8.9284   0.73022      any
>       Puerto Rican 18-25  6.1364   0.28394      any
>              Cuban 18-25  8.6782   1.45543      any
>       C-S American 18-25  5.9360   0.59899      any
>          Dominican 18-25  7.7642   1.64553      any
>    Spanish (Spain) 18-25  9.2632   1.15652      any
>     Multi Hisp Eth 18-25 11.3566   1.79282      any
>           NH White 18-25  8.6484   0.11866      any
>           NH Black 18-25  7.5972   0.24926      any
>           NH AM-AK 18-25 13.5041   1.57275      any
>          NH HI-OPI 18-25  8.0227   1.41348      any
>           NH Asian 18-25  3.2701   0.32414      any
>     NH Multiracial 18-25 10.6489   0.85105      any
>            Mexican   26+  3.2110   0.51683      any
>       Puerto Rican   26+  1.6273   0.15033      any
>              Cuban   26+  1.4419   0.44118      any
>       C-S American   26+  1.0187   0.26594      any
>          Dominican   26+  0.9554   0.50275      any
>    Spanish (Spain)   26+  2.5976   0.86230      any
>     Multi Hisp Eth   26+  1.1345   0.66375      any
>           NH White   26+  1.5510   0.04156      any
>           NH Black   26+  2.8763   0.15133      any
>           NH AM-AK   26+  3.9674   0.76611      any
>          NH HI-OPI   26+  1.2919   0.66205      any
>           NH Asian   26+  0.7207   0.13870      any
>     NH Multiracial   26+  3.0668   0.52334      any
>            Mexican 12-17  4.3152   0.53235      mrj
>       Puerto Rican 12-17  3.7237   0.20969      mrj
>              Cuban 12-17  2.0616   0.67248      mrj
>       C-S American 12-17  3.3282   0.47392      mrj
>          Dominican 12-17  1.3797   0.40435      mrj
>    Spanish (Spain) 12-17  5.1810   0.93979      mrj
>     Multi Hisp Eth 12-17  4.8915   0.94816      mrj
>           NH White 12-17  3.6190   0.07379      mrj
>           NH Black 12-17  2.8196   0.14042      mrj
>           NH AM-AK 12-17  6.5091   0.85124      mrj
>          NH HI-OPI 12-17  3.6267   1.06724      mrj
>           NH Asian 12-17  1.3162   0.23575      mrj
>     NH Multiracial 12-17  5.0657   0.49614      mrj
>            Mexican 18-25  7.3802   0.67992      mrj
>       Puerto Rican 18-25  4.3260   0.24191      mrj
>              Cuban 18-25  6.1433   1.19242      mrj
>       C-S American 18-25  3.9166   0.51272      mrj
>          Dominican 18-25  5.8000   1.24097      mrj
>    Spanish (Spain) 18-25  6.8646   1.01387      mrj
>     Multi Hisp Eth 18-25 10.1134   1.75013      mrj
>           NH White 18-25  5.8656   0.10100      mrj
>           NH Black 18-25  6.6869   0.23643      mrj
>           NH AM-AK 18-25 11.2989   1.51687      mrj
>          NH HI-OPI 18-25  5.6302   1.14561      mrj
>           NH Asian 18-25  2.3418   0.28309      mrj
>     NH Multiracial 18-25  8.2696   0.77139      mrj
>            Mexican   26+  1.1658   0.33967      mrj
>       Puerto Rican   26+  0.6757   0.09329      mrj
>              Cuban   26+  0.6653   0.31239      mrj
>       C-S American   26+  0.3177   0.17604      mrj
>          Dominican   26+  0.5616   0.39780      mrj
>    Spanish (Spain)   26+  1.8078   0.82590      mrj
>     Multi Hisp Eth   26+  0.8468   0.63529      mrj
>           NH White   26+  0.6915   0.02791      mrj
>           NH Black   26+  1.5675   0.12031      mrj
>           NH AM-AK   26+  1.7273   0.37673      mrj
>          NH HI-OPI   26+  0.0356   0.03535      mrj
>           NH Asian   26+  0.2687   0.07564      mrj
>     NH Multiracial   26+  1.3419   0.30074      mrj
>            Mexican 12-17  1.2074   0.36082      anl
>       Puerto Rican 12-17  1.0772   0.11547      anl
>              Cuban 12-17  1.2569   0.67109      anl
>       C-S American 12-17  0.6213   0.22726      anl
>          Dominican 12-17  0.1412   0.08552      anl
>    Spanish (Spain) 12-17  0.9625   0.25453      anl
>     Multi Hisp Eth 12-17  1.2863   0.43909      anl
>           NH White 12-17  1.1490   0.04289      anl
>           NH Black 12-17  0.5932   0.06220      anl
>           NH AM-AK 12-17  1.9117   0.50122      anl
>          NH HI-OPI 12-17  0.3833   0.20240      anl
>           NH Asian 12-17  0.4782   0.14706      anl
>     NH Multiracial 12-17  1.5369   0.25321      anl
>            Mexican 18-25  1.1836   0.24209      anl
>       Puerto Rican 18-25  1.0337   0.11015      anl
>              Cuban 18-25  1.2738   0.45891      anl
>       C-S American 18-25  0.5598   0.15047      anl
>          Dominican 18-25  0.4720   0.31559      anl
>    Spanish (Spain) 18-25  1.7871   0.64048      anl
>     Multi Hisp Eth 18-25  1.2764   0.48779      anl
>           NH White 18-25  2.0818   0.05831      anl
>          NH Black 18-25  0.7851   0.07803      anl
>          NH AM-AK 18-25  1.8964   0.46240      anl
>         NH HI-OPI 18-25  1.9397   0.73301      anl
>          NH Asian 18-25  0.4858   0.13528      anl
>    NH Multiracial 18-25  1.7864   0.30651      anl
>           Mexican   26+  0.4014   0.08306      anl
>      Puerto Rican   26+  0.4536   0.07721      anl
>             Cuban   26+  0.2164   0.17096      anl
>      C-S American   26+  0.2233   0.09101      anl
>         Dominican   26+  0.0000   0.00000      anl
>   Spanish (Spain)   26+  1.1527   0.74125      anl
>    Multi Hisp Eth   26+  0.0303   0.03045      anl
>          NH White   26+  0.4970   0.02275      anl
>          NH Black   26+  0.3748   0.06124      anl
>          NH AM-AK   26+  1.4842   0.52284      anl
>         NH HI-OPI   26+  0.3898   0.34827      anl
>          NH Asian   26+  0.2536   0.07643      anl
>    NH Multiracial   26+  0.5120   0.18326      anl
>           Mexican 12-17  0.2453   0.15761      coc
>      Puerto Rican 12-17  0.4351   0.06999      coc
>             Cuban 12-17  0.2472   0.24698      coc
>      C-S American 12-17  0.2399   0.15804      coc
>         Dominican 12-17  0.0000   0.00000      coc
>   Spanish (Spain) 12-17  0.5315   0.30907      coc
>    Multi Hisp Eth 12-17  0.9797   0.53981      coc
>          NH White 12-17  0.3559   0.02305      coc
>          NH Black 12-17  0.0220   0.01235      coc
>          NH AM-AK 12-17  0.3588   0.23956      coc
>         NH HI-OPI 12-17  0.0000   0.00000      coc
>          NH Asian 12-17  0.1171   0.07887      coc
>    NH Multiracial 12-17  0.4702   0.14823      coc
>           Mexican 18-25  1.1540   0.26424      coc
>      Puerto Rican 18-25  1.3422   0.12707      coc
>             Cuban 18-25  1.6312   0.69363      coc
>      C-S American 18-25  0.8669   0.23394      coc
>         Dominican 18-25  0.6003   0.43959      coc
>   Spanish (Spain) 18-25  1.9886   0.59004      coc
>    Multi Hisp Eth 18-25  1.8588   0.86984      coc
>          NH White 18-25  1.3990   0.04700      coc
>          NH Black 18-25  0.3640   0.04961      coc
>          NH AM-AK 18-25  2.2718   0.77117      coc
>         NH HI-OPI 18-25  0.8386   0.47913      coc
>          NH Asian 18-25  0.1947   0.05994      coc
>    NH Multiracial 18-25  1.5209   0.30649      coc
>           Mexican   26+  1.4155   0.39542      coc
>      Puerto Rican   26+  0.5618   0.09323      coc
>             Cuban   26+  0.7766   0.31905      coc
>      C-S American   26+  0.3364   0.15414      coc
>         Dominican   26+  0.2632   0.26477      coc
>   Spanish (Spain)   26+  0.1596   0.07740      coc
>    Multi Hisp Eth   26+  0.2521   0.18020      coc
>          NH White   26+  0.3928   0.02073      coc
>          NH Black   26+  1.1867   0.09546      coc
>          NH AM-AK   26+  0.6865   0.24570      coc
>         NH HI-OPI   26+  0.5155   0.49176      coc
>          NH Asian   26+  0.0787   0.04558      coc
>    NH Multiracial   26+  1.1320   0.37928      coc
>         Mexican 12-17  0.6556   0.23195      inh
>      Puerto Rican 12-17  0.6060   0.08943      inh
>             Cuban 12-17  0.4765   0.36661      inh
>      C-S American 12-17  0.3629   0.12994      inh
>         Dominican 12-17  0.0300   0.03006      inh
>   Spanish (Spain) 12-17  0.2020   0.11445      inh
>    Multi Hisp Eth 12-17  0.7095   0.32063      inh
>          NH White 12-17  0.4161   0.02587      inh
>          NH Black 12-17  0.2608   0.04218      inh
>          NH AM-AK 12-17  1.3372   0.40763      inh
>         NH HI-OPI 12-17  0.1116   0.06566      inh
>          NH Asian 12-17  0.1580   0.08034      inh
>    NH Multiracial 12-17  0.5472   0.13080      inh
>           Mexican 18-25  0.0160   0.01601      inh
>      Puerto Rican 18-25  0.2163   0.06270      inh
>             Cuban 18-25  0.3252   0.32468      inh
>      C-S American 18-25  0.2238   0.12254      inh
>         Dominican 18-25  0.9445   0.94734      inh
>   Spanish (Spain) 18-25  0.0443   0.03141      inh
>    Multi Hisp Eth 18-25  0.6523   0.57082      inh
>          NH White 18-25  0.1016   0.01257      inh
>          NH Black 18-25  0.0617   0.02371      inh
>          NH AM-AK 18-25  0.2387   0.14246      inh
>         NH HI-OPI 18-25  0.0000   0.00000      inh
>          NH Asian 18-25  0.1894   0.06962      inh
>    NH Multiracial 18-25  0.0562   0.03261      inh
>           Mexican   26+  0.0160   0.01600      inh
>      Puerto Rican   26+  0.0185   0.01276      inh
>             Cuban   26+  0.0000   0.00000      inh
>      C-S American   26+  0.0696   0.06954      inh
>         Dominican   26+  0.0000   0.00000      inh
>   Spanish (Spain)   26+  0.1571   0.11467      inh
>    Multi Hisp Eth   26+  0.0000   0.00000      inh
>          NH White   26+  0.0174   0.00456      inh
>          NH Black   26+  0.0131   0.00757      inh
>          NH AM-AK   26+  0.2587   0.24381      inh
>         NH HI-OPI   26+  0.0000   0.00000      inh
>          NH Asian   26+  0.0607   0.03372      inh
>    NH Multiracial   26+  0.0433   0.02960      inh"
>
> agerace <- read.table(textConnection(xd1), header=TRUE, as.is=TRUE)
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list