[R] Splitting strings in data files R
Mark Sharp
msharp at txbiomed.org
Wed Jan 20 22:31:59 CET 2016
Looks like homework.
R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center
Texas Biomedical Research Institute
P.O. Box 760549
San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msharp at TxBiomed.org
> On Jan 20, 2016, at 2:53 PM, R. Help <r-help at r-project.org> wrote:
>
>
>
>
> Please I need help processing files with strings in R. All the files have two patterns (thus,examine separately):
> Pattern 1 (see file1 below): Delete Lines 1,2 & 4 in file1. Line 3 contains the column names. Then find anything as.character and delete. Please do not delete any values (e.g. delete T in 0.21T). Also find -999.99M,-999.99 and replace with with NA.
>
> File1 output format should be: Year Month Day_1 Day_2 ... Day_31 ## so all months should 31 days. Months with <31 days should have NA where appropraite (e.g. Feb 30=NA, 31=NA)
>
> Pattern 2 (see file2 below): Delete Line 1 in file2.Then find anything as.character and delete. Please do not delete any values (e.g. delete T in 0.21T). Also find -999.99M,-999.99 and replace with withNA. File2 has no column names. Please do not include any.
> File2 output format: Year Month Day_1 Day_2 ... Day_31 but no column names
>
> Here is a simple reproducible example for both files/cases:
>
>
> file1=list(df1,df1)df1=list(structure(list(X7011982.....DONNACONA........QC..station.joined......Homogenized.daily.maximum.temperature..........Deg.Celcius...........Updated.to.December.2014 =structure(c(20L,19L,21L,1L,2L,3L,4L,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L,15L,16L,17L,18L),.Label =c(" 1918 7 -9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M-9999.9M 23.6a 25.9a 25.8a 24.9a 24.9a 29.6a 27.4a 24.5a 28.5a 28.5a 30.1a 25.3a 28.5a 19.6a 24.1a"," 1918 8 23.7a 18.6a 17.6a 19.0a 23.7a 24.7a 18.6a 22.6a 20.1a 21.4a 22.6a 24.9a 24.1a 23.2a 22.0a 17.6a 19.0a 19.0a 23.7a 24.1a 24.9a 27.9a 26.2a 22.6a 24.0a 25.4a 21.4a 24.4a 19.0a 22.6a 23.7a"," 1918 9 22.0a 22.0a 24.0a 19.0a 14.4a 11.2a 17.1a 18.1a 19.0a 12.0a 13.5a 9.6a 11.2a 10.7a 18.1a 18.1a 16.3a 14.4a 14.3a 15.9a 10.1a 9.8a 11.3a 11.4a 13.6a 14.4a 9.3a 9.6a 9.2a 8.4a-9999!
> .9M"," 1918 10 9.3a 9.5a 11.3a 10.2a 9.9a-9999.9M 4.6a-9999.9M 9.8a 13.6a 17.0a 15.2a 15.1a 15.9a 8.1a 9.3a 8.8a 6.0a 8.7a 9.8a 9.8a 10.7a 11.3a 9.5a 10.7a-9999.9M 10.7a 16.9a 17.1a 10.7a 12.7a"," 1918 11 8.8a-9999.9M 4.0a 3.4a 4.0a 6.6a 4.0a 7.3a 8.1a 7.3a 2.5a 3.4a 7.7a 6.1a 2.2a 4.0a 4.6a 2.5a 2.2a 1.6a 2.2a 3.0a -3.4a 2.5a 1.6a -3.4a 2.1a 0.0a 2.6a 0.6a-9999.9M"," 1918 12 -10.1a -8.3a -6.3a -5.5a -5.1a -7.2a-9999.9M -3.4a -2.2a -5.5a -6.0a -3.4a 0.6a 3.0a 4.7a 0.6a -5.0a -6.4a -5.9a -2.2a 1.2a 4.0a 5.3a-9999.9M -2.2a -5.5a -7.5a -9.6a -7.3a -6.6a-9999.9M"," 1919 1 2.5a 0.0a -7.3a -6.7a -6.6a -9.2a -5.9a -0.7a -2.9a -13.2a -8.0a -17.1a -7.4a -4.0a -5.5a 0.6a -7.1a -5.5a -2.2a -7.6a -7.0a -3.4a -2.2a -6.7a -8.0a -2.9a -1.5a -5.9a -5.5a -5.8a -3.4a"," 1919 2 -9999.9M 0.0a -4.0a -3.4a -1.5a -2.1a -3!
> .4a -7.2a -2.8a -5.5a -6.7a -5.1a -2.1a -2.1a 1.2a -2.1a -5
> .9a -2.8a -4.5a -4.5a -3.4a 2.1a 0.0a 0.0a 1.2a -2.1a -8.3a -6.6a-9999.9M-9999.9M-9999.9M"," 1919 3 -9999.9M 0.0a 1.6a 1.7a -5.1a -6.7a -5.1a -3.4a -2.0a 1.2a 3.4a 1.2a -8.3a -7.9a -3.4a -2.0a 1.2a 3.4a 6.6a 1.2a 6.6a 1.2a 6.6a 6.6a 3.4a 10.7a 6.6a 6.6a 0.6a 0.6a -6.8a"," 1919 4 -5.9a -3.4a 2.1a 3.4a 3.0a 3.0a 8.1a 8.1a 6.0a 2.1a 6.6a 8.5a 6.0a 1.7a 4.7a 2.4a 2.1a 8.5a 1.2a 1.7a 9.6a 8.6a 12.8a 9.5a -2.8a 9.8a 4.7a 10.7a 6.6a 11.2a-9999.9M"," 1919 5 16.4a 8.5a 9.4a 6.0a 10.7a 9.8a 8.5a 13.6a 14.4a-9999.9M 16.4a 19.0a 23.2a 16.9a 17.0a 19.6a 11.3a 9.4a 12.1a 17.2a 15.2a 17.0a 15.2a 17.5a 10.2a 22.6a 14.5a 22.0a 24.9a 23.8a 19.0a"," 1919 6 17.7a 25.4a 31.2a 25.3a 26.8a 22.0a 15.8a 19.0a 12.7a 19.6a 19.0a 24.5a 25.1a 27.4a 26.8a 19.0a 20.8a 26.8a 27.9a 25.8a 20.1a 17.7a 19.0a 32.4a 30.7a 22.6a !
> 19.0a 13.6a 17.5a 24.1a-9999.9M"," 1919 7 23.7a 24.4a 27.9a 29.6a 23.7a 21.3a 23.7a 20.1a 23.7a 21.3a 17.0a 17.8a 23.7a 27.4a 18.2a 23.2a 24.5a 26.2a 25.8a 27.9a 29.0a 25.3a 25.1a 23.9a 22.6a 23.9a 20.8a 25.8a 20.1a 23.2a 23.7a"," 1919 8 20.8a 18.2a 20.1a 20.1a 25.1a 20.8a 24.6a 18.5a 17.6a 22.0a 24.0a 23.2a 24.0a 24.0a 20.8a 24.0a 23.7a 23.8a 17.1a 23.8a 24.6a 23.8a 19.6a 24.0a 24.0a 16.9a 18.2a 18.6a 18.6a 23.2a 20.8a"," 1919 9 24.0a 21.3a 24.4a 18.1a 19.0a 19.0a 17.7a 11.4a 10.7a 12.7a 15.2a 15.2a 18.6a 12.7a 15.2a 10.1a 12.0a 12.7a 19.6a 18.5a 28.5a 28.5a 10.7a 14.5a 15.8a 11.3a 11.3a 20.8a 23.2a 11.3a-9999.9M"," 1919 10 11.3a 8.2a 8.2a 16.4a 10.7a 17.5a 7.7a 6.0a 11.3a 7.3a 12.1a 7.7a 10.2a 15.9a 18.2a 9.0a 10.7a 9.8a 8.2a 7.3a 7.7a 8.9a 9.5a 12.1a 10.2a 10.2a 4.0a 10.7a 2.9a 5.3a 3.0a"," 1919 11 8.2a 2.2a 1.2a !
> 2.6a 1.7a 2.6a 6.1a 8.2a 7.7a 5.3a 4.7a 8.9a 4.7a
> 1.7a -4.0a 2.2a 7.7a 7.7a 0.6a -4.5a 2.6a 3.4a 2.5a -3.4a -5.9a -5.1a -5.5a -6.4a 8.9a 3.0a-9999.9M"," 1919 12 -4.0a -9.2a -10.5a -5.1a -4.5a -6.9a -4.0a -4.0a 3.0a 2.2a -9.2a -3.4a 5.3a -6.4a -6.9a -20.4a -20.4a -17.6a -10.5a -13.8a -8.7a -3.4a -2.9a -4.5a -5.5a -5.5a -2.9a -0.8a -10.1a -6.9a -5.9a"," Year Mo Day 01 Day 02 Day 03 Day 04 Day 05 Day 06 Day 07 Day 08 Day 09 Day 10 Day 11 Day 12 Day 13 Day 14 Day 15 Day 16 Day 17 Day 18 Day 19 Day 20 Day 21 Day 22 Day 23 Day 24 Day 25 Day 26 Day 27 Day 28 Day 29 Day 30 Day 31","7011982, DONNACONA , QC, station jointe , Temperature quotidienne maximale homogeneisee, Deg Celcius, Mise a jour jusqu a decembre 2014","Annee Mo Jour 01 Jour 02 Jour 03 Jour 04 Jour 05 Jour 06 Jour 07 Jour 08 Jour 09 Jour 10 Jour 11 Jour 12 Jour 13 Jour 14 Jour 15 Jour 16 Jour 17 Jour 18 Jour 19 Jour 20 Jour 21 Jour 22 Jour 23 Jour 24 Jour 25 Jour 26 Jour !
> 27 Jour 28 Jour 29 Jour 30 Jour 31"),class ="factor")),.Names ="X7011982.....DONNACONA........QC..station.joined......Homogenized.daily.maximum.temperature..........Deg.Celcius...........Updated.to.December.2014",class ="data.frame",row.names =c(NA,-21L)))
>
>
>
>
> file2=list(df2,df2)df2=list(structure(list(X250M001.MOULD.BAY.................NT.station.joined.....Daily.adjusted.precipitation..mm..Updated.to.December.2014 =structure(1:24,.Label =c("1948 1 -9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M","1948 2 -9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M","1948 3 -9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99!
> M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M","1948 4 -9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M","1948 5 -9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M-9999.99M 0.00 0.00 0.21T 0.21T 1.69 0.21T 0.21T 0.21T 0.21T 0.00 0.21T 0.00 0.00 0.00 0.00 1.39 0.00 0.21T","1948 6 0.00 0.00 0.30T 3.34T 0.21T 0.00 0.00 7.19T 0.21T 1.04 0.00 4.29 1.69 0.21T 0.00 0.00 0.21T 0.65 0.21T 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.21T 0.00 0.21T-9999.99M","1948 7 0.21T 0.51T 0.00 2.74 0.00 0.00 0.00 0.00 0.00 1.05 0.00 !
> 0.00 1.57 1.57 2.30 0.74T 0.00 0.30T 0.74 0.53
> 0.00 0.21T 0.00 0.00 0.00 0.00 0.00 0.53 3.34 2.30 13.43 ","1948 8 0.30T 2.61 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.53 0.21T 0.65 3.65 3.25 0.21T 3.90 0.21T 0.21T 0.21T 0.30T 0.21T 0.21T 0.21T 0.00 0.21T 0.65 1.95 0.21T 0.21T","1948 9 0.00 0.21T 0.21T 0.21T 0.21T 0.69T 7.54 0.00 0.00 0.00 0.21T 0.21T 0.00 0.21T 0.21T 0.21T 0.00 0.21T 1.04 0.00 0.00 0.00 0.00 7.28 4.68 2.34 1.95 3.90 1.30 0.21T-9999.99M","1948 10 1.04 0.00 0.00 1.69 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.65 0.21T 0.21T 0.00 0.21T 0.21T 0.21T 0.00 0.21T 0.21T 0.21T 0.21T 0.21T 0.21T","1948 11 0.00 0.00 0.00 0.21T 0.21T 0.00 0.21T 1.04 0.21T 0.00 0.00 1.69 0.21T 0.21T !
> 0.00 0.00 0.21T 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -9999.99M","1948 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.21T 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.21T 0.21T 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ","1949 1 0.00 0.00 0.00 0.00 0.00 0.39 0.65 0.00 0.21T 0.21T 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.21T 0.21T 0.65 0.39 0.21T 0.00 0.00 0.00 0.21T 0.00 0.00 ","1949 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.21T 0.00 0.00 0.00 0.00 0.00 0.00 0.21T 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.21T 0.21T 0.00 0.00 0.00 -9999.99M-9999.99M-9999.99M","1949 3 0.00 0.00 0.00 0.00 0.00 0.00 0.21T 0.!
> 21T 0.00 1.69 0.00 1.04 1.69 0.65 0.21T 0.51T
> 0.21T 0.21T 0.21T 0.21T 0.21T 0.00 0.00 0.21T 0.00 0.00 0.00 0.00 0.21T 0.21T 0.00 ","1949 4 0.00 0.00 0.39 0.21T 0.00 0.00 0.39 0.21T 0.00 0.00 0.00 0.21T 0.00 0.21T 0.00 0.00 0.00 0.21T 0.21T 0.00 0.00 0.65 0.21T 0.21T 0.00 0.00 0.00 0.00 0.00 0.00 -9999.99M","1949 5 0.00 0.00 0.00 0.00 0.00 0.21T 0.39 0.21T 0.00 0.00 0.21T 0.00 0.00 0.00 0.00 0.00 0.21T 0.39 0.39 0.39 0.00 0.00 0.21T 0.21T 0.21T 0.39 0.21T 0.39 0.39 0.21T 1.04 ","1949 6 0.39 0.21T 0.00 0.21T 0.00 0.21T 0.21T 0.65 0.00 0.00 0.00 0.21T 0.21T 0.00 0.00 0.51T 0.21T 0.00 0.39 0.21T 0.00 0.21T 0.21T 0.39 0.39 0.21T 0.00 0.00 0.00 0.00 -9999.99M","1949 7 0.00 0.00 0.53 0.51T 0.51T 0.21T 0.51T 0.30T 0.00 0.00 !
> 0.00 0.00 0.00 0.00 0.30T 0.30T 0.00 0.00 0.00 0.00 0.51T 0.51T 6.25 22.63 0.00 0.51T 0.21T 0.30T 0.30T 0.00 0.00 ","1949 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.51T 0.51T 0.00 0.30T 0.30T 0.00 0.00 6.56 0.00 0.00 0.00 0.00 1.05 0.21T 0.21T 0.21T 0.00 0.21T 0.21T 0.51T 0.00 0.30T 0.21T 0.21T","1949 9 0.30T 0.30T 0.00 0.39 0.39 0.21T 0.00 0.00 0.21T 0.21T 0.00 0.00 0.00 0.21T 0.00 0.21T 0.00 0.00 0.00 0.00 0.00 0.21T 0.00 0.00 0.00 0.00 0.00 0.21T 0.21T 0.21T-9999.99M","1949 10 0.21T 0.00 0.00 0.00 1.04 0.39 0.65 0.21T 0.00 0.00 0.21T 0.21T 0.21T 0.39 0.21T 0.65 0.65 0.21T 0.65 0.00 0.00 0.21T 0.00 0.21T 0.00 0.21T 0.21T 0.00 0.00 0.00 0.00 ","1949 11 0.00 0.00 0.00 0.00 !
> 1.04 0.21T 0.00 0.00 0.21T 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.21T-9999.99M","1949 12 0.21T 0.21T 0.21T 0.00 0.00 0.00 0.00 0.00 0.21T 0.21T 0.00 0.00 0.21T 0.39 0.00 0.00 0.00 0.00 0.21T 0.21T 0.21T 0.21T 0.21T 0.21T 0.00 0.00 0.00 0.00 0.21T 0.00 0.00 "),class ="factor")),.Names ="X250M001.MOULD.BAY.................NT.station.joined.....Daily.adjusted.precipitation..mm..Updated.to.December.2014",class ="data.frame",row.names =c(NA,-24L)))
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
More information about the R-help
mailing list