[R] Help with parsing a data file

Henrique Dallazuanna wwwhsd at gmail.com
Thu Mar 6 20:49:24 CET 2008


Try this:

lines <- readLines('yourfile')
newLines <- lines[-(1:(13+3))]
coln <-  scan(textConnection(lines[3]), what="")
lapply(which(nchar(newLines) == 4),
function(x)read.table(textConnection(newLines[seq(x + 1, x + 13)]),
col.names=coln))


On 06/03/2008, sean <smachin1000 at gmail.com> wrote:
> Hi All,
>
>  I need to parse data from a file, example shown below.  The first two lines
>  can be skipped, the third line contains the column names.  The next 13 lines
>  can be skipped.  The next line "1991" is a year value, with the following 13
>  values data for that year.  The file then repeats this format with (year, 13
>  lines of data for that year).  I would ideally like to end up with an
>  array/list/vector of the block of 13 values, indexed by year, each block
>  using the column names given on the third line.
>
>  If anyone has any good ideas on how to do this in R, pls. let me know.
>
>  Thanks,
>  Sean
>
>  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  725280 BUFFALO NIAGARA INTL A NY  -5  N42 56  W078 44   215   988
>   1991-2005
>   MO AVGLO FL SDGLO AVDIR FL SDDIR AVDIF FL SDDIF AVETR AETRN  TOT  OPQ
>  H2O   TAU  MAX_T  MIN_T  AVG_T  AVGDT  RH  HTDD  CLDD AVWS
>   1  1336 K5   222  1534 K7   676   837 K5    72  3806 13256  8.4  8.1  0.83
>  0.09  -0.52  -7.40  -3.86  -3.36  75   691     0  5.5
>   2  2261 K5   400  2691 K7  1026  1129 K5    74  5330 14714  7.6  7.1  0.79
>  0.10   0.97  -6.67  -2.76  -1.85  73   599     0  5.1
>   3  3249 K5   413  3207 K7   852  1578 K5   118  7428 16443  7.2  6.7  0.98
>  0.12   4.96  -3.21   0.93   2.04  71   541     0  4.9
>   4  4460 K5   570  4051 K6  1045  1951 K5   130  9509 18140  6.6  6.1  1.33
>  0.13  12.18   2.68   7.39   8.54  67   328     1  4.7
>   5  5484 K5   518  4529 K6   801  2408 K5   142 10999 19523  6.1  5.4  1.87
>  0.15  18.77   8.68  13.83  15.07  69   154    12  4.5
>   6  6046 K5   383  5011 K6   671  2567 K5   166 11616 20177  5.7  4.9  2.64
>  0.16  24.05  14.52  19.47  20.66  71    34    63  4.1
>   7  5793 K5   529  4734 K6   884  2537 K5   127 11250 19734  5.6  4.9  2.97
>  0.16  26.10  16.92  21.70  22.90  71     5   104  4.1
>   8  5057 K5   417  4390 K6   693  2245 K5    94  9974 18430  5.6  4.9  2.92
>  0.15  25.61  16.37  21.10  22.56  73     9    91  3.6
>   9  4001 K5   458  3864 K6   826  1797 K5   105  8078 16803  5.6  5.0  2.36
>  0.13  21.73  12.20  17.06  18.68  73    71    30  3.9
>   10  2502 K5   254  2564 K7   584  1306 K5    88  5948 15098  6.3  5.8  1.67
>  0.11  14.89   6.40  10.71  12.16  72   241     3  4.3
>   11  1395 K5   198  1394 K7   492   887 K5    47  4170 13545  7.9  7.5  1.30
>  0.10   8.37   1.42   4.91   5.82  73   403     0  5.0
>   12  1120 K5   173  1391 K7   475   701 K5    52  3351 12733  7.9  7.7  0.94
>  0.09   2.41  -3.92  -0.70  -0.03  75   592     0  5.0
>   13  3559 K5   201  3280 K7   383  1662 K5    50  7622 16550  6.7  6.2  1.72
>  0.12  13.29   4.83   9.15  10.27  72  3668   304  4.5
>   1991
>   1  1313 I5   637  1374 I6  1636   832 I5   169  3800 13249  8.2  7.8  0.75
>  0.07  -0.09  -6.67  -3.46  -2.94  73   673     0  5.9
>   2  1875 I5   887  1767 I6  2080  1137 I5   263  5310 14694  8.3  7.6  0.85
>  0.08   2.44  -3.84  -0.61   0.15  73   533     0  5.9
>   3  3205 I5  1520  3371 I6  3133  1458 I5   392  7395 16417  6.7  6.1  1.12
>  0.10   7.23  -1.17   2.75   3.75  70   474     0  5.3
>   4  3999 I5  1911  3451 I6  3501  1918 I5   521  9482 18116  6.9  5.9  1.60
>  0.12  14.46   5.65   9.91  11.04  68   250     2  5.4
>   5  5968 I5  1854  5369 I6  2936  2296 I5   437 10983 19506  6.1  4.5  2.46
>  0.14  23.15  12.56  17.85  19.12  68    81    66  4.8
>   6  6988 I5  1577  6761 I6  2983  2288 I5   604 11614 20176  4.8  3.0  2.42
>  0.15  26.09  14.95  20.80  22.28  64    14    80  4.3
>   7  6364 I5  1538  5779 I6  2799  2404 I5   568 11262 19749  5.0  3.7  2.89
>  0.16  27.17  16.96  22.43  23.77  66     1   116  4.4
>   8  5407 I5  1478  5114 I6  2693  2106 I5   527  9999 18451  4.8  4.0  2.91
>  0.18  26.64  16.49  21.44  23.08  73     2   102  4.2
>   9  4482 I5  1010  4126 I6  1953  2033 I5   415  8109 16830  5.8  4.6  2.24
>  0.19  22.05  10.98  16.67  18.53  66    97    42  4.3
>   10  2534 I5   864  2419 I6  1859  1396 I5   289  5978 15123  6.1  5.3  1.83
>  0.20  16.44   6.92  11.72  13.21  72   213     7  4.4
>   11  1264 I5   716  1059 I6  1733   851 I5   206  4190 13565  8.3  8.0  1.33
>  0.21   7.63   0.44   3.94   4.82  77   429     0  5.1
>   12   976 I5   423   826 I6  1172   714 I5   156  3354 12738  7.6  7.2  0.98
>  0.22   3.40  -4.20  -0.34   0.21  78   581     0  5.6
>   13  3698 I5  2146  3451 I6  2002  1619 I5   629  7623 16551  6.6  5.6  1.78
>  0.15  14.72   5.76  10.26  11.42  71  3347   415  5.0
>   1992
>   1  1149 I5   496   701 I6   919   896 I5   231  3791 13236  8.5  8.1  0.84
>  0.24   0.68  -6.20  -2.60  -1.69  79   654     0  5.4
>   2  1580 I5   708   898 I6  1469  1198 I5   255  5328 14708  8.2  7.7  0.86
>  0.27   1.26  -6.19  -2.40  -1.56  78   603     0  4.7
>   3  2968 I5  1429  2145 I6  2037  1760 I5   452  7449 16457  7.3  6.7  0.97
>  0.29   3.82  -4.35  -0.11   1.01  70   577     0  4.8
>   4  4050 I5  1812  2937 I6  2634  2146 I5   404  9527 18154  7.3  6.4  1.41
>  0.29  10.64   2.33   6.40   7.50  71   356     1  4.1
>   5  5654 I5  1935  4311 I6  2843  2695 I5   557 11009 19528  5.4  4.3  1.74
>  0.29  19.79   8.17  14.13  15.66  66   148    13  3.8
>   6  6170 I5  2120  4608 I6  3029  2877 I5   695 11617 20176  5.3  4.1  2.17
>  0.28  22.76  11.91  17.63  19.06  65    56    26  4.0
>   7  4879 I5  1816  2795 I6  1915  2835 I5   595 11242 19729  7.2  6.3  2.99
>  0.27  23.05  15.33  19.23  20.19  75    18    44  4.4
>   8  5168 I5  1720  4473 I6  2922  2256 I5   444  9959 18415  5.8  4.9  2.64
>  0.24  23.28  14.62  19.05  20.37  72    26    46  4.4
>   9  4094 I5  1361  3741 I6  2382  1893 I5   375  8058 16789  5.8  4.5  2.50
>  0.21  21.25  11.59  16.56  18.13  72    84    27  4.5
>   10  2499 I5  1177  2228 I6  1904  1393 I5   311  5928 15081  6.1  5.6  1.45
>  0.18  13.37   4.21   8.81  10.50  70   296     0  4.7
>   11  1134 I5   680   731 I6  1287   849 I5   249  4156 13533  8.5  8.1  1.39
>  0.15   7.58   1.22   4.38   5.30  77   418     0  4.8
>   12  1048 I5   508  1136 I6  1428   687 I5   146  3348 12729  7.8  7.3  0.96
>  0.14   3.30  -3.43  -0.06   0.83  69   570     0  5.1
>   13  3366 I5  1883  2559 I6  1489  1790 I5   790  7618 16545  6.9  6.1  1.66
>  0.24  12.56   4.10   8.42   9.61  72  3806   157  4.6
>  ...
>
>         [[alternative HTML version deleted]]
>
>  ______________________________________________
>  R-help at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O



More information about the R-help mailing list