[R] How to read a file with two data sets in text format

jim holtman jholtman at gmail.com
Mon Jan 21 14:31:14 CET 2013


Here is one way to read the data.  Modified your sample for the line
counts of actual data:


x <- readLines(textConnection("40 Terry Cove-Model
300 .300110459327698
300.041656494141 .289277672767639
300.083343505859 .276237487792969
300.125 .258902788162231
300.166656494141 .236579895019531
300.208343505859 .221315026283264
300.25 .214318037033081
300.291656494141 .190926909446716
300.333343505859 .158144593238831
300.375 .113302707672119
300.416656494141 .103684902191162
300.458343505859 9.72903966903687E-02
300.5 8.76948833465576E-02
300.541656494141 8.42459201812744E-02
300.583343505859 .078397274017334
300.625 8.44632387161255E-02
300.666656494141 9.32939052581787E-02
300.708343505859 .113663911819458
300.75 .123064398765564
300.791656494141 .157548069953918
300.833343505859 .148393034934998
300.875 .135645747184753
300.916656494141 .137590646743774
300.958343505859 .133154153823853
301 .131152510643005
301.041656494141 .114152908325195
301.083343505859 8.04083347320557E-02
301.125 5.53587675094604E-02
301.166656494141 3.17397117614746E-02
301.208343505859 4.07266616821289E-03
301.25 -2.15455293655396E-02
301.291656494141 -4.07489538192749E-02
301.333343505859 -5.85414171218872E-02
301.375 -7.53517150878906E-02
301.416656494141 -8.49723815917969E-02
301.458343505859 -7.91778564453125E-02
301.5 -7.02846050262451E-02
301.541656494141 -7.24701881408691E-02
301.583343505859 -7.76907205581665E-02
301.625 -6.82642459869385E-02
81 Terry Cove-Data
300 .216407993
300.0042 .204216005
300.0083 .210311999
300.0125 .195071996
300.0167 .192023999
300.0208 .179831992
300.025 .188976001
300.0292 .185928004
300.0333 .195071996
300.0375 .219456009
300.0417 .210311999
300.0458 .204216005
300.05 .195071996
300.0542 .188976001
300.0583 .195071996
300.0625 .195071996
300.0667 .185928004
300.0708 .173735998
300.075 .170688001
300.0792 .167640004
300.0833 .167640004
300.0875 .167640004
300.0917 .167640004
300.0958 .161543991
300.1 .1524
300.1042 .158495994
300.1083 .149352003
300.1125 .158495994
300.1167 .1524
300.1208 .1524
300.125 .149352003
300.1292 .143256
300.1333 .146303997
300.1375 .149352003
300.1417 .146303997
300.1458 .137159996
300.15 .131064002
300.1542 .124967999
300.1583 .128015996
300.1625 .124967999
300.1667 .131064002
300.1708 .124967999
300.175 .124967999
300.1792 .134111999
300.1833 .118871996
300.1875 .128015996
300.1917 .131064002
300.1958 .128015996
300.2 .131064002
300.2042 .128015996
300.2083 .121920002
300.2125 .115823999
300.2167 .112776001
300.2208 .103632001
300.225 .097535998
300.2292 .103632001
300.2333 .094488001
300.2375 .082296003
300.2417 .0762
300.2458 .079247997
300.25 .067056
300.2542 .064007998
300.2583 .045720002
300.2625 .033528
300.2667 .036575999
300.2708 .036575999
300.275 .036575999
300.2792 .027432001
300.2833 .027432001
300.2875 .021336
300.2917 .012192
300.2958 .009144
300.3 .009144
300.3042 .003048
300.3083 0
300.3125 -.003048
300.3167 -.006096
300.3208 0
300.325 .006096
300.3292 -.003048
300.3333 .006096"))
indx <- grep("^[0-9]+ [[:alpha:]]", x)  # determine where breaks are

# read data into a list
result <- lapply(indx, function(.start){
    # extract the line count
    n <- as.integer(sub("^\\s*([0-9]+).*", "\\1", x[.start]))
    read.table(text = x[seq(.start + 1L, length = n)])
})
str(result)

> str(result)
List of 2
 $ :'data.frame':       40 obs. of  2 variables:
  ..$ V1: num [1:40] 300 300 300 300 300 ...
  ..$ V2: num [1:40] 0.3 0.289 0.276 0.259 0.237 ...
 $ :'data.frame':       81 obs. of  2 variables:
  ..$ V1: num [1:81] 300 300 300 300 300 ...
  ..$ V2: num [1:81] 0.216 0.204 0.21 0.195 0.192 ...
> source('clipboard')
List of 2
 $ :'data.frame':       40 obs. of  2 variables:
  ..$ V1: num [1:40] 300 300 300 300 300 ...
  ..$ V2: num [1:40] 0.3 0.289 0.276 0.259 0.237 ...
 $ :'data.frame':       81 obs. of  2 variables:
  ..$ V1: num [1:81] 300 300 300 300 300 ...
  ..$ V2: num [1:81] 0.216 0.204 0.21 0.195 0.192 ...



On Mon, Jan 21, 2013 at 2:19 AM, Jd Devkota <janesh.devkota at gmail.com> wrote:
> Hello All,
>
> I have a data file in a text format and there are two data sets. The data
> set are continuous.
> For each data set there is a header which has the number of data rows and
> the name of data series.
> For example first data set has "6240 Terry Cove-Model". Then the data for
> that series follows upto 6240 rows. Then another data would start and it
> will have the header such as "5200 Terry-Observed"
>
> The sample data would look like:
>
> 6240 Terry Cove-Model
> 300 .300110459327698
> 300.041656494141 .289277672767639
> 300.083343505859 .276237487792969
> 300.125 .258902788162231
> 300.166656494141 .236579895019531
> 300.208343505859 .221315026283264
> 300.25 .214318037033081
> 300.291656494141 .190926909446716
> 300.333343505859 .158144593238831
> 300.375 .113302707672119
> 300.416656494141 .103684902191162
> 300.458343505859 9.72903966903687E-02
> 300.5 8.76948833465576E-02
> 300.541656494141 8.42459201812744E-02
> 300.583343505859 .078397274017334
> 300.625 8.44632387161255E-02
> 300.666656494141 9.32939052581787E-02
> 300.708343505859 .113663911819458
> 300.75 .123064398765564
> 300.791656494141 .157548069953918
> 300.833343505859 .148393034934998
> 300.875 .135645747184753
> 300.916656494141 .137590646743774
> 300.958343505859 .133154153823853
> 301 .131152510643005
> 301.041656494141 .114152908325195
> 301.083343505859 8.04083347320557E-02
> 301.125 5.53587675094604E-02
> 301.166656494141 3.17397117614746E-02
> 301.208343505859 4.07266616821289E-03
> 301.25 -2.15455293655396E-02
> 301.291656494141 -4.07489538192749E-02
> 301.333343505859 -5.85414171218872E-02
> 301.375 -7.53517150878906E-02
> 301.416656494141 -8.49723815917969E-02
> 301.458343505859 -7.91778564453125E-02
> 301.5 -7.02846050262451E-02
> 301.541656494141 -7.24701881408691E-02
> 301.583343505859 -7.76907205581665E-02
> 301.625 -6.82642459869385E-02
>  62401 Terry Cove-Data
> 300 .216407993
> 300.0042 .204216005
> 300.0083 .210311999
> 300.0125 .195071996
> 300.0167 .192023999
> 300.0208 .179831992
> 300.025 .188976001
> 300.0292 .185928004
> 300.0333 .195071996
> 300.0375 .219456009
> 300.0417 .210311999
> 300.0458 .204216005
> 300.05 .195071996
> 300.0542 .188976001
> 300.0583 .195071996
> 300.0625 .195071996
> 300.0667 .185928004
> 300.0708 .173735998
> 300.075 .170688001
> 300.0792 .167640004
> 300.0833 .167640004
> 300.0875 .167640004
> 300.0917 .167640004
> 300.0958 .161543991
> 300.1 .1524
> 300.1042 .158495994
> 300.1083 .149352003
> 300.1125 .158495994
> 300.1167 .1524
> 300.1208 .1524
> 300.125 .149352003
> 300.1292 .143256
> 300.1333 .146303997
> 300.1375 .149352003
> 300.1417 .146303997
> 300.1458 .137159996
> 300.15 .131064002
> 300.1542 .124967999
> 300.1583 .128015996
> 300.1625 .124967999
> 300.1667 .131064002
> 300.1708 .124967999
> 300.175 .124967999
> 300.1792 .134111999
> 300.1833 .118871996
> 300.1875 .128015996
> 300.1917 .131064002
> 300.1958 .128015996
> 300.2 .131064002
> 300.2042 .128015996
> 300.2083 .121920002
> 300.2125 .115823999
> 300.2167 .112776001
> 300.2208 .103632001
> 300.225 .097535998
> 300.2292 .103632001
> 300.2333 .094488001
> 300.2375 .082296003
> 300.2417 .0762
> 300.2458 .079247997
> 300.25 .067056
> 300.2542 .064007998
> 300.2583 .045720002
> 300.2625 .033528
> 300.2667 .036575999
> 300.2708 .036575999
> 300.275 .036575999
> 300.2792 .027432001
> 300.2833 .027432001
> 300.2875 .021336
> 300.2917 .012192
> 300.2958 .009144
> 300.3 .009144
> 300.3042 .003048
> 300.3083 0
> 300.3125 -.003048
> 300.3167 -.006096
> 300.3208 0
> 300.325 .006096
> 300.3292 -.003048
> 300.3333 .006096
>
> The full data set can be downloaded from
> https://www.dropbox.com/s/chhw3vz6ru1godk/Practicedata.Dat
>
> I want to make a comparison graph between modeled and observed. Once I am
> able to read two data sets as two sets of data or combined in one I would
> be able to create the time series graph.
>
> Another thing I need to do is create another sub data set where both the
> series have common data. One data might have more intervals than another.
> After I find two data sets of same interval then I want to plot a
> correlation graph.
>
> I hope I made it clear what I want to do.
>
> Thank you so much.
>
> Best Regards,
> Janesh
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list