[R] data format
arun
smartpink111 at yahoo.com
Tue Feb 19 16:22:20 CET 2013
Hi,
Try this:
el<- read.csv("el.csv",header=TRUE,sep="\t",stringsAsFactors=FALSE)
elsplit<- split(el,el$st)
datetrial<-data.frame(date1=seq.Date(as.Date("1930.1.1",format="%Y.%m.%d"),as.Date("2010.12.31",format="%Y.%m.%d"),by="day"))
elsplit1<- lapply(elsplit,function(x) data.frame(date1=as.Date(paste(x[,2],x[,3],x[,4],sep="-"),format="%Y-%m-%d"),discharge=x[,5]))
elsplit2<-lapply(elsplit1,function(x) x[order(x[,1]),])
library(plyr)
elsplit3<-lapply(elsplit2,function(x) join(datetrial,x,by="date1",type="full"))
elsplit4<-lapply(elsplit3,function(x) {x[,2][is.na(x[,2])]<- "-9999.000";x})
elsplit5<-lapply(elsplit4,function(x) {x[,1]<-format(x[,1],"%Y.%m.%d");x})
elsplit6<-lapply(elsplit5,function(x){substr(x[,1],6,6)<-ifelse(substr(x[,1],6,6)==0," ",substr(x[,1],6,6));substr(x[,1],9,9)<- ifelse(substr(x[,1],9,9)==0," ",substr(x[,1],9,9));x})
elsplit6[[1]][1:4,]
# date1 discharge
#1 1930. 1. 1 -9999.000
#2 1930. 1. 2 -9999.000
#3 1930. 1. 3 -9999.000
#4 1930. 1. 4 -9999.000
length(elsplit6)
#[1] 124
tail(elsplit6[[124]],25)
# date1 discharge
#29561 2010.12. 7 -9999.000
#29562 2010.12. 8 -9999.000
#29563 2010.12. 9 -9999.000
#29564 2010.12.10 -9999.000
#29565 2010.12.11 -9999.000
#29566 2010.12.12 -9999.000
#29567 2010.12.13 -9999.000
#29568 2010.12.14 -9999.000
#29569 2010.12.15 -9999.000
#29570 2010.12.16 -9999.000
#29571 2010.12.17 -9999.000
#29572 2010.12.18 -9999.000
#29573 2010.12.19 -9999.000
#29574 2010.12.20 -9999.000
#29575 2010.12.21 -9999.000
#29576 2010.12.22 -9999.000
#29577 2010.12.23 -9999.000
#29578 2010.12.24 -9999.000
#29579 2010.12.25 -9999.000
#29580 2010.12.26 -9999.000
#29581 2010.12.27 -9999.000
#29582 2010.12.28 -9999.000
#29583 2010.12.29 -9999.000
#29584 2010.12.30 -9999.000
#29585 2010.12.31 -9999.000
str(head(elsplit6,3))
#List of 3
# $ AGOMO:'data.frame': 29585 obs. of 2 variables:
# ..$ date1 : chr [1:29585] "1930. 1. 1" "1930. 1. 2" "1930. 1. 3" "1930. 1. 4" ...
#..$ discharge: chr [1:29585] "-9999.000" "-9999.000" "-9999.000" "-9999.000" ...
#$ AGONO:'data.frame': 29585 obs. of 2 variables:
#..$ date1 : chr [1:29585] "1930. 1. 1" "1930. 1. 2" "1930. 1. 3" "1930. 1. 4" ...
#..$ discharge: chr [1:29585] "-9999.000" "-9999.000" "-9999.000" "-9999.000" ...
#$ ANZMA:'data.frame': 29585 obs. of 2 variables:
#..$ date1 : chr [1:29585] "1930. 1. 1" "1930. 1. 2" "1930. 1. 3" "1930. 1. 4" ...
#..$ discharge: chr [1:29585] "-9999.000" "-9999.000" "-9999.000" "-9999.000" ...
Regarding the space between date1 and discharge, I haven't checked it as you didn't mention whether it is needed in data.frame or not.
A.K.
________________________________
From: eliza botto <eliza_botto at hotmail.com>
To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
Sent: Tuesday, February 19, 2013 10:01 AM
Subject: RE:
THANKS ARUN..
ITS A CHARACTER....
SORRY FOR NOT TELLING YOU IN ADVANCE
ELISA
> Date: Tue, 19 Feb 2013 07:00:03 -0800
> From: smartpink111 at yahoo.com
> Subject: Re:
> To: eliza_botto at hotmail.com
>
>
>
> Hi,
> One more doubt.
> You mentioned about -9999.000. Is it going to be a number or character like "-9999.000"? If it is a number, the final product will be -9999.
> Arun
>
>
>
>
> ________________________________
> From: eliza botto <eliza_botto at hotmail.com>
> To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
> Sent: Tuesday, February 19, 2013 9:16 AM
> Subject: RE:
>
>
>
> How can u be wrong arun?? you are right.....
>
> elisa
>
>
> > Date: Tue, 19 Feb 2013 06:15:31 -0800
> > From: smartpink111 at yahoo.com
> > Subject: Re:
> > To: eliza_botto at hotmail.com
> >
> > Hi Elisa,
> >
> > Just a doubt regarding the format of the date. Is it the same format as the previous one? 0 replaced by one space if either month or day is less than 10. Also, if I am correct, the list elements are for the different stationname, right?
> > Arun
> >
> >
> >
> >
> >
> >
> >
> >
> > ________________________________
> > From: eliza botto <eliza_botto at hotmail.com>
> > To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
> > Sent: Tuesday, February 19, 2013 8:35 AM
> > Subject:
> >
> >
> >
> >
> >
> > Dear Arun,
> > [Text file is also attached if format is changed, where as el is data file
> > Attached with email is the excel file with contains the data. the data is following form
> >
> > col1. col2. col3.col4.col5.
> > stationname year month day discharge
> > A 2004 11232
> > A 2004 1 2 334
> > .............................
> > ........................
> > B 2009 11 323
> > B 2009 12332
> >
> >
> > There are stations where data starts from and ends at different years but i want each year to start from 1930 and ends at 2010 with -9999.000 for those days when data is missing. i want to make a list which should appear like the following
> >
> > [[A]]
> > 1930. 1. 1 -9999.000
> > 1930. 1. 2 -9999.000
> > 1930. 1. 3 -9999.000
> > 1930. 1. 4 -9999.000
> > 1930. 1. 5 -9999.000
> > 1930. 1. 6 -9999.000
> > 1930. 1. 7 -9999.000
> > 1930. 1. 8 -9999.000
> > 1930. 1. 9 -9999.000
> > 1930. 1.10 -9999.000
> > 1930. 1.11 -9999.000
> > 1930. 1.12 -9999.000
> > 1930. 1.13 -9999.000
> > ....................
> > ....................
> > ....................
> > 2004. 1. 1 232.0
> > 2004. 1. 2 334.0
> > ..................
> > ..................
> > 2004.12. 1 113.56
> > ....
> > ...
> > 2004.12.31 114.56
> >
> > [[B]]
> > 1930. 1. 1 -9999.000
> > 1930. 1. 2 -9999.000
> > 1930. 1. 3 -9999.000
> > 1930. 1. 4 -9999.000
> > 1930. 1. 5 -9999.000
> > 1930. 1. 6 -9999.000
> > 1930. 1. 7 -9999.000
> > 1930. 1. 8 -9999.000
> > 1930. 1. 9 -9999.000
> > 1930. 1.10 -9999.000
> > 1930. 1.11 -9999.000
> > 1930. 1.12 -9999.000
> > 1930. 1.13 -9999.000
> > ....................
> > ....................
> > ....................
> > 2007. 1. 1 23.0
> > 2007. 1. 2 33.0
> > ..................
> > ..................
> > 2007.12. 1 13.56
> > ....
> > ...
> > 2007.12.31 4.56
> >
> >
> > Alongside the usual format of starting and ending....... There are stations like "BRRSD", where data is for the years 2001, 2002, 2009 and 2010, i want -9999.000 to be inserted for each day of 2003, 2004, 2005, 2006, 2007, 2008 as data is not avaliable for them.
> > The date format should be the way written above. just one request would be to not share my data file on R forum.
> >
> > thankyou so very much in advance
> >
> > elisa
More information about the R-help
mailing list