[R] How to extract information from the following dataset?

Mike Marchywka marchywka at hotmail.com
Thu May 12 11:35:48 CEST 2011













----------------------------------------
> Date: Thu, 12 May 2011 10:43:59 +0200
> From: Jose-Marcio.Martins at mines-paristech.fr
> To: xzhan011 at ucr.edu
> CC: r-help at r-project.org
> Subject: Re: [R] How to extract information from the following dataset?
>
> Xin Zhang wrote:
> > Hi all,
> >
> > I have never worked with this kind of data before, so Please help me out
> > with it.
> > I have the following data set, in a csv file, looks like the following:
> >
> > Jan 27, 2010 16:01:24,000 125 - - -
> > Jan 27, 2010 16:06:24,000 125 - - -
> > Jan 27, 2010 16:11:24,000 176 - - -
> > Jan 27, 2010 16:16:25,000 159 - - -
> > Jan 27, 2010 16:21:25,000 142 - - -
> > Jan 27, 2010 16:26:24,000 142 - - -
> > Jan 27, 2010 16:31:24,000 125 - - -
> > Jan 27, 2010 16:36:24,000 125 - - -
> > Jan 27, 2010 16:41:24,000 125 - - -
> > Jan 27, 2010 16:46:24,000 125 - - -
> > Jan 27, 2010 16:51:24,000 125 - - -
> > Jan 27, 2010 16:56:24,000 125 - - -
> > Jan 27, 2010 17:01:24,000 157 - - -
> > Jan 27, 2010 17:06:24,000 172 - - -
> > Jan 27, 2010 17:11:25,000 142 - - -
> > Jan 27, 2010 17:16:24,000 125 - - -
> > Jan 27, 2010 17:21:24,000 125 - - -
> > Jan 27, 2010 17:26:24,000 125 - - -
> > Jan 27, 2010 17:31:24,000 125 - - -
> > Jan 27, 2010 17:36:24,000 125 - - -
> > Jan 27, 2010 17:41:24,000 125 - - -
> > Jan 27, 2010 17:46:24,000 125 - - -
> > Jan 27, 2010 17:51:24,000 125 - - -
> > ......
> >
> > The first few columns are month, day, year, time with OS3 accuracy. And the
> > last number is the measurement I need to extract.
> > I wonder if there is a easy way to just take out the measurements only from
> > a specific day and hour, i.e. if I want measurements from Jan 27 2010
> > 16:--:--
> > then I get 125,125,176,159,142,142,125,125,125,125,125,125.
> > Many thanks!!
>
> The easiest is in the shell, if you're using some flavour of unix :
>
> grep "Jan 27, 2010 16" filein.txt | awk '{print $5}' > fileout.txt
>
> and use fileout which will contain only the column of data you want.
>
Nomrally that is what I do but the R POSIXct features work pretty easily.
I guess I'd use bash text processing commands to put the data into a 
form you like, perhaps "y-mo-day time " and then read it in in as data frame.
Usually I convert everything to "time since epoch began" because I like integers
but there are some facilities here like "round" that work well with date-times.

> dx<-as.POSIXct("2011-04-03 13:14:15")
> dx
[1] "2011-04-03 13:14:15 CDT"
> round(dx,"hour")
[1] "2011-04-03 13:00:00 CDT"
> as.integer(dx)
[1] 1301854455
>

 		 	   		  


More information about the R-help mailing list