[R] Importing Time Series Data for an R Beginner
Cedrick W. Johnson (CJ)
cedrick at cedrickjohnson.com
Thu Mar 11 21:34:37 CET 2010
Hi Clay-
You may want to look at both the XTS package, in addition to 'strptime'
and 'as.POSIXct'
When I get datasets in Excel, what I normally do is change the date
(column) format to YYYY-mm-dd.. But that's due to my own shortcomings
with date formatting in R.
Here's a quick example:
> x = read.csv('TestData.csv')
> x
Subject Date Time Value
1 1 2003-07-23 13:05:00 84
2 1 2003-07-23 13:10:00 87
3 1 2003-07-23 13:15:00 95
4 2 2004-09-25 14:34:00 95
5 2 2004-09-25 14:39:00 81
6 2 2004-09-25 14:44:00 93
7 3 2004-03-02 16:34:00 72
8 3 2004-03-02 16:39:00 67
9 3 2004-03-02 16:44:00 83
dates = as.POSIXct(strptime(paste(x[,2], x[,3], sep=" "),
format="%Y-%m-%d %H:%M:%S"))
> dates
[1] "2003-07-23 13:05:00 EDT" "2003-07-23 13:10:00 EDT" "2003-07-23
13:15:00 EDT"
[4] "2004-09-25 14:34:00 EDT" "2004-09-25 14:39:00 EDT" "2004-09-25
14:44:00 EDT"
[7] "2004-03-02 16:34:00 EST" "2004-03-02 16:39:00 EST" "2004-03-02
16:44:00 EST"
> data = xts(x[,c(1,4)], order.by=dates)
> data
Subject Value
2003-07-23 13:05:00 1 84
2003-07-23 13:10:00 1 87
2003-07-23 13:15:00 1 95
2004-03-02 16:34:00 3 72
2004-03-02 16:39:00 3 67
2004-03-02 16:44:00 3 83
2004-09-25 14:34:00 2 95
2004-09-25 14:39:00 2 81
2004-09-25 14:44:00 2 93
HTH
-cedrick
=============================
Cedrick Johnson
aolim) cedrickjcvgr
www.cedrickjohnson.com
New York - Chicago
On 3/11/2010 3:13 PM, Clay Heaton wrote:
> Hi, I'm trying to learn R for a project I'm working on. I know several programming languages, so I'm comfortable with the syntax. What I can't figure out is how to import the file of time series data that I have and parse it into individual series. The data was given to me in Excel, but I can output it to tab-delimited or csv. I've been able to pull in the entire table with read.table(), but I can't figure out how to parse it into distinct groups.
>
> It looks like this:
>
> Subject Date Time Value
> 1 7/23/03 13:05:00 84
> 1 7/23/03 13:10:00 87
> 1 7/23/03 13:15:00 95
> ....
> 1 9/25/04 14:34:00 95
> 1 9/25/04 14:39:00 81
> 1 9/25/04 14:44:00 93
> ...
> 2 3/02/04 16:34:00 72
> 2 3/02/04 16:39:00 67
> 2 3/02/04 16:44:00 83
> ...
> 2 3/21/05 11:15:00 121
> 2 3/21/05 11:20:00 125
> 2 3/21/05 11:25:00 120
> ...
>
> There are ~ 100,000 rows of data. There are 86 subjects and each of them have multiple traces. For each trace, the times are in uniform increments of 5 minutes. Some subjects have multiple traces, some have a single trace. Some traces include up to 500 values and others only 40.
>
> For now, what I'm looking to do is to be able to generate summary statistics for each trace, and then for each subject. Hence, I need a way to aggregate by value or subject, where the criteria for aggregating traces are that the values were collected on the same day and all are within 5 minutes of each other. I would like to be able to iterate through the data to plot each trace independently.
>
> Any suggestions to help me get started would be appreciated. I'm looking to learn, so I'd appreciate pointers to good tutorials or code examples of dealing with time series data.
>
> Thanks!
> Clay
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list