[R] [newbie] aggregating table() results and simplifying code with loop
John Kane
jrkrideau at inbox.com
Sun Sep 16 20:24:40 CEST 2012
Hi Davide,
I had some time this afternoon and I wonder if this approach is llkely to get the results you want? As before it is not complete but I think it holds promise.
On the other hand Rui is a much better programer than I am so he may have a much cleaner solution. My way still looks labour-intensive at the moment.
I am using the plyr package which you will probably have to install.
load.packages("plyr") should do it.
==========================================================
# load the plyr package -
library(plyr)
# sample data
T80<- read.csv("/home/john/rdata/sample.csv", header = TRUE, sep = ";")
# Davide's actual read statement
# T80<-read.table(file="C:/sample.txt", header=T, sep=";")
# Looking for Maize
pattern <- c("2Ma", "2Ma","2Ma", "2Ma","2Ma")
# one row examples to see that is happening
T80[1,3:7]
T80[1, 3:7] == pattern
T80[405, 3:7]
T80[405, 3:7] == pattern
T80[55, 3:7] == pattern
# now we apply the patterns to the entire data set.
pp1 <- T80[, 3:7] == pattern
# paste the TRUEs and FALSEs together to form a single variable
concatdat <- paste(pp1[, 1], pp1[, 2], pp1[, 3], pp1[, 4],pp1[,5] , sep = "+")
# Assmble new data frame.
maizedata <- data.frame(T80$WS, concatdat)
names(maizedata) <- c("WS", "crop_pattern")
mzcount <- ddply(maizedata, .(WS, crop_pattern), summarize, count = length(crop_pattern))
mzcount # This is all the data not just the relevant maise patterns
# This seems to be getting us somewhere though we are not not there yet
# Does this subset look like we are going in the right direction?
m51 <- subset(mzcount,
mzcount$crop_pattern == "FALSE+FALSE+FALSE+FALSE+TRUE"
| mzcount$crop_pattern == "TRUE+FALSE+FALSE+FALSE+FALSE")
m51 <- ddply(m51, .(WS), summarize, count = sum(count))
m51
=================================================================
John Kane
Kingston ON Canada
> -----Original Message-----
> From: ridavide at gmail.com
> Sent: Sat, 15 Sep 2012 19:00:29 +0200
> To: jrkrideau at inbox.com, ruipbarradas at sapo.pt
> Subject: Re: [R] [newbie] aggregating table() results and simplifying
> code with loop
>
> Thanks Rui, thanks John for your very different solutions.
>
> I'll try to break my questions into smaller steps following your tips.
> However, not everything is clear for me... so before giving you a
> feed-back I need to study further your answers. For the moment I could
> specify that I'm looking for the following 19 patterns:
>
> 1. True, False, False, False, False # return period of 5 years (1/2)
> 2. False, False, False, False, True # return period of 5 years (2/2)
> 3. True, False, False, False, True # return period of 4 years (1/3)
> 4. False, True, False, False, False # return period of 4 years (2/3)
> 5. False, False, False, True, False # return period of 4 years (3/3)
> 6. True, False, False, True, False # return period of 3 years (1/3)
> 7. False, True, False, False, True # return period of 3 years (2/3)
> 8. False, False, True, False, False # return period of 3 years (3/3)
> 9. False, True, False, True, False # return period of 2 years (1/2)
> 10. True, False, True, False, True # return period of 2 years (1/2)
> 11. True, True, True, True, True # mono-succession of 5 years
> 12. False, True, True, True, True # mono-succession of 4 years (1/2)
> 13. True, True, True, True, False # mono-succession of 4 years (2/2)
> 14. True, False, True, True, True # mono-succession of 3 years (1/5)
> 15. True. True. True. False, True # mono-succession of 3 years (2/5)
> 16. False, False, True, True, True # mono-succession of 3 years (3/5)
> 17. True, True, True, False, False # mono-succession of 3 years (4/5)
> 18. False, True, True, True, False # mono-succession of 3 years (5/5)
> 19. True, True, False, True, True # crops repeated two years
>
> In particular, I want to apply all these 19 patterns to 7 (out of 11)
> land covers: 2BC, 2Co, 2Ma, 2We, 2MG, 2ML, 2PG. The pattern are so
> structured: True means presence of a given land cover (iteratively,
> one of the seven listed above), False means any other land-cover
> (amidst the remainder 10).
>
> Thanks again for any further help.
> Greetings,
> Dd
>
> ***********************************************************
> Davide Rizzo
> website :: http://sites.google.com/site/ridavide/
>
>
> On Sat, Sep 15, 2012 at 5:51 PM, John Kane <jrkrideau at inbox.com> wrote:
>> I have not seen any replies to your questions so I will suggest an
>> approach that may work if I can get a function to work.
>>
>> If I understand what you want, you have a pattern something like this:
>> pattern1 <- c("2Ma", "no2Ma","no2Ma", "no2Ma","no2Ma")
>> pattern2 <- c("no2Ma", 'no2Ma', "no2Ma", "no2Ma", "2Ma")
>>
>> for each five year period where 2Ma stands to Maize, one of 11 different
>> grains
>> 1AU 2BC 2Co 2Ma 2MG 2ML 2oc 2PG 2SA 2We 3sN
>>
>> and what you want to know is if each year gives a pattern like
>>
>> check1 <- c(TRUE, FALSE, FALSE, FALSE, FALSE)
>> check2 <- c(FALSE, FALSE, FALSE, FALSE, TRUE)
>>
>> If I understand the patterns you only care for the two above, is that
>> correct?
>>
>> I am running out of time today but I think that this approach will get
>> you started
>> ===========================================================
>>
>> T80<-read.table(file="C:/sample.txt", header=T, sep=";")
>>
>> # Reminder of just what we want to get as a final result.
>> check1 <- c(TRUE, FALSE, FALSE, FALSE, FALSE)
>> check2 <- c(FALSE, FALSE, FALSE, FALSE, TRUE)
>>
>> pattern1 <- c("2Ma", "2Ma","2Ma", "2Ma","2Ma")
>>
>> # one row examples to see that is happening
>> T80[1,3:7]
>> T80[1, 3:7] == pattern1
>>
>> T80[405, 3:7]
>> T80[405, 3:7] == pattern1
>>
>> # now we apply the patterns to the entire data set.
>> pp1 <- T80[, 3:7] == pattern1
>> pp2 <- T80[, 3:7] == pattern2
>>
>> # reassign the WS values so we know where the data is from
>> WSnames <- rep(T80$WS, 2)
>>
>> # Assmble new data frame.
>> maizedata <- data.frame(WSnames, rbind(pp1,pp2))
>> ========================================================
>>
>> Now, assuming this runs for you and I have not made a serious mistake in
>> logic, kyou should be able to do some subsetting (?subset) to extract
>> only the
>> check1 and check2 patterns above.
>>
>> This is where I ran into trouble as I don't have the time this morning
>> to work out the subsetting conditions. It looks tricking and you
>> probably need a couple of subsetting moves.
>>
>> It's not a pretty solutlion and, particularly, I expect someone could
>> clean it up to make the subsetting easier or even unnecessary but I hope
>> it helps.
>>
>> Once you have extracted what you want use apply() or perhaps the plyr
>> package to aggregate the results.
>>
>> Repeat for all grains. Actually look into setting the whole thing up as
>> a function. You should be able to write the program once as a function
>> and do a loop or an apply() to do all 11 grains in one go.
>>
>> Best of luck.
>>
>> John Kane
>> Kingston ON Canada
>>
>>
>>> -----Original Message-----
>>> From: ridavide at gmail.com
>>> Sent: Thu, 13 Sep 2012 15:36:28 +0200
>>> To: r-help at r-project.org
>>> Subject: [R] [newbie] aggregating table() results and simplifying code
>>> with loop
>>>
>>> Dear all,
>>> I'm looking for primary help at aggregating table() results and at
>>> writing a loop (if useful)
>>>
>>> My dataset ( http://goo.gl/gEPKW ) is composed of 23k rows, each one
>>> representing a point in the space of which we know the land cover over
>>> 10 years (column y01 to y10).
>>>
>>> I need to analyse it with a temporal sliding window of 5 years (y01 to
>>> y05, y02 to y06 and so forth)
>>> For each period I'm looking for specific sequences (e.g., Maize,
>>> -noMaize, -noMaize, -noMaize, -noMaize) to calculate the "return time"
>>> of principal land covers: barley (2BC), colza (2Co), maize (2Ma), etc.
>>> I define the "return time" as the presence of a given land cover
>>> according to a given sequence. Hence, each return time could require
>>> the sum of different sequences (e.g., a return time of 5 years derives
>>> from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
>>> [no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
>>> I need to repeat the calculation for each land cover for each time
>>> window. In addition, I need to repeat the process over three datasets
>>> (the one I give is the first one, the second one is from year 12 to
>>> year 24, the third one from year 27 to year 31. So I have breaks in
>>> the monitoring of land cover that avoid me to create a continuous
>>> dataset). At the end I expect to aggregate the sum for each spatial
>>> entity (column WS)
>>>
>>> I've started writing the code for the first crop in the first 5yrs
>>> period (http://goo.gl/FhZNx) then copying and pasting it for each crop
>>> then for each time window...
>>> Moreover I do not know how to aggregate the results of table(). (NB
>>> sometimes I have a different number of WS per table because a given
>>> sequence could be absent in a given spatial entity... so I have the
>>> following warning msg: number of columns of result is not a multiple
>>> of vector length (arg 1)). Therefore, I'm "obliged" to copy&paste the
>>> table corresponding to each sequence....
>>>
>>> FIRST QUEST. How to aggregate the results of table() when the number
>>> of columns is different?
>>> Or the other way around: Is there a way to have a table where each row
>>> reports the number of points per time return per WS? something like
>>>
>>> WS1 WS2 WS3 WS4 ... WS16 crop period
>>> 23 15 18 43 ... 52 Ma5 01
>>> 18 11 25 84 ... 105 Ma2 01
>>> ... ... ... ... ... ... ... ...
>>> ... ... ... ... ... ... Co5 01
>>> ... ... ... ... ... ... ... ...
>>> ... ... ... ... ... ... Ma5 02
>>> ... ... ... ... ... ... ... ...
>>> In this table each row should represent a return time for a given land
>>> cover a given period (one of the 6 time window of 5 years)?
>>>
>>> SECOND QUEST. Could a loop (instead of a modular copy/paste code)
>>> improve the time/reliability of the calculation? If yes, could you
>>> please indicate me some entry-level references to write it?
>>>
>>> I am aware this are newbie's questions, but I have not be able to
>>> solve them using manuals and available sources.
>>> Thank you in advance for your help.
>>>
>>> Greetings,
>>> Dd
>>>
>>> PS
>>> R: version 2.14.2 (2012-02-29)
>>> OS: MS Windows XP Home 32-bit SP3
>>>
>>> *****************************
>>> Davide Rizzo
>>> post-doc researcher
>>> INRA UR055 SAD-ASTER
>>> website :: http://sites.google.com/site/ridavide/
>>
>> ____________________________________________________________
>> GET FREE 5GB EMAIL - Check out spam free email with many cool features!
>> Visit http://www.inbox.com/email to find out more!
>>
>>
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
More information about the R-help
mailing list