[R] Data reshaping with conditions
Jim Lemon
drjimlemon at gmail.com
Thu Apr 21 06:53:03 CEST 2016
Hi sri,
I think that I see what you mean. Your statements:
x = Count_A_less_than_max of (Count type B)
y = Count_A_higher_than_max of (Count type B).
I took to mean that you wanted a logical value for x and y. Looking
more closely at your initial message, I see that you wanted _all_
values of A with respect to maxB in x and y. The error with maximum
values was due to a typo. Perhaps this will do what you want:
svdat<-read.table(text="Count id name type
117 335 sally A
19 335 sally A
167 335 sally B
18 340 susan A
56 340 susan A
22 340 susan B
53 340 susan B
135 351 lee A
114 351 lee A
84 351 lee A
80 351 lee A
19 351 lee A
8 351 lee A
21 351 lee A
88 351 lee B
111 351 lee B
46 351 lee B
108 351 lee B",header=TRUE)
# you can also do this with other reshape functions
library(prettyR)
svdatstr<-stretch_df(svdat,"id",c("Count","type"))
count_ind<-grep("Count",names(svdatstr))
type_ind<-grep("type",names(svdatstr))
svdatstr$maxA<-NA
svdatstr$maxB<-NA
svdatstr$x<-NA
svdatstr$y<-NA
for(row in 1:nrow(svdatstr)) {
indicesA<-count_ind[as.logical(match(svdatstr[row,type_ind],"A",0))]
svdatstr[row,"maxA"]<-max(svdatstr[row,indicesA])
indicesB<-count_ind[as.logical(match(svdatstr[row,type_ind],"B",0))]
svdatstr[row,"maxB"]<-max(svdatstr[row,indicesB])
AltB<-svdatstr[row,indicesA][svdatstr[row,indicesA]<svdatstr[row,"maxB"]]
svdatstr[row,"x"]<-paste(AltB,collapse=",")
AgeB<-svdatstr[row,indicesA][svdatstr[row,indicesA]>=svdatstr[row,"maxB"]]
svdatstr[row,"y"]<-paste(AgeB,collapse=",")
}
svdatstr[,c("id","name","maxB","x","y")]
Jim
On Thu, Apr 21, 2016 at 2:23 PM, sri vathsan <srivibish at gmail.com> wrote:
> Hi Jim,
>
> Thanks for your time. But somehow this code did not help me to achieve my
> expected output.
> Problems: 1) x, y are coming as logical rather than values as I mentioned in
> my post
> 2) The values that I get for Max A and Max B not correct
> 3) It looks like a pretty big data, but I just need to
> concatenate the values with a comma, the final output will be a character
> variable.
>
> Regards,
> Sri
>
> On Thu, Apr 21, 2016 at 4:52 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
>>
>> Hi sri,
>> As your problem involves a few logical steps, I found it easier to
>> approach it in a stepwise way. Perhaps there are more elegant ways to
>> accomplish this.
>>
>> svdat<-read.table(text="Count id name type
>> 117 335 sally A
>> 19 335 sally A
>> 167 335 sally B
>> 18 340 susan A
>> 56 340 susan A
>> 22 340 susan B
>> 53 340 susan B
>> 135 351 lee A
>> 114 351 lee A
>> 84 351 lee A
>> 80 351 lee A
>> 19 351 lee A
>> 8 351 lee A
>> 21 351 lee A
>> 88 351 lee B
>> 111 351 lee B
>> 46 351 lee B
>> 108 351 lee B",header=TRUE)
>> # you can also do this with other reshape functions
>> library(prettyR)
>> svdatstr<-stretch_df(svdat,"id",c("Count","type"))
>> count_ind<-grep("Count",names(svdatstr))
>> type_ind<-grep("type",names(svdatstr))
>> svdatstr$maxA<-NA
>> svdatstr$maxB<-NA
>> svdatstr$x<-NA
>> svdatstr$y<-NA
>> for(row in 1:nrow(svdatstr)) {
>> svdatstr[row,"maxA"]<-
>>
>> max(svdatstr[row,count_ind[as.logical(match(svdatstr[1,type_ind],"A",0))]])
>> svdatstr[row,"maxB"]<-
>>
>> max(svdatstr[row,count_ind[as.logical(match(svdatstr[1,type_ind],"B",0))]])
>> svdatstr[row,"x"]<-svdatstr[row,"maxA"] < svdatstr[row,"maxB"]
>> svdatstr[row,"y"]<-!svdatstr[row,"x"]
>> }
>> svdatstr
>>
>> You can then just extract the columns that you need.
>>
>> Jim
>>
>>
>> On Wed, Apr 20, 2016 at 3:03 PM, sri vathsan <srivibish at gmail.com> wrote:
>> > Dear All,
>> >
>> > I am trying to reshape the data with some conditions. A small part of
>> > the
>> > data looks like below. Like this there will be more data with repeating
>> > ID.
>> >
>> > Count id name type
>> > 117 335 sally A
>> > 19 335 sally A
>> > 167 335 sally B
>> > 18 340 susan A
>> > 56 340 susan A
>> > 22 340 susan B
>> > 53 340 susan B
>> > 135 351 lee A
>> > 114 351 lee A
>> > 84 351 lee A
>> > 80 351 lee A
>> > 19 351 lee A
>> > 8 351 lee A
>> > 21 351 lee A
>> > 88 351 lee B
>> > 111 351 lee B
>> > 46 351 lee B
>> > 108 351 lee B
>> >
>> > >From the above data I am expecting an output like below.
>> >
>> > id name type count_of_B Max of count B x y
>> > 335 sally B 167 167 117,19 NA
>> > 340 susan B 22,53 53 18 56
>> > 351 lee B 88,111,46,108 111 84,80,19,8,2 135,114
>> >
>> > Where, the column x and column y are:
>> >
>> > x = Count_A_less_than_max of (Count type B)
>> > y = Count_A_higher_than_max of (Count type B).
>> >
>> > *1)* I tried with dplyr with the following code for the initial step to
>> > get
>> > the values for each column.
>> > *2)* I thought to transpose the columns which has the unique ID alone.
>> >
>> > I tried with the following code and I am struck with the intial step
>> > itself. The code is executed but higher and lower value of A is not
>> > coming.
>> >
>> > Expected_output= data %>%
>> > group_by(id, Type) %>%
>> > mutate(Count_of_B = paste(unlist(count[Type=="B"]), collapse =
>> > ","))%>%
>> > mutate(Max_of_count_B = ifelse(Type == "B", max(count[Type ==
>> > "B"]),max(count[Type == "A"]))) %>%
>> > mutate(count_type_A_lesser = ifelse
>> > (Type=="B",(paste(unlist(count[Type=="A"]) < Max_of_count_B[Type=="B"],
>> > collapse = ",")), "NA"))%>%
>> > mutate(count_type_A_higher =
>> > ifelse(Type=="B",(paste(unlist(count[Type=="A"]) >
>> > Max_of_count_B[Type=="B"], collapse = ",")), "NA"))
>> >
>> > I hope I make my point clear. Please bare with the code, as I am new to
>> > this.
>> >
>> > Regards,
>> > sri
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
>
> Regards,
> Srivathsan.K
> Phone : 9600165206
More information about the R-help
mailing list