[R] Create new data frame with conditional sums
Jeff Newmiller
jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sat Oct 14 19:01:30 CEST 2023
Pre-compute the per-interval answers and use findInterval to look up the per-row answers...
dat <- read.table( text=
"Tract Pct Totpop
1 0.05 4000
2 0.03 3500
3 0.01 4500
4 0.12 4100
5 0.21 3900
6 0.04 4250
7 0.07 5100
8 0.09 4700
9 0.06 4950
10 0.03 4800
", header=TRUE )
dat2 <- aggregate( Totpop ~ Pct, dat, FUN = sum )
dat2$TotpopSum <- rev( cumsum( rev( dat2$Totpop ) ) )
Cutoff <- seq( 0, .15, .01 )
ans <- data.frame(
Cutoff = Cutoff
, Pop = dat2$TotpopSum[
findInterval(
Cutoff
, c( -Inf, dat2$Pct )
, left.open = TRUE
)
]
)
ans
On October 14, 2023 8:10:56 AM PDT, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>Well, here's one way to do it:
>(dat is your example data frame)
>
>Cutoff <- seq(0, .15, .01)
>Pop <- with(dat, sapply(Cutoff, \(p)sum(Totpop[Pct >= p])))
>
>I think there must be a more efficient way to do it with cumsum(), though.
>
>Cheers,
>Bert
>
>On Sat, Oct 14, 2023 at 12:53 AM Jason Stout, M.D. <jason.stout using duke.edu> wrote:
>>
>> This seems like it should be simple but I can't get it to work properly. I'm starting with a data frame like this:
>>
>> Tract Pct Totpop
>> 1 0.05 4000
>> 2 0.03 3500
>> 3 0.01 4500
>> 4 0.12 4100
>> 5 0.21 3900
>> 6 0.04 4250
>> 7 0.07 5100
>> 8 0.09 4700
>> 9 0.06 4950
>> 10 0.03 4800
>>
>> And I want to end up with a data frame with two columns, a "Cutoff" column that is a simple sequence of equally spaced cutoffs (let's say in this case from 0-0.15 by 0.01) and a "Pop" column which equals the sum of "Totpop" in the prior data frame in which "Pct" is greater than or equal to "cutoff." So in this toy example, this is what I want for a result:
>>
>> Cutoff Pop
>> 1 0.00 43800
>> 2 0.01 43800
>> 3 0.02 39300
>> 4 0.03 39300
>> 5 0.04 31000
>> 6 0.05 26750
>> 7 0.06 22750
>> 8 0.07 17800
>> 9 0.08 12700
>> 10 0.09 12700
>> 11 0.10 8000
>> 12 0.11 8000
>> 13 0.12 8000
>> 14 0.13 3900
>> 15 0.14 3900
>> 16 0.15 3900
>>
>> I can do this with a for loop but it seems there should be an easier, vectorized way that would be more efficient. Here is a reproducible example:
>>
>> dummydata<-data.frame(Tract=seq(1,10,by=1),Pct=c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03),Totpop=c(4000,3500,4500,4100,
>> 3900,4250,5100,4700,
>> 4950,4800))
>> dfrm<-data.frame(matrix(ncol=2,nrow=0,dimnames=list(NULL,c("Cutoff","Pop"))))
>> for (i in seq(0,0.15,by=0.01)) {
>> temp<-sum(dummydata[dummydata$Pct>=i,"Totpop"])
>> dfrm[nrow(dfrm)+1,]<-c(i,temp)
>> }
>>
>> Jason Stout, MD, MHS
>> Division of Infectious Diseases
>> Dept of Medicine
>> Duke University
>> Box 102359-DUMC
>> Durham, NC 27710
>> FAX 919-681-7494
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
Sent from my phone. Please excuse my brevity.
More information about the R-help
mailing list