[R] Fastest way to repeatedly subset a data frame?
Iestyn Lewis
ilewis at pharm.emory.edu
Fri Apr 20 22:01:04 CEST 2007
Good tip - an Rprof trace over my real data set resulted in a file
filled with:
pmatch [.data.frame [ FUN lapply
pmatch [.data.frame [ FUN lapply
pmatch [.data.frame [ FUN lapply
pmatch [.data.frame [ FUN lapply
pmatch [.data.frame [ FUN lapply
...
with very few other calls in there. pmatch seems to be the string
search function, so I'm guessing there's no hashing going on, or not
very good hashing.
I'll let you know how the environment option works - the Bioconductor
project seems to make extensive use of it, so I'm guessing it's the way
to go.
Iestyn
hadley wickham wrote:
>> But... it's not any faster, which is worrisome to me because it seems
>> like your code uses rownames and would take advantage of the hashing
>> potential of named items.
>
> I'm pretty sure it will use a hash to access the specified rows.
> Before you pursue an environment based solution, you might want to
> profile the code to check that the hashing is actually the slowest
> part - I suspect creating all new data.frames is taking the most time.
>
> Hadley
More information about the R-help
mailing list