[R] About size of data frames
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Thu Aug 14 14:01:41 CEST 2025
On 2025-08-14 7:27 a.m., Stefano Sofia via R-help wrote:
> Dear R-list users,
>
> let me ask you a very general question about performance of big data frames.
>
> I deal with semi-hourly meteorological data of about 70 sensors during 28 winter seasons.
>
>
> It means that for each sensor I have 48 data for each day, 181 days for each winter season (182 in case of leap year): 48 * 181 * 28 = 234,576
>
> 234,576 * 70 = 16420320
>
>
> From the computational point of view it is better to deal with a single data frame of approximately 16.5 M rows and 3 columns (one for data, one for sensor code and one for value), with a single data frame of approximately 235,000 rows and 141 rows or 70 different data frames of approximately 235,000 rows and 3 rows? Or it doesn't make any difference?
>
> I personally would prefer the first choice, because it would be easier for me to deal with a single data frame and few columns.
>
It really depends on what computations you're doing. As a general rule,
column operations are faster than row operations. (Also as a general
rule, arrays are faster than dataframes, but are much more limited in
what they can hold: all entries must be the same type, which probably
won't work for your data.)
So I'd guess your 3 column solution would likely be best.
Duncan Murdoch
More information about the R-help
mailing list