[R] About size of data frames

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Thu Aug 14 14:01:41 CEST 2025


On 2025-08-14 7:27 a.m., Stefano Sofia via R-help wrote:
> Dear R-list users,
> 
> let me ask you a very general question about performance of big data frames.
> 
> I deal with semi-hourly meteorological data of about 70 sensors during 28 winter seasons.
> 
> 
> It means that for each sensor I have 48 data for each day, 181 days for each winter season (182 in case of leap year): 48 * 181 * 28 = 234,576
> 
> 234,576 * 70 = 16420320
> 
> 
>  From the computational point of view it is better to deal with a single data frame of approximately 16.5 M rows and 3 columns (one for data, one for sensor code and one for value), with a single data frame of approximately 235,000 rows and 141 rows or 70 different data frames of approximately 235,000 rows and 3 rows? Or it doesn't make any difference?
> 
> I personally would prefer the first choice, because it would be easier for me to deal with a single data frame and few columns.
> 

It really depends on what computations you're doing.  As a general rule, 
column operations are faster than row operations.  (Also as a general 
rule, arrays are faster than dataframes, but are much more limited in 
what they can hold:  all entries must be the same type, which probably 
won't work for your data.)

So I'd guess your 3 column solution would likely be best.

Duncan Murdoch



More information about the R-help mailing list