[R] Right input mechanism to R for high amount of data

Sawhney, Prerna (Nokia - IN/Bangalore) prerna.sawhney at nokia.com
Mon Jun 27 07:48:16 CEST 2016


Hi All,

I am currently loading 3B (20GB) events in my algorithm for processing. I am reading this data from postgresXL DB cluster (1 coordinator+4 datanodes (8cpu 61GB 200GB machines each)) total 1TB of space.

The whole data loading is taking too much time almost 5days before I can start running my algorithms.

Can you please help me in suggesting right technology to choose for inputting data? So clearly DB is the bottleneck right now

Should I move away from postgresXL ? Which is most suitable options DB, File, Paraquet File to load data efficiently in R?

Look forward to your responses

Thanks
Prerna

	[[alternative HTML version deleted]]



More information about the R-help mailing list