[R] Exceptional slowness with read.csv
Dave Dixon
dd|xon @end|ng |rom @wcp@com
Mon Apr 8 07:47:52 CEST 2024
Greetings,
I have a csv file of 76 fields and about 4 million records. I know that
some of the records have errors - unmatched quotes, specifically.
Reading the file with readLines and parsing the lines with read.csv(text
= ...) is really slow. I know that the first 2459465 records are good.
So I try this:
> startTime <- Sys.time()
> first_records <- read.csv(file_name, nrows = 2459465)
> endTime <- Sys.time()
> cat("elapsed time = ", endTime - startTime, "\n")
elapsed time = 24.12598
> startTime <- Sys.time()
> second_records <- read.csv(file_name, skip = 2459465, nrows = 5)
> endTime <- Sys.time()
> cat("elapsed time = ", endTime - startTime, "\n")
This appears to never finish. I have been waiting over 20 minutes.
So why would (skip = 2459465, nrows = 5) take orders of magnitude longer
than (nrows = 2459465) ?
Thanks!
-dave
PS: readLines(n=2459470) takes 10.42731 seconds.
More information about the R-help
mailing list