[R] strange behavior when reading csv - line wraps
Martin Tomko
martin.tomko at geo.uzh.ch
Sat May 30 10:32:11 CEST 2009
Jim,
the two lines I put in are the actual problematic input lines.
In these examples, there are no quotes nor # signs, although I have no
means to make sure they do not occur in the inputs (any hints how I
could deal with that?).
I am trying to avoid as much pre-processing outside R as possible, and I
have to process about 500 files with up to 3000 records each, so I need
a more or less automated/batch solution. - so any string substitution
will have to occur in R. But for the moment, I do not see a reaason for
substitution, and the wrapping still occurs.
Cheers
Martin
jim holtman wrote:
> You need to supply the actual input line so we can see what is
> happening. Are you sure you do not have unbalanced quotes in your
> input (try quote='') or do you have comment characters ("#") in your
> input?
>
> On Fri, May 29, 2009 at 3:15 PM, Martin Tomko <martin.tomko at geo.uzh.ch
> <mailto:martin.tomko at geo.uzh.ch>> wrote:
>
> Dear All,
> I am observing a strange behavior and searching the archives and
> help pages didn't help much.
> I have a csv with a variable number of fields in each line.
>
> I use
> dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE);
>
> to read it in, and it works. But - some lines are long and 'wrap',
> or split and continue on the next line. So when I check the dim of
> the frame, they are not correct and I can see when I do a printout
> that the lines is split into two in the frame. I checked the input
> file and all is good.
>
> an example of the input is:
> 37;2175168475;13;8.522729;47.19537;16366682 at N00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;
>
> where the last values occurs on the next line in the data frame.
>
> It does not have to be the last value, as in the follwong example,
> the word "kempten" starts the next line:
> 39;167757703;12;10.309295;47.724545;21903142 at N00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;
>
> What could be the reason?
>
> I ws thinking about solving the issue by using a different
> separator, that I would use for the first 7 fields and
> concatenating all of the remaining values into a single stirng
> value, but could not figure out how to do such a substitution in
> R. Unfortunately, on my system I cannot specify a range for sed...
>
> Thanks for any help/pointers
> Martin
>
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
More information about the R-help
mailing list