[R] problems with function read.table
Petr PIKAL
petr.pikal at precheza.cz
Fri Sep 9 09:23:14 CEST 2011
Hi
>
> Hi,
>
> If you read carefully the help pages for read.table you get this:
>
>
> na.stringsa character vector of strings which are to be interpreted as
> NA<../../utils/help/NA> values.
> Blank fields are also considered to be missing values in logical,
integer,
> numeric and complex fields.
>
> So, both NAs and blank fields are considered as NAs directly by
read.table.
>
> Once you have imported your data, you can modify with any of the string
> manipulation functions (sub() or gsub()) to change your "#DIV/0!" to the
> string "NAs". Another option is to manipulate your Excel file and
consider
> the division by cero with a "IF" and get back a NA if that happens.
The only problem is that in such case all columns which has "#DIV/0!" are
converted to factors and you need to consider changing it back to numeric.
read.* functions accept as na.string definition not only one value but
also vector of values and you can get rid of all non numeric and other
weird Excel values by defining it as a na.strings in read.table call.
> x <- read.delim("clipboard")
> str(x)
'data.frame': 6 obs. of 3 variables:
$ a: int 1 5 9 8 6 3
$ b: int 3 5 7 0 NA 6
$ r: Factor w/ 5 levels "#DIV/0!","0.333333333",..: 2 4 5 1 1 3
> y<-read.delim("clipboard", na.strings=c("NA", "#DIV/0!"))
> str(y)
'data.frame': 6 obs. of 3 variables:
$ a: int 1 5 9 8 6 3
$ b: int 3 5 7 0 NA 6
$ r: num 0.333 1 1.286 NA NA ...
>
Regards
Petr
>
> And finally, instead of using na.omits use option na.rm=T to get done
your
> calculations:
>
> > mean(c(12,23,24,45,67,NA), na.rm=T)[1] 34.2
>
>
>
> Regards,
> Carlos Ortega
> www.qualityexcellence.es
>
> On Thu, Sep 8, 2011 at 4:23 PM, Samir Benzerfa <benzerfa at gmx.ch> wrote:
>
> > Hello everyone
> >
> >
> >
> > I have a couple of questions about the usage of the R function
> > "read.table(.)". My point of departure is that I want to import a
matrix
> > (consisting of time and daily stock returns of many stocks) in R. Most
of
> > the data is numeric, however some values are missing (blanks) and in
other
> > cases I have the character "#DIV/0!" (from excel). My goal is to do
some
> > regression analysis with this matrix. My questions now are the
following
> > ones:
> >
> >
> >
> > 1. How can I in general tell R to automatically replace some
specific
> > numbers or characters in tables by others? (for example to replace all
> > characters "#DIV/0!" by the number 0 or simply "NA")
> >
> > 2. How can I tell R to fill blanks with a number 0 or "NA"?
> >
> > 3. How can I tell R to omit the "NA" fields in the calculations
but
> > not the whole row or column? (I realized that the function "na.omit"
omits
> > the whole row)
> >
> >
> >
> > Many thanks for your help!
> >
> >
> >
> > Sincerely,
> >
> > Samir
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list