[R] managing data and removing lines
Christopher W. Ryan
cryan at binghamton.edu
Sat Apr 17 04:45:29 CEST 2010
Tara--
Welcome to R!
Your questions could be answered from a variety of angles. I'd start by
asking, where did the n/a's come from--they were in your text file, I
assume, to represent missing data? If so, when you imported your data
from that text file into R, those n/a's were considered (rightfully) to
be character strings, not numbers, so your columns (variables) ended up
being what R calls factors, not numeric variables.
I used the data you provided, saved it in a csv file called
untitled.csv, the values in each record separated by commas instead of
spaces. Then I did the following (# is the comment symbol)
habitat <- read.csv("untitled.csv", header=TRUE)
head(habitat) # useful command to look at the first few lines of data
str(habitat) # a very useful command to examine structure of an R object
# see that all columns are factors.
R knows what to do with missing data in a csv file. It will turn them
into NA. Which is *not* the same as n/a. In R, NA means "not available."
Sometimes you will also run into NaN, which means "not a number."
For example, I replaced all the n/a's in the text file with . . .
nothingness (that is to say, when a value is missing, there will just be
2 commas in a row), and saved the file as untitled2.csv. Now:
habitat2 <- read.csv("untitled2.csv", header=TRUE)
head(habitat2)
str(habitat2) # all columns are num for numeric
model <- lm(gdist ~ gair, data=habitat2)
Hope this helps get you started.
There's an excellent book by Phil Spector, called Data Manipulation with
R. I'd recommend it very highly.
--Chris Ryan
Tara Imlay wrote:
> Hi,
>
> I am very new to R and I've been trying to work through the R book to gain a
> better idea of the code (which is also completely new to me).
>
> Initially I imputed my data from a text file and that seemed to work ok, but
> I'm trying to examine linear relationships between gdist and gair, gdist and
> gsub, m6dist and m6air, etc.
>
> This didn't work and I think it might have something to do with the n/a's in
> my dataset.
>> habitat
> gdist gair gsub m6dist m6air m6sub m7dist m7air m7sub m8dist m8air m8sub
> 1 20 8 14 -0.5 24 19 7 12.1 16.1 2.5 12 12
> 2 4 13 15 -0.1 24.5 24.5 0.1 11.4 15.1 2 14 16
> 3 30 12.6 16.4 -3 25 26 2.5 9.7 12.8 0.1 11.5 14
> 4 40 12.6 17.9 1 n/a n/a 0.1 8.1 15.2 2 16 20
> 5 40 2 1.8 1 n/a n/a 0.7 10.2 24.1 2 16 19
. . . . . .
>
> Is there anyway to use my old data set with all the n/a's to look at
> relationships between the variables? Ideally I want to add in more habitat
> variables to this analysis, that will include some categorical data and more
> n/a's since the data collection was not complete with every observation.
>
> Any help is appreciated.
>
> Tara
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list