[R] managing data and removing lines

Christopher W. Ryan cryan at binghamton.edu
Sat Apr 17 04:45:29 CEST 2010


Tara--

Welcome to R!

Your questions could be answered from a variety of angles. I'd start by 
asking, where did the n/a's come from--they were in your text file, I 
assume, to represent missing data?  If so, when you imported your data 
from that text file into R, those n/a's were considered (rightfully) to 
be character strings, not numbers, so your columns (variables) ended up 
being what R calls factors, not numeric variables.

I used the data you provided, saved it in a csv file called 
untitled.csv, the values in each record separated by commas instead of 
spaces.  Then I did the following (# is the comment symbol)

habitat <- read.csv("untitled.csv", header=TRUE)
head(habitat) # useful command to look at the first few lines of data
str(habitat) # a very useful command to examine structure of an R object
# see that all columns are factors.

R knows what to do with missing data in a csv file. It will turn them 
into NA. Which is *not* the same as n/a. In R, NA means "not available." 
Sometimes you will also run into NaN, which means "not a number."

For example, I replaced all the n/a's in the text file with . . . 
nothingness (that is to say, when a value is missing, there will just be 
2 commas in a row), and saved the file as untitled2.csv.  Now:

habitat2 <- read.csv("untitled2.csv", header=TRUE)
head(habitat2)
str(habitat2) # all columns are num for numeric
model <- lm(gdist ~ gair, data=habitat2)

Hope this helps get you started.

There's an excellent book by Phil Spector, called Data Manipulation with 
R. I'd recommend it very highly.

--Chris Ryan


Tara Imlay wrote:
> Hi,
> 
> I am very new to R and I've been trying to work through the R book to gain a
> better idea of the code (which is also completely new to me).
> 
> Initially I imputed my data from a text file and that seemed to work ok, but
> I'm trying to examine linear relationships between gdist and gair, gdist and
> gsub, m6dist and m6air, etc.
> 
> This didn't work and I think it might have something to do with the n/a's in
> my dataset.
>> habitat
>     gdist gair gsub m6dist m6air m6sub m7dist m7air m7sub m8dist m8air m8sub
> 1      20    8   14   -0.5    24    19      7  12.1  16.1    2.5    12    12
> 2       4   13   15   -0.1  24.5  24.5    0.1  11.4  15.1      2    14    16
> 3      30 12.6 16.4     -3    25    26    2.5   9.7  12.8    0.1  11.5    14
> 4      40 12.6 17.9      1   n/a   n/a    0.1   8.1  15.2      2    16    20
> 5      40    2  1.8      1   n/a   n/a    0.7  10.2  24.1      2    16    19
. . . . . .
> 
> Is there anyway to use my old data set with all the n/a's to look at
> relationships between the variables?  Ideally I want to add in more habitat
> variables to this analysis, that will include some categorical data and more
> n/a's since the data collection was not complete with every observation.
> 
> Any help is appreciated.
> 
> Tara
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list