[R] re placing <NA> in character column
David Winsemius
dwinsemius at comcast.net
Thu Oct 29 00:06:24 CET 2009
On Oct 28, 2009, at 6:22 PM, Mark W. Miller wrote:
>
>
>
>
> I am guessing that your <NA> is not really an NA_character, but
> rather a factor with a level of "<NA>".
>
> See if str(s1) confirms my suspicions.
>
>> df1 <- read.table(textConnection(" Firstname Lastname Age
> + 1 Bob Smith 20
> + 2 John Clark NA
> + 3 Andy <NA> 40"), header=T)
>> levels(df1$Lastname)
> [1] "<NA>" "Clark" "Smith"
>> levels(df1$Lastname)[1] <- "qqqq"
>> levels(df1$Lastname)
> [1] "qqqq" "Clark" "Smith"
>> df1
> Firstname Lastname Age
> 1 Bob Smith 20
> 2 John Clark NA
> 3 Andy qqqq 40
>
> --
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>
>
>
>
> Dr. Winsemius, I just tried your suggestion of typing str(s1).
> Below is the
> code and the comments that R returned:
>
>> channel <- odbcConnectExcel('u:/test.xls')
>> sqlTables(channel)
>
> TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS
> 1 u:\\test <NA> Sheet1$ SYSTEM TABLE <NA>
> 2 u:\\test <NA> Sheet2$ SYSTEM TABLE <NA>
> 3 u:\\test <NA> Sheet3$ SYSTEM TABLE <NA>
>
>>
>> s1 <- sqlFetch(channel, "Sheet1")
>>
>> odbcClose(channel)
>>
>> s1
>
> Firstname Lastname Age
> 1 Bob Smith 20
> 2 John Clark NA
> 3 Andy <NA> 40
>
>
>> str(s1)
>
> 'data.frame': 3 obs. of 3 variables:
> $ Firstname: Factor w/ 3 levels "Andy","Bob","John": 2 3 1
> $ Lastname : Factor w/ 2 levels "Clark","Smith": 2 1 NA
My guess was wrong.
> $ Age : num 20 NA 40
>
>> levels(s1$Lastname)
>
> [1] "Clark" "Smith"
>
>>
>
>
>
> When I used the code:
>
> levels(s1$Lastname)[1] <- "qqqq"
>
> R replaced Clark with qqqq and kept <NA> I guess because <NA> is not
> considered a level here:
>
>
>> levels(s1$Lastname)[1] <- "qqqq"
When I did that on the dataset I already had examined "df1". I knew
that <NA> was a level. In your case is was not a named level. So you
should not have expected the same outcome as with my efforts.
I am unable to explain why your methods did not work with that
structure:
This seems to work:
> df1$Lastname <- ifelse(is.na(df1$Lastname), "qqqq", df1$Lastname)
> df1
Firstname Lastname Age
1 Bob 3 20
2 John 2 NA
3 Andy qqqq 40
>> s1
> Firstname Lastname Age
> 1 Bob Smith 20
> 2 John qqqq NA
> 3 Andy <NA> 40
>>
>
>
>
>
> --
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list