[R] Convert list to data frame while controlling column types

Alexander Shenkin ashenkin at ufl.edu
Mon Aug 24 15:59:40 CEST 2009


On 8/24/2009 2:06 AM, Petr PIKAL wrote:
> Hi
> 
> r-help-bounces at r-project.org napsal dne 23.08.2009 17:29:48:
> 
>> On 8/23/2009 9:58 AM, David Winsemius wrote:
>>> I still have problems with this statement. As I understand R, this
>> should be impossible. I have looked at both you postings and neither of
>> them clarify the issues. How can you have blanks or spaces in an R
>> numeric vector?
>>
>>
>> Just because I search numeric columns doesn't mean that my regex matches
>> them!  I posted some info on my data frame in an earlier email:
>>
>>     str(final_dataf)
>>     'data.frame':   1127 obs. of  43 variables:
>>      $ block      : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
>>      $ treatment  : Factor w/ 4 levels "I","M","N","T": 1 1 1 1 1 1 ...
>>      $ transect   : Factor w/ 1 level "4": 1 1 1 1 1 1 1 1 1 1 ...
>>      $ tag        : chr  NA "121AL" "122AL" "123AL" ...
>>     ...
>>      $ h1         : num  NA NA NA NA NA NA NA NA NA NA ...
>>     ...
>>
>> You can see that I do indeed have some numeric columns.  And while I
> 
> Well, AFAICS you have a data frame with 3 columns which are factors and 1 
> which is character. I do not see any numeric column. If you want to change 
> block and transect to numeric you can use
> 
> df$block <- as.numeric(as.character(df$block))

If you take a closer look at my data frame listing, you'll see that it
is "1127 obs. of  43 variables".  I edited the column listing for
readability, and you'll see even in my editing listing I do indeed have
one numeric column - "h1".  And as I mentioned earlier, I use
colClasses, so no need to change anything to numeric here.

> 
>> search them for spaces, I only do so because my dataset isn't so large
>> as to require me to exclude them from the search.  If my dataset grows
>> too big at some point, I will exclude numeric columns, and other columns
>> which cannot hold blanks or spaces.
>>
>> To clarify further with an example:
>>
>>> df = data.frame(a=c(1,2,3,4,5),b=c("a","","c","d"," "))
>>> df = as.data.frame(lapply(df, function(x){ is.na(x) <-
>> + grep('^\\s*$',x); return(x) }), stringsAsFactors = FALSE)
>>> df
>>   a    b
>> 1 1    a
>> 2 2 <NA>
>> 3 3    c
>> 4 4    d
>> 5 5 <NA>
> 
> which can be done also by
> df[,2] <- levels(df[,2])[1:2]<-NA
> 
> but maybe with less generalization

Yes - my point was to show how I looped through an entire data frame
looking for \\s*, even when some of the columns were numeric.  I gave
this simple example with a 2-column data frame to illustrate that point.

>>> str(df)
>> 'data.frame':   5 obs. of  2 variables:
>>  $ a: num  1 2 3 4 5
>>  $ b: Factor w/ 5 levels ""," ","a","c",..: 3 NA 4 5 NA
>>
>> And one final clarification: I left out "as.data.frame" in my previous
>> solution.  So it now becomes:
>>
>>> final_dataf = as.data.frame(lapply(final_dataf, function(x){ is.na(x)
>> + <- grep('^\\s*$',x); return(x) }), stringsAsFactors = FALSE)
> 
> Again not too much of clarification, in your first data frame second 
> column is a factor with some levels you want to convert to NA, which can 
> be done by different approaches.

This clarification was to show the code that worked (for posterity), as
my previous post left out an argument.  It seems that perhaps you missed
the previous emails.

> Your final_dataf is same as df.

Yes, that is the point.  As I mentioned in the first email of this
thread, I was trying to get around as.data.frame's automatic conversion
routines, in order to retain the original column types.  And it turned
out that gsub() was more of the problem than as.data.frame() was.
Please refer to the earlier emails for more information on that.

> Columns which shall be numeric and are read as factor/character by 
> read.table likely contain some values which strictly can not be considered 
> numeric. You can see them quite often in Excel like programs and some 
> examples are
> 
> 1..2, o.5, 12.o5 and or spaces, "-" e.t.c.
> 
> and you usually need handle them by hand.
> 
> Regards
> Petr
> 
>> Hope that clarifies things, and thanks for your help.
>>
>> Thanks,
>> Allie
>>
>>
>> On 8/23/2009 9:58 AM, David Winsemius wrote:
>>> On Aug 23, 2009, at 2:47 AM, Alexander Shenkin wrote:
>>>
>>>> On 8/21/2009 3:04 PM, David Winsemius wrote:
>>>>> On Aug 21, 2009, at 3:41 PM, Alexander Shenkin wrote:
>>>>>
>>>>>> Thanks everyone for their replies, both on- and off-list.  I should
>>>>>> clarify, since I left out some important information.  My original
>>>>>> dataframe has some numeric columns, which get changed to character 
> by
>>>>>> gsub when I replace spaces with NAs.
>>>>> If you used is.na() <-  that would not happen to a true _numeric_ 
> vector
>>>>> (but, of course, a numeric vector in a data.frame could not have 
> spaces,
>>>>> so you are probably not using precise terminology).
>>>> I do have true numeric columns, but I loop through my entire 
> dataframe
>>>> looking for blanks and spaces for convenience.
>>> I still have problems with this statement. As I understand R, this
>>> should be impossible. I have looked at both you postings and neither 
> of
>>> them clarify the issues. How can you have blanks or spaces in an R
>>> numeric vector?
>>>
>>>
>>>>> You would be well
>>>>> advised to include the actual code rather than applying loose
>>>>> terminology subject you your and our misinterpretation.
>>>> I did include code in my previous email.  Perhaps you were looking 
> for
>>>> different parts.
>>>>
>>>>> ?is.na
>>>>>
>>>>>
>>>>> I am guessing that you were using read.table() on the original data, 
> in
>>>>> which case you should look at the colClasses parameter.
>>>>>
>>>> yep - I use read.csv, and I do use colClasses.  But as I mentioned
>>>> earlier, gsub converts those columns to characters.  Thanks for the 
> tip
>>>> about is.na() <-.  I'm now using the following, thus side-stepping 
> the
>>>> whole "controlling as.data.frame's column conversion" issue:
>>>>
>>>> final_dataf = lapply(final_dataf, function(x){ is.na(x) <-
>>>> + grep('^\\s*$',x); return(x) })
>>>
>>> Good that you have a solution.
>>>
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list