[R] When creating a data frame with data.frame() transforms "integers" into "factors"
António Brito Camacho
toinobc at gmail.com
Sun May 26 17:34:02 CEST 2013
Hello Bert.
I didn't reply to the list because i forgot. I hit reply instead of reply all....
Thanks for your example.
I understood now that i was trying to do something that didn't made sense and that was why it failed.
I should have used an histogram do do a graph of the frequency of each number of 'posts' instead of going the convoluted way around and trying to do a scatterplot.
I now understand that table() transforms each value of the variable into a "factor" and counts how many times it shows up. It makes sense that these "factors" are then tranformed into "character" when in the data frame, because they are not a quantity, but the representation of the number.
Thanks for the help. Problem solved.
António Brito Camacho
No dia 26/05/2013, às 15:00, Bert Gunter <gunter.berton at gene.com> escreveu:
> 1. Please always cc. the list; do not reply just to me.
>
> 2. OK, I see. I ERRED. Had you cc'ed the list, someone might have
> pointed this out. The correct example reproduces what you saw.
>
> z<- sample(1:10,30,rep=TRUE)
> table(z)
> w <- data.frame(table(z))
> w
>
> z Freq
> 1 1 2
> 2 2 3
> 3 3 1
> 4 4 3
> 5 5 5
> 6 6 3
> 7 7 5
> 8 8 4
> 9 9 1
> 10 10 3
>
>> sapply(w,class)
> z Freq
> "factor" "integer"
>
> This is exactly what is expected and documented. See ?table. So the
> question is: What do you expect? table() produces an array whose
> cross-classifying factors are the dimensions. data.frame converts this
> into a data frame. Perhaps the following will help clarify:
>
>> z <- data.frame(fac1= sample(LETTERS[1:3],10,rep=TRUE),
> fac2 = sample(c("j","k"),10,rep=TRUE))
>> z
> fac1 fac2
> 1 A k
> 2 B k
> 3 C k
> 4 C k
> 5 B k
> 6 C k
> 7 C k
> 8 A j
> 9 A j
> 10 C j
>
>> table(z)
>
> fac2
> fac1 j k
> A 2 1
> B 0 2
> C 1 4
>
>> data.frame(table(z))
>
> fac1 fac2 Freq
> 1 A j 2
> 2 B j 0
> 3 C j 1
> 4 A k 1
> 5 B k 2
> 6 C k 4
>
>> table(z['fac1'])
>
> A B C
> 3 2 5
>
>> data.frame(table(z['fac1']))
> Var1 Freq
> 1 A 3
> 2 B 2
> 3 C 5
>
> Cheers,
> Bert
>
> On Sat, May 25, 2013 at 6:54 PM, António Camacho <toinobc at gmail.com> wrote:
>> Hello Bert
>> Thanks for your prompt reply.
>> I tried your example and it worked without a problem.
>>
>> But what i want is to create a data frame from the output of the function
>> table(), so in your example i tried "sapply(data.frame(tbl),class)" and the
>> output was z --> factor and Freq --->integer.
>> What is happening in the table() function that is transforming the integers
>> in z into values with labels ?
>> because when i do "names(tbl)" it returns each value of z as a name....
>>
>> I read the manual for " [ " but i didn't understand it completely. I have to
>> read the introduction to R more carefully.
>>
>> I also tried using "[," "[[" and "$" for the extraction of the values from
>> the 'posts' column, but the problem persisted.
>>
>> Like i said, this code was taken from an example in a webpage. I contacted
>> the author and he confirmed me that the code worked on his machine, that was
>> running R 2.15.1....
>> Maybe something changed between versions in the data.frame() ??
>>
>> I really don't understant what I am doing wrong.
>>
>> António
>>
>> On 2013/05/26, at 01:44, Bert Gunter wrote:
>>
>>> Huh?
>>>
>>>> z <- sample(1:10,30,rep=TRUE)
>>>> tbl <- table(z)
>>>> tbl
>>>
>>> z
>>> 1 2 3 4 5 6 7 8 9 10
>>> 4 3 2 6 3 3 2 2 2 3
>>>>
>>>> data.frame(z)
>>>
>>> z
>>> 1 5
>>> 2 2
>>> 3 4
>>> 4 1
>>> 5 6
>>> 6 4
>>> 7 10
>>> 8 4
>>> 9 3
>>> 10 8
>>> 11 10
>>> 12 4
>>> 13 3
>>> 14 9
>>> 15 2
>>> 16 2
>>> 17 6
>>> 18 1
>>> 19 4
>>> 20 7
>>> 21 9
>>> 22 10
>>> 23 7
>>> 24 5
>>> 25 5
>>> 26 6
>>> 27 8
>>> 28 1
>>> 29 1
>>> 30 4
>>>>
>>>> sapply(data.frame(z),class)
>>>
>>> z
>>> "integer"
>>>
>>> Your error: you used df['posts'] . You should have used df[,'posts'] .
>>>
>>> The former is a data frame. The latter is a vector. Read the
>>> "Introduction to R tutorial" or ?"[" if you don't understand why.
>>>
>>> -- Bert
>>>
>>> -- Bert
>>>
>>> On Sat, May 25, 2013 at 12:36 PM, António Camacho <toinobc at gmail.com>
>>> wrote:
>>>>
>>>> Hello
>>>>
>>>>
>>>> I am novice to R and i was learning how to do a scatter plot with R using
>>>> an example from a website.
>>>>
>>>> My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
>>>> without additional packages loaded
>>>>
>>>> I created a .csv file in vim with the following content
>>>> userID,user,posts
>>>> 1,user1,581
>>>> 2,user2,281
>>>> 3,user3,196
>>>> 4,user4,150
>>>> 5,user5,282
>>>> 6,user6,184
>>>> 7,user7,90
>>>> 8,user8,74
>>>> 9,user9,45
>>>> 10,user10,20
>>>> 11,user11,3
>>>> 12,user12,1
>>>> 13,user13,345
>>>> 14,user14,123
>>>>
>>>> i imported the file into R using : ' df <- read.csv('file.csv')
>>>> to confirm the data types i did : 'sappily(df, class) '
>>>> that returns "userID" --> "integer" ; "user" ---> "factor" ; "posts" --->
>>>> "integer"
>>>> then i try to create another data frame with the number of posts and its
>>>> frequencies,
>>>> so i did: 'postFreqCount<-data.frame(table(df['posts']))'
>>>> this gives me the postFreqCount data frame with two columns, one called
>>>> 'Var1' that has the number of posts each user did, and another collumn
>>>> 'Freq' with the frequency of each number of posts.
>>>> the problem is that if i do : 'sappily(postFreqCount['Var1'],class)' it
>>>> returns "factor".
>>>> So the data.frame() function transformed a variable that was "integer"
>>>> (posts) to a variable (Var1) that has the same values but is "factor".
>>>> I want to know how to prevent this from happening. How do i keep the
>>>> values
>>>> from being transformed from "integer" to "factor" ?
>>>>
>>>> Thank you for your help
>>>>
>>>> António
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>>
>>> Internal Contact Info:
>>> Phone: 467-7374
>>> Website:
>>>
>>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>
>>
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
More information about the R-help
mailing list