[R] "haven" - read_spss: How to avoid extracting value labels instead of long labels?
Ista Zahn
istazahn at gmail.com
Fri Nov 13 16:00:58 CET 2015
Why do you think this is a bug in have? To the contrary, I don't think
this has anything to do with haven at all. The problem seems to be
that attr does partial matching by default. Check it out:
> attr(x, "labels") <- c("foo", "bar", "baz")
> attr(x, "label")
[1] "foo" "bar" "baz"
and see ?attr for details.
The answer I think is
fix_labels <- function(x, TextIfMissing) {
val <- attr(x, "label", exact = TRUE)
if (is.null(val)) TextIfMissing else val
}
Finally, note that the development version of rio
(https://github.com/leeper/rio) has an (non-exported) function for
cleaning up meta data from haven imports. See
https://github.com/leeper/rio/blob/master/R/utils.R#L86
Best,
Ista
On Thu, Nov 12, 2015 at 8:37 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> I have to rephrase my question again - it's clearly a small bug in
> haven. Here is what it is about:
>
> If I have a column in SPSS that has BOTH a long label and value
> labels, then everything works fine - I access one with 'label' and
> another with 'labels':
>
> attr(spss1$MYVAR, "label")
> [1] "LONG LABEL"
> attr(spss1$MYVAR, "labels")
> DEFINITELY CONSIDER PROBABLY CONSIDER PROBABLY NOT
> CONSIDER DEFINITELY NOT CONSIDER
> 1 2
> 3 4
>
> However, if I have a column that has no long label and ONLY value
> labels, then it's not working properly:
>
>> attr(spss1$MYVAR, "label")
> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR
> 1 2
>> attr(spss1$MYVAR, "labels")
> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR
> 1 2
>
> And I actually need to be able to identify if label is empty.
> Thank you for looking into it!
>
> Dimitri
>
>
> On Thu, Nov 12, 2015 at 5:55 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> Looks like a little bug in 'haven':
>>
>> When I actually look at the attributes of one variable that has no
>> long label in SPSS but has Value Labels, I am getting:
>> attr(spss1$WAVE, "label")
>> NULL
>>
>> But when I sapply my function longlabels to my data frame and ask it
>> to print the long labels for each column, for the same column "WAVE" I
>> am getting - instead of NULL:
>> NULL
>> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR
>> 1 2
>>
>> This is, of course, incorrect, because it grabs the next attribute
>> (which one? And replaces NULL with it).
>> Any suggestions?
>> Thanks!
>>
>>
>>
>>
>> On Thu, Nov 12, 2015 at 11:56 AM, Dimitri Liakhovitski
>> <dimitri.liakhovitski at gmail.com> wrote:
>>> Hello!
>>>
>>> I don't have an example file, but I think my question should be clear
>>> without it.
>>> I have an SPSS file. I read it in using 'haven':
>>>
>>> library(haven)
>>> spss1 <- read_spss("SPSS_Example.sav")
>>>
>>> I created a function that extracts the long labels (in SPSS - "Label"):
>>>
>>> fix_labels <- function(x, TextIfMissing) {
>>> val <- attr(x, "label")
>>> if (is.null(val)) TextIfMissing else val
>>> }
>>> longlabels <- sapply(spss1, fix_labels, TextIfMissing = "NO LABLE IN SPSS")
>>>
>>> This function is supposed to create a vector of long labels and
>>> usually it does, e.g.:
>>>
>>> str(longlabels)
>>> Named chr [1:64] "Serial number" ...
>>> - attr(*, "names")= chr [1:64] "Respondent_Serial" "weight" "r7_1" "r7_2" ...
>>>
>>> However, I just got an SPSS file with 92 columns and ran exactly the
>>> same function on it. Now, I am getting not a vector, but a list
>>>
>>> str(longlabels)
>>> List of 92
>>> $ VEHRATED : chr "VEHICLE RATED"
>>> $ RESPID : chr "RESPONDENT ID"
>>> $ RESPID8 : chr "8 DIGIT RESPONDENT NUMBER"
>>>
>>> An observation about the structure of longlabels here: those columns
>>> that do NOT have a long lable in SPSS but DO have Values (value
>>> labels) - for them my function grabs their value labels, so that now
>>> my long label is recorded as a numeric vector with names, e.g.:
>>>
>>> $ AWARE2 : Named num [1:2] 1 2
>>> ..- attr(*, "names")= chr [1:2] "VERY/SOMEWHAT FAMILIAR" "NOT AT ALL FAMILIAR"
>>>
>>> Question: How could I avoid the extraction of the Value Labels for the
>>> columns that have no long labels?
>>>
>>> Thank you very much!
>>> --
>>> Dimitri Liakhovitski
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>
>
>
> --
> Dimitri Liakhovitski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list