[R] spss, string factors, selecting
James Reilly
reilly at stat.auckland.ac.nz
Tue Nov 27 09:52:47 CET 2007
It does sound like there could be a problem with the merging process. I
have two questions about your merge command:
chaffmerge2<-merge(chaff, chafffat, by.x=c("RINGNO", "FAT",
"FATMTD"), by.y=c("RINGNO", "FAT", "FATMTD"), all=T)
1. What is the reason for matching on "FAT" and "FATMTD"? From your
description of the data, I assume that "RINGNO" is the individual
identifier. I'd have thought matching on that alone would be appropriate.
2. What happens if you omit the "all=T" argument? In particular, how
does the size of the merged dataset compare to the inputs?
--
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand
On 27/11/07 2:31 PM, Katherine Jones wrote:
> Hi,
>
> This is probably a case where someone has to see what is happening on
> my computer and it is complicated by my data being from SPSS (not my
> choice). It is quite hard to give my data, because it is such a large
> dataset. I have analysed 9 other datasets that work fine, but this
> particular dataset was inputted wrong so requires merging of two
> datasets. This may be the problem.
>
> Example of data:-
> File 1.
> [1] Individual [2] Habitat type [3] Weight
> File 2.
> [1] Individual [2] Fat [3] Fat method.
>
> I merge the two files to create:-
> [1] Individual [2] Habitat type [3] Weight [4] Fat [5] Fat method
>
> My merging appears to work in the sense that I can plot Weight versus
> Fat and I get data, but if I ask to see the data file I see a sea of
> "NAs". So I'm not sure how there can be data there to plot, see
> levels for and create tables for but I can't see it as a dataframe. I
> do get the plot I want.
>
> Fat method contains either blank cells, " B" or " E".
>
> I wish to select all the rows in columns 1-4 which contain an " E" in
> Fat method.
>
> e.g.
> 120, 3, 20.2, 4, E
> 121, 4, 20.0, 5, B
> 132, 3, 21.2, 4,
>
> I want to select only the row containing " E", so I can plot Fat vs
> Habitat and Weight vs. Fat.
>
> I have been doing this by using
>
> selectE<-Data[Fatmethod==" E",].
>
> However, this does not work. It removes all of my data in the other
> columns to "NA" and I am left only with fatmethod and fat scores.
>
> It is odd it works with other datasets but not this one. Although
> with my other datasets when I ask to select " E", I can still see "
> B" using levels(Fat method) but there is no data there, so my plots
> are correct.
>
> Sorry this is long. I'm having difficulty explaining it.
>
> Katherine
>
>
> On 26-Nov-07, at 5:09 PM, jim holtman wrote:
>
>> That should give you back a subset of 'data' (with all its columns),
>> for those with " E" in 'column'. Can you show an example of your data
>> and what the desired output would be. The posting guide asks "provide
>> commented, minimal, self-contained, reproducible code" so we don't
>> have to speculate on what you want.
>>
>> On Nov 26, 2007 5:04 PM, Katherine Jones
>> <kajones at connect.carleton.ca> wrote:
>>> This sort of works. It does select the E data, but unfortunately
>>> it doesn't
>>> select the data from the other columns; I want to select data
>>> across about 5
>>> columns by the factor " E" in one of the columns. It should be
>>> easy, but for
>>> some reason it is not working. The spaces being added don't help.
>>>
>>> It seems to work on my non-merged data files, although the merged
>>> file
>>> contains all the data I need.
>>>
>>> Thanks for the subset command though. Hadn't thought of using that.
>>>
>>>
>>>
>>> On 26-Nov-07, at 4:46 PM, jim holtman wrote:
>>> ?subset
>>>
>>>
>>> subset(data, column == " E")
>>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list