[R] combining data.frames with is.na & match (), two questions
Drake Gossi
dr@ke@go@@| @end|ng |rom gm@||@com
Thu Apr 18 01:24:13 CEST 2019
Hello everyone,
I'm working through this book, *Humanities Data in R* (Arnold & Tilton),
and I'm just having trouble understanding this maneuver.
In sum, I'm trying to combine data in two different data.frames.
This data.frame is called fruitNutr
Fruit Calories
1 banana 100
2 pear 100
3 mango 200
And this data.frame is called fruitData
Fruit Color Shape Juice
1 apple red round 1
2 banana yellow oblong 0
3 pear green pear 0.5
4 orange orange round 1
5 kiwi green round 0
So, as you can see, these two data.frames overlap insofar as they both have
banana and pear. So, what happens next is the book suggests this:
fruitData$calories <- NA
As a result, I've created a new column for the fruitData data.frame:
Fruit Color Shape Juice Calories
1 apple red round 1 N/A
2 banana yellow oblong 0 N/A
3 pear green pear 0.5 N/A
4 orange orange round 1 N/A
5 kiwi green round 0 N/A
Then:
> index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit)
> index
[1] NA 1 2 NA NA
> is.na(index)
[1] TRUE FALSE FALSE TRUE TRUE
> fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na
(index)]]
> fruitData
Fruit Color Shape Juice Calories
1 apple red round 1 N/A
2 banana yellow oblong 0 100
3 pear green pear 0.5 100
4 orange orange round 1 N/A
5 kiwi green round 0 N/A
I get what the first part means, that first part being this:
fruitData$Calories [!is.na(index)]
go into the fruitData data.frame, specifically into the calories column,
and only for what's true according to is.na(index). But I just literally
can't understand this last part. fruitNutr$Calories[index[!is.na(index)]]
Two questions.
1. I just literally don't understand how this code works. It does work,
of course, but I don't know what it's doing, specifically this [index[!
is.na(index)]] part. Could someone explain it to me like I'm five? I'm
new at this...
2. And then: is there any other way to combine these two data.frames so
that we get this same result? maybe an easier to understand method?
That same result, again, is
Fruit Color Shape Juice Calories
1 apple red round 1 N/A
2 banana yellow oblong 0 100
3 pear green pear 0.5 100
4 orange orange round 1 N/A
5 kiwi green round 0 N/A
Drake
[[alternative HTML version deleted]]
More information about the R-help
mailing list