[R] combining data.frames with is.na & match (), two questions
peter dalgaard
pd@|gd @end|ng |rom gm@||@com
Thu Apr 18 10:29:52 CEST 2019
The whole thing is a merge operation, i.e.
> FruitNutr <- read.table(text="
+ Fruit Calories
+ 1 banana 100
+ 2 pear 100
+ 3 mango 200
+ ")
> FruitData <- read.table(text="
+ Fruit Color Shape Juice
+ 1 apple red round 1
+ 2 banana yellow oblong 0
+ 3 pear green pear 0.5
+ 4 orange orange round 1
+ 5 kiwi green round 0
+ ")
> merge(FruitData, FruitNutr)
Fruit Color Shape Juice Calories
1 banana yellow oblong 0.0 100
2 pear green pear 0.5 100
> merge(FruitData, FruitNutr, all.x=TRUE)
Fruit Color Shape Juice Calories
1 apple red round 1.0 NA
2 banana yellow oblong 0.0 100
3 kiwi green round 0.0 NA
4 orange orange round 1.0 NA
5 pear green pear 0.5 100
Mind you, merge() comes with its own set of confusing options in the more complex cases, which may be why the authors have chosen a more elementary approach.
-pd
> On 18 Apr 2019, at 01:24 , Drake Gossi <drake.gossi using gmail.com> wrote:
>
> Hello everyone,
>
> I'm working through this book, *Humanities Data in R* (Arnold & Tilton),
> and I'm just having trouble understanding this maneuver.
>
> In sum, I'm trying to combine data in two different data.frames.
>
> This data.frame is called fruitNutr
>
> Fruit Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
>
> And this data.frame is called fruitData
>
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
>
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
>
> fruitData$calories <- NA
>
>
> As a result, I've created a new column for the fruitData data.frame:
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1 N/A
> 2 banana yellow oblong 0 N/A
> 3 pear green pear 0.5 N/A
> 4 orange orange round 1 N/A
> 5 kiwi green round 0 N/A
>
> Then:
>
>> index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit)
>> index
> [1] NA 1 2 NA NA
>> is.na(index)
> [1] TRUE FALSE FALSE TRUE TRUE
>> fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na
> (index)]]
>> fruitData
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1 N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1 N/A
> 5 kiwi green round 0 N/A
>
> I get what the first part means, that first part being this:
> fruitData$Calories [!is.na(index)]
> go into the fruitData data.frame, specifically into the calories column,
> and only for what's true according to is.na(index). But I just literally
> can't understand this last part. fruitNutr$Calories[index[!is.na(index)]]
>
> Two questions.
>
>
> 1. I just literally don't understand how this code works. It does work,
> of course, but I don't know what it's doing, specifically this [index[!
> is.na(index)]] part. Could someone explain it to me like I'm five? I'm
> new at this...
> 2. And then: is there any other way to combine these two data.frames so
> that we get this same result? maybe an easier to understand method?
>
> That same result, again, is
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1 N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1 N/A
> 5 kiwi green round 0 N/A
>
>
> Drake
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
More information about the R-help
mailing list