[R] combining data.frames with is.na & match (), two questions

Thu Apr 18 10:29:52 CEST 2019

The whole thing is a merge operation, i.e.

> FruitNutr <- read.table(text="
+ Fruit  Calories
+ 1 banana 100
+ 2 pear 100
+ 3 mango 200
+ ")
> FruitData <- read.table(text="
+ Fruit Color Shape Juice
+ 1 apple red round 1
+ 2 banana yellow oblong 0
+ 3 pear green pear 0.5
+ 4 orange orange round 1
+ 5 kiwi green round 0
+ ")
> merge(FruitData, FruitNutr)
   Fruit  Color  Shape Juice Calories
1 banana yellow oblong   0.0      100
2   pear  green   pear   0.5      100
> merge(FruitData, FruitNutr, all.x=TRUE)
   Fruit  Color  Shape Juice Calories
1  apple    red  round   1.0       NA
2 banana yellow oblong   0.0      100
3   kiwi  green  round   0.0       NA
4 orange orange  round   1.0       NA
5   pear  green   pear   0.5      100

Mind you, merge() comes with its own set of confusing options in the more complex cases, which may be why the authors have chosen a more elementary approach.

-pd

> On 18 Apr 2019, at 01:24 , Drake Gossi <drake.gossi using gmail.com> wrote:
> 
> Hello everyone,
> 
> I'm working through this book, *Humanities Data in R* (Arnold & Tilton),
> and I'm just having trouble understanding this maneuver.
> 
> In sum, I'm trying to combine data in two different data.frames.
> 
> This data.frame is called fruitNutr
> 
> Fruit  Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
> 
> And this data.frame is called fruitData
> 
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
> 
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
> 
> fruitData$calories <- NA
> 
> 
> As a result, I've created a new column for the fruitData data.frame:
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0            N/A
> 3 pear green pear 0.5            N/A
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> Then:
> 
>> index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit)
>> index
>  [1]    NA       1       2      NA      NA
>> is.na(index)
>  [1]    TRUE   FALSE    FALSE   TRUE    TRUE
>> fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na
> (index)]]
>> fruitData
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> I get what the first part means, that first part being this:
> fruitData$Calories [!is.na(index)]
> go into the fruitData data.frame, specifically into the calories column,
> and only for what's true according to is.na(index). But I just literally
> can't understand this last part.  fruitNutr$Calories[index[!is.na(index)]]
> 
> Two questions.
> 
> 
>   1. I just literally don't understand how this code works. It does work,
>   of course, but I don't know what it's doing, specifically this [index[!
>   is.na(index)]] part. Could someone explain it to me like I'm five? I'm
>   new at this...
>   2. And then: is there any other way to combine these two data.frames so
>   that we get this same result? maybe an easier to understand method?
> 
> That same result, again, is
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> 
> Drake
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com