[R] set dataframe field value from lookup table

Thu Dec 9 19:03:41 CET 2010

Thanks to David and William for helpful comments. I'm not sure if the list will accept attachments, but am trying with this. Z_example.txt is a pared-down sample of the target table in text format. What is needed is a z-score based on the DSF and DSB fields relative to the age field. The second table is the full z-score table (DSTz.txt), age group in first row and score (2*DSF + 2*DSB) in the first column.

Jon

Soli Deo Gloria

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Z_example.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101209/b23a3b23/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: DSTz.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101209/b23a3b23/attachment-0001.txt>
-------------- next part --------------

On 2010-12-09, at 12:53 PM, William Dunlap wrote:

>> -----Original Message-----
>> From: r-help-bounces at r-project.org 
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Jon Erik Ween
>> Sent: Thursday, December 09, 2010 8:27 AM
>> To: David Winsemius
>> Cc: r-help at r-project.org
>> Subject: Re: [R] set dataframe field value from lookup table
>> 
>> Sorry, I should have included the error I get when using the 
>> initial vesion of step 2):
>> 
>> Error in `$<-.data.frame`(`*tmp*`, "DSTz", value = list(Age7 
>> = c(-1.55,  : 
>>  replacement has 20 rows, data has 955
>> In addition: Warning message:
>> In DSTzlook[, 1] == df$DSF + df$DSB :
>>  longer object length is not a multiple of shorter object length
>> 
>> So, regardless of how you calculate [r,c], the step
>> 
>> df$DSTz<-DSTzlook[r,c]
> 
> You probably want to use [cbind(r,c)], where r
> and c are vectors of row and column numbers.
> 
> Supplying an example that helpers could copy and
> paste into an R session would really help.  E.g.,
> instead of showing the usual printout of the table
> of zscores, show the output of dput(thatTable) or
> the command you used to build it.  Here is my
> guess, given your printout
> 
>  ZScoreTable <- matrix(byrow=TRUE,
>     c( 2.6, 2.6, 2.6, 2.6, 2.6, 2.6,
>        1.8, 1.8, 1.8, 2.0, 2.6, 2.6,
>        1.0, 1.0, 1.8, 1.8, 2.6, 2.6,
>        0.0, 0.5, 1.0, 1.8, 2.6, 2.6,
>        -.5, 0.0, 0.0, 1.0, 1.8, 2.6),
>     nrow=5,
>     ncol=6,
>     dimnames = list(
>        StdScore=c("30", "29", "28", "27", "26"),
>        AgeClass=c("17", "19", "24", "29", "34", "44")
>     )
>  )
> 
> Your structure may be different, but given that
> that is your table of that encodes the mapping of
> the order pair (StdScore,AgeClass) to a z-score
> here is some code to do the mapping:
> 
>  ZScore <- function(age, stdScore) {
>     AgeToColumnNumber <- function(age,
>        ageClassBottoms=as.numeric(colnames(ZScoreTable)))
>     {
>        retval <- findInterval(age, c(ageClassBottoms, Inf))
>        retval[retval==0] <- NA
>        retval
>     }
> 
>     StdScoreToRowNumber <- function(stdScore,
>        knownScores = as.numeric(rownames(ZScoreTable)))
>     {
>        match(stdScore, knownScores)
>     }
> 
>     ZScoreTable[cbind(StdScoreToRowNumber(stdScore),
> AgeToColumnNumber(age))]
>  }
> 
> where a typical usage would be
> 
>> ZScore(age=c(29,44, 10), stdScore=c(28,29,30))
>  [1] 1.8 2.6  NA
> 
> (Age 10 is not in the table so it gets an NA for a z-score).
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com 
> 
>> doesn't work. I've tried various permutations with "apply", 
>> but that didn't work either. Any suggestions?
>> 
>> Jon
>> 
>> Soli Deo Gloria
>> 
>> Jon Erik Ween, MD, MS
>> Scientist, Kunin-Lunenfeld Applied Research Unit 
>> Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
>> Assistant Professor, Dept. of Medicine, Div. of Neurology
>>    University of Toronto Faculty of Medicine
>> 
>> Kimel Family Building, 6th Floor, Room 644 
>> Baycrest Centre
>> 3560 Bathurst Street 
>> Toronto, Ontario M6A 2E1
>> Canada 
>> 
>> Phone: 416-785-2500 x3648
>> Fax: 416-785-2484
>> Email: jween at klaru-baycrest.on.ca
>> 
>> 
>> Confidential: This communication and any attachment(s) may 
>> contain confidential or privileged information and is 
>> intended solely for the address(es) or the entity 
>> representing the recipient(s). If you have received this 
>> information in error, you are hereby advised to destroy the 
>> document and any attachment(s), make no copies of same and 
>> inform the sender immediately of the error. Any unauthorized 
>> use or disclosure of this information is strictly prohibited.
>> 
>> 
>> 
>> On 2010-12-09, at 11:06 AM, David Winsemius wrote:
>> 
>>> 
>>> On Dec 9, 2010, at 10:51 AM, Jon Erik Ween wrote:
>>> 
>>>> Thanks David
>>>> 
>>>> What I am trying to do is set up a script that assigns 
>> z-scores to a large dataframe (2500x300, but has Age in years 
>> and test scores as columns.) from a published table of 
>> age-corrected standard scores on this cognitive test.
>>>> 
>>>> 1) The age intervals in the lookup table are given and not 
>> my choice.
>>> 
>>> You may want to skip the intermediate translation to the 
>> row and column labels and just use the results of findInterval:
>>> 
>>>> findInterval( 16, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 
>> 69, 74, 79, 84, 89) )
>>> [1] 1
>>>> findInterval( 90, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 
>> 69, 74, 79, 84, 89) )
>>> [1] 14
>>> 
>>> Those look like appropriate indices for the column argument
>>>> 
>>>> 2) Sorry I didn't post an example table, it looks 
>> something like this ("Age" is in the first row, standard 
>> scores in the first column):
>>>> 
>>>>      17   19   24   29   34   44 ....
>>>> 30   2.6  2.6  2.6  2.6  2.6  2.6
>>>> 29  1.8  1.8  1.8  2.0  2.6  2.6
>>>> 28  1.0  1.0  1.8  1.8  2.6  2.6
>>>> 27  0.0  0.5  1.0  1.8  2.6  2.6
>>>> 26   -.5  0.0  0.0  1.0  1.8  2.6
>>>> .
>>>> .
>>>> .
>>>> .
>>>> 
>>>> So, if a subject (row) has age==29 and a standard score of 
>> 28, the value should be 1.8, etc.
>>> 
>>> Looks like a job for two findInterval indices to be used 
>> used with "[ r , c ] ".
>>> 
>>> -- 
>>> David.
>>> 
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> Jon
>>>> 
>>>> Soli Deo Gloria
>>>> 
>>>> Jon Erik Ween, MD, MS
>>>> Scientist, Kunin-Lunenfeld Applied Research Unit
>>>> Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
>>>> Assistant Professor, Dept. of Medicine, Div. of Neurology
>>>>  University of Toronto Faculty of Medicine
>>>> 
>>>> Kimel Family Building, 6th Floor, Room 644
>>>> Baycrest Centre
>>>> 3560 Bathurst Street
>>>> Toronto, Ontario M6A 2E1
>>>> Canada
>>>> 
>>>> Phone: 416-785-2500 x3648
>>>> Fax: 416-785-2484
>>>> Email: jween at klaru-baycrest.on.ca
>>>> 
>>>> 
>>>> Confidential: This communication and any attachment(s) may 
>> contain confidential or privileged information and is 
>> intended solely for the address(es) or the entity 
>> representing the recipient(s). If you have received this 
>> information in error, you are hereby advised to destroy the 
>> document and any attachment(s), make no copies of same and 
>> inform the sender immediately of the error. Any unauthorized 
>> use or disclosure of this information is strictly prohibited.
>>>> 
>>>> 
>>>> 
>>>> On 2010-12-09, at 10:33 AM, David Winsemius wrote:
>>>> 
>>>>> 
>>>>> On Dec 9, 2010, at 9:34 AM, Jon Erik Ween wrote:
>>>>> 
>>>>>> 
>>>>>> Hi
>>>>>> 
>>>>>> This is (hopefully) a bit more cogent phrasing of a 
>> previous post. I'm
>>>>>> trying to compute a z-score to rows in a large dataframe 
>> based on values in
>>>>>> another dataframe. Here's the script (that does not 
>> work). 2 questons,
>>>>>> 
>>>>>> 1) Anyone know of a more elegant way to calculate the 
>> "rounded" age value
>>>>>> than the nested ifelse's I've used?
>>>>>> 
>>>>>> 2) how to reference the lookup table based on computed indices?
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Jon
>>>>>> 
>>>>>> # Define tables
>>>>>> DSTzlook <-
>>>>>> 
>> read.table("/Users/jween/Documents/ResearchProjects/ABC/data/D
>> STz.txt",
>>>>>> header=TRUE, sep="\t", na.strings="NA", dec=".", 
>> strip.white=TRUE)
>>>>>> df<-stroke
>>>>>> 
>>>>>> # Compute rounded age.
>>>>>> df$Agetmp
>>>>>> 
>> <-ifelse(df$Age>=89,89,ifelse(df$Age>=84,84,ifelse(df$Age>=79,
> 79,ifelse(df$Age>=74,74,ifelse(df$Age>=69,69,ifelse(df$Age>=>
> 64,64,ifelse(df$Age>=54,54,ifelse(df$Age>=44,44,ifelse(df$Age>
>> =34,34,ifelse(df$Age>=29,29,ifelse(df$Age>=24,24,ifelse(df$Age
>> =19,19,17))))))))))))
>>>>> 
>>>>> Ew, painful. If you want categorized ages (since what the 
>> above coding is producing is not "rounded" in any sense of 
>> that word as I understand it, then why not findInterval() as 
>> an index into the ages you wnat to label these case with?
>>>>> 
>>>>> df$Agetmp <- c(17,19,24,29,34,44,54,64,69,74,79,84)[  # 
>> note Extract operation
>>>>>          findInterval(runif(100,0,100), 
>> c(17,19,24,29,34,44,54,64,69,74,79,84,110) )
>>>>>         ]  # close extraction
>>>>> 
>>>>> 
>>>>> The other option, of course, and a more "honest" one in 
>> this instance would be
>>>>> 
>>>>> cut(vec, breaks=c(...), labels=c(...) )
>>>>> 
>>>>> (It's not clear why you are not picking midpoint ages 
>> within those brackets to me.)
>>>>> 
>>>>>> 
>>>>>> # Reference the lookup table based on computed indices
>>>>>> df$DSTz
>>>>>> 
>> <-DSTzlook[which(DSTzlook[,1]==df$Agetmp),which(DSTzlook[1,]==
> df$DSF+df$DSB)]
>>>>> 
>>>>> I have not been able to figure out what you are trying to 
>> do here. Trying to use a 2d lookup looks promising a a way to 
>> emulate what an Excel user might attempt, but an example (as 
>> requested in the message at the bottom of every posting) 
>> would really be of great help in making this more concrete 
>> for those of us with insufficient abstractive abilities.
>>>>> 
>>>>> -- 
>>>>> David.
>>>>> 
>>>>>> 
>>>>>> # Cleanup
>>>>>> #rm(df)
>>>>>> #df$Agetmp<-NULL
>>>>>> -- 
>>>>>> View this message in context: 
>> http://r.789695.n4.nabble.com/set-dataframe-field-value-from-l
> ookup-table-tp3080245p3080245.html
>>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>> 
>>>>> 
>>>>> 
>>>>> David Winsemius, MD
>>>>> West Hartford, CT
>>>>> 
>>>> 
>>> 
>>> David Winsemius, MD
>>> West Hartford, CT
>>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>