[R] set dataframe field value from lookup table
Jon Erik Ween
jween at klaru-baycrest.on.ca
Thu Dec 9 19:03:41 CET 2010
Thanks to David and William for helpful comments. I'm not sure if the list will accept attachments, but am trying with this. Z_example.txt is a pared-down sample of the target table in text format. What is needed is a z-score based on the DSF and DSB fields relative to the age field. The second table is the full z-score table (DSTz.txt), age group in first row and score (2*DSF + 2*DSB) in the first column.
Jon
Soli Deo Gloria
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Z_example.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101209/b23a3b23/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: DSTz.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101209/b23a3b23/attachment-0001.txt>
-------------- next part --------------
On 2010-12-09, at 12:53 PM, William Dunlap wrote:
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Jon Erik Ween
>> Sent: Thursday, December 09, 2010 8:27 AM
>> To: David Winsemius
>> Cc: r-help at r-project.org
>> Subject: Re: [R] set dataframe field value from lookup table
>>
>> Sorry, I should have included the error I get when using the
>> initial vesion of step 2):
>>
>> Error in `$<-.data.frame`(`*tmp*`, "DSTz", value = list(Age7
>> = c(-1.55, :
>> replacement has 20 rows, data has 955
>> In addition: Warning message:
>> In DSTzlook[, 1] == df$DSF + df$DSB :
>> longer object length is not a multiple of shorter object length
>>
>> So, regardless of how you calculate [r,c], the step
>>
>> df$DSTz<-DSTzlook[r,c]
>
> You probably want to use [cbind(r,c)], where r
> and c are vectors of row and column numbers.
>
> Supplying an example that helpers could copy and
> paste into an R session would really help. E.g.,
> instead of showing the usual printout of the table
> of zscores, show the output of dput(thatTable) or
> the command you used to build it. Here is my
> guess, given your printout
>
> ZScoreTable <- matrix(byrow=TRUE,
> c( 2.6, 2.6, 2.6, 2.6, 2.6, 2.6,
> 1.8, 1.8, 1.8, 2.0, 2.6, 2.6,
> 1.0, 1.0, 1.8, 1.8, 2.6, 2.6,
> 0.0, 0.5, 1.0, 1.8, 2.6, 2.6,
> -.5, 0.0, 0.0, 1.0, 1.8, 2.6),
> nrow=5,
> ncol=6,
> dimnames = list(
> StdScore=c("30", "29", "28", "27", "26"),
> AgeClass=c("17", "19", "24", "29", "34", "44")
> )
> )
>
> Your structure may be different, but given that
> that is your table of that encodes the mapping of
> the order pair (StdScore,AgeClass) to a z-score
> here is some code to do the mapping:
>
> ZScore <- function(age, stdScore) {
> AgeToColumnNumber <- function(age,
> ageClassBottoms=as.numeric(colnames(ZScoreTable)))
> {
> retval <- findInterval(age, c(ageClassBottoms, Inf))
> retval[retval==0] <- NA
> retval
> }
>
> StdScoreToRowNumber <- function(stdScore,
> knownScores = as.numeric(rownames(ZScoreTable)))
> {
> match(stdScore, knownScores)
> }
>
> ZScoreTable[cbind(StdScoreToRowNumber(stdScore),
> AgeToColumnNumber(age))]
> }
>
> where a typical usage would be
>
>> ZScore(age=c(29,44, 10), stdScore=c(28,29,30))
> [1] 1.8 2.6 NA
>
> (Age 10 is not in the table so it gets an NA for a z-score).
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> doesn't work. I've tried various permutations with "apply",
>> but that didn't work either. Any suggestions?
>>
>> Jon
>>
>> Soli Deo Gloria
>>
>> Jon Erik Ween, MD, MS
>> Scientist, Kunin-Lunenfeld Applied Research Unit
>> Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
>> Assistant Professor, Dept. of Medicine, Div. of Neurology
>> University of Toronto Faculty of Medicine
>>
>> Kimel Family Building, 6th Floor, Room 644
>> Baycrest Centre
>> 3560 Bathurst Street
>> Toronto, Ontario M6A 2E1
>> Canada
>>
>> Phone: 416-785-2500 x3648
>> Fax: 416-785-2484
>> Email: jween at klaru-baycrest.on.ca
>>
>>
>> Confidential: This communication and any attachment(s) may
>> contain confidential or privileged information and is
>> intended solely for the address(es) or the entity
>> representing the recipient(s). If you have received this
>> information in error, you are hereby advised to destroy the
>> document and any attachment(s), make no copies of same and
>> inform the sender immediately of the error. Any unauthorized
>> use or disclosure of this information is strictly prohibited.
>>
>>
>>
>> On 2010-12-09, at 11:06 AM, David Winsemius wrote:
>>
>>>
>>> On Dec 9, 2010, at 10:51 AM, Jon Erik Ween wrote:
>>>
>>>> Thanks David
>>>>
>>>> What I am trying to do is set up a script that assigns
>> z-scores to a large dataframe (2500x300, but has Age in years
>> and test scores as columns.) from a published table of
>> age-corrected standard scores on this cognitive test.
>>>>
>>>> 1) The age intervals in the lookup table are given and not
>> my choice.
>>>
>>> You may want to skip the intermediate translation to the
>> row and column labels and just use the results of findInterval:
>>>
>>>> findInterval( 16, c(0, 17, 19, 24, 29, 34, 44, 54, 64,
>> 69, 74, 79, 84, 89) )
>>> [1] 1
>>>> findInterval( 90, c(0, 17, 19, 24, 29, 34, 44, 54, 64,
>> 69, 74, 79, 84, 89) )
>>> [1] 14
>>>
>>> Those look like appropriate indices for the column argument
>>>>
>>>> 2) Sorry I didn't post an example table, it looks
>> something like this ("Age" is in the first row, standard
>> scores in the first column):
>>>>
>>>> 17 19 24 29 34 44 ....
>>>> 30 2.6 2.6 2.6 2.6 2.6 2.6
>>>> 29 1.8 1.8 1.8 2.0 2.6 2.6
>>>> 28 1.0 1.0 1.8 1.8 2.6 2.6
>>>> 27 0.0 0.5 1.0 1.8 2.6 2.6
>>>> 26 -.5 0.0 0.0 1.0 1.8 2.6
>>>> .
>>>> .
>>>> .
>>>> .
>>>>
>>>> So, if a subject (row) has age==29 and a standard score of
>> 28, the value should be 1.8, etc.
>>>
>>> Looks like a job for two findInterval indices to be used
>> used with "[ r , c ] ".
>>>
>>> --
>>> David.
>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Jon
>>>>
>>>> Soli Deo Gloria
>>>>
>>>> Jon Erik Ween, MD, MS
>>>> Scientist, Kunin-Lunenfeld Applied Research Unit
>>>> Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
>>>> Assistant Professor, Dept. of Medicine, Div. of Neurology
>>>> University of Toronto Faculty of Medicine
>>>>
>>>> Kimel Family Building, 6th Floor, Room 644
>>>> Baycrest Centre
>>>> 3560 Bathurst Street
>>>> Toronto, Ontario M6A 2E1
>>>> Canada
>>>>
>>>> Phone: 416-785-2500 x3648
>>>> Fax: 416-785-2484
>>>> Email: jween at klaru-baycrest.on.ca
>>>>
>>>>
>>>> Confidential: This communication and any attachment(s) may
>> contain confidential or privileged information and is
>> intended solely for the address(es) or the entity
>> representing the recipient(s). If you have received this
>> information in error, you are hereby advised to destroy the
>> document and any attachment(s), make no copies of same and
>> inform the sender immediately of the error. Any unauthorized
>> use or disclosure of this information is strictly prohibited.
>>>>
>>>>
>>>>
>>>> On 2010-12-09, at 10:33 AM, David Winsemius wrote:
>>>>
>>>>>
>>>>> On Dec 9, 2010, at 9:34 AM, Jon Erik Ween wrote:
>>>>>
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>> This is (hopefully) a bit more cogent phrasing of a
>> previous post. I'm
>>>>>> trying to compute a z-score to rows in a large dataframe
>> based on values in
>>>>>> another dataframe. Here's the script (that does not
>> work). 2 questons,
>>>>>>
>>>>>> 1) Anyone know of a more elegant way to calculate the
>> "rounded" age value
>>>>>> than the nested ifelse's I've used?
>>>>>>
>>>>>> 2) how to reference the lookup table based on computed indices?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>> # Define tables
>>>>>> DSTzlook <-
>>>>>>
>> read.table("/Users/jween/Documents/ResearchProjects/ABC/data/D
>> STz.txt",
>>>>>> header=TRUE, sep="\t", na.strings="NA", dec=".",
>> strip.white=TRUE)
>>>>>> df<-stroke
>>>>>>
>>>>>> # Compute rounded age.
>>>>>> df$Agetmp
>>>>>>
>> <-ifelse(df$Age>=89,89,ifelse(df$Age>=84,84,ifelse(df$Age>=79,
> 79,ifelse(df$Age>=74,74,ifelse(df$Age>=69,69,ifelse(df$Age>=>
> 64,64,ifelse(df$Age>=54,54,ifelse(df$Age>=44,44,ifelse(df$Age>
>> =34,34,ifelse(df$Age>=29,29,ifelse(df$Age>=24,24,ifelse(df$Age
>> =19,19,17))))))))))))
>>>>>
>>>>> Ew, painful. If you want categorized ages (since what the
>> above coding is producing is not "rounded" in any sense of
>> that word as I understand it, then why not findInterval() as
>> an index into the ages you wnat to label these case with?
>>>>>
>>>>> df$Agetmp <- c(17,19,24,29,34,44,54,64,69,74,79,84)[ #
>> note Extract operation
>>>>> findInterval(runif(100,0,100),
>> c(17,19,24,29,34,44,54,64,69,74,79,84,110) )
>>>>> ] # close extraction
>>>>>
>>>>>
>>>>> The other option, of course, and a more "honest" one in
>> this instance would be
>>>>>
>>>>> cut(vec, breaks=c(...), labels=c(...) )
>>>>>
>>>>> (It's not clear why you are not picking midpoint ages
>> within those brackets to me.)
>>>>>
>>>>>>
>>>>>> # Reference the lookup table based on computed indices
>>>>>> df$DSTz
>>>>>>
>> <-DSTzlook[which(DSTzlook[,1]==df$Agetmp),which(DSTzlook[1,]==
> df$DSF+df$DSB)]
>>>>>
>>>>> I have not been able to figure out what you are trying to
>> do here. Trying to use a 2d lookup looks promising a a way to
>> emulate what an Excel user might attempt, but an example (as
>> requested in the message at the bottom of every posting)
>> would really be of great help in making this more concrete
>> for those of us with insufficient abstractive abilities.
>>>>>
>>>>> --
>>>>> David.
>>>>>
>>>>>>
>>>>>> # Cleanup
>>>>>> #rm(df)
>>>>>> #df$Agetmp<-NULL
>>>>>> --
>>>>>> View this message in context:
>> http://r.789695.n4.nabble.com/set-dataframe-field-value-from-l
> ookup-table-tp3080245p3080245.html
>>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>>
>>>>>
>>>>>
>>>>> David Winsemius, MD
>>>>> West Hartford, CT
>>>>>
>>>>
>>>
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
More information about the R-help
mailing list