[R] longitudinal imputation with PAN
Joanne Hosking
joanne.hosking at pms.ac.uk
Mon Sep 24 15:55:20 CEST 2007
Hello all,
I am working on a longitudinal study of children in the UK and trying the PAN package for imputation of missing data, since it fulfils the critical criteria of taking into account individual subject trend over time as well as population trend over time. In order to validate the procedure I have started by deleting some known values …we have 6 annual measures of height on 300 children and I have imputed the missing values using PAN and compared the imputed values to the real values I deleted - in most individuals the imputed values fit the individual trend extremely well! However, when looking at the trend over time for a handful of individuals, the imputed value was actually lower than the previous (real) value of height or higher than the next (real) value making it appear that height went down…which in reality it never does…so my question is why, when it seems to work so well for the majority of individuals, does this happen? Am I doing something wrong?
As a novice user of R (and new to this area of statistics) I wondered if anyone could possibly point me in the right direction, since the mixed effect design (plus potential ease and speed) of the PAN procedure for longitudinal data imputation is very appealing...
I would very much appreciate any advice you could give me, many thanks in advance.
Jo Hosking
Code and a small sample data are shown below (I could supply more data to anyone willing!)...
impht.data <-read.delim ("impht_long_trunc.dat",header = TRUE)
impht.data$sex <-factor(impht.data$sex,label = c("Boys","Girls"))
impht.data$visit <- factor (impht.data$visit)
impht.data$code <- factor (impht.data$code)
y <- impht.data$htmiss
subj <- impht.data$code
pred <- cbind (impht.data$age, impht.data$sex, impht.data$visit)
xcol <- 1:3
zcol <- 1
prior <- list(a=1, Binv=1, c=1, Dinv=1)
ht1 <- pan(y, subj, pred, xcol, zcol, prior, seed=13579, iter=1000)
code sex visit age ht htmiss
1 2 1 4.87 105 105
1 2 2 5.86 109.6
1 2 3 6.88 116.4 116.4
1 2 4 7.72 121.2 121.2
1 2 5 8.72 126.7 126.7
1 2 6 9.71 132.3 132.3
2 2 1 4.84 107.1 107.1
2 2 2 6 115.7 115.7
2 2 3 6.86 121.4 121.4
2 2 4 7.69 126.5 126.5
2 2 5 8.7 134.15 134.15
2 2 6 9.76 140
3 2 1 4.62 103 103
3 2 2 5.69 108.9 108.9
3 2 3 6.87 115.1
3 2 4 7.55 118.6 118.6
3 2 5 8.46 123.6 123.6
3 2 6 9.63 128.9 128.9
More information about the R-help
mailing list