[R] Problems with data structure when using plsr() from package pls
Bjørn-Helge Mevik
b.h.mevik at usit.uio.no
Mon Jan 18 10:26:10 CET 2016
S Ellison <S.Ellison at lgcgroup.com> writes:
> Reading ?plsr examples and inspecting the data they use, you need to arrange
> frame1 so that it has the data from n96 included as columns with names of the
> from "n96.xxx" whre xxx can be numbers, names etc.
No, you do not. :) plsr() is happy with a data frame where n96 is a
single variable consisting of a matrix. And this is the recommended way
for matrices with a lot of coloumns. Which is what you get with
frame1 <- data.frame(gushVM, n96 = I(n96))
if n96 is a matrix, or
frame1 <- data.frame(gushVM, n96 = I(as.matrix(n96)))
if it is a data.frame.
> If n96 is a data frame, try something like
> names(n96) <- paste("n96", 1:96)
> frame1 <- cbind(gushVM, n96)
>
> pls1 <- plsr(gushVM ~ n96, data = frame1)
Have you actually tried this? It doesn't work: For instance:
> gushVM <- 1:5
> n96 <- data.frame(a=1:5, b=2:6)
> names(n96) <- paste("n96", 1:2)
> n96
n96 1 n96 2
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
> frame1 <- cbind(gushVM, n96)
> frame1
gushVM n96 1 n96 2
1 1 1 2
2 2 2 3
3 3 3 4
4 4 4 5
5 5 5 6
> dim(frame1)
[1] 5 3
> pls1 <- plsr(gushVM ~ n96, data = frame1)
Error in model.frame.default(formula = gushVM ~ n96, data = frame1) :
invalid type (list) for variable 'n96'
The reason is that frame1 does _not_ contain a variable called 'n96', so
plsr() (or actually model.frame.default()) searches in the global work
space, where it finds a _data.frame_ n96. A data.frame is a list.
Hence the error message.
> If n96 is a matrix,
>
> frame1 <- data.frame(gushVM, n96=n96)
>
> should also give you a data frame with names of the right format.
It does not:
> n96 <- as.matrix(n96)
> frame1 <- data.frame(gushVM, n96=n96)
> frame1
gushVM n96.n96.1 n96.n96.2
1 1 1 2
2 2 2 3
3 3 3 4
4 4 4 5
5 5 5 6
> dim(frame1)
[1] 5 3
> names(frame1)
[1] "gushVM" "n96.n96.1" "n96.n96.2"
So the data frame still does not have any variable named 'n96'. The
only reason
> pls1 <- plsr(gushVM ~ n96, data = frame1)
seems to work, is that the 'n96' variable it now finds in the global
environment, happens to be a matrix
> class(n96)
[1] "matrix"
If that wasn't there, you would get an error:
> rm(n96)
> pls1 <- plsr(gushVM ~ n96, data = frame1)
Error in eval(expr, envir, enclos) : object 'n96' not found
> I() wrapped round a matrix or data frame does nothing like what is needed if
> you include it in a data frame construction, so either things have changed
> since the tutorial was written, or the authors were not handling a matrix or
> data frame with I().
Yes it does. :) Nothing (substantial) has changed, and we did/do handle
matrices with I():
> n96 <- matrix(1:10, ncol=2)
> n96
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> frame1 <- data.frame(gushVM, I(n96))
> frame1
gushVM n96.1 n96.2
1 1 1 6
2 2 2 7
3 3 3 8
4 4 4 9
5 5 5 10
> dim(frame1)
[1] 5 2
> names(frame1)
[1] "gushVM" "n96"
> rm(n96)
> pls1 <- plsr(gushVM ~ n96, data = frame1)
> pls1
Partial least squares regression , fitted with the kernel algorithm.
Call:
plsr(formula = gushVM ~ n96, data = frame1)
--
Regards,
Bjørn-Helge Mevik
More information about the R-help
mailing list