[R] data frame is killing me! help
James W. MacDonald
jmacdon at med.umich.edu
Fri Oct 23 16:00:16 CEST 2009
bbslover wrote:
>
>
> Steve Lianoglou-6 wrote:
>> Hi,
>>
>> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>>
>>> Usage
>>> data(gasoline)
>>> Format
>>> A data frame with 60 observations on the following 2 variables.
>>> octane
>>> a numeric vector. The octane number.
>>> NIR
>>> a matrix with 401 columns. The NIR spectrum
>>>
>>> and I see the gasoline data to see below
>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696
>>> nm
>>> NIR.1698 nm NIR.1700 nm
>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913
>>> 1.221135
>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985
>>> 1.198851
>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321
>>> 1.208742
>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655
>>> 1.206696
>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864
>>> 1.202926
>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763
>>> 1.207576
>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.222273
>>> 1.200446
>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947
>>> 1.188174
>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883
>>> 1.196102
>>>
>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.
>>> 1694 nm
>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm
>>>
>>> how can I add letters NIR to my variable, because my 600
>>> independents never
>>> have NIR as the prefix. however, it is needed to model the plsr. for
>>> example aa=plsr(y~NIR, data=data ,....), the prefix NIR is
>>> necessary, how
>>> can I do with it?
>> I'm not really sue that I'm getting you, but if your problem is that
>> the column names of your data.frame don't match the variable names
>> you'd like to use in your formula, just change the colnames of your
>> data.frame to match your formula.
>>
>> BTW - I have no idea where to get this gasoline data set, so I'm just
>> imagining:
>>
>> eg.
>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that',
>> 'you', 'want', 'here')
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> thanks for you. but the numbers of indenpendence are so many, it is not easy
> to identify them one by one, is there some better way?
You don't need to identify anything. What you need to do is read the
help page for the function you want to use, so you (at the very least)
know how to use the function.
> library(pls)
> data(gasoline)
> fit <- plsr(octane~NIR, data=gasoline, validation = "CV")
> summary(fit)
Data: X dimension: 60 401
Y dimension: 60 1
Fit method: kernelpls
Number of components considered: 53
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
CV 1.543 1.372 0.3827 0.2522 0.2347 0.2455 0.2281
adjCV 1.543 1.367 0.3740 0.2497 0.2360 0.2407 0.2243
7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
CV 0.2311 0.2352 0.2455 0.2534 0.2737 0.2814 0.2832
adjCV 0.2257 0.2303 0.2395 0.2473 0.2646 0.2705 0.2726
14 comps 15 comps 16 comps 17 comps 18 comps 19 comps 20 comps
CV 0.2913 0.2932 0.2985 0.3137 0.3289 0.3323 0.3391
adjCV 0.2808 0.2821 0.2863 0.3008 0.3141 0.3172 0.3228
21 comps 22 comps 23 comps 24 comps 25 comps 26 comps 27 comps
CV 0.3476 0.3384 0.3316 0.3213 0.3155 0.3118 0.3062
adjCV 0.3307 0.3217 0.3154 0.3057 0.3002 0.2964 0.2908
28 comps 29 comps 30 comps 31 comps 32 comps 33 comps 34 comps
CV 0.3033 0.3034 0.3074 0.3083 0.3094 0.3087 0.3105
adjCV 0.2881 0.2881 0.2917 0.2926 0.2936 0.2929 0.2946
35 comps 36 comps 37 comps 38 comps 39 comps 40 comps 41 comps
CV 0.3108 0.3106 0.3105 0.3104 0.3104 0.3105 0.3105
adjCV 0.2949 0.2947 0.2946 0.2945 0.2945 0.2945 0.2946
42 comps 43 comps 44 comps 45 comps 46 comps 47 comps 48 comps
CV 0.3105 0.3105 0.3105 0.3105 0.3105 0.3105 0.3105
adjCV 0.2946 0.2946 0.2946 0.2946 0.2946 0.2946 0.2946
49 comps 50 comps 51 comps 52 comps 53 comps
CV 0.3105 0.3105 0.3105 0.3105 0.3105
adjCV 0.2946 0.2946 0.2946 0.2946 0.2946
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
8 comps
X 70.97 78.56 86.15 95.4 96.12 96.97 97.32
98.1
octane 31.90 94.66 97.71 98.0 98.68 98.93 99.06
99.1
9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
X 98.32 98.71 98.84 99.00 99.21 99.46 99.52
octane 99.20 99.24 99.36 99.44 99.49 99.51 99.58
16 comps 17 comps 18 comps 19 comps 20 comps 21 comps 22
comps
X 99.57 99.64 99.68 99.76 99.78 99.82 99.84
octane 99.65 99.69 99.78 99.81 99.86 99.89 99.92
23 comps 24 comps 25 comps 26 comps 27 comps 28 comps 29
comps
X 99.88 99.91 99.92 99.93 99.94 99.95 99.96
octane 99.93 99.94 99.95 99.97 99.98 99.99 99.99
30 comps 31 comps 32 comps 33 comps 34 comps 35 comps 36
comps
X 99.96 99.97 99.97 99.98 99.98 99.98 99.98
octane 99.99 100.00 100.00 100.00 100.00 100.00 100.00
37 comps 38 comps 39 comps 40 comps 41 comps 42 comps 43
comps
X 99.99 99.99 99.99 99.99 100 100 100
octane 100.00 100.00 100.00 100.00 100 100 100
44 comps 45 comps 46 comps 47 comps 48 comps 49 comps 50
comps
X 100 100 100 100 100 100 100
octane 100 100 100 100 100 100 100
51 comps 52 comps 53 comps
X 100 100 100
octane 100 100 100
>
>
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
More information about the R-help
mailing list