[R] data frame is killing me! help
bbslover
dluthm at yeah.net
Fri Oct 23 18:43:56 CEST 2009
I have read that one ,I want to this method to be used to my data.but I donot
know how to put my data into R.
James W. MacDonald wrote:
>
>
>
> bbslover wrote:
>>
>>
>> Steve Lianoglou-6 wrote:
>>> Hi,
>>>
>>> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>>>
>>>> Usage
>>>> data(gasoline)
>>>> Format
>>>> A data frame with 60 observations on the following 2 variables.
>>>> octane
>>>> a numeric vector. The octane number.
>>>> NIR
>>>> a matrix with 401 columns. The NIR spectrum
>>>>
>>>> and I see the gasoline data to see below
>>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696
>>>> nm
>>>> NIR.1698 nm NIR.1700 nm
>>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913
>>>> 1.221135
>>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985
>>>> 1.198851
>>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321
>>>> 1.208742
>>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655
>>>> 1.206696
>>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864
>>>> 1.202926
>>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763
>>>> 1.207576
>>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.222273
>>>> 1.200446
>>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947
>>>> 1.188174
>>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883
>>>> 1.196102
>>>>
>>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.
>>>> 1694 nm
>>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm
>>>>
>>>> how can I add letters NIR to my variable, because my 600
>>>> independents never
>>>> have NIR as the prefix. however, it is needed to model the plsr. for
>>>> example aa=plsr(y~NIR, data=data ,....), the prefix NIR is
>>>> necessary, how
>>>> can I do with it?
>>> I'm not really sue that I'm getting you, but if your problem is that
>>> the column names of your data.frame don't match the variable names
>>> you'd like to use in your formula, just change the colnames of your
>>> data.frame to match your formula.
>>>
>>> BTW - I have no idea where to get this gasoline data set, so I'm just
>>> imagining:
>>>
>>> eg.
>>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that',
>>> 'you', 'want', 'here')
>>>
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>> | Memorial Sloan-Kettering Cancer Center
>>> | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> thanks for you. but the numbers of indenpendence are so many, it is not
>> easy
>> to identify them one by one, is there some better way?
>
> You don't need to identify anything. What you need to do is read the
> help page for the function you want to use, so you (at the very least)
> know how to use the function.
>
> > library(pls)
> > data(gasoline)
> > fit <- plsr(octane~NIR, data=gasoline, validation = "CV")
> > summary(fit)
> Data: X dimension: 60 401
> Y dimension: 60 1
> Fit method: kernelpls
> Number of components considered: 53
>
> VALIDATION: RMSEP
> Cross-validated using 10 random segments.
> (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
> CV 1.543 1.372 0.3827 0.2522 0.2347 0.2455 0.2281
> adjCV 1.543 1.367 0.3740 0.2497 0.2360 0.2407 0.2243
> 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
> CV 0.2311 0.2352 0.2455 0.2534 0.2737 0.2814 0.2832
> adjCV 0.2257 0.2303 0.2395 0.2473 0.2646 0.2705 0.2726
> 14 comps 15 comps 16 comps 17 comps 18 comps 19 comps 20
> comps
> CV 0.2913 0.2932 0.2985 0.3137 0.3289 0.3323
> 0.3391
> adjCV 0.2808 0.2821 0.2863 0.3008 0.3141 0.3172
> 0.3228
> 21 comps 22 comps 23 comps 24 comps 25 comps 26 comps 27
> comps
> CV 0.3476 0.3384 0.3316 0.3213 0.3155 0.3118
> 0.3062
> adjCV 0.3307 0.3217 0.3154 0.3057 0.3002 0.2964
> 0.2908
> 28 comps 29 comps 30 comps 31 comps 32 comps 33 comps 34
> comps
> CV 0.3033 0.3034 0.3074 0.3083 0.3094 0.3087
> 0.3105
> adjCV 0.2881 0.2881 0.2917 0.2926 0.2936 0.2929
> 0.2946
> 35 comps 36 comps 37 comps 38 comps 39 comps 40 comps 41
> comps
> CV 0.3108 0.3106 0.3105 0.3104 0.3104 0.3105
> 0.3105
> adjCV 0.2949 0.2947 0.2946 0.2945 0.2945 0.2945
> 0.2946
> 42 comps 43 comps 44 comps 45 comps 46 comps 47 comps 48
> comps
> CV 0.3105 0.3105 0.3105 0.3105 0.3105 0.3105
> 0.3105
> adjCV 0.2946 0.2946 0.2946 0.2946 0.2946 0.2946
> 0.2946
> 49 comps 50 comps 51 comps 52 comps 53 comps
> CV 0.3105 0.3105 0.3105 0.3105 0.3105
> adjCV 0.2946 0.2946 0.2946 0.2946 0.2946
>
> TRAINING: % variance explained
> 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
> 8 comps
> X 70.97 78.56 86.15 95.4 96.12 96.97 97.32
> 98.1
> octane 31.90 94.66 97.71 98.0 98.68 98.93 99.06
> 99.1
> 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15
> comps
> X 98.32 98.71 98.84 99.00 99.21 99.46
> 99.52
> octane 99.20 99.24 99.36 99.44 99.49 99.51
> 99.58
> 16 comps 17 comps 18 comps 19 comps 20 comps 21 comps 22
> comps
> X 99.57 99.64 99.68 99.76 99.78 99.82
> 99.84
> octane 99.65 99.69 99.78 99.81 99.86 99.89
> 99.92
> 23 comps 24 comps 25 comps 26 comps 27 comps 28 comps 29
> comps
> X 99.88 99.91 99.92 99.93 99.94 99.95
> 99.96
> octane 99.93 99.94 99.95 99.97 99.98 99.99
> 99.99
> 30 comps 31 comps 32 comps 33 comps 34 comps 35 comps 36
> comps
> X 99.96 99.97 99.97 99.98 99.98 99.98
> 99.98
> octane 99.99 100.00 100.00 100.00 100.00 100.00
> 100.00
> 37 comps 38 comps 39 comps 40 comps 41 comps 42 comps 43
> comps
> X 99.99 99.99 99.99 99.99 100 100
> 100
> octane 100.00 100.00 100.00 100.00 100 100
> 100
> 44 comps 45 comps 46 comps 47 comps 48 comps 49 comps 50
> comps
> X 100 100 100 100 100 100
> 100
> octane 100 100 100 100 100 100
> 100
> 51 comps 52 comps 53 comps
> X 100 100 100
> octane 100 100 100
>
>
>>
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26029667.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list