[R] Survuval Anaysis
Haddison Mureithi
mure|th|h@dd|@on @end|ng |rom gm@||@com
Thu May 2 14:37:26 CEST 2019
Hello guys this problem was never answered and I happened to come across
the same problem , kindly help. This is a simple R program that I have been
trying to run. I keep running into the "singular matrix" error. I end up
with no sensible results. Can anyone suggest any changes or a way around
this?
I am a total rookie when working with R.
Thanks,
Haddison
> library(survival)
Loading required package: splines
> args(coxph)
function (formula, data, weights, subset, na.action, init, control,
method = c("efron", "breslow", "exact"), singular.ok = TRUE,
robust = FALSE, model = FALSE, x = FALSE, y = TRUE, tt, ...)
NULL
> test1<-read.table("S:/FISHDO/03_Phase_I_Field_Work/Data_6_28_2011/Working
Folder/R_files/4SondesJuly24.csv", header=T, sep=",")
> sondes<-coxph(Surv(Start, Stop, Depart)~DOLoomis + DOI55 + DODamen,
data=test1)
Warning messages:
1: In fitter(X, Y, strats, offset, init, control, weights = weights, :
Loglik converged before variable 1,2 ; beta may be infinite.
2: In coxph(Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 + DODamen, :
X matrix deemed to be singular; variable 3
> summary(sondes)
Call:
coxph(formula = Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 +
DODamen, data = test1)
n= 1737, number of events= 58
(1 observation deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
DOLoomis -2.152e+00 1.163e-01 1.161e+05 0 1
DOI55 4.560e-01 1.578e+00 3.755e+04 0 1
DODamen NA NA 0.000e+00 NA NA
exp(coef) exp(-coef) lower .95 upper .95
DOLoomis 0.1163 8.5995 0 Inf
DOI55 1.5777 0.6338 0 Inf
DODamen NA NA NA NA
Concordance= 0.5 (se = 0 )
Rsquare= 0 (max possible= 0.01 )
Likelihood ratio test= 0 on 2 df, p=1
Wald test = 0 on 2 df, p=1
Score (logrank) test = 0 on 2 df, p=1
On Wed, 1 May 2019, 1:00 pm , <r-help-request using r-project.org> wrote:
> Send R-help mailing list submissions to
> r-help using r-project.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://stat.ethz.ch/mailman/listinfo/r-help
> or, via email, send a message with subject or body 'help' to
> r-help-request using r-project.org
>
> You can reach the person managing the list at
> r-help-owner using r-project.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-help digest..."
>
>
> Today's Topics:
>
> 1. Re: Bug in R 3.6.0? (Martin Maechler)
> 2. Re: Bug in R 3.6.0? (ocjt using free.fr)
> 3. Time series (trend over time) for irregular sampling dates
> and multiple sites (=?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?=)
> 4. Re: Time series (trend over time) for irregular sampling
> dates and multiple sites (Bert Gunter)
> 5. Passing formula as parameter to `lm` within `sapply` causes
> error [BUG?] (Jens Heumann)
> 6. (no subject) (Haddison Mureithi)
> 7. Help with loop for column means into new column by a subset
> Factor w/131 levels (Bill Poling)
> 8. Re: Help with loop for column means into new column by a
> subset Factor w/131 levels (Bill Poling)
> 9. transpose and split dataframe (Matthew)
> 10. Re: transpose and split dataframe (David L Carlson)
> 11. Re: Passing formula as parameter to `lm` within `sapply`
> causes error [BUG?] (David Winsemius)
> 12. Fwd: Re: transpose and split dataframe (Matthew)
> 13. Re: transpose and split dataframe (Jim Lemon)
> 14. Re: Time series (trend over time) for irregular sampling
> dates and multiple sites (Abs Spurdle)
> 15. Re: Fwd: Re: transpose and split dataframe (David L Carlson)
> 16. Re: Passing formula as parameter to `lm` within `sapply`
> causes error [BUG?] (Duncan Murdoch)
> 17. Re: Time series (trend over time) for irregular sampling
> dates and multiple sites (Abs Spurdle)
> 18. Re: Time series (trend over time) for irregular sampling
> dates and multiple sites (Abs Spurdle)
> 19. Re: Passing formula as parameter to `lm` within `sapply`
> causes error [BUG?] (Jens Heumann)
> 20. Re: Passing formula as parameter to `lm` within `sapply`
> causes error [BUG?] (peter dalgaard)
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 30 Apr 2019 16:54:10 +0200
> From: Martin Maechler <maechler using stat.math.ethz.ch>
> To: Morgan Morgan <morgan.emailbox using gmail.com>
> Cc: <r-help using r-project.org>
> Subject: Re: [R] Bug in R 3.6.0?
> Message-ID: <23752.24978.45927.96764 using stat.math.ethz.ch>
> Content-Type: text/plain; charset="utf-8"
>
> >>>>> Morgan Morgan
> >>>>> on Mon, 29 Apr 2019 21:42:36 +0100 writes:
>
> > Hi,
> > I am using the R 3.6.0 on windows. The issue that I report below
> does not
> > exist with previous version of R.
> > In order to reproduce the error you must install a package of your
> choice
> > from source (tar.gz).
>
> > -Create a .Rprofile file with the following command in it :
> setwd("D:/")
> > -Close your R session and re-open it. Your working directory must be
> now set
> > to D:
> > -Install a package of your choice from source, example :
> > install.packages("data.table",type="source")
>
> > In my case the package fail to install and I get the following error
> > message:
>
> > ** R
> > ** inst
> > ** byte-compile and prepare package for lazy loading
> > Error in tools:::.read_description(file) :
> > file 'DESCRIPTION' does not exist
> > Calls: suppressPackageStartupMessages ... withCallingHandlers ->
> > .getRequiredPackages -> <Anonymous> -> <Anonymous>
> > Execution halted
> > ERROR: lazy loading failed for package 'data.table'
> > * removing 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
> > * restoring previous
> > 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
> > Warning in install.packages :
> > installation of package ‘data.table’ had non-zero exit status
>
> > Now remove the .Rprofile file, restart your R session and try to
> install th
> e
> > package with the same command.
> > In that case everything should be installed just fine.
>
> > FYI the issue happens on macOS as well and I suspect it also does on
> all
> > linux systems.
>
> > My question: Is this expected or is it a bug?
>
> It is a bug, thank you very much for reporting it.
>
> I've been told privately by Ömer An (thank you!) who's been
> affected as well, that this problem seems to affect others, and
> that there's a thread about this over at the Rstudio support site
>
>
> https://support.rstudio.com/hc/en-us/community/posts/200704708-Build-tool-does-not-recognize-DESCRIPTION-file
>
> There, users mention that (all?) packages are affected which
> have a multiline 'Description:' field in their DESCRIPTION file.
> Of course, many if not most packages have this property.
>
> Indeed, I can reproduce the problem (e.g. with my 'sfsmisc'
> package) if I ("silly enough to") add a setwd() call to my
> Rprofile file (the one I set via env.var R_PROFILE or R_PROFILE_USER).
>
> This is clearly a bug, and indeed a bad one.
>
> It seems all R core (and other R expert users who have tried R
> 3.6.0 alpha, beta, and RC versions) have *not* seen the bug as they
> are intuitively smart not to mess with R's working directory in
> a global R profile file ...
>
> For now you definitively have to work around by not doing what's
> the problem : do *NOT* setwd() in your ~/.Rprofile or other
> such R init files.
>
> Best,
> Martin Maechler
> ETH Zurich and R Core Team
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 30 Apr 2019 16:15:46 +0200
> From: <ocjt using free.fr>
> To: "'Morgan Morgan'" <morgan.emailbox using gmail.com>,
> <r-help using r-project.org>
> Subject: Re: [R] Bug in R 3.6.0?
> Message-ID: <002d01d4ff5f$34816be0$9d8443a0$@free.fr>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> I have exactly the same problem when I install one of my own packages:
>
> Error in tools:::.read_description(file) :
> file 'DESCRIPTION' does not exist
> Calls: suppressPackageStartupMessages ... withCallingHandlers ->
> .getRequiredPackages -> <Anonymous> -> <Anonymous>
> Exécution arrêtée
> ERROR: lazy loading failed for package 'RRegArch'
>
> Best,
> Ollivier
>
>
> -----Message d'origine-----
> De : R-help <r-help-bounces using r-project.org> De la part de Morgan Morgan
> Envoyé : lundi 29 avril 2019 22:43
> À : r-help using r-project.org
> Objet : [R] Bug in R 3.6.0?
>
> Hi,
>
> I am using the R 3.6.0 on windows. The issue that I report below does not
> exist with previous version of R.
> In order to reproduce the error you must install a package of your choice
> from source (tar.gz).
>
> -Create a .Rprofile file with the following command in it : setwd("D:/")
> -Close your R session and re-open it. Your working directory must be now
> set to D:
> -Install a package of your choice from source, example :
> install.packages("data.table",type="source")
>
> In my case the package fail to install and I get the following error
> message:
>
> ** R
> ** inst
> ** byte-compile and prepare package for lazy loading Error in
> tools:::.read_description(file) :
> file 'DESCRIPTION' does not exist
> Calls: suppressPackageStartupMessages ... withCallingHandlers ->
> .getRequiredPackages -> <Anonymous> -> <Anonymous> Execution halted
> ERROR: lazy loading failed for package 'data.table'
> * removing 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
> * restoring previous
> 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
> Warning in install.packages :
> installation of package ‘data.table’ had non-zero exit status
>
> Now remove the .Rprofile file, restart your R session and try to install
> the package with the same command.
> In that case everything should be installed just fine.
>
> FYI the issue happens on macOS as well and I suspect it also does on all
> linux systems.
>
> My question: Is this expected or is it a bug?
>
> Thank you
> Best regards,
> Morgan
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 1 May 2019 00:57:43 +1000
> From: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <catarinasg using gmail.com>
> To: r-help using r-project.org
> Subject: [R] Time series (trend over time) for irregular sampling
> dates and multiple sites
> Message-ID:
> <
> CAOQWJbvY+JKy80sksmfC8tu-C+5qq-tzwAd21XbyGvJAyYjQPQ using mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I have a dataset of marine debris items (number of items standardized per
> effort: Items/(number of volunteers*Hours*Lenght)) taken from 2 main
> locations (WA and Queensland) in Australia (8 Sub Sites in total: 4 in WA
> and 4 in Queensland) at irregular sampling intervals over a period 15
> years.
>
> I want to test if there is a change over the years on the amount of debris
> in these locations and more specifically a change after the implementation
> of a mitigation strategy (in 2013).
> Here’s the head of the data:[image: enter image description here]
> <https://i.stack.imgur.com/VNIpb.png>Description of each one of the
> varables in the dataframe:
>
> *eventid *= each sampling (clean-up) event Location = Queensland and New
> South Wales Sites = all the 9 sampling beaches
>
> *Date *= specific dates for the clean-up events (day-month-year)
>
> *Date1 *= specific dates for the clean-up events (day-month-year) on the
> POSICXT format Year= Year of sampling event (2004 to 2018)
>
> *Month*= Month of the sampling event (jan to dec)
>
> *nMonth*= a number was determined to the respective month of the sampling
> event (1 to 12)
>
> *Day*= Day of sampling (1 to 31) Days = Days since the first date of clean
> up = just another way of using the dates
>
> *MARPOL *= before and after implementation (factor with 2 levels)
>
> *DaysC *= days between sampling events for the same sites = number of days
> since the previous clean-up event
>
> *DaysI *= Days since intervention, all the dates before implementation are
> zero, and after we count the number of days since the implementation date
> (1 jan 2013)
>
> *DaysIa*= same as DayI but instead of zero for before the intervention we
> have negative values (days)
>
> *Items *= number of fishing and shipping items counted in each clean-up
> event
>
> *Hours *= hours spent by all volunteers together at each clean up event
>
> *Lenght *= Lenght of beach sampled by all volunteers together at each clean
> up event volunteers = all volunteers at each clean up event
>
> *HoursVolunteer *= hours spent bt each volunteer at each clean up event
> (Hours/volunteers)
>
> *Ieffort *= the items standarized by the effort (hours, volunteers and
> lenght)
>
> *GrossWeight & **GrossTotal are not relevant *
> ------------------------------
> Problems:
>
> My data has a few problems: (1) I think I will need to fix the effects of
> seasonal variation (Monthly) and (2) of possible spatial correlation
> (probability of finding an item is higher after finding one since they can
> come from the same ship). (3) How do I handle the fact that the
> measurements were not taken at a regular interval?
>
> I was trying to use GAMs to analyse the data and see the trends over time.
> The model I came across is the following:
>
> m4<- gamm(Ieffort ~ s(DaysIa)+MARPOL+ s(nMonth, bs = "ps", k = 12),
> random=list(Site=~1,Location=~1),data = d)
>
> *thank you in advance.*
> -
> *Catarina Serra Gonçalves *
> PhD candidate
>
> Adrift Lab <https://adriftlab.org>
> University of Tasmania <http://www.utas.edu.au/> | Institute for Marine
> and
> Antarctic Studies <http://www.imas.utas.edu.au/>
> Launceston, TAS | Australia
>
> Personal website <https://catarinasg.wixsite.com/acserra>
> <https://catarinasg.wixsite.com/acserra>| E-mail <acserra using utas.edu.au> |
> Twitter <https://twitter.com/CatarinaSerraG>
> Research Gate
> <https://www.researchgate.net/profile/Catarina_Serra_Goncalves> | Google
> Scholar <https://scholar.google.pt/citations?user=8nBrRFwAAAAJ&hl=en>
>
> [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 30 Apr 2019 08:28:37 -0700
> From: Bert Gunter <bgunter.4567 using gmail.com>
> To: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <catarinasg using gmail.com>
> Cc: R-help <r-help using r-project.org>
> Subject: Re: [R] Time series (trend over time) for irregular sampling
> dates and multiple sites
> Message-ID:
> <CAGxFJbT2YSB1xcs0MajpeqtHbbn4T1ycYoSOBEFvMucFme1t=
> g using mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I have 0 expertise, but I suggest that you check out the SPatioTemporal
> taskview on CRAN (or possibly others, like environmetrics). You might also
> want to move this to the R-Sig-geo list,where you probably are more likely
> to find relevant expertise.
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 30, 2019 at 8:13 AM Catarina Serra Gonçalves <
> catarinasg using gmail.com> wrote:
>
> > I have a dataset of marine debris items (number of items standardized per
> > effort: Items/(number of volunteers*Hours*Lenght)) taken from 2 main
> > locations (WA and Queensland) in Australia (8 Sub Sites in total: 4 in WA
> > and 4 in Queensland) at irregular sampling intervals over a period 15
> > years.
> >
> > I want to test if there is a change over the years on the amount of
> debris
> > in these locations and more specifically a change after the
> implementation
> > of a mitigation strategy (in 2013).
> > Here’s the head of the data:[image: enter image description here]
> > <https://i.stack.imgur.com/VNIpb.png>Description of each one of the
> > varables in the dataframe:
> >
> > *eventid *= each sampling (clean-up) event Location = Queensland and New
> > South Wales Sites = all the 9 sampling beaches
> >
> > *Date *= specific dates for the clean-up events (day-month-year)
> >
> > *Date1 *= specific dates for the clean-up events (day-month-year) on the
> > POSICXT format Year= Year of sampling event (2004 to 2018)
> >
> > *Month*= Month of the sampling event (jan to dec)
> >
> > *nMonth*= a number was determined to the respective month of the sampling
> > event (1 to 12)
> >
> > *Day*= Day of sampling (1 to 31) Days = Days since the first date of
> clean
> > up = just another way of using the dates
> >
> > *MARPOL *= before and after implementation (factor with 2 levels)
> >
> > *DaysC *= days between sampling events for the same sites = number of
> days
> > since the previous clean-up event
> >
> > *DaysI *= Days since intervention, all the dates before implementation
> are
> > zero, and after we count the number of days since the implementation date
> > (1 jan 2013)
> >
> > *DaysIa*= same as DayI but instead of zero for before the intervention we
> > have negative values (days)
> >
> > *Items *= number of fishing and shipping items counted in each clean-up
> > event
> >
> > *Hours *= hours spent by all volunteers together at each clean up event
> >
> > *Lenght *= Lenght of beach sampled by all volunteers together at each
> clean
> > up event volunteers = all volunteers at each clean up event
> >
> > *HoursVolunteer *= hours spent bt each volunteer at each clean up event
> > (Hours/volunteers)
> >
> > *Ieffort *= the items standarized by the effort (hours, volunteers and
> > lenght)
> >
> > *GrossWeight & **GrossTotal are not relevant *
> > ------------------------------
> > Problems:
> >
> > My data has a few problems: (1) I think I will need to fix the effects of
> > seasonal variation (Monthly) and (2) of possible spatial correlation
> > (probability of finding an item is higher after finding one since they
> can
> > come from the same ship). (3) How do I handle the fact that the
> > measurements were not taken at a regular interval?
> >
> > I was trying to use GAMs to analyse the data and see the trends over
> time.
> > The model I came across is the following:
> >
> > m4<- gamm(Ieffort ~ s(DaysIa)+MARPOL+ s(nMonth, bs = "ps", k = 12),
> > random=list(Site=~1,Location=~1),data = d)
> >
> > *thank you in advance.*
> > -
> > *Catarina Serra Gonçalves *
> > PhD candidate
> >
> > Adrift Lab <https://adriftlab.org>
> > University of Tasmania <http://www.utas.edu.au/> | Institute for Marine
> > and
> > Antarctic Studies <http://www.imas.utas.edu.au/>
> > Launceston, TAS | Australia
> >
> > Personal website <https://catarinasg.wixsite.com/acserra>
> > <https://catarinasg.wixsite.com/acserra>| E-mail <acserra using utas.edu.au>
> |
> > Twitter <https://twitter.com/CatarinaSerraG>
> > Research Gate
> > <https://www.researchgate.net/profile/Catarina_Serra_Goncalves> | Google
> > Scholar <https://scholar.google.pt/citations?user=8nBrRFwAAAAJ&hl=en>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 30 Apr 2019 17:24:33 +0200
> From: Jens Heumann <jens.heumann using students.unibe.ch>
> To: <r-help using r-project.org>
> Subject: [R] Passing formula as parameter to `lm` within `sapply`
> causes error [BUG?]
> Message-ID: <75abba2b-c528-460e-df92-08f8479ba399 using students.unibe.ch>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> Hi,
>
> `lm` won't take formula as a parameter when it is within a `sapply`; see
> example below. Please, could anyone either point me to a syntax error or
> confirm that this might be a bug?
>
> Best,
> Jens
>
> [Disclaimer: This is my first post here, following advice of how to
> proceed with possible bugs from here: https://www.r-project.org/bugs.html]
>
>
> SUMMARY
>
> While `lm` alone accepts formula parameter `FO` well, the same within a
> `sapply` causes an error. When putting everything as parameter but
> formula `FO`, it's still working, though. All parameters work fine
> within a similar `for` loop.
>
>
> MCVE (see data / R-version at bottom)
>
> > summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ]
> Estimate Std. Error t value Pr(>|t|)
> 1.6269038 0.9042738 1.7991275 0.3229600
> > summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ]
> Estimate Std. Error t value Pr(>|t|)
> 1.6269038 0.9042738 1.7991275 0.3229600
> > sapply(unique(df1$z), function(s)
> + summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ])
> [,1] [,2] [,3]
> Estimate 1.6269038 -0.1404174 -0.010338774
> Std. Error 0.9042738 0.4577001 1.858138516
> t value 1.7991275 -0.3067890 -0.005564049
> Pr(>|t|) 0.3229600 0.8104951 0.996457853
> > sapply(unique(data[[st]]), function(s)
> + summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]) # !!!
> Error in eval(substitute(subset), data, env) : object 's' not found
> > sapply(unique(data[[st]]), function(s)
> + summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ])
> [,1] [,2] [,3]
> Estimate 1.6269038 -0.1404174 -0.010338774
> Std. Error 0.9042738 0.4577001 1.858138516
> t value 1.7991275 -0.3067890 -0.005564049
> Pr(>|t|) 0.3229600 0.8104951 0.996457853
> > m <- matrix(NA, 4, length(unique(data[[st]])))
> > for (s in unique(data[[st]])) {
> + m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]
> + }
> > m
> [,1] [,2] [,3]
> [1,] 1.6269038 -0.1404174 -0.010338774
> [2,] 0.9042738 0.4577001 1.858138516
> [3,] 1.7991275 -0.3067890 -0.005564049
> [4,] 0.3229600 0.8104951 0.996457853
>
> # DATA #################################################################
>
> df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089,
> 0.363128411337339,
> 0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894,
> -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425,
> 0.740171482827397, 2.64977380403845, -0.755998096151299, 0.125479556323628,
> -0.239445852485142, 2.14747239550901, -0.37891195982917, -0.638031707027734
> ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8,
> 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA,
> -9L))
>
> FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1
>
> ########################################################################
>
> > R.version
> _
> platform x86_64-w64-mingw32
> arch x86_64
> os mingw32
> system x86_64, mingw32
> status
> major 3
> minor 6.0
> year 2019
> month 04
> day 26
> svn rev 76424
> language R
> version.string R version 3.6.0 (2019-04-26)
> nickname Planting of a Tree
>
> #########################################################################
>
> NOTE: Question on SO two days ago
> (
> https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation)
>
> brought many views but neither answer nor bug confirmation.
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 29 Apr 2019 21:38:00 +0300
> From: Haddison Mureithi <mureithihaddison using gmail.com>
> To: r-help using r-project.org
> Subject: [R] (no subject)
> Message-ID:
> <CABVwvn6y_M2M1o41HryKYp=
> LQcbsajdtginyw_RPVf81o4BmqQ using mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello guys this problem was never answered and I happened to come across
> the same problem , kindly help. This is a simple R program that I have been
> trying to run. I keep running into the "singular matrix" error. I end up
> with no sensible results. Can anyone suggest any changes or a way around
> this?
>
> I am a total rookie when working with R.
>
> Thanks,
> Rasika
>
> > library(survival)
> Loading required package: splines
> > args(coxph)
> function (formula, data, weights, subset, na.action, init, control,
> method = c("efron", "breslow", "exact"), singular.ok = TRUE,
> robust = FALSE, model = FALSE, x = FALSE, y = TRUE, tt, ...)
> NULL
> > test1<-read.table("S:/FISHDO/03_Phase_I_Field_Work/Data_6_28_2011/Working
> Folder/R_files/4SondesJuly24.csv", header=T, sep=",")
> > sondes<-coxph(Surv(Start, Stop, Depart)~DOLoomis + DOI55 + DODamen,
> data=test1)
> Warning messages:
> 1: In fitter(X, Y, strats, offset, init, control, weights = weights, :
> Loglik converged before variable 1,2 ; beta may be infinite.
> 2: In coxph(Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 + DODamen, :
> X matrix deemed to be singular; variable 3
> > summary(sondes)
> Call:
> coxph(formula = Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 +
> DODamen, data = test1)
>
> n= 1737, number of events= 58
> (1 observation deleted due to missingness)
>
> coef exp(coef) se(coef) z Pr(>|z|)
> DOLoomis -2.152e+00 1.163e-01 1.161e+05 0 1
> DOI55 4.560e-01 1.578e+00 3.755e+04 0 1
> DODamen NA NA 0.000e+00 NA NA
>
> exp(coef) exp(-coef) lower .95 upper .95
> DOLoomis 0.1163 8.5995 0 Inf
> DOI55 1.5777 0.6338 0 Inf
> DODamen NA NA NA NA
>
> Concordance= 0.5 (se = 0 )
> Rsquare= 0 (max possible= 0.01 )
> Likelihood ratio test= 0 on 2 df, p=1
> Wald test = 0 on 2 df, p=1
> Score (logrank) test = 0 on 2 df, p=1
>
> [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 7
> Date: Tue, 30 Apr 2019 16:50:48 +0000
> From: Bill Poling <Bill.Poling using zelis.com>
> To: "r-help (r-help using r-project.org)" <r-help using r-project.org>
> Subject: [R] Help with loop for column means into new column by a
> subset Factor w/131 levels
> Message-ID:
> <
> BN7PR02MB50737455E93F882B58EAA4F4EA3A0 using BN7PR02MB5073.namprd02.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="windows-1252"
>
> Good afternoon.
>
> #RStudio Version 1.1.456
> sessionInfo()
> #R version 3.5.3 (2019-03-11)
> #Platform: x86_64-w64-mingw32/x64 (64-bit)
> #Running under: Windows >= 8 x64 (build 9200)
>
>
>
> #I have a DF of 8 columns and 14025 rows
>
> str(hcd2tmp2)
>
> # 'data.frame':14025 obs. of 8 variables:
> # $ Submitted_Charge: num 21021 15360 40561 29495 7904 ...
> # $ Allowed_Amt : num 18393 6254 40561 29495 7904 ...
> # $ Submitted_Units : num 60 240 420 45 120 215 215 15 57 2 ...
> # $ Procedure_Code1 : Factor w/ 131 levels "A9606","J0129",..: 43 113 117
> 125 24 85 85 90 86 25 ...
> # $ AllowByLimit : num 4.268 0.949 7.913 6.124 3.524 ...
> # $ UnitsByDose : num 600 240 420 450 120 215 215 750 570 500 ...
> # $ LimitByUnits : num 4310 6591 5126 4816 2243 ...
> # $ HCPCSCodeDose1 : num 10 1 1 10 1 1 1 50 10 250 ...
>
> #I would like to create four additional columns that are the mean of four
> current columns in the DF.
> #Current columns
> #Allowed_Amt
> #LimitByUnits
> #AllowByLimit
> #UnitsByDose
>
> #The goal is to be able to identify rows where (for instance) Allowed_Amt
> is greater than the average (aka outliers).
>
> #The trick Is I want the means of those columns based on a Factor value
> #The Factor is:
> #Procedure_Code1 : Factor w/ 131 levels "A9606","J0129"
>
> #So each of my four new columns will have 131 distinct values based on the
> mean for the specific Procedure_Code1 grouping
>
> #In SQL it would look something like this:
>
> #SELECT *,
> # NewCol1 = mean(Allowed_Amt) OVER (PARTITION BY Procedure_Code1),
> # NewCol2 = mean(LimitByUnits) OVER (PARTITION BY Procedure_Code1),
> # NewCol3 = mean(AllowByLimit) OVER (PARTITION BY Procedure_Code1),
> # NewCol4 = mean(UnitsByDose) OVER (PARTITION BY Procedure_Code1)
> #INTO NewTable
> #FROM Oldtable
>
> #Here are some sample data
>
> head(hcd2tmp2, n=40)
> # Submitted_Charge Allowed_Amt Submitted_Units Procedure_Code1
> AllowByLimit UnitsByDose LimitByUnits HCPCSCodeDose1
> # 1 21020.70 18393.12 60 J1745
> 4.2679810 600 4309.56 10
> # 2 15360.00 6254.40 240 J9299
> 0.9488785 240 6591.36 1
> # 3 40561.32 40561.32 420 J9306
> 7.9133539 420 5125.68 1
> # 4 29495.25 29495.25 45 J9355
> 6.1244417 450 4815.99 10
> # 5 7904.30 7904.30 120 J0897
> 3.5243000 120 2242.80 1
> # 6 15331.95 10614.31 215 J9034
> 2.0586686 215 5155.91 1
> # 7 15331.95 10614.31 215 J9034
> 2.0586686 215 5155.91 1
> # 8 461.90 0.00 15 J9045
> 0.0000000 750 46.38 50
> # 9 27340.96 15092.21 57 J9035
> 3.2600227 570 4629.48 10
> # 10 768.00 576.00 2 J1190
> 1.3617343 500 422.99 250
> # 11 101.00 38.38 5 J2250
> 59.9687500 5 0.64 1
> # 12 17458.40 0.00 200 J9033
> 0.0000000 200 5990.00 1
> # 13 7885.10 7569.70 1 J1745
> 105.3835445 10 71.83 10
> # 14 2015.00 1155.78 4 J2785
> 5.0051100 0 230.92 0
> # 15 443.72 443.72 12 J9045
> 11.9601078 600 37.10 50
> # 16 113750.00 113750.00 600 J2350
> 3.3025003 600 34443.60 1
> # 17 3582.85 3582.85 10 J2469
> 30.5573561 250 117.25 25
> # 18 5152.65 5152.65 50 J2796
> 1.4362988 500 3587.45 10
> # 19 5152.65 5152.65 50 J2796
> 1.4362988 500 3587.45 10
> # 20 39664.09 0.00 74 J9355
> 0.0000000 740 7919.63 10
> # 21 166.71 102.53 9 J9045
> 3.6841538 450 27.83 50
> # 22 13823.61 9676.53 1 J2505
> 2.0785247 6 4655.48 6
> # 23 90954.00 26436.53 360 J1786
> 1.7443775 3600 15155.28 10
> # 24 4800.00 3494.40 800 J3262
> 0.8861838 800 3943.20 1
> # 25 216.00 105.84 4 J0696
> 42.3360000 1000 2.50 250
> # 26 5300.00 4770.00 1 J0178
> 4.9677151 1 960.20 1
> # 27 35203.00 35203.00 200 J9271
> 3.5772498 200 9840.80 1
> # 28 17589.15 17589.15 300 J3380
> 2.9696855 300 5922.90 1
> # 29 18394.64 17842.79 1 J9355
> 166.7238834 10 107.02 10
> # 30 770.00 731.50 10 J2469
> 6.2388060 250 117.25 25
> # 31 461.90 0.00 15 J9045
> 0.0000000 750 46.38 50
> # 32 8160.00 3342.40 80 J1459
> 1.0260818 40000 3257.44 500
> # 33 1653.48 314.16 6 J9305
> 0.7661505 60 410.05 10
> # 34 13036.50 0.00 194 J9034
> 0.0000000 194 4652.31 1
> # 35 10486.87 0.00 156 J9034
> 0.0000000 156 3741.04 1
> # 36 15360.00 6254.40 240 J9299
> 0.9488785 240 6591.36 1
> # 37 1616.83 1616.83 150 J1453
> 5.2528590 150 307.80 1
> # 38 80685.74 34772.43 96 J9035
> 4.4597077 960 7797.02 10
> # 39 85220.58 35925.13 287 J9299
> 4.5577715 287 7882.17 1
> # 40 3860.17 1627.27 13 J9299
> 4.5577963 13 357.03 1
>
>
> #I hope this is enough inforamtion to warrant your support
> #Thank you
> #WHP
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
>
>
>
> ------------------------------
>
> Message: 8
> Date: Tue, 30 Apr 2019 18:45:40 +0000
> From: Bill Poling <Bill.Poling using zelis.com>
> To: "r-help (r-help using r-project.org)" <r-help using r-project.org>
> Subject: Re: [R] Help with loop for column means into new column by a
> subset Factor w/131 levels
> Message-ID:
> <
> BN7PR02MB5073D732498AB265872F5750EA3A0 using BN7PR02MB5073.namprd02.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="windows-1252"
>
> I ran this routine but I was thinking there must be a more elegant way of
> doing this.
>
>
> #
> https://community.rstudio.com/t/how-to-average-mean-variables-in-r-based-on-the-level-of-another-variable-and-save-this-as-a-new-variable/8764/8
>
> hcd2tmp2_summmary <- hcd2tmp2 %>%
> select(.) %>%
> group_by(Procedure_Code1) %>%
> summarize(average = mean(Allowed_Amt))
> # A tibble: 131 x 2
> # Procedure_Code1 average
> # <fct> <dbl>
> # 1 A9606 57785.
> # 2 J0129 5420.
> # 3 J0178 4700.
> # 4 J0180 13392.
> # 5 J0202 56328.
> # 6 J0256 17366.
> # 7 J0257 7563.
> # 8 J0485 2450.
> # 9 J0490 6398.
> # 10 J0585 4492.
> # ... with 121 more rows
>
> hcd2tmp2 <- hcd2tmp %>%
> group_by(Procedure_Code1) %>%
> summarise(Avg_Allowed_Amt = mean(Allowed_Amt))
>
> view(hcd2tmp2)
>
>
> hcd2tmp3 <- hcd2tmp %>%
> group_by(Procedure_Code1) %>%
> summarise(Avg_AllowByLimit = mean(AllowByLimit))
>
> view(hcd2tmp3)
>
>
> hcd2tmp4 <- hcd2tmp %>%
> group_by(Procedure_Code1) %>%
> summarise(Avg_UnitsByDose = mean(UnitsByDose))
>
> view(hcd2tmp4)
>
> hcd2tmp5 <- hcd2tmp %>%
> group_by(Procedure_Code1) %>%
> summarise(Avg_LimitByUnits = mean(LimitByUnits))
>
> view(hcd2tmp5)
>
> #Joins----
>
>
> hcd2tmp <- left_join(hcd2tmp2, hcd2tmp, by =
> c("Procedure_Code1"="Procedure_Code1"))
> hcd2tmp <- left_join(hcd2tmp3, hcd2tmp, by =
> c("Procedure_Code1"="Procedure_Code1"))
> hcd2tmp <- left_join(hcd2tmp4, hcd2tmp, by =
> c("Procedure_Code1"="Procedure_Code1"))
> hcd2tmp <- left_join(hcd2tmp5, hcd2tmp, by =
> c("Procedure_Code1"="Procedure_Code1"))
>
> view(hcd2tmp)
>
> hcd2tmp$Avg_LimitByUnits <- round(hcd2tmp$Avg_LimitByUnits, digits = 2)
> hcd2tmp$Avg_Allowed_Amt <- round(hcd2tmp$Avg_Allowed_Amt, digits = 2)
> hcd2tmp$Avg_AllowByLimit <- round(hcd2tmp$Avg_AllowByLimit, digits = 2)
> hcd2tmp$Avg_UnitsByDose <- round(hcd2tmp$Avg_UnitsByDose, digits = 2)
>
> view(hcd2tmp)
>
> #Over under columns----
> hcd2tmp$AllowByLimitFlag <- hcd2tmp$AllowByLimit > hcd2tmp$Avg_AllowByLimit
> hcd2tmp$LimitByUnitsFlag <- hcd2tmp$LimitByUnits > hcd2tmp$Avg_LimitByUnits
> hcd2tmp$Allowed_AmtFlag <- hcd2tmp$Allowed_Amt > hcd2tmp$Avg_Allowed_Amt
> hcd2tmp$UnitsByDoseFlag <- hcd2tmp$UnitsByDose > hcd2tmp$Avg_UnitsByDose
>
> view(hcd2tmp)
>
>
> -----Original Message-----
> From: Bill Poling
> Sent: Tuesday, April 30, 2019 12:51 PM
> To: r-help (r-help using r-project.org) <r-help using r-project.org>
> Cc: Bill Poling <Bill.Poling using zelis.com>
> Subject: Help with loop for column means into new column by a subset
> Factor w/131 levels
>
> Good afternoon.
>
> #RStudio Version 1.1.456
> sessionInfo()
> #R version 3.5.3 (2019-03-11)
> #Platform: x86_64-w64-mingw32/x64 (64-bit) #Running under: Windows >= 8
> x64 (build 9200)
>
>
>
> #I have a DF of 8 columns and 14025 rows
>
> str(hcd2tmp2)
>
> # 'data.frame':14025 obs. of 8 variables:
> # $ Submitted_Charge: num 21021 15360 40561 29495 7904 ...
> # $ Allowed_Amt : num 18393 6254 40561 29495 7904 ...
> # $ Submitted_Units : num 60 240 420 45 120 215 215 15 57 2 ...
> # $ Procedure_Code1 : Factor w/ 131 levels "A9606","J0129",..: 43 113 117
> 125 24 85 85 90 86 25 ...
> # $ AllowByLimit : num 4.268 0.949 7.913 6.124 3.524 ...
> # $ UnitsByDose : num 600 240 420 450 120 215 215 750 570 500 ...
> # $ LimitByUnits : num 4310 6591 5126 4816 2243 ...
> # $ HCPCSCodeDose1 : num 10 1 1 10 1 1 1 50 10 250 ...
>
> #I would like to create four additional columns that are the mean of four
> current columns in the DF.
> #Current columns
> #Allowed_Amt
> #LimitByUnits
> #AllowByLimit
> #UnitsByDose
>
> #The goal is to be able to identify rows where (for instance) Allowed_Amt
> is greater than the average (aka outliers).
>
> #The trick Is I want the means of those columns based on a Factor value
> #The Factor is:
> #Procedure_Code1 : Factor w/ 131 levels "A9606","J0129"
>
> #So each of my four new columns will have 131 distinct values based on the
> mean for the specific Procedure_Code1 grouping
>
> #In SQL it would look something like this:
>
> #SELECT *,
> # NewCol1 = mean(Allowed_Amt) OVER (PARTITION BY Procedure_Code1),
> # NewCol2 = mean(LimitByUnits) OVER (PARTITION BY Procedure_Code1),
> # NewCol3 = mean(AllowByLimit) OVER (PARTITION BY Procedure_Code1),
> # NewCol4 = mean(UnitsByDose) OVER (PARTITION BY Procedure_Code1)
> #INTO NewTable
> #FROM Oldtable
>
> #Here are some sample data
>
> head(hcd2tmp2, n=40)
> # Submitted_Charge Allowed_Amt Submitted_Units Procedure_Code1
> AllowByLimit UnitsByDose LimitByUnits HCPCSCodeDose1
> # 1 21020.70 18393.12 60 J1745
> 4.2679810 600 4309.56 10
> # 2 15360.00 6254.40 240 J9299
> 0.9488785 240 6591.36 1
> # 3 40561.32 40561.32 420 J9306
> 7.9133539 420 5125.68 1
> # 4 29495.25 29495.25 45 J9355
> 6.1244417 450 4815.99 10
> # 5 7904.30 7904.30 120 J0897
> 3.5243000 120 2242.80 1
> # 6 15331.95 10614.31 215 J9034
> 2.0586686 215 5155.91 1
> # 7 15331.95 10614.31 215 J9034
> 2.0586686 215 5155.91 1
> # 8 461.90 0.00 15 J9045
> 0.0000000 750 46.38 50
> # 9 27340.96 15092.21 57 J9035
> 3.2600227 570 4629.48 10
> # 10 768.00 576.00 2 J1190
> 1.3617343 500 422.99 250
> # 11 101.00 38.38 5 J2250
> 59.9687500 5 0.64 1
> # 12 17458.40 0.00 200 J9033
> 0.0000000 200 5990.00 1
> # 13 7885.10 7569.70 1 J1745
> 105.3835445 10 71.83 10
> # 14 2015.00 1155.78 4 J2785
> 5.0051100 0 230.92 0
> # 15 443.72 443.72 12 J9045
> 11.9601078 600 37.10 50
> # 16 113750.00 113750.00 600 J2350
> 3.3025003 600 34443.60 1
> # 17 3582.85 3582.85 10 J2469
> 30.5573561 250 117.25 25
> # 18 5152.65 5152.65 50 J2796
> 1.4362988 500 3587.45 10
> # 19 5152.65 5152.65 50 J2796
> 1.4362988 500 3587.45 10
> # 20 39664.09 0.00 74 J9355
> 0.0000000 740 7919.63 10
> # 21 166.71 102.53 9 J9045
> 3.6841538 450 27.83 50
> # 22 13823.61 9676.53 1 J2505
> 2.0785247 6 4655.48 6
> # 23 90954.00 26436.53 360 J1786
> 1.7443775 3600 15155.28 10
> # 24 4800.00 3494.40 800 J3262
> 0.8861838 800 3943.20 1
> # 25 216.00 105.84 4 J0696
> 42.3360000 1000 2.50 250
> # 26 5300.00 4770.00 1 J0178
> 4.9677151 1 960.20 1
> # 27 35203.00 35203.00 200 J9271
> 3.5772498 200 9840.80 1
> # 28 17589.15 17589.15 300 J3380
> 2.9696855 300 5922.90 1
> # 29 18394.64 17842.79 1 J9355
> 166.7238834 10 107.02 10
> # 30 770.00 731.50 10 J2469
> 6.2388060 250 117.25 25
> # 31 461.90 0.00 15 J9045
> 0.0000000 750 46.38 50
> # 32 8160.00 3342.40 80 J1459
> 1.0260818 40000 3257.44 500
> # 33 1653.48 314.16 6 J9305
> 0.7661505 60 410.05 10
> # 34 13036.50 0.00 194 J9034
> 0.0000000 194 4652.31 1
> # 35 10486.87 0.00 156 J9034
> 0.0000000 156 3741.04 1
> # 36 15360.00 6254.40 240 J9299
> 0.9488785 240 6591.36 1
> # 37 1616.83 1616.83 150 J1453
> 5.2528590 150 307.80 1
> # 38 80685.74 34772.43 96 J9035
> 4.4597077 960 7797.02 10
> # 39 85220.58 35925.13 287 J9299
> 4.5577715 287 7882.17 1
> # 40 3860.17 1627.27 13 J9299
> 4.5577963 13 357.03 1
>
>
> #I hope this is enough inforamtion to warrant your support
> #Thank you
> #WHP
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
>
>
>
> ------------------------------
>
> Message: 9
> Date: Tue, 30 Apr 2019 15:24:57 -0400
> From: Matthew <mccormack using molbio.mgh.harvard.edu>
> To: "r-help (r-help using r-project.org)" <r-help using r-project.org>
> Subject: [R] transpose and split dataframe
> Message-ID:
> <0d6ac524-4291-ab03-6bcb-592b3996cc74 using molbio.mgh.harvard.edu>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> I have a data frame that is a lot bigger but for simplicity sake we can
> say it looks like this:
>
> Regulator hits
> AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
> AT2G55980 AT2G85403,AT4G89223
>
> In other words:
>
> data.frame : 2 obs. of 2 variables
> $Regulator: Factor w/ 2 levels
> $hits : Factor w/ 6 levels
>
> I want to transpose it so that Regulator is now the column headings
> and each of the AGI numbers now separated by commas is a row. So,
> AT1G69490 is now the header of the first column and AT4G31950 is row 1
> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> column 2 and AT2G85403 is row 1 of column 2, etc.
>
> I have tried playing around with strsplit(TF2list[2:2]) and
> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>
> Matthew
>
>
>
>
> ------------------------------
>
> Message: 10
> Date: Tue, 30 Apr 2019 21:04:50 +0000
> From: David L Carlson <dcarlson using tamu.edu>
> To: "r-help using r-project.org" <r-help using r-project.org>, Matthew
> <mccormack using molbio.mgh.harvard.edu>
> Subject: Re: [R] transpose and split dataframe
> Message-ID: <db8cede89a724defb691cea72a25b092 using tamu.edu>
> Content-Type: text/plain; charset="utf-8"
>
> I neglected to copy this to the list:
>
> I think we need more information. Can you give us the structure of the
> data with str(YourDataFrame). Alternatively you could copy a small piece
> into your email message by copying and pasting the results of the following
> code:
>
> dput(head(YourDataFrame))
>
> The data frame you present could not be a data frame since you say "hits"
> is a factor with a variable number of elements. If each value of "hits" was
> a single character string, it would only have 2 factor levels not 6 and
> your efforts to parse the string would make more sense. Transposing to a
> data frame would only be possible if each column was padded with NAs to
> make them equal in length. Since your example tries use the name TF2list,
> it is possible that you do not have a data frame but a list and you have no
> factor levels, just character vectors.
>
> If you are not familiar with R, it may be helpful to tell us what your
> overall goal is rather than an intermediate step. Very likely R can easily
> handle what you want by doing things a different way.
>
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 2:25 PM
> To: r-help (r-help using r-project.org) <r-help using r-project.org>
> Subject: [R] transpose and split dataframe
>
> I have a data frame that is a lot bigger but for simplicity sake we can
> say it looks like this:
>
> Regulator hits
> AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
> AT2G55980 AT2G85403,AT4G89223
>
> In other words:
>
> data.frame : 2 obs. of 2 variables
> $Regulator: Factor w/ 2 levels
> $hits : Factor w/ 6 levels
>
> I want to transpose it so that Regulator is now the column headings
> and each of the AGI numbers now separated by commas is a row. So,
> AT1G69490 is now the header of the first column and AT4G31950 is row 1
> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> column 2 and AT2G85403 is row 1 of column 2, etc.
>
> I have tried playing around with strsplit(TF2list[2:2]) and
> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>
> Matthew
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
>
> Message: 11
> Date: Tue, 30 Apr 2019 15:03:09 -0600
> From: David Winsemius <dwinsemius using comcast.net>
> To: Jens Heumann <jens.heumann using students.unibe.ch>
> Cc: r-help using r-project.org
> Subject: Re: [R] Passing formula as parameter to `lm` within `sapply`
> causes error [BUG?]
> Message-ID: <924255D4-912E-4C24-8E85-6E313EC50203 using comcast.net>
> Content-Type: text/plain; charset="utf-8"
>
> Try using do.call
>
> —
> David
>
> Sent from my iPhone
>
> > On Apr 30, 2019, at 9:24 AM, Jens Heumann <
> jens.heumann using students.unibe.ch> wrote:
> >
> > Hi,
> >
> > `lm` won't take formula as a parameter when it is within a `sapply`; see
> example below. Please, could anyone either point me to a syntax error or
> confirm that this might be a bug?
> >
> > Best,
> > Jens
> >
> > [Disclaimer: This is my first post here, following advice of how to
> proceed with possible bugs from here: https://www.r-project.org/bugs.html]
> >
> >
> > SUMMARY
> >
> > While `lm` alone accepts formula parameter `FO` well, the same within a
> `sapply` causes an error. When putting everything as parameter but formula
> `FO`, it's still working, though. All parameters work fine within a similar
> `for` loop.
> >
> >
> > MCVE (see data / R-version at bottom)
> >
> > > summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ]
> > Estimate Std. Error t value Pr(>|t|)
> > 1.6269038 0.9042738 1.7991275 0.3229600
> > > summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ]
> > Estimate Std. Error t value Pr(>|t|)
> > 1.6269038 0.9042738 1.7991275 0.3229600
> > > sapply(unique(df1$z), function(s)
> > + summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ])
> > [,1] [,2] [,3]
> > Estimate 1.6269038 -0.1404174 -0.010338774
> > Std. Error 0.9042738 0.4577001 1.858138516
> > t value 1.7991275 -0.3067890 -0.005564049
> > Pr(>|t|) 0.3229600 0.8104951 0.996457853
> > > sapply(unique(data[[st]]), function(s)
> > + summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]) # !!!
> > Error in eval(substitute(subset), data, env) : object 's' not found
> > > sapply(unique(data[[st]]), function(s)
> > + summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ])
> > [,1] [,2] [,3]
> > Estimate 1.6269038 -0.1404174 -0.010338774
> > Std. Error 0.9042738 0.4577001 1.858138516
> > t value 1.7991275 -0.3067890 -0.005564049
> > Pr(>|t|) 0.3229600 0.8104951 0.996457853
> > > m <- matrix(NA, 4, length(unique(data[[st]])))
> > > for (s in unique(data[[st]])) {
> > + m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1,
> ]
> > + }
> > > m
> > [,1] [,2] [,3]
> > [1,] 1.6269038 -0.1404174 -0.010338774
> > [2,] 0.9042738 0.4577001 1.858138516
> > [3,] 1.7991275 -0.3067890 -0.005564049
> > [4,] 0.3229600 0.8104951 0.996457853
> >
> > # DATA #################################################################
> >
> > df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089,
> 0.363128411337339,
> > 0.63286260496104, 0.404268323140999, -0.106124516091484,
> 1.51152199743894,
> > -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425,
> > 0.740171482827397, 2.64977380403845, -0.755998096151299,
> 0.125479556323628,
> > -0.239445852485142, 2.14747239550901, -0.37891195982917,
> -0.638031707027734
> > ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8,
> > 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA,
> > -9L))
> >
> > FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1
> >
> > ########################################################################
> >
> > > R.version
> > _
> > platform x86_64-w64-mingw32
> > arch x86_64
> > os mingw32
> > system x86_64, mingw32
> > status
> > major 3
> > minor 6.0
> > year 2019
> > month 04
> > day 26
> > svn rev 76424
> > language R
> > version.string R version 3.6.0 (2019-04-26)
> > nickname Planting of a Tree
> >
> > #########################################################################
> >
> > NOTE: Question on SO two days ago (
> https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation)
> brought many views but neither answer nor bug confirmation.
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> ------------------------------
>
> Message: 12
> Date: Tue, 30 Apr 2019 17:31:28 -0400
> From: Matthew <mccormack using molbio.mgh.harvard.edu>
> To: "r-help using r-project.org" <r-help using r-project.org>
> Subject: [R] Fwd: Re: transpose and split dataframe
> Message-ID:
> <e4a9e321-b437-eed6-344b-472319e85fec using molbio.mgh.harvard.edu>
> Content-Type: text/plain; charset="utf-8"
>
> Thanks for your reply. I was trying to simplify it a little, but must
> have got it wrong. Here is the real dataframe, TF2list:
>
> str(TF2list)
> 'data.frame': 152 obs. of 2 variables:
> $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54
> 54 82 82 82 82 82 ...
> $ hits : Factor w/ 97 levels
> "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
>
> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>
> And the first few lines resulting from dput(head(TF2list)):
>
> dput(head(TF2list))
> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
> 82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380",
> "AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300",
> "AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600",
> "AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ...
>
> This is another way of looking at the first 4 entries (Regulator is
> tab-separated from hits):
>
> Regulator
> hits
> 1
> AT1G69490
>
> AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
> 2
> AT1G29860
>
> AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
>
> 3
> AT1G2986
>
> AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830
>
> So, the goal would be to
>
> first: Transpose the existing dataframe so that the factor Regulator
> becomes a column name (column 1 name = AT1G69490, column2 name
> AT1G29860, etc.) and the hits associated with each Regulator become
> rows. Hits is a comma separated 'list' ( I do not not know if
> technically it is an R list.), so it would have to be comma
> 'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950,
> col 1 row 2 - AT5G24410, etc); like this :
>
> AT1G69490
> AT4G31950
> AT5G24110
> AT1G05675
> AT5G64905
>
> ... I did not include all the rows)
>
> I think it would be best to actually make the first entry a separate
> dataframe ( 1 column with name = AT1G69490 and number of rows depending
> on the number of hits), then make the second column (column name =
> AT1G29860, and number of rows depending on the number of hits) into a
> new dataframe and do a full join of of the two dataframes; continue by
> making the third column (column name = AT1G2986) into a dataframe and
> full join it with the previous; continue for the 152 observations so
> that then end result is a dataframe with 152 columns and number of rows
> depending on the entry with the greatest number of hits. The full joins
> I can do with dplyr, but getting up to that point seems rather difficult.
>
> This would get me what my ultimate goal would be; each Regulator is a
> column name (152 columns) and a given row has either NA or the same hit.
>
> This seems very difficult to me, but I appreciate any attempt.
>
> Matthew
>
> On 4/30/2019 4:34 PM, David L Carlson wrote:
> > External Email - Use Caution
> >
> > I think we need more information. Can you give us the structure of the
> data with str(YourDataFrame). Alternatively you could copy a small piece
> into your email message by copying and pasting the results of the following
> code:
> >
> > dput(head(YourDataFrame))
> >
> > The data frame you present could not be a data frame since you say
> "hits" is a factor with a variable number of elements. If each value of
> "hits" was a single character string, it would only have 2 factor levels
> not 6 and your efforts to parse the string would make more sense.
> Transposing to a data frame would only be possible if each column was
> padded with NAs to make them equal in length. Since your example tries use
> the name TF2list, it is possible that you do not have a data frame but a
> list and you have no factor levels, just character vectors.
> >
> > If you are not familiar with R, it may be helpful to tell us what your
> overall goal is rather than an intermediate step. Very likely R can easily
> handle what you want by doing things a different way.
> >
> > ----------------------------------------
> > David L Carlson
> > Department of Anthropology
> > Texas A&M University
> > College Station, TX 77843-4352
> >
> >
> >
> > -----Original Message-----
> > From: R-help<r-help-bounces using r-project.org> On Behalf Of Matthew
> > Sent: Tuesday, April 30, 2019 2:25 PM
> > To: r-help (r-help using r-project.org)<r-help using r-project.org>
> > Subject: [R] transpose and split dataframe
> >
> > I have a data frame that is a lot bigger but for simplicity sake we can
> > say it looks like this:
> >
> > Regulator hits
> > AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
> > AT2G55980 AT2G85403,AT4G89223
> >
> > In other words:
> >
> > data.frame : 2 obs. of 2 variables
> > $Regulator: Factor w/ 2 levels
> > $hits : Factor w/ 6 levels
> >
> > I want to transpose it so that Regulator is now the column headings
> > and each of the AGI numbers now separated by commas is a row. So,
> > AT1G69490 is now the header of the first column and AT4G31950 is row 1
> > of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> > column 2 and AT2G85403 is row 1 of column 2, etc.
> >
> > I have tried playing around with strsplit(TF2list[2:2]) and
> > strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
> >
> > Matthew
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://
> www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 13
> Date: Wed, 1 May 2019 07:46:32 +1000
> From: Jim Lemon <drjimlemon using gmail.com>
> To: Matthew <mccormack using molbio.mgh.harvard.edu>
> Cc: "r-help (r-help using r-project.org)" <r-help using r-project.org>
> Subject: Re: [R] transpose and split dataframe
> Message-ID:
> <CA+8X3fUjv3APb=
> UcsNQAD61pmOSbvoYBFsW3caZW7p11eD7umg using mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Matthew,
> Is this what you are trying to do?
>
> mmdf<-read.table(text="Regulator hits
> AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
> AT2G55980 AT2G85403,AT4G89223",header=TRUE,
> stringsAsFactors=FALSE)
> # split the second column at the commas
> hitsplit<-strsplit(mmdf$hits,",")
> # define a function that will fill with NAs
> NAfill<-function(x,n) return(x[1:n])
> # get the maximum length of hits
> maxlen<-max(unlist(lapply(hitsplit,length)))
> # fill the list with NAs
> hitsplit<-lapply(hitsplit,NAfill,maxlen)
> # change the names of the list
> names(hitsplit)<-mmdf$Regulator
> # convert to a data frame
> tmmdf<-as.data.frame(hitsplit)
>
> Jim
>
> On Wed, May 1, 2019 at 5:25 AM Matthew <mccormack using molbio.mgh.harvard.edu>
> wrote:
> >
> > I have a data frame that is a lot bigger but for simplicity sake we can
> > say it looks like this:
> >
> > Regulator hits
> > AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
> > AT2G55980 AT2G85403,AT4G89223
> >
> > In other words:
> >
> > data.frame : 2 obs. of 2 variables
> > $Regulator: Factor w/ 2 levels
> > $hits : Factor w/ 6 levels
> >
> > I want to transpose it so that Regulator is now the column headings
> > and each of the AGI numbers now separated by commas is a row. So,
> > AT1G69490 is now the header of the first column and AT4G31950 is row 1
> > of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> > column 2 and AT2G85403 is row 1 of column 2, etc.
> >
> > I have tried playing around with strsplit(TF2list[2:2]) and
> > strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
> >
> > Matthew
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> ------------------------------
>
> Message: 14
> Date: Wed, 1 May 2019 09:58:34 +1200
> From: Abs Spurdle <spurdle.a using gmail.com>
> To: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <catarinasg using gmail.com>
> Cc: r-help <r-help using r-project.org>
> Subject: Re: [R] Time series (trend over time) for irregular sampling
> dates and multiple sites
> Message-ID:
> <
> CAB8pepxHYbCXQPX5CaUQ868kMAp80z+zSXH7LHak+xDabJOjKg using mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> > My data has a few problems: (1) I think I will need to fix the effects of
> > seasonal variation (Monthly) and (2) of possible spatial correlation
> > (probability of finding an item is higher after finding one since they
> can
> > come from the same ship). (3) How do I handle the fact that the
> > measurements were not taken at a regular interval?
>
> Can I ask two questions:
> (1) Is the data autocorrelated (or "Seasonal") over time?
> If not then this problem is a lot simpler.
> (2) Can you expand on the following statement?
> "possible spatial correlation (probability of finding an item is higher
> after finding one since they can come from the same ship"
>
> [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 15
> Date: Tue, 30 Apr 2019 22:29:24 +0000
> From: David L Carlson <dcarlson using tamu.edu>
> To: Matthew <mccormack using molbio.mgh.harvard.edu>, "r-help using r-project.org"
> <r-help using r-project.org>
> Subject: Re: [R] Fwd: Re: transpose and split dataframe
> Message-ID: <1d59b3c0584a40c1b322b0efd5de7646 using tamu.edu>
> Content-Type: text/plain; charset="utf-8"
>
> If you read the data frame with read.csv() or one of the other read()
> functions, use the asis=TRUE argument to prevent conversion to factors. If
> not do the conversion first:
>
> # Convert factors to characters
> DataMatrix <- sapply(TF2list, as.character)
> # Split the vector of hits
> DataList <- sapply(DataMatrix[, 2], strsplit, split=",")
> # Use the values in Regulator to name the parts of the list
> names(DataList) <- DataMatrix[,"Regulator"]
>
> # Now create a data frame
> # How long is the longest list of hits?
> mx <- max(sapply(DataList, length))
> # Now add NAs to vectors shorter than mx
> DataList2 <- lapply(DataList, function(x) c(x, rep(NA, mx-length(x))))
> # Finally convert back to a data frame
> TF2list2 <- do.call(data.frame, DataList2)
>
> Try this on a portion of the list, say 25 lines and print each object to
> see what is happening.
>
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>
>
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 4:31 PM
> To: r-help using r-project.org
> Subject: [R] Fwd: Re: transpose and split dataframe
>
> Thanks for your reply. I was trying to simplify it a little, but must
> have got it wrong. Here is the real dataframe, TF2list:
>
> str(TF2list)
> 'data.frame': 152 obs. of 2 variables:
> $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54
> 54 82 82 82 82 82 ...
> $ hits : Factor w/ 97 levels
> "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
>
> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>
> And the first few lines resulting from dput(head(TF2list)):
>
> dput(head(TF2list))
> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list