[R] Using something like the "by" command, but on rows instead of columns
David Freedman
3.14david at gmail.com
Mon Nov 9 22:52:57 CET 2009
Some variation of the following might be want you want:
df=data.frame(sex=sample(1:2,100,replace=T),snp.1=rnorm(100),snp.15=runif(100))
df$snp.1[df$snp.1>1.0]<-NA; #put some missing values into the data
x=grep('^snp',names(df)); x #which columns that begin with 'snp'
apply(df[,x],2,summary)
#or
apply(df[,x],2,FUN=function(x)mean(x,na=T))
hth,
david
Josh B-3 wrote:
>
> Hello R Forum users,
>
> I was hoping someone could help me with the following problem. Consider
> the following "toy" dataset:
>
> Accession SNP_CRY2 SNP_FLC Phenotype
> 1 NA A 0.783143079
> 2 BQ A 0.881714811
> 3 BQ A 0.886619488
> 4 AQ B 0.416893034
> 5 AQ B 0.621392903
> 6 AS B 0.031719125
> 7 AS NA 0.652375037
>
> "Accession" = individual plants, arbitrarily identified by unique numbers
> "SNP_" = individual genes.
> "SNP_CRY2" = the CRY2 gene. The plants either have the BQ, AQ, or AS
> genotype at the CRY2 gene. "NA" = missing data.
> "SNP_FLC" = the FLC gene. The plants either have the A or B genotype at
> the FLC gene. "NA" = missing data.
> "Phenotype" = a continuous variable of interest.
>
> I have a much larger number of columns corresponding to genes (i.e., more
> columns with the "SNP_" prefix) in my real dataset. For each gene in turn
> (i.e., each "SNP_" column), I would like to find the phenotypic variance
> for all of the plants with non-missing data. Note that the plants with
> missing genotype data ("NA") differ for each gene (each "SNP_" column).
>
> Would one of you be able to offer some specific code that could do this
> operation? Please rest assured that I am not a student trying to elicit
> help with a homework assignment. I am a post-doc with limited R skills,
> working with a large genetic dataset.
>
> Thanks very much in advance to a wonderful online community.
> Sincerely,
> Josh
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://old.nabble.com/Using-something-like-the-%22by%22-command%2C-but-on-rows-instead-of-columns-tp26273840p26274373.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list