[R] Writing a summary file in R
David Winsemius
dwinsemius at comcast.net
Thu Jul 28 01:19:44 CEST 2011
On Jul 27, 2011, at 7:02 PM, a217 wrote:
> Hello,
>
> I have an input file:
> http://r.789695.n4.nabble.com/file/n3700031/testOut.txt testOut.txt
>
> where col 1 is chromosome, column2 is start of region, column 3 is
> end of
> region, column 4 and 5 is base position, column 6 is total reads,
> column 7
> is methylation data, and column 8 is the strand.
>
>
> I would like a summary output file such as:
> http://r.789695.n4.nabble.com/file/n3700031/out.summary.txt
> out.summary.txt
>
> where column 1 is chromosome, column 2 is start of region, column 3
> is end
> of region, column 4 is total reads in general, column 5 is total
> reads >=1,
> column 6 is (col4/col5) or the percentage, and at the end I'd like
> to list 6
> more columns based on summary results from summary() function in R.
>
> The summary() function will be used to analyze all of the
> methylation data
> (col7 from input) for each region (bounded by col2 and col3).
>
> For example for chr1 100 159 summary() gives:
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 0.0400 0.0425 0.0450 0.0450 0.0475 0.0500
>
> which is simply the methylation data input into summary() only in
> the region
> of chr1 100 159.
>
> I know how to perform all of the required functions line-by-line,
> but the
> hard part for me is essentially taking the input data with multiple
> positions in each region and assigning all of the summary results to
> one
> line identified by the region.
>
> If any of you have any suggestions I would appreciate it.
So essentially you want to drop columns 4:5 and column 8 and calculate
a proportion of counts >= 1 and get summary stats within separate
categories of start-of-region. Is that correct?
This is probably a job for aggregate or for ddply in plyr if I felt
comfortable with it, which I don't in general. Its documentation
through the help pages is s not great IMO but there are those who love
it. And I admit the melt function is a major contributor to human
happiness. Why don't you read up on aggregate which is a base
function (in the r-sense, not in the biological sense.) I will see
what I can come up with in the meantime.
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list