[R] For help in R coding

Gabor Grothendieck ggrothendieck at gmail.com
Sat Jul 2 01:11:32 CEST 2011


On Fri, Jul 1, 2011 at 12:47 PM, Bansal, Vikas <vikas.bansal at kcl.ac.uk> wrote:
> Dear all,
>
> I am doing a project on variant calling using R.I am working on pileup file.There are 10 columns in my data frame and I want to count the number of A,C,G and T in each row for column 9.example of column 9 is given below-
>
>            .a,g,,
>            .t,t,,
>            .,c,c,
>            .,a,,,
>            .,t,t,t
>            .c,,g,^!.
>            .g,ggg.^!,
>            .$,,,,,.,
>            a,g,,t,
>            ,,,,,.,^!.
>            ,$,,,,.,.
>
> This is a bit confusing for me as these characters are in one column and how can we scan them for each row to print number of A,C,G and T for each row.
> Most of the rows have      .         and      ,    and other symbols but we will ignore them.I just want to run a loop with a counter which will count the number of A,C,G and T for each row and will give output something like this-
>
>
> A   C   G  T
> 1   0   1  0
> 0   0   0  2
> 0   2   0  0
> 1   0   0  0
> 0   0   0  3
>
> This output is for first 5 rows from the example given above.
>

Read the lines into L and then remove all but each of a, c, g and t
computing the number of characters in the remaining character strings:

Lines <- ".a,g,,
.t,t,,
.,c,c,
.,a,,,
.,t,t,t
.c,,g,^!.
.g,ggg.^!,
.$,,,,,.,
a,g,,t,
,,,,,.,^!.
,$,,,,.,."

L <- readLines(textConnection(Lines))

data.frame(a = nchar(gsub("[^a]", "", L)),
	c = nchar(gsub("[^c]", "", L)),
	g = nchar(gsub("[^g]", "", L)),
	t = nchar(gsub("[^t]", "", L))
)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list