[R] For help in R coding
Bansal, Vikas
vikas.bansal at kcl.ac.uk
Sat Jul 2 18:34:08 CEST 2011
>> Dear all,
>>
>> I am doing a project on variant calling using R.I am working on
>> pileup file.There are 10 columns in my data frame and I want to
>> count the number of A,C,G and T in each row for column 9.example of
>> column 9 is given below-
>>
>> .a,g,,
>> .t,t,,
>> .,c,c,
>> .,a,,,
>> .,t,t,t
>> .c,,g,^!.
>> .g,ggg.^!,
>> .$,,,,,.,
>> a,g,,t,
>> ,,,,,.,^!.
>> ,$,,,,.,.
>>
>> This is a bit confusing for me as these characters are in one column
>> and how can we scan them for each row to print number of A,C,G and T
>> for each row.
>
> Seems a bit clunky but this does the job (first the data):
>> txt <- " .a,g,,
> + .t,t,,
> + .,c,c,
> + .,a,,,
> + .,t,t,t
> + .c,,g,^!.
> + .g,ggg.^!,
> + .$,,,,,.,
> + a,g,,t,
> + ,,,,,.,^!.
> + ,$,,,,.,."
>
>> txtvec <- readLines(textConnection(txt))
>
> Now the clunky solution, Basically subtracts 1 from the counts of
> "fragments" that result from splitting on each letter in turn. Could
> be made prettier with a function that did the job.
>
>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
> split="a"), length) , "-", 1)),
> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
> length) , "-", 1)),
> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
> length) , "-", 1)),
> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
> length) , "-", 1)) )
> A C G T
> .a,g,, 1 0 1 0
> .t,t,, 0 0 0 2
> .,c,c, 0 2 0 0
> .,a,,, 1 0 0 0
> .,t,t,t 0 0 0 2
> .c,,g,^!. 0 1 1 0
> .g,ggg.^!, 0 0 4 0
> .$,,,,,., 0 0 0 0
> a,g,,t, 1 0 1 1
> ,,,,,.,^!. 0 0 0 0
> ,$,,,,.,. 0 0 0 0
>
> Has the advantage that the input data ends up as rownames, which was a
> surprise.
>
> If you wanted to count "A" and "a" as equivalent, then the split
> argument should be "a|A"
>
>
>>AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT LIKE THIS.
BUT CAN I COUNT . AND , ALSO USING-
data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
split=".|,"), length) , "-", 1)),
I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME PLACES IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN CALCULATING AND JUST SHOWING 0.
>>
>>
>> Thanking you,
>> Warm Regards
>> Vikas Bansal
>> Msc Bioinformatics
>> Kings College London
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
>
>
>
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list