[BioC] pulling out specific lines from a bam file

Eric Foss [guest] guest at bioconductor.org
Sun Jun 30 02:29:42 CEST 2013


I would like to enter a list of SNPs and a bam file and have Bioconductor return all the lines from the bam file, in human readable form, that contain the SNP in question. Here is my attempt: 

[code]library(GenomicRanges)

genes <- GRanges(seqnames = c("13", "1", "12"), 
				 ranges = IRanges(
				 	start = c(111942495, 114516752, 116434901), 
				 	width = 1), 
				 strand = rep("*", 3), 
				 seqlengths = c("1" = 249250621, "13" = 115169878, "12" = 133851895))

alnFile <- "first_10k_PASRTP_RNA_790790_exome6500SNPtolerant_062913_1_clean_sorted.bam"

aln <- readGappedAlignments(alnFile)

bamGRanges <- GRanges(seqnames = seqnames(aln), 
				 ranges = ranges(aln), 
				 strand = strand(aln), 
				 seqlengths = seqlengths(aln))

answers <- findOverlaps(query = genes, subject = bamGRanges)

final <- aln[subjectHits(answers)]
[/code]

my "final" variable consists of almost what I want, but not quite. It has only some of the fields from the bam file. It doesn't, for example, have the sequence. Can someone point me in the right direction? 

Thank you. 

Eric


 -- output of sessionInfo(): 

R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rsamtools_1.12.3     Biostrings_2.28.0    GenomicRanges_1.12.4 IRanges_1.18.1      
[5] BiocGenerics_0.6.0  

loaded via a namespace (and not attached):
[1] bitops_1.0-5   stats4_3.0.1   tools_3.0.1    zlibbioc_1.6.0

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list