[BioC] Getting Introns Expression at a Per Gene Level
Valerie Obenchain
vobencha at fhcrc.org
Wed Jan 30 21:23:46 CET 2013
Hi Fong,
We don't have a single function call to get introns by gene but you can
put one together in a few steps.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
Specifying use.names = TRUE puts the transcript names on the GRangesList.
introns <- intronsByTranscript(txdb, use.names=TRUE)
Unlist and remove duplicate introns.
ulst <- unlist(introns)
intronsNoDups <- ulst[!duplicated(ulst)]
The select() function can map txid to geneid (and others). See ?select.
txnames <- names(intronsNoDups)
map <- select(txdb, keys=txnames, keytype='TXNAME', cols='GENEID')
Not all transcripts are associated with a gene id and some are
associated with more than one.
> head(map, 10)
TXNAME GENEID
1 uc001aaa.3 <NA>
2 uc001aaa.3 <NA>
3 uc010nxq.1 <NA>
4 uc010nxq.1 <NA>
5 uc010nxr.1 <NA>
6 uc010nxr.1 <NA>
7 uc009vjk.2 100133331
8 uc009vjk.2 100133331
9 uc001aau.3 100132287
10 uc021oeh.1 100133331
To get a GRangesList of introns by gene, split the introns on the
associated gene id.
idx <- map$GENEID[!is.na(map$GENEID)]
intronsByGene <- split(intronsNoDups[!is.na(map$GENEID)], idx)
names(intronsByGene) <- unique(idx)
The list names are geneid and the element rownames are txid.
> intronsByGene
GRangesList of length 19784:
$100133331
GRanges with 13 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
uc002qsd.4 chr19 [58858396, 58858718] -
uc002qsd.4 chr19 [58859007, 58861735] -
uc002qsd.4 chr19 [58862018, 58862756] -
...
$100132287
GRanges with 1 range and 0 metadata columns:
seqnames ranges strand
uc003wyw.1 chr8 [18248856, 18257507] +
To count reads against this annotation you can use summarizeOverlaps().
The 'reads' can be Bam files, GappedAlignments or GappedAlignmentPairs.
The 'features' argument will be the GRangesList of introns by gene. The
result will be counts per gene.
To count just introns with no grouping you can still use
summarizeOverlaps() but use the 'intronsNoDups' GRanges object as
'features'.
Valerie
On 01/29/2013 11:38 AM, Fong Chun Chan wrote:
> Hi,
>
> I am using the R package, Genomic Features, to get Intron Expression and
> the function that I am using is the intronsByTranscript(). While this
> function is useful, it returns the number of raw reads that align to each
> intron grouped at the transcript level. Is there an easy way to get it to
> group it by a gene such that I am grabbing all the introns that fall in all
> the transcripts belong to a gene (and without double counting the intronic
> space).
>
> Similar, is there a way to easily get the raw read count at an individual
> intron level rather than having it grouped? The introns() function seems to
> be defunct now as it recommends that we use the intronsByTranscript()
> function.
>
> Thanks,
>
> Fong
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list