[BioC] PWMmatch: position weight matrix or position frequency matrix?

Hervé Pagès hpages at fhcrc.org
Thu Feb 24 21:42:46 CET 2011


3 little additions... (see below)

On 02/24/2011 12:14 PM, Hervé Pagès wrote:
> Hi Zuzanna,
>
> On 02/22/2011 10:59 AM, Hervé Pagès wrote:
> [...]
>> Finally note that the Biostrings package doesn't provide a tool
>> to convert a position frequency matrix (that can be obtained with
>> consensusMatrix) into a position weight matrix.
>
> More on this and to clarify the role of the PWM() function mentioned
> by Val.
>
> PWM() can be used on a set of short sequences to compute the associated
> Position Weight Matrix using the Wasserman & Sandelin's approach.
> As its name suggests, PWM() will always return a PWM, not a PFM.
> The 'type' argument controls the type of Position Weight Matrix that
> is returned.
> The 'prior.params' argument controls the Dirichlet conjugate prior.
> By this argument is set to c(A=0.25, C=0.25, G=0.25, T=0.25).
   ^^^
   by default

>
> In the example given by Val, PWM(sset, type="prob") returns a PWM
> that is just the PFM divided by a constant (this constant being
> the number of short sequences in the input).

Not true that this constant is the number of short sequences in
the input. However, it doesn't matter what this constant is...

> So, in that particular
> case, matchPWM() will give the same result whether you pass it the
> PFM or the PWM obtained with PWM(sset , type="prob"). (Multiplying
> the PWM by a constant doesn't affect the output of matchPWM).
>
> But this is only a particular situation. It's not true in general
> that PWM( , type="prob") will return a PWM that is just the
> PFM divided by a constant. For example it would not be the case
> anymore if you were using a 'prior.params' vector that contains
> values that are not all the same.

Just to be more concrete about this. By just adding 1 sequence to
'sset', things look very different:

 > sset <- DNAStringSet(c("AGTT", "ATGC", "AACG", "AATG", "CCAA"))
 > consensusMatrix(sset)[DNA_BASES, ]
   [,1] [,2] [,3] [,4]
A    4    2    1    1
C    1    1    1    1
G    0    1    1    2
T    0    1    2    1
 > PWM(sset, type="prob")
          [,1]       [,2]       [,3]       [,4]
A  0.46428571 0.17857143 0.03571429 0.03571429
C  0.03571429 0.03571429 0.03571429 0.03571429
G -0.10714286 0.03571429 0.03571429 0.17857143
T -0.10714286 0.03571429 0.17857143 0.03571429

Note the negative weights!

Cheers,
H.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list