[BioC] processCGH in snapCGH package

Thu Sep 27 16:18:30 CEST 2007

Quoting Sean Davis <sdavis2 at mail.nih.gov> on Thu 27 Sep 2007 00:13:04 BST:

> jhs1jjm at leeds.ac.uk wrote:
> > Quoting jhs1jjm at leeds.ac.uk on Wed 26 Sep 2007 22:54:01 BST:
> >
> >
> >> Quoting Sean Davis <sdavis2 at mail.nih.gov> on Wed 26 Sep 2007 17:30:18 BST:

> >>
> >>
> >>> jhs1jjm at leeds.ac.uk wrote:
> >>>
> >>>> R 2.5.0 on openSUSE 10.2 x86_64.
> >>>> Hi,
> >>>>
> >>>> I'm using the snapCGH package to analyse 2* 244k agilent CGH arrays with
> >>>>

> >>> the aim
> >>>
> >>>> of identifying regions of gain/loss.
> >>>> So far i've done the following:
> >>>>
> >>>>
> >>>>> targets <- readTargets ("targets.txt")
> >>>>> RG1 <-read.maimages (targets$File_names, source="agilent")
> >>>>> RG2 <- readPositionalInfo (RG1,source="agilent")
> >>>>> RG2$design <- c(-1-1)
> >>>>> RG3 <- backgroundCorrect (RG2,method="minimum")
> >>>>> MA1 <- normalizeWithinArrays (RG2,method="median")
> >>>>>
> >>>> then
> >>>>
> >>>>> MA2 <-
> processCGH(MA1,method.of.averaging=mean,ID="MA1$genes$ProbeName")
> >>>>>
> >>>> Error in order(na.last, decreasing, ...) :
> >>>>         argument 2 is not a vector
> >>>>
> >>>> I've looked at ?processCGH and am following the vignette for the snapCGH
> >>>>
> >>> package
> >>>
> >>>> fairly closely. Can anyone help with the error.
> >>>>
> >>> You can't quote variable names like above.  I'm not sure that is going
> >>> to fix the problem, but until the syntax is correct, it will be hard to
> >>> diagnose the issue.
> >>>
> >>>
> >>>> Also i'm unsure of what background correction to use and normalization
> >>>>
> >>> function
> >>>
> >>>> (I've been informed that non-linear methods are unsuitable). Also if
> >>>>
> >> anyone
> >>
> >>> has
> >>>
> >>>> any experience of Agilent CGH arrays could they also tell me whether the
> >>>> default estimates used for the foreground and background intensities in
> >>>> read.maimages are suitable. I'd like to determine the most suitable
> >>>>
> >> methods
> >>
> >>>> before as I think the segmentation may take some time on my machine. If
> >>>>
> >> its
> >>
> >>> a
> >>>
> >>>> case of trial and error then then thats fine. Thanks for any input.
> >>>>
> >>> I would use the LogRatio column of the Agilent file without any further
> >>> normalization.  The LogRatio is already background corrected.  The CGH
> >>> algorithms in snapCGH do not depend on the center of the data, so there
> >>> isn't really a need to do any further median centering, etc.  In fact,
> >>> there are probably better methods to center the data, but these use the
> >>> segmented data.
> >>>
> >>> Hope that helps.
> >>>
> >>> Sean
> >>>
> >>>
> >> Hi Sean,
> >>
> >> I'm struggling to import the LogRatio column from the Agilent text files.
> I'm
> >> using read.delim2 but this is bringing my machine to a standstill and
> after
> >> 45
> >> mins hadn't finished. Is the following the same:
> >>
> >>
> >>> RG1 <- read.maimages(targets$File_names,source="agilent")
> >>> RG2 <- readPositionalInfo(RG1,"agilent")
> >>> RG2$design <- c(1,-1)
> >>> RG3 <- backgroundCorrect(RG2,method="none")
> >>> MA1 <- normalizeWithinArrays (RG3,method="none")
> >>> LogRatio <- MA1$M
> >>>
> >> Having just looked at the text file it doesn't appear to be. I've looked
> >> through
> >> the data import R guide but haven't found anything yet.
> >>
> >>
>
> You will probably need to read the read.maimages help pretty carefully.
> You will need to specify other columns to read in if you want to read in
> the LogRatio column.  Alternatively, change the red and green foreground
> columns to be rProcessedSignal and gProcessedSignal and then do not do
> background correction, as LogRatio is calculated from these.  You will
> also potentially benefit from looking at the Agilent Feature Extraction
> Reference Manual, which explains the columns in the Agilent files.
>
> http://www.chem.agilent.com/scripts/LiteraturePDF.asp?iWHID=50416
>
> > Additionally Sean I tried:
> >
> >
> >> LogRatio <-log2(RG1$R)-log2(RG1$G)
> >>
> >
> > This gives me different results to the text file?
> >
>
> The LogRatio column is calculated from rProcessedSignal and
> gProcessedSignal in the Agilent file.  These columns are not loaded by
> limma by default.
>
> Hope that helps some.
>
> Sean

Hi Sean,

I did the following:

#read in the intensity data
> RG1 <-read.maimages(targets$File_names,source="agilent",
columns=list(R="rProcessedSignal",G="gProcessedSignal"))

It sounded like there was an alternative in your email but having looked at the
reference manual's column explanation I couldn't see one.

#insert info on ch pos of clone into the $genes matrix
> RG2 <- readPositionalInfo(RG1,source="agilent")
Warning message:
NAs introduced by coercion

#normalize
> MA1 <- normalizeWithinArrays (RG2,method="none")

This gives the log2 ratio whereas the agilent text is log10, is this important?
Following this i'm getting the same error with processCGH as follows:

> MA2 <- processCGH(MA1,ID="ProbeName")
Error in order(na.last, decreasing, ...) :
        argument 2 is not a vector

Some of the probes do not have location information, could this be the problem?

Thanks again
John