[BioC] PAIR files -- feature set table

Johnson, Franklin Theodore franklin.johnson at email.wsu.edu
Thu Jun 13 01:31:32 CEST 2013


Dear Dr. Carvalho,

Recently, we had cooresponence regaring makePDInfoPackage for an NimbleGen apple microarray.
I was able to merge the ndf design and XYS files using PROBE_ID.
As a reminder this is a custom array, and there are no SIGNAL==NAs for control probes. 
It seemed to work: 
> makePdInfoPackage(seed, destDir(""))
============================================================================================================================================================
Building annotation package for Nimblegen Expression Array
NDF: GPL11164.ndf
XYS: XYS.txt
============================================================================================================================================================
Parsing file: GPL11164.ndf... OK
Parsing file: XYS.txt... OK
Merging NDF and XYS files... OK
Preparing contents for featureSet table... OK
Preparing contents for bgfeature table... OK
Preparing contents for pmfeature table... OK
Creating package in C:/Users/franklin.johnson.PW50-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/pd.gpl11164 
Inserting 2 rows into table featureSet... OK
Inserting 765524 rows into table pmfeature... OK
Inserting 5075 rows into table bgfeature... OK
Counting rows in bgfeature
Counting rows in featureSet
Counting rows in pmfeature
Creating index idx_bgfsetid on bgfeature... OK
Creating index idx_bgfid on bgfeature... OK
Creating index idx_pmfsetid on pmfeature... OK
Creating index idx_pmfid on pmfeature... OK
Creating index idx_fsfsetid on featureSet... OK
Saving DataFrame object for PM.
Saving DataFrame object for BG.
Done.
Warning message:
In is.na(ndfdata[["SIGNAL"]]) :
is.na() applied to non-(list or vector) of type 'NULL'
>

In contrast to this warning message, I see a pdinfopackage directory with 4 subdirectories: c=("data", "inst", "man", R"), as well as subsubdirectories in "inst"=c("extdata", and "Unit Tests"), in addition to two text files in the main directory: c=("DESCRIPTION", "NAMESPACE") were created in my destination folder.
Before using "oligo", if possible, I wanted to confirm with you that this package is viable to use with "oligo" although a warning message that may not pertain to my custom designed microarray was printed.

Regards,
Franklin

Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt




________________________________________
From: Johnson, Franklin Theodore
Sent: Friday, June 07, 2013 10:39 AM
To: Benilton Carvalho
Cc: bioconductor at r-project.org
Subject: RE: [BioC] PAIR files -- feature set table

Resending to bioconductor message thread:

Dear Dr. Carvalho,
Thanks for the response.
As you suggested, I will look into the merge function using "Probe_ID".
After reading in the data, I will start here: merge.datasets(dataset1, dataset2, by="key").
Best Regards,
Franklin

Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt

________________________________________
From: Benilton Carvalho [beniltoncarvalho at gmail.com]
Sent: Thursday, June 06, 2013 8:11 PM
To: Johnson, Franklin Theodore
Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu
Subject: Re: [BioC] PAIR files -- feature set table

You will need to merge the PAIR and the NDF using the PROBE_ID column
as key. This will allow you to get the X/Y coordinates needed to
create the XYS as described on the other messages.

Regarding annotation, you may need to contact NimbleGen to request
this information directly from them...

benilton

2013/6/6 Johnson, Franklin Theodore <franklin.johnson at email.wsu.edu>:
> Dear Dr. Carvalho,
>
> Muchos grasias for the reply.
>
> Actually, this is what my .ndf file looks like:
>> head(ndf)
>   PROBE_DESIGN_ID   CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID
> 1  7552_0343_0009 Duplicate_1
> 2  7552_0345_0009 Duplicate_2
> 3  7552_0347_0009 Duplicate_1
> 4  7552_0349_0009 Duplicate_2
> 5  7552_0351_0009 Duplicate_2
> 6  7552_0353_0009 Duplicate_1
>                                                PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS
> 1  cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca        0    64535488   64535488       9     343
> 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg        0    64799310   64799310       9     345
> 3          agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca        0    64476989   64476989       9     347
> 4          ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa        0    64862794   64862794       9     349
> 5          gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg        0    64832726   64832726       9     351
> 6          ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc        0    64435686   64435686       9     353
>                       PROBE_ID POSITION DESIGN_ID   X Y
> 1    Contig19819_1_f_28_10_535        0      7552 343 9
> 2 Malus_CN899188_2_f_147_1_755        0      7552 345 9
> 3  Contig20738_8_r_1179_2_1432        0      7552 347 9
> 4 Malus_CN880097_2_r_336_2_536        0      7552 349 9
> 5 Malus_CN918117_2_f_632_1_781        0      7552 351 9
> 6     Contig1991_1_f_71_2_1239        0      7552 353 9
>
> The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying.
> So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without
> NimbleScan.
>
> Salud,
> Franklin
>
>
>
>
>
>
> Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt
>
>
>
>
> ________________________________________
> From: Benilton Carvalho [beniltoncarvalho at gmail.com]
> Sent: Wednesday, June 05, 2013 6:42 PM
> To: FRANKLIN JOHNSON [guest]
> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder Maintainer
> Subject: Re: [BioC] PAIR files -- feature set table
>
> It's an unfortunate mistake to have the pairFile *argument* in the
> call (not in the slots session, but I see your point). :-( I'll make
> sure that this is fixed.
>
> You need to convert the PAIR files to XYS...
>
> Some refs that should help you in the process:
>
> https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html
> http://comments.gmane.org/gmane.science.biology.informatics.conductor/27547
>
> b
>
> 2013/6/5 FRANKLIN JOHNSON [guest] <guest at bioconductor.org>:
>>
>> Dear Maintainer,
>>
>> I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild.
>>
>> In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys.
>>
>> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file
>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf"
>>
>> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1])
>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys"
>>
>> But, doing this resulted in an error message:
>> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD")
>>
>> makePdInfoPackage(arrays, destDir = getwd())
>> ============================================================================================================================================
>> Building annotation package for Nimblegen Expression Array
>> NDF: GPL11164.ndf
>> XYS: GSM618107_14418002_532.xys
>> ============================================================================================================================================
>> Parsing file: GPL11164.ndf... OK
>> Parsing file: GSM618107_14418002_532.xys... OK
>> Merging NDF and XYS files... OK
>> Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected
>> In addition: Warning message:
>> In is.na(ndfdata[["SIGNAL"]]) :
>>   is.na() applied to non-(list or vector) of type 'NULL'
>>
>> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions?
>> Hope to hear from you soon.
>> Franklin
>>
>>  -- output of sessionInfo():
>>
>>> sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>>
>> attached base packages:
>>  [1] tcltk     grid      parallel  stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>>  [1] pdInfoBuilder_1.24.0 oligo_1.24.0         oligoClasses_1.22.0  affxparser_1.32.1    RSQLite_0.11.4       DBI_0.2-7
>>  [7] Mfuzz_2.18.0         DynDoc_1.38.0        widgetTools_1.38.0   e1071_1.6-1          class_7.3-7          gplots_2.11.0.1
>> [13] KernSmooth_2.23-10   caTools_1.14         gdata_2.12.0.2       gtools_2.7.1         timecourse_1.32.0    MASS_7.3-26
>> [19] Biobase_2.20.0       BiocGenerics_0.6.0   limma_3.16.5         ggplot2_0.9.3.1      BiocInstaller_1.10.1
>>
>> loaded via a namespace (and not attached):
>>  [1] affyio_1.28.0         Biostrings_2.28.0     bit_1.1-10            bitops_1.0-5          codetools_0.2-8       colorspace_1.2-2
>>  [7] dichromat_2.0-0       digest_0.6.3          ff_2.2-11             foreach_1.4.0         GenomicRanges_1.12.4  gtable_0.1.2
>> [13] IRanges_1.18.1        iterators_1.0.6       labeling_0.1          marray_1.38.0         munsell_0.4           plyr_1.8
>> [19] preprocessCore_1.22.0 proto_0.3-10          RColorBrewer_1.0-5    reshape2_1.2.2        scales_0.2.3          splines_3.0.1
>> [25] stats4_3.0.1          stringr_0.6.2         tkWidgets_1.38.0      tools_3.0.1           zlibbioc_1.6.0
>>>
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list