[BioC] Data formats

Vincent Carey 525-2265 stvjc at channing.harvard.edu
Wed May 28 08:31:25 MEST 2003


> Is there a list of the data formats accepted by bioconductor modules. I have
> checked the FAQ and can't find this information. I know .cel, .gpr (genepix)
> are accepted. What about other formats?
>
> Also if I wish to re-analyse published data, the raw data files are
> frequently not available, if I use read.table(filename, row.names =1, header
> =TRUE), what bioconductor modules are/are not available.

read.marrayRaw indicates capacity to handle .xls, .spot, .gpr ...
(in package marrayInput)

but your second question is more telling.  there is no a priori
restriction on module use based on data source.
procedures that are based on the exprSet class can be used provided
you populate an exprSet with the information from your raw data file.
typically this involves setting the exprs slot to hold the matrix
of expression values (rows are genes, columns are samples), and
setting the phenoData slot to hold information
about the samples/design (information about the columns of the expression
matrix).  edd is an example of a module that requires an exprSet
format.  some of the affy procedures work from exprSets.

genefilter and geneplotter do not require an exprSet.   you
can use tools in genefilter on a matrix of expression values.

the annotation modules are usable in many different contexts.

perhaps a good way to think about this is: if Bioconductor does not
explicitly provide a way to handle a certain data resource, you have
the full power of R and omegahat (www.omegahat.org, which deals with
many intersystem interfaces) to transform that data resource into
something immediately amenable to analysis with tools in Bioconductor.
if there's a specific data resource you are having trouble with, give
us the details and a path may be suggested or coded.



More information about the Bioconductor mailing list