[R] label storage and conversions: DBMS and R

Frank E Harrell Jr fharrell at virginia.edu
Sun Feb 9 23:16:07 CET 2003

On Sun, 9 Feb 2003 16:29:51 EST
TyagiAnupam at aol.com wrote:

> Hi R users, 
> I am new to using DBMS with R for large datasets. Thanks to all who responded 
> with useful suggestion to my earlier postings about using large datasets and 
> DBMS with R. I am writing to get some help about how to design good tables in 
> DBMS to take full advantage of the wonderful built-in facilities in R, like 
> labels.
> I am using RMySQL client. Because R makes good use of variable and value 
> labels and data (column) types, I would like to create tables with 
> appropriate design in terms of,
>  (1) datatype (char, varchar, int, etc.) in DBMS such that it corresponds 
> with the appropriate datatype in R (factor, numeric, etc.) when converted,
> (2) How best to store variable and values lables and formats in DBMS, so they 
> are correctly included in the data.frame that DBMS clients like RMySQL create 
> for use in R. 
> If I had only a few variables and values this will not be a problem; I can 
> use meaningful variable names or create labels directly in R. But with 1600 
> variables, many with about 10 catagorical values, this approach does not look 
> promising. Is there a document somewhere that addresses this issue? What 
> would be a good way to solve this problem?
> Anupam.
> *********************************************************
> Prediction is very difficult, especially about the future. 
>                   -- Niels Bohr
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help

We are working on a PostgrSQL-based system in which all metadata are defined in XML.  Ultimately I will interpret the XML metadata in R to fetch the variable labels.  In the Hmisc library I have a label function to make it easy to assign a 'label' attribute to an individual variable, and a function upData which makes it easy to assign lots of labels.  I will use the same 'label' attribute these use when fetching labels from XML.

Another possibility is to make a table defining variable-specific metadata.  Then you could just read in the table and write a short function to pull out labels after matching on variable names, assigning the labels to an attribute of your choosing.
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

More information about the R-help mailing list