[R] Documenting data

Pito Salas pitosalas at brandeis.edu
Thu Jun 30 17:46:39 CEST 2016


Thanks to you both. I think you’re saying/implying that once I “test drive” a particular bit of cleaning I should capture it in a function which does it reproducibly against the raw data, and that becomes the best documentation for it. That makes sense.

Pito Salas
Brandeis Computer Science
Feldberg 131

> On Jun 30, 2016, at 11:44 AM, Robert Baer <rbaer at atsu.edu> wrote:
> 
> You might look at:
> 
> http://stackoverflow.com/questions/7979609/automatic-documentation-of-datasets
> 
> You might also, try the  FIle | Compile Notebook  from within R-Studio (https://www.rstudio.com/) on your well-documented R-scripts to get a nice reproducible recording/report of data analysis workflow.  Similar functionality is available from basic R, but involves more work.  There are many other approaches, but the best choice depends on your precise needs.
> 
> And, as a programmer, you are probably already familiar with things like:
> https://google.github.io/styleguide/Rguide.xml
> 
> 
> 
> On 6/30/2016 9:51 AM, Pito Salas wrote:
>> I am studying statistics and using R in doing it. I come from software development where we document everything we do.
>> 
>> As I “massage” my data, adding columns to a frame, computing on other data, perhaps cleaning, I feel the need to document in detail what the meaning, or background, or calculations, or whatever of the data is. After all it is now derived from my raw data (which may have been well documented) but it is “new.”
>> 
>> Is this a real problem? Is there a “best practice” to address this?
>> 
>> Thanks!
>> 
>> Pito Salas
>> Brandeis Computer Science
>> Feldberg 131
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list