[R] Documenting data

Ulrik Stervbo ulrik.stervbo at gmail.com
Thu Jun 30 17:44:51 CEST 2016


Vince Buffalo has covers this nicely in his book "Bioinformatics Data
Skills". The original data should stay the original data is immutable and
Vince then suggests that you have a text file in your data directory where
you explain where the data came from and which scripts you used to create a
modified version, when you did this and so on.

I find using roxygen comments and knitr extremely useful for keeping track
of what I intend to do and why because it allows me to export all the
reasoning, summary tables and plots to a format I can share with
collaborators that don't care about the R code for getting there.

HTH
Ulrik


On Thu, 30 Jun 2016 at 17:30 Pito Salas <pitosalas at brandeis.edu> wrote:

> I am studying statistics and using R in doing it. I come from software
> development where we document everything we do.
>
> As I “massage” my data, adding columns to a frame, computing on other
> data, perhaps cleaning, I feel the need to document in detail what the
> meaning, or background, or calculations, or whatever of the data is. After
> all it is now derived from my raw data (which may have been well
> documented) but it is “new.”
>
> Is this a real problem? Is there a “best practice” to address this?
>
> Thanks!
>
> Pito Salas
> Brandeis Computer Science
> Feldberg 131
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list