[R] gene name problem in Excel, and an R analogue

John Kane jrkrideau at inbox.com
Wed Sep 14 12:48:41 CEST 2016

 I see it (after a lot of peering at the code).  It is a nasty problem but I suspect one that would get flagged later in an analysis (well in most cases).

The Excel problem is serious in another way. Many people use Excel or other spreadsheets as data entry tool ---which I think was the cause of the issue in the gene study---and can lose the data completely if there is no paper backup. In your example, one can run str() and diagnose the problem and  recover (i.e. convert)the data.  

If I have 30,000 rows of data in a spreadsheet is there anyway I can tell if some of my character data has converted to numerical dates and convert back? 

John Kane
Kingston ON Canada

> -----Original Message-----
> From: erich.neuwirth at univie.ac.at
> Sent: Wed, 14 Sep 2016 07:54:44 +0200
> To: r-help at r-project.org
> Subject: [R] gene name problem in Excel, and an R analogue
> Since many people commenting on the gene name problem in Excel
> essentially tell us
> This could never have happened with R
> I want to show you a somewhat related issue:
> ff1 <- tempfile()
> cat(file = ff1, "12345", "1E002", sep = "\n")
> xdf1 <- read.fwf(ff1, widths = 5, stringsAsFactors=FALSE)
> ff2 <- tempfile()
> cat(file = ff2, "12345", "1E002","1A010", sep = "\n")
> xdf2 <- read.fwf(ff2, widths = 5, stringsAsFactors=FALSE)
> in xdf1, the variable is numeric, in xdf2, it is a character variable.
> Of course, in hindsight this makes sense. But the problem is similar to
> the
> Excel problem where something which could be a date is interpreted as a
> date.
> A possible solution with my read.fwf problem would be to have a parameter
> forcing variables to be read as strings.

FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!

More information about the R-help mailing list