[R] importing data in excel
(Ted Harding)
Ted.Harding at nessie.mcc.ac.uk
Mon Aug 30 11:42:24 CEST 2004
On 30-Aug-04 ronggui wong wrote:
> if under windows ,the command works well,but under linux,
> it does NOT,as linux can keep couples of file in clipboar.
> so i want to know how to do similar thing under linux.
> anyone knows?
Since you're using Linux, maybe the following suggestions will
work for you (it's how I almost always deal with this situation).
This follows on from Thomas Petzoldt's suggestions.
1. Save the spreadsheet out of Excel as a CSV (Comma Separated
Variables) file. There are appropriate options available
in the "save as ... " menu.
Comments. There can be complications if the original .xls file
was badly prepared. You need to ensure that it comes out with
exactly the same number of columns as there are variables, and
exactly the same number of rows as there are rows of data in
the .xls file (plus 1 for the header row of variable names, if
present). There should be (k-1) commas in every line if there
are k columns of data.
a) I have known spreadsheets where the person entering the data
has simply entered nothing in "short rows", i.e. where the
last few variables have "missing values".
b) Likewise, some rows may have had "space" entered beyond the
range of data columns, so some rows are too long.
c) Likewise, some rows below the range of data rows may have
had blank entered in some cells, so Excel sees these as
having content.
For the most part, these complications will be avoided if, as
Thomas suggests, you "mark" the full rectangular area prior to
saving the marked area. Of course, you may already know that
these problems will not arise in your case; but I can assure
you from experience that if you are receiving your data from
someone else then you should watch out for them.
d) Similarly, you may encounter cases where the user has entered
a "." (or similar) in blank/missing cells, though not necessarily
in all, and possibly (as in (b) above) where they should not have.
2. Transfer the resulting file (say "mydata.csv") from the Windows
machine to the Linux machine.
3. Run "dos2unix" on this file:
dos2unix mydata.csv
This ensures in particular that the end-of-line terminator is
a clean unix-compatible LF ("\n") and not the Windows pair CRLF
("\r\n") which can cause complications; and also will strip any
DOS-style "filler bytes" from the end of the file. It also copes
with files originating on MAC systems (where the EOL is "\r").
4. If the resulting file "mydata.csv" is now nice and clean (which
normally it should be, unless things like (a-d) above apply),
then you should be able to bring it into R as a dataframe by
simply giving the command
mydata.df <- read.csv("mydata.csv")
or similar.
5. However, if it has fallen victim to the sort of complication
described above, then you will need to clean it up before use.
Unix/Linux happily comes provided with plenty of file utilities
which assist with such clean-up. These include the programs
'sed', 'awk', 'grep', 'sort' and the like, and for the most part
I can get such jobs done just using 'sed' and 'awk'.
If you find yourself in this kind of situation and need further
advice, get in touch with me off-list.
Good luck,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 30-Aug-04 Time: 10:42:23
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list