[R] How to do indexing after splitting my data-frame?

Oliver Bandel oliver at first.in-berlin.de
Sat Dec 20 21:35:08 CET 2008


Hello,

after splitting a data-frame I want to access the results.

Maybe the problem is, that the factor/index is a string...

...or do I miss knowing details of the index-uasge?

Please look and help:

=======================================
> weblog <- read_weblog("web.log")
>
>
> str(weblog)
'data.frame':	2247 obs. of  18 variables:
 $ host      : Factor w/ 77 levels "124.0.210.117",..: 23 44 44 23 46 46
26 26 42 32 ...
 $ lname     : Factor w/ 1 level "-": 1 1 1 1 1 1 1 1 1 1 ...
 $ user      : Factor w/ 1 level "-": 1 1 1 1 1 1 1 1 1 1 ...
 $ date_time : chr  "29/Nov/2008:00:09:52" "29/Nov/2008:01:08:37"
"29/Nov/2008:01:08:37" "29/Nov/2008:03:39:45" ...
 $ timezone  : chr  "+0100" "+0100" "+0100" "+0100" ...
 $ status    : int  404 200 304 403 301 200 200 404 304 200 ...
 $ size      : num  307  32   0 314 333 ...
 $ referrer  : Factor w/ 19 levels "-","http://messenger.su/",..: 1 1 1
1 1 1 11 1 1 1 ...
 $ client    : Factor w/ 45 levels "digsby-asynchttp/0.1",..: 30 4 4 30
28 28 20 20 27 41 ...
 $ req_file  : chr  "/software/tools/newfileaction/pftdbns/"
"/robots.txt" "/kurama_2007/tn_kurama_fire_festival_hpim4496.jpg"
"/software/libraries/mboxlib/mbox.mli.html" ...
 $ req_method: chr  "GET" "GET" "GET" "GET" ...
 $ req_prot  : chr  "HTTP/1.0" "HTTP/1.1" "HTTP/1.1" "HTTP/1.0" ...
 $ date      : chr  "29-Nov-2008" "29-Nov-2008" "29-Nov-2008"
"29-Nov-2008" ...
 $ hour      : chr  "00" "01" "01" "03" ...
 $ day       : chr  "29" "29" "29" "29" ...
 $ month     : chr  "Nov" "Nov" "Nov" "Nov" ...
 $ year      : chr  "2008" "2008" "2008" "2008" ...
 $ t_sec     : atomic  1.23e+09 1.23e+09 1.23e+09 1.23e+09 1.23e+09 ...
  ..- attr(*, "tzone")= chr ""
>
>
> weblog_by_date <- split(weblog, weblog$date)
>
> weblog_by_date$"01-Dec-2008"$host
 [1] 74.6.22.164     74.6.22.164     74.6.22.164     67.195.37.169
 [5] 67.195.37.169   74.6.22.164     174.36.196.98   174.36.196.98
 [9] 67.195.37.169   72.30.65.23     72.30.65.23     65.55.210.177
[13] 65.55.210.177   74.6.22.160     74.6.22.160     74.6.22.121
[17] 74.6.22.121     208.80.194.30   66.249.71.141   66.249.71.141
[21] 66.249.71.141   216.34.181.101  216.34.181.101  65.55.210.182
[25] 65.55.210.182   38.99.44.101    217.212.224.183 217.212.224.186
[29] 89.111.176.102  89.111.176.102  66.249.71.141   65.55.210.180
[33] 65.55.210.180   65.55.210.179   65.55.210.179
77 Levels: 124.0.210.117 145.253.3.244 160.91.44.155 ... 94.23.3.220
>
> myindex <- "01-Dec-2008"
>
> weblog_by_date$myindex$host
NULL
> weblog_by_date[myindex]$host
NULL
>

=======================================

How can I grab into the data-structures, using the indexing by
date-string and by the names like "host" and so on?

So: is it posisble to use split in a way, that the original index-names
("host", "status" and so on) can be used?


Ciao,
   Oliver



More information about the R-help mailing list