[R] R usage for log analysis

Allen S. Rout asr at ufl.edu
Mon Jun 12 06:44:51 CEST 2006

"Gabriel Diaz" <gabidiaz at gmail.com> writes:

> and what is the correct path to do it?
> I mean, put logs files in a mysql or somehting like that, and then
> make R use that data, using the data from the files directly?

I haven't stuck anything in a DB yet.  I'm not sure how much of the DB
clue is used under the covers. 

> pre-parse the log files to accomodate them to R?
Probably not; a little familiarity with the reading functions will
obviate most needs to pre-parse.

> I need faqs, manuals, books, whatever to learn about this, can anyone
> give some advice?


Don't expect a warm welcome.  This community is like all open-source
communities, sharply focused on its' own concerns and expertise.  And,
in an unusual experience for computer types, our core competencies
hold little or no sway here; they don't even give us much of a leg up.
Just wait 'till you want to do something nutso like produce a business
graphic. :)

I'm working on understanding enough of R packaging and documentation
to begin a 'task view' focused on systems administration, for humble
submission. That might end up being mostly "log analysis"; the term
can describe much of what we do, if it's stretched a bit.  I'm hoping
the task view will attract the teeming masses of sysadmins trapped in
the mire of Gnuplot and friends.

For starters, become familliar with read.table(); with a few
variations it will take care of all the 

while (<>) { @blah = split(/,/); etc. etc. etc. } 

you've been accustomed to doing.  

Name columns;  this makes it easier to think about your data.  


Start thinking of your data in generic sets, as opposed to specific
rows.  Situations which required iteration over specific rows in
PERL-land fall neatly to precise assignment in R.  For example, if
you've got records with dates and times and you want to work with time

in PERL you'd 

foreach (...) 
{$foo->{pdate} = parsedate($foo->{date}." ".$foo->{time})}

or some such.  In R-land, the iteration is implicit.  Here's a snippet
from something I'm using 


You're really acting on logical columns all at once here.  This is
fantastically more efficient in terms of your thought processes.  

- Allen S. Rout

More information about the R-help mailing list