[R] one way to write scripts in R

Gabor Grothendieck ggrothendieck at gmail.com
Mon Mar 29 18:19:59 CEST 2010


Thanks.

You might want to repost it as a text attachment since many of the
lines wrapped around.

Another more permanent possibility would be to put it on the R wiki at
http://rwiki.sciviews.org/doku.php

Note that the gsubfn package has a facility for quasi-perl type string
interpolation as well. Just preface any function with fn$ and the
facility is applied to the arguments of the function (subject to
certain heuristics which determine which args to apply it to).

> library(gsubfn)
> today <- format(Sys.Date())
> show <- list()
> show$syntax <- 43
> Sys.setenv(AN_ENV_VAR="greetings (I'm an env var!)")
>
> fn$cat('Getting `Sys.getenv("AN_ENV_VAR")` from the environment, on $today,
+ `show$syntax` is also possible.\n')
Getting greetings (I'm an env var!) from the environment, on 2010-03-29,
43 is also possible.



On Mon, Mar 29, 2010 at 11:41 AM, Jason E. Aten <j.e.aten at gmail.com> wrote:
> Dear R users,
>
> A colleague of mine asked me how to write a script (an executable text file
> containing R code) in R. After I showed
> him, he said that after extensive searching of the R archives, he had not
> found anything like these techniques.
>
> He suggested that I share these methods to enable others to leverage R as a
> better alternative to bash/perl scripts.
>
> So in the interest of giving back to the R community, and with all humility,
> I offer the
> following small demonstration of one method for creating scripts of R code
> that are
> executable from the (at least Linux) command line.
>
> I don't make any warrantees that this will work for you, but if it helps
> somebody at least
> get starting utilizing R effectively in scripts, then great!
>
> Best regards,
>
> Jason
>
> --
> Jason E. Aten, Ph.D.
>
>
> # file: scriptdemo.rsh
>
> #!/bin/bash
> exec R --vanilla -q --slave -e "source(file=pipe(\"tail -n +4 $0\"))" --args
> $@
> #debug: exec R --vanilla --verbose -e "source(file=pipe(\"tail -n +4 $0\"))"
> --args $@
> ### The above line starts R and then reads in this script, starting at line
> 4:
> #
> # scriptdemo.rsh : a simple filter script to demonstrate how to write a
> script in R that
> #                reads stdin and utilizes command line argv. Also shows how
> to use ppp() to do
> #                bash scripting like variable substitution, which is really
> just syntactic
> #                sugar. But sugar can be sweet.
> #
> # NB: Only tested on Linux, YMMV, and you may have to adapt to your OS. If
> it breaks, you
> #     get to keep both pieces.
>
> # 1st point of note: notice the exec R invocation above, with the pipe and
> tail combo.
> #   This file becomes the program read into R. If it is set chmod+x then you
> can execute this file.
>
>  pp=function(...) paste(sep="",...)
>  script="scriptdemo.rsh"
>  usage=pp(script,": put help info here")
>
>  argv = commandArgs(trailingOnly=TRUE)
>
>
>  # --help
>  if(any(argv=="--help")) {
>    cat(usage)
>    quit(save="no",status=0)
>  }
>
> # 2nd point of note: this is how to read stdin inside a script:
> #
>
>  # slurp in all the input
>  r=readLines("stdin")
>
>  bad=grep("^#",r) # remove comments
>
>  # write out lines that didn't start with #
>  cat(r[setdiff(1:length(r),bad)],sep="\n")
>
>
> # 3rd point of note: if you want nice bash shell scripting string
> substitution and backticking
> #  you can use my ppp() function. Note it's not well vectorized at the
> moment, so it will expect
> #  variables that are substituted from the environment to be of length 1.
> #  A bit hackish in places (sure the |@|@| and 34HERE43 stuff makes me
> wince), but it gets the job done,
> #  as it's meant as a proof of concept.
>
> ##########################
> # utility functions leading up to final definition of ppp() : shell
> scripting like facilities for R
> #  Skip to the end of this file to see what ppp() does for you.
> ##########################
>
> # delete one trailing whitespace
> chomp=function(x) {
>   n=nchar(x)
>   a=substr(x,n,n)
>   w=which(a==" " | a == "\n" | a=="\t")
>   if (length(w)) {
>     x[w]=substr(x[w],1,n[w]-1)
>   }
>   x
> }
>
> # delete one leading whitespace
> prechomp=function(x) {
>   n=nchar(x)
>   a=substr(x,1,1)
>   w=which(a==" " | a == "\n" | a == "\t")
>   if (length(w)) {
>     x[w]=substr(x[w],2,n[w])
>   }
>   x
> }
>
>
> # eliminate whitespace leading/trailing from a string
> trim=function(x) {
>   y=chomp(x)
>   while(any(y!=x)) {
>     x=y
>     y=chomp(x)
>   }
>
>   y=prechomp(x)
>   while(any(y!=x)) {
>     x=y
>     y=prechomp(x)
>   }
>
>  x
> }
>
> strsplit2=function(x,split,...) {
>    # detect trailing split : and add "" afterwards, so we know if it was
> there.
>    a=strsplit(pp(x,"|@|@|"),split,...)
>    lapply(a,function(x) gsub("|@|@|","",x,fixed=TRUE))
> }
>
> strsplit3=function(x,split,keepsplit=FALSE,...) {
>    if (keepsplit) {
>      repstring="34HERE43"
>      if (length(grep(repstring,x))) { die(repstring, " repstring already
> found. Arg! Aborting")  } # sanity check
>      # note where we want to split, using \\1 backref to keep the original
>      a=gsub(pattern=pp("(",split,")") ,replacement=pp(repstring,"\\1"),x)
>    } else {
>       a=x
>       repstring=split
>    }
>    b=strsplit2(a,repstring,...) # split, keeping the original delimiters
> }
>
> pp=function(...) paste(...,sep="") # pp() must be defined in outermost scope
> for ppp() to work
>
> replacer=function(s,begin.string="${",end.string="}",keepend=FALSE,require.end=TRUE)
> {
>
>     translate.env=function(x) {
>       if (exists(x)) return(x)
>       a=Sys.getenv(x)
>       if (a!="") return(pp("\"",a,"\""))
>       x
>     }
>
>     parts=strsplit2(s,begin.string,fixed=T)[[1]]
>     if (length(parts) < 2 || all(parts=="")) return(s)
>     if (any(trim(parts[-1])=="")) {
>        warning(pp("ppp::replacer(): found begin.string '",begin.string,"'
> in '",s,"' but had empty/blankspace/end of string following it."))
>        return(s)
>     }
>
>     collap=c()
>     collap[1]=parts[1]
>     for (i in 2:length(parts)) {
>       tmp=strsplit3(parts[i],end.string,keepsplit=keepend)[[1]]
>       if (length(tmp)==1) {
>         if (require.end) {
>           warning(pp("ppp::replacer(): could not find end.string
> '",end.string,"' in string '",s,"' and require.end=TRUE, so karping."))
>           collap[(i-1)*2]=parts[i]
>           collap[(i-1)*2+1]=""
>         } else {
>           collap[(i-1)*2]=pp("\",",translate.env(tmp[1]),",\"") # allow
> newline to terminate as well, if end not required
>           collap[(i-1)*2+1]=""
>         }
>       } else {
>         collap[(i-1)*2]=pp("\",",translate.env(tmp[1]),",\"")
>         # collect the rest of parts[i] following tmp[1] and the end.string
> (assumes end.string is only ever length 1)
>
> collap[(i-1)*2+1]=substr(parts[i],nchar(tmp[1])+1+(1-as.numeric(keepend)),nchar(parts[i]))
>       }
>     }
>     text=pp("pp(\"",pp(collap,collapse=""),"\")")
>
>     # sys.frame(-2) is necessary to get definitions from calling function
> before where we were defined.
>     if (sys.nframe() > 1) {
>       ftext=eval(parse(text=text),envir=sys.frame(-2))
>     } else {
>       ftext=eval(parse(text=text))
>     }
>     ftext
> }
>
> pp=function(...) paste(...,sep="") # must be defined in outermost scope for
> ppp() to work
>
> # shell like string interpolation... ppp("fill in ${myvar} here after
> `hostname` is $myvar")
> ppp=function(...) {
>
>  sa=paste(sep="",...)
>  res=c()
>  for (j in 1:length(sa)) {
>     s=sa[j]
>     s2=replacer(s,"${","}")
>     terminators="\t| |\\.|`|\\$|\\{|\\}|\\(|\\)|<|>|\\|"
>     s3=replacer(s2,"$",terminators,keepend=TRUE,require.end=FALSE) #
> !require.end allows end of line termination
>     res[j]=s3
>  }
>
>  do.sys.expecting.output=function(cmd) {
>    got=pp(system(intern=T,cmd),collapse="\n")
>    if (got=="") die("do.sys() on '",cmd,"' returned no output.")
>    got
>  }
>
>  # now check for backtick system call requests as well, *after* variable
> substitution is all finished.
>  bt=grep("`",res)
>  if(length(bt)) {
>     sa=res[bt]
>     for (j in 1:length(sa)) {
>         s=sa[j]
>         parts=strsplit2(s,"`",fixed=T)[[1]]
>         if (length(parts) < 3) { res[bt[j]]=s; next; }
>
>         collap=c()
>         collap[1]=parts[1]
>         for (i in seq(2,length(parts)-1,2)) {
>            cmd=parts[i]
>            collap[i]=do.sys.expecting.output(cmd)
>            collap[i+1]=parts[i+1]
>         }
>         text=pp(collap,collapse="")
>         res[bt[j]]=text
>       } # for j
>     } #end if length(bt)
>  res
> }
>
>
>
> #
> # now demonstrate the use of ppp() in a scripting context:
> #
>
> today="date"
> month=3
> year=2010
>
> show=list()
> show$syntax = 43
>
> Sys.setenv(AN_ENV_VAR="greetings (I'm an env var!)")
>
> demo=ppp("Getting $AN_ENV_VAR from the environment, on `$today`,
> substituting ${show$syntax} in named lists is also possible. `cal $month
> $year| head` ")
>
> cat("here's the demo output\n")
> cat(demo,sep="\n")
>
>
> ## # output of this demo script when run, show how to use stdin and ppp()
> ##
> ##
> ## me at host:~/uns/bin$ cat ~/tmp/test | template2.rsh
> ## not comment 1
> ## not comment 2
> ## not comment 3
> ## not comment 4
> ## not comment 5
> ## here's the demo output
> ## Getting greetings (I'm an env var!) from the environment, on Mon Mar 29
> 10:23:49 CDT 2010, substituting 43 in named lists is also possible.
>  March 2010
> ## Su Mo Tu We Th Fr Sa
> ##     1  2  3  4  5  6
> ##  7  8  9 10 11 12 13
> ## 14 15 16 17 18 19 20
> ## 21 22 23 24 25 26 27
> ## 28 29 30 31
> ##
> ## me at host:~/uns/bin$ cat ~/tmp/test
> ## # comment 1
> ## not comment 1
> ## not comment 2
> ## # comment 2
> ## not comment 3
> ## not comment 4
> ## not comment 5
> ## # comment 3
>



More information about the R-help mailing list