[R] File checking problem
    Barry Rowlingson 
    b.rowlingson at lancaster.ac.uk
       
    Thu Mar  5 20:19:32 CET 2009
    
    
  
2009/3/5 ling ling <metal_licaling at live.com>:
>
> Dear all,
>
> I am a newcomer to R programming, I met the problem:
>
> I have a lot of .txt files in my directory.
>
> Firstly, I check whether the file satisfies the conditions:
> 1.empty
> 2.the "Rep" column of the file has no "useractivity_idle" or
> "useractivity_act"
> 3.even The "rep" has both of them, numbers of "useractivity_idle"==numbers of "useractivity_act"==1
> If the file has one of those conditions, skip this file, jump to and read the next .txt file:
> I made the programming as:
>
> name<-list.files(path = ".", pattern = NULL, all.files = FALSE,
>           full.names = FALSE, recursive = FALSE,
>           ignore.case = FALSE)
>
> for(k in 1:length(name)){
>
> log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
>
> x<-which(log1$Rep=="useractivity_act")
> y<-which(log1$Rep=="useractivity_idle")
>
> while(all(log1$Rep!="useractivity_act")||all(log1$Rep!="useractivity_idle")||(length(x)==1
> && length(y)==1)||(file.info(name[k])$size== 0)){
> k=k+1
> log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
> }
>
> ........
>
> }
>
> But I always get the following information:
> Error in file(file, "r") : cannot open the connection
> In addition: Warning message:
> In file(file, "r") : cannot open file 'NA': No such file or directory
>
>
> I have been exploring this for long time, any help would be appreciated. Thanks a lot!
 You are trying to read one more file than you have! Simplified your
code looks like this:
name = list.files(...)
for(k in 1:length(name)){
  log1 = read.table(name[k],....)
  while(something){
    k =k + 1
    log1 = read.table(name[k],...)     # 1
  }
}
What will happen is that when the last file is read at point #1, the
loop goes round again, k becomes more than the length of name, and it
will fail at #1 again.
 I think you've overcomplicated it. You just need one loop with an
'if' in it. I'd write it as:
processFiles = function(){
name<-list.files(path = ".", pattern = NULL, all.files = FALSE,
          full.names = FALSE, recursive = FALSE,
          ignore.case = FALSE)
 for(k in 1:length(name)){
   log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
   if(testCondition(log1)){
      cat("Processing ",name[k],"\n")
     processLog(log1)
   }else{
     cat("Skipping ",name[k],"\n")
   }
 }
}
Then you need two more functions, testCondition and processLog.
testCondition takes a data frame and decides whether you want to
process it or note. I'm not sure I've got the test logic right here,
but you should get the idea:
`testCondition` <-
  function(log1){
    ## test for Rep column:
    if(!any(names(log1)=="Rep"))return(FALSE)
    ## test active/idle count
    nAct = sum(log1$Rep == "useractivity_act")
    nIdle = sum(log1$Rep == "useractivity_idle")
    ## if we have no active or idle, return False
    if(nAct + nIdle == 0)return(FALSE)
    ## if we only have one of either, return False
    if(nAct == 1 || nIdle ==1) return(FALSE)
    ## maybe some other tests here?
    return(TRUE)
  }
 here is a simple processLog function that just prints the summary of
the data frame. Put whatever you want in here:
`processLog` <-
  function(log1){
     ## for example:
    print(summary(log1))
  }
How's that? Note the use of comments and breaking the code up into
small independent, testable functions.
Barry
    
    
More information about the R-help
mailing list