[R] "subscript out of bounds" error when using koRpus+Tree Tagger
Jiayue Wang
@@@h@@w@ng2017 @ending from y@ndex@com
Sun Dec 9 07:23:36 CET 2018
Hi,
I'm trying to do text corpus processing on some novels, with koRpus
package and Tree Tagger. The script lists all txt files (11 in all) in a
dir, and processes it one by one.
##########
rm(list=ls())
library(koRpus)
library(koRpus.lang.en)
set.kRp.env(TT.cmd = "/pathto/tree-tagger-english", lang = "en")
outdir <- "/pathto/corpora"
corpdir <- paste0(outdir,"/","morrison11")
files <- list.files(path=corpdir, pattern = "*.txt", full.names = F)
n <- length(files)
output <- file(paste0(outdir,"/calc_results_morrison11.txt"), open="at")
for (i in 1:n) {
cat(i," - ",files[i],"\n", file = output)
tagged.results <- treetag(paste0(corpdir,'/',files[i]),
treetagger="kRp.env")
capture.output(flesch(tagged.results), file = output)
cat("\n", file=output)
capture.output(TTR(tagged.results), file = output)
cat("\n", file=output)
capture.output(textFeatures(tagged.results), file=output)
cat("\n===========================\n", file = output)
}
close(output)
#########
The problem is, the script always throws the following error when it
works on the last txt file and prematurely exits:
Error in all.patterns[[word.length]] : subscript out of bounds
I can't figure out what this message means. the dir's are correct;
there's no problem with Tree Tagger installation; n and files have the
correct values.
Please help, many thanks!
Jiayue
More information about the R-help
mailing list