[R] help reading a variably formatted text file
Michael Na Li
lina at u.washington.edu
Tue Nov 19 23:57:06 CET 2002
On Tue, 19 Nov 2002, ripley at stats.ox.ac.uk verbalised:
> On Tue, 19 Nov 2002, Michael Na Li wrote:
>
> > It would be nice to have more powerful regex in R, such as returning
> > matched substring grouped with "()".
>
> I think you are overlooking the power of gsub. You can certainly do that.
I want something like:
> REGEXFUN ("abc ([0-9]+)", "abc 30 and ABC 40 and abc 80")
[[1]]
[1] "30" "80"
I'm not sure how to achieve this with 'gsub'.
The best I can come up with is:
regex.match <- function (pattern, x) {
a <- strsplit (gsub(pattern, "*| \\1 |*", x), split = "\\*")
b <- lapply (a, function (x) x[grep ("^\\|.*\\|", x)])
lapply (b, function (x) {
temp <- unlist (strsplit (x, split = " *\\| *"))
temp[temp != ""]
})
}
> regex.match ("abc ([0-9]+)", "abc 30 and ABC 40 and abc 80")
[[1]]
[1] "30" "80"
It is unfortunately not quite useful and breaks down when there are two "()"
expressions or none, for instance.
> regex.match ("abc ([0-9]+) and ABC ([0-9+])", "abc 30 and ABC 40 and abc 80")
[[1]]
[1] "30"
Michael
--
----------------------------------------------------------------------------
Michael Na Li
Email: lina at u.washington.edu
Department of Biostatistics, Box 357232
University of Washington, Seattle, WA 98195
---------------------------------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list