[Rd] error handling in strcapture
Michael Lawrence
lawrence.michael at gene.com
Wed Sep 21 23:10:41 CEST 2016
Hi Bill,
Thanks, another good suggestion. strcapture() now returns NAs for
non-matches. It's nice to have someone kicking the tires on that
function.
Michael
On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel
<r-devel at r-project.org> wrote:
> Michael, thanks for looking at my first issue with utils::strcapture.
>
> Another issue is how it deals with lines that don't match the pattern.
> Currently it gives an error
>
>> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
> proto=list(Name="", Number=0))
> Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), :
> number of matches does not always match ncol(proto)
>
> First, isn't the 'number of matches' the number of parenthesized
> subpatterns in the regular expression? I thought that if the entire
> pattern matches then the subpatterns without matches would be
> shown as matches at position 0 with length 0. Hence either the
> pattern is compatible with the prototype or it isn't, it does not depend
> on the text input. E.g.,
>
>> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280"))
> [[1]]
> [1] 1 1 1 0
> attr(,"match.length")
> [1] 6 6 6 0
> attr(,"useBytes")
> [1] TRUE
>
> [[2]]
> [1] 1 1 0 1
> attr(,"match.length")
> [1] 2 2 0 2
> attr(,"useBytes")
> [1] TRUE
>
> [[3]]
> [1] -1
> attr(,"match.length")
> [1] -1
> attr(,"useBytes")
> [1] TRUE
>
> Second, an error message like 'some lines were bad' is not very helpful.
> Should it put NA's in all the columns of the current output row if the
> input line didn't match the pattern and perhaps warn the user that there
> were problems? The user could then look for rows of NA's to see where the
> problems were.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list