[R] grep

Fri Aug 2 06:28:50 CEST 2024

Às 02:10 de 02/08/2024, Steven Yen escreveu:
> Good Morning. Below I like statement like
> 
> j<-grep(".r\\b",colnames(mydata),value=TRUE); j
> 
> with the \\b option which I read long time ago which Ive found useful.
> 
> Are there more or these options, other than ? grep? Thanks.
> 
> dstat is just my own descriptive routine.
> 
>  > x
>   [1] "age"          "sleep"        "primary"      "middle"
>   [5] "high"         "somewhath"    "veryh"        "somewhatm"
>   [9] "verym"        "somewhatc"    "veryc"        "somewhatl"
> [13] "veryl"        "village"      "married"      "social"
> [17] "agricultural" "communist"    "minority"     "religious"
>  > colnames(mydata)
>   [1] "depression"     "sleep"          "female" "village"
>   [5] "agricultural"   "married"        "communist" "minority"
>   [9] "religious"      "social"         "no" "primary"
> [13] "middle"         "high"           "veryh" "somewhath"
> [17] "notveryh"       "verym"          "somewhatm" "notverym"
> [21] "veryc"          "somewhatc"      "notveryc" "veryl"
> [25] "somewhatl"      "notveryl"       "age" "village.r"
> [29] "married.r"      "social.r"       "agricultural.r" "communist.r"
> [33] "minority.r"     "religious.r"    "male.r" "education.r"
>  > j<-grep(".r\\b",colnames(mydata),value=TRUE); j
> [1] "village.r"      "married.r"      "social.r" "agricultural.r"
> [5] "communist.r"    "minority.r"     "religious.r" "male.r"
> [9] "education.r"
>  > j<-c(x,j); j
>   [1] "age"            "sleep"          "primary" "middle"
>   [5] "high"           "somewhath"      "veryh" "somewhatm"
>   [9] "verym"          "somewhatc"      "veryc" "somewhatl"
> [13] "veryl"          "village"        "married" "social"
> [17] "agricultural"   "communist"      "minority" "religious"
> [21] "village.r"      "married.r"      "social.r" "agricultural.r"
> [25] "communist.r"    "minority.r"     "religious.r" "male.r"
> [29] "education.r"
>  > data<-mydata[j]
>  > cbind(
> +   dstat(subset(data,male.r==1))[,1:2],
> +   dstat(subset(data,male.r==0))[,1:2]
> + )
> Sample statistics (Weighted =  FALSE )
> 
> Sample statistics (Weighted =  FALSE )
> 
>                  Mean Std.dev  Mean Std.dev
> age            6.279   0.841 6.055   0.813
> sleep          6.483   1.804 6.087   2.045
> primary        0.452   0.498 0.408   0.491
> middle         0.287   0.453 0.176   0.381
> high           0.171   0.377 0.082   0.275
> somewhath      0.522   0.500 0.447   0.497
> veryh          0.254   0.435 0.250   0.433
> somewhatm      0.419   0.493 0.460   0.498
> verym          0.544   0.498 0.411   0.492
> somewhatc      0.376   0.484 0.346   0.476
> veryc          0.593   0.491 0.615   0.487
> somewhatl      0.544   0.498 0.504   0.500
> veryl          0.390   0.488 0.389   0.487
> village        0.757   0.429 0.752   0.432
> married        0.936   0.245 0.906   0.291
> social         0.538   0.499 0.528   0.499
> agricultural   0.780   0.414 0.826   0.379
> communist      0.178   0.383 0.038   0.190
> minority       0.071   0.256 0.081   0.273
> religious      0.088   0.284 0.102   0.302
> village.r      0.243   0.429 0.248   0.432
> married.r      0.064   0.245 0.094   0.291
> social.r       0.462   0.499 0.472   0.499
> agricultural.r 0.220   0.414 0.174   0.379
> communist.r    0.822   0.383 0.962   0.190
> minority.r     0.929   0.256 0.919   0.273
> religious.r    0.912   0.284 0.898   0.302
> male.r         1.000   0.000 0.000   0.000
> education.r    0.090   0.286 0.334   0.472
>  >
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,

The metacharacters reference is the documentation ?regex.
If you want to know whether there are more metacharacters similar to \b,
there are \< and \>. low are examples of using them instead of \b.

Also, the pattern '.r' does not match a period followed by an 'r', the 
period matches any character ('.'). To match a literal period you must 
escape it. The correct regex is '\\.r'.

x <- c("age", "sleep", "primary", "middle", "high", "somewhath", "veryh",
        "somewhatm", "verym", "somewhatc", "veryc", "somewhatl", "veryl",
        "village", "married", "social", "agricultural", "communist",
        "minority", "religious")
colnms <- c("depression", "sleep", "female", "village", "agricultural",
             "married", "communist", "minority", "religious", "social", 
"no",
             "primary", "middle", "high", "veryh", "somewhath", "notveryh",
             "verym", "somewhatm", "notverym", "veryc", "somewhatc", 
"notveryc",
             "veryl", "somewhatl", "notveryl", "age", "village.r", 
"married.r",
             "social.r", "agricultural.r", "communist.r", "minority.r", 
"religious.r",
             "male.r", "education.r")

grep("\\.r\\b", colnms, value = TRUE)
#> [1] "village.r"      "married.r"      "social.r"       "agricultural.r"
#> [5] "communist.r"    "minority.r"     "religious.r"    "male.r"
#> [9] "education.r"
# the same as above
# \\> matches the empty string at the end of a word,
# \\b matches the empty string at both ends of a word
grep("\\.r\\>", colnms, value = TRUE)
#> [1] "village.r"      "married.r"      "social.r"       "agricultural.r"
#> [5] "communist.r"    "minority.r"     "religious.r"    "male.r"
#> [9] "education.r"

# 4 col names have a 'm' and end in '.r' therefore 4 matches
grep("m.*\\.r\\>", colnms, value = TRUE)
#> [1] "married.r"   "communist.r" "minority.r"  "male.r"
# only the strings starting with 'm'
grep("\\bm.*\\.r\\b", colnms, value = TRUE)
#> [1] "married.r"  "minority.r" "male.r"
grep("\\<m.*\\.r\\>", colnms, value = TRUE)
#> [1] "married.r"  "minority.r" "male.r"

Hope this helps,

Rui Barradas

-- 
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com