[R] grep
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri Aug 2 06:28:50 CEST 2024
Às 02:10 de 02/08/2024, Steven Yen escreveu:
> Good Morning. Below I like statement like
>
> j<-grep(".r\\b",colnames(mydata),value=TRUE); j
>
> with the \\b option which I read long time ago which Ive found useful.
>
> Are there more or these options, other than ? grep? Thanks.
>
> dstat is just my own descriptive routine.
>
> > x
> [1] "age" "sleep" "primary" "middle"
> [5] "high" "somewhath" "veryh" "somewhatm"
> [9] "verym" "somewhatc" "veryc" "somewhatl"
> [13] "veryl" "village" "married" "social"
> [17] "agricultural" "communist" "minority" "religious"
> > colnames(mydata)
> [1] "depression" "sleep" "female" "village"
> [5] "agricultural" "married" "communist" "minority"
> [9] "religious" "social" "no" "primary"
> [13] "middle" "high" "veryh" "somewhath"
> [17] "notveryh" "verym" "somewhatm" "notverym"
> [21] "veryc" "somewhatc" "notveryc" "veryl"
> [25] "somewhatl" "notveryl" "age" "village.r"
> [29] "married.r" "social.r" "agricultural.r" "communist.r"
> [33] "minority.r" "religious.r" "male.r" "education.r"
> > j<-grep(".r\\b",colnames(mydata),value=TRUE); j
> [1] "village.r" "married.r" "social.r" "agricultural.r"
> [5] "communist.r" "minority.r" "religious.r" "male.r"
> [9] "education.r"
> > j<-c(x,j); j
> [1] "age" "sleep" "primary" "middle"
> [5] "high" "somewhath" "veryh" "somewhatm"
> [9] "verym" "somewhatc" "veryc" "somewhatl"
> [13] "veryl" "village" "married" "social"
> [17] "agricultural" "communist" "minority" "religious"
> [21] "village.r" "married.r" "social.r" "agricultural.r"
> [25] "communist.r" "minority.r" "religious.r" "male.r"
> [29] "education.r"
> > data<-mydata[j]
> > cbind(
> + dstat(subset(data,male.r==1))[,1:2],
> + dstat(subset(data,male.r==0))[,1:2]
> + )
> Sample statistics (Weighted = FALSE )
>
> Sample statistics (Weighted = FALSE )
>
> Mean Std.dev Mean Std.dev
> age 6.279 0.841 6.055 0.813
> sleep 6.483 1.804 6.087 2.045
> primary 0.452 0.498 0.408 0.491
> middle 0.287 0.453 0.176 0.381
> high 0.171 0.377 0.082 0.275
> somewhath 0.522 0.500 0.447 0.497
> veryh 0.254 0.435 0.250 0.433
> somewhatm 0.419 0.493 0.460 0.498
> verym 0.544 0.498 0.411 0.492
> somewhatc 0.376 0.484 0.346 0.476
> veryc 0.593 0.491 0.615 0.487
> somewhatl 0.544 0.498 0.504 0.500
> veryl 0.390 0.488 0.389 0.487
> village 0.757 0.429 0.752 0.432
> married 0.936 0.245 0.906 0.291
> social 0.538 0.499 0.528 0.499
> agricultural 0.780 0.414 0.826 0.379
> communist 0.178 0.383 0.038 0.190
> minority 0.071 0.256 0.081 0.273
> religious 0.088 0.284 0.102 0.302
> village.r 0.243 0.429 0.248 0.432
> married.r 0.064 0.245 0.094 0.291
> social.r 0.462 0.499 0.472 0.499
> agricultural.r 0.220 0.414 0.174 0.379
> communist.r 0.822 0.383 0.962 0.190
> minority.r 0.929 0.256 0.919 0.273
> religious.r 0.912 0.284 0.898 0.302
> male.r 1.000 0.000 0.000 0.000
> education.r 0.090 0.286 0.334 0.472
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,
The metacharacters reference is the documentation ?regex.
If you want to know whether there are more metacharacters similar to \b,
there are \< and \>. low are examples of using them instead of \b.
Also, the pattern '.r' does not match a period followed by an 'r', the
period matches any character ('.'). To match a literal period you must
escape it. The correct regex is '\\.r'.
x <- c("age", "sleep", "primary", "middle", "high", "somewhath", "veryh",
"somewhatm", "verym", "somewhatc", "veryc", "somewhatl", "veryl",
"village", "married", "social", "agricultural", "communist",
"minority", "religious")
colnms <- c("depression", "sleep", "female", "village", "agricultural",
"married", "communist", "minority", "religious", "social",
"no",
"primary", "middle", "high", "veryh", "somewhath", "notveryh",
"verym", "somewhatm", "notverym", "veryc", "somewhatc",
"notveryc",
"veryl", "somewhatl", "notveryl", "age", "village.r",
"married.r",
"social.r", "agricultural.r", "communist.r", "minority.r",
"religious.r",
"male.r", "education.r")
grep("\\.r\\b", colnms, value = TRUE)
#> [1] "village.r" "married.r" "social.r" "agricultural.r"
#> [5] "communist.r" "minority.r" "religious.r" "male.r"
#> [9] "education.r"
# the same as above
# \\> matches the empty string at the end of a word,
# \\b matches the empty string at both ends of a word
grep("\\.r\\>", colnms, value = TRUE)
#> [1] "village.r" "married.r" "social.r" "agricultural.r"
#> [5] "communist.r" "minority.r" "religious.r" "male.r"
#> [9] "education.r"
# 4 col names have a 'm' and end in '.r' therefore 4 matches
grep("m.*\\.r\\>", colnms, value = TRUE)
#> [1] "married.r" "communist.r" "minority.r" "male.r"
# only the strings starting with 'm'
grep("\\bm.*\\.r\\b", colnms, value = TRUE)
#> [1] "married.r" "minority.r" "male.r"
grep("\\<m.*\\.r\\>", colnms, value = TRUE)
#> [1] "married.r" "minority.r" "male.r"
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list