[R] Patterns on postal codes
Frede Aakmann Tøgersen
frtog at vestas.com
Wed Jan 8 07:42:33 CET 2014
Hi
Something like this.
## 4 valid zips + 4 invalid zips
zipcode <- c("22942-0173", "32601", "N9YZE6", "S7V 1J9", "0022942-0173", "32-601", "NN9YZE6", "S7V 1J9")
tmp <- gsub("[[:space:]]", "_", zipcode)
tmp <- gsub("[[:alpha:]]", "A", tmp)
tmp <- gsub("[[:digit:]]", "N", tmp)
tmp
## [1] "NNNNN-NNNN" "NNNNN" "ANAAAN" "ANA_NAN" "NNNNNNN-NNNN"
## [6] "NN-NNN" "AANAAAN" "ANA__NAN"
patterns <- c("NNNNN-NNNN", "NNNNN", "ANAAAN", "ANA_NAN")
zipcode[tmp %in% patterns]
## [1] "22942-0173" "32601" "N9YZE6" "S7V 1J9"
zipcode[!tmp %in% patterns]
## [1] "0022942-0173" "32-601" "NN9YZE6" "S7V 1J9"
Yours sincerely / Med venlig hilsen
Frede Aakmann Tøgersen
Specialist, M.Sc., Ph.D.
Plant Performance & Modeling
Technology & Service Solutions
T +45 9730 5135
M +45 2547 6050
frtog at vestas.com
http://www.vestas.com
Company reg. name: Vestas Wind Systems A/S
This e-mail is subject to our e-mail disclaimer statement.
Please refer to www.vestas.com/legal/notice
If you have received this e-mail in error please contact the sender.
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Jeff Johnson
> Sent: 8. januar 2014 00:11
> To: r-help at r-project.org
> Subject: [R] Patterns on postal codes
>
> Hi all,
>
> I'm pretty new to R and have a question. I have a postal_code field which
> can have a variety of values such as:
> For US postal codes: 22942-0173 or 32601
> For Canada postal codes: N9YZE6 or S7V 1J9
>
> What I want to do is represent these as patterns, such as:
> US: NNNNN-NNNN or NNNNN
> Canada: ANAAAN or ANA NAN
> where N = any number and A = any alpha character, space = space, etc (other
> characters such as ' should be represented as '.
>
> Ultimately I want to count these to see how many have a pattern of
> NNNNN-NNNN, ANA NAN, etc so that I can visualize the outliers.
>
> Does anyone know if there is a built-in function in R to do this?
> Currently, the str() function on the postal_code field shows a factor with
> 90,993 levels which isn't particularly helpful.
>
> Thanks in advance!
>
> --
> Jeff
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list