[R] - Trying to replicate VLOOKUP in R - help needed

Mitchell Maltenfort mm@|ten @end|ng |rom gm@||@com
Mon Nov 16 22:41:22 CET 2020


ASSIGNED_COMPANY[grep("ADELPHI",NAME)] <-"NEC ADELPHI" is what I'd try

On Mon, Nov 16, 2020 at 4:27 PM Andrew Robinson <apro using unimelb.edu.au> wrote:

> Hi Gregg,
>
> it's not clear from your context if all of ASSIGNED _COMPANY is NA or what
> the classes of the objects are.  Try the following ideas, none of which are
> tested.  I assume that the data set is called location.
>
> location$ASSIGNED_COMPANY <- as.character(location$NAME)
>
> is.a.FORT <- substr(location$ASSIGNED_COMPANY, 1, 4) == "FORT"
>
> location$ASSIGNED_COMPANY[!is.a.FORT] <-
>   sapply(location$ASSIGNED_COMPANY[!is.a.FORT],
>   function(x) strsplit(x)[[1]][[1]]) # retains first name if not a fort
>
> location$ASSIGNED_COMPANY[is.a.FORT] <-
>   substr(location$ASSIGNED_COMPANY[is.a.FORT], 6,
>   nchar(location$ASSIGNED _COMPANY[is.a.FORT])) # Strips FORT from Forts
>
> substr(location$ASSIGNED_COMPANY, 2, nchar(location$ASSIGNED_COMPANY)) <-
>   tolower(substr(location$ASSIGNED _COMPANY, 2,
>   nchar(location$ASSIGNED _COMPANY))) # lower case word
>
> location$ASSIGNED_COMPANY <- paste("NEC", location$ASSIGNED_COMPANY)
>
> or you can just do
>
> location$ASSIGNED_COMPANY[location$NAME == "ABERDEEN PROVING GROUND"] <-
> "NEC Aberdeen"
>
> for each option ....
>
> Cheers,
>
> Andrew
>
> --
> Andrew Robinson
> Director, CEBRA and Professor of Biosecurity,
> School/s of BioSciences and Mathematics & Statistics
> University of Melbourne, VIC 3010 Australia
> Tel: (+61) 0403 138 955
> Email: apro using unimelb.edu.au
> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
>
> I acknowledge the Traditional Owners of the land I inhabit, and pay my
> respects to their Elders.
> On Nov 17, 2020, 8:05 AM +1100, Gregg via R-help <R-help using r-project.org>,
> wrote:
> PROBLEM: I am trying to replicate something like a VLOOKUP in R but am
> having no success - need a bit of help.
>
> GIVEN DATA SET (data.table): (looks something like this, but much bigger)
>
> NAME TOTALAUTH ASSIGNED_COMPANY
> ABERDEEN PROVING GROUND 1 NA
> ADELPHI LABORATORY CENTER 1 NA
> CARLISLE BARRACKS 1 NA
> DETROIT ARSENAL 1 NA
> DUGWAY PROVING GROUND 1 NA
> FORT A P HILL 1 NA
> FORT BELVOIR 1 NA
> FORT BENNING 1 NA
> FORT BLISS 1 NA
> FORT BRAGG 1 NA
> FORT BUCHANAN 1 NA
>
>
> I am trying to update the values in the ASSIGNED_COMPANY column from NAs
> to a value that matches based on the "key" word like below.
>
> NAME TOTALAUTH ASSIGNED_COMPANY
> ABERDEEN PROVING GROUND 1 NEC Aberdeen
> ADELPHI LABORATORY CENTER 1 NEC Adelphi
> CARLISLE BARRACKS 1 NEC Carlise
> DETROIT ARSENAL 1 NEC Detroit
> DUGWAY PROVING GROUND 1 NEC Dugway
> FORT A P HILL 1 NEC AP Hill
> FORT BELVOIR 1 NEC Belvoir
> FORT BENNING 1 NEC Benning
> FORT BLISS 1 NEC Bliss
> FORT BRAGG 1 NEC Bragg
> FORT BUCHANAN 1 NEC Buchanon
>
>
> In a nutshell, for instance.......
>
> I want to search for the keyword "ABERDEEN" in the NAME column, and for
> every row where it exists, I want to update the NA in the ASSIGNED_COMPANY
> column to "NEC Aberdeen"
>
> I want to search for the keyword "ADELPHI" in the NAME column, and for
> every row where it exists, I want to update the NA in the ASSIGNED_COMPANY
> column to "NEC ADELPHI"
>
> ....... and so on for every value in the NAME column - so in the end a I
> have matching names in the ASSIGNED_COMPANY column.
>
> I can use an if statement because it is not vectorized.
>
> If I use an ifelse statement, the "else" rewrites any changes with ""
>
> Something so simple should not be difficult.
>
> Some of the methods I attempted to use are below along with the errors I
> get...
>
>
>
>
>
>
> ###################CODE#######################################
>
> library(data.table)
> library(dplyr)
> library(stringr)
>
>
> VLOOKUP_inR <- data.table::fread("DATASET_TESTINGONLY.csv")
>
> #METHOD 1 FAILS
> VLOOKUP_inR %>% dplyr::rename_if(grepl("ADELPHI", VLOOKUP_inR$NAME,
> useBytes = TRUE), "NEC Adelphi")
>
> Error in get(.x, .env, mode = "function") :
>
> object 'NEC Adelphi' of mode 'function' was not found
>
> #METHOD 2 FAILS
> if(stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) {
> VLOOKUP_inR$ASSIGNED_COMPANY == "NEC Adelphi"
> }
>
> Warning message:
> In if (stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { :
> the condition has length > 1 and only the first element will be used
>
>
> #METHOD 3 FAILS
> ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, "ADELPHI"),
> ASIP_combined_location_tally$ASSIGNED_COMPANY ==
> ASIP_combined_location_tally$ASSIGNED_COMPANY)
>
> Error in ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, :
>
> argument "no" is missing, with no default
>
> #METHOD4 FAILS
> VLOOKUP_inR_matching <- VLOOKUP_inR %>% mutate(ASSIGNED_COMPANY =
> ifelse(grepl(pattern = 'ABERDEEN', x = NAME), 'NEC Aberdeen', ''))
> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY =
> ifelse(grepl(pattern = 'ADELPHI', x = NAME), 'NEC Adelphi', ''))
>
> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY =
> ifelse(grepl(pattern = 'CARLISLE', x = NAME), 'NEC Carlisle Barracks', ''))
> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY =
> ifelse(grepl(pattern = 'DETROIT', x = NAME), 'NEC Detroit Arsenal', ''))
> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY =
> ifelse(grepl(pattern = 'BELVOIR', x = NAME), 'NEC Fort Belvoir', ''))
>
> -----------the 4th method just over writes all previous changers back to ""
>
>
>
>
>
> ######################################################################
>
> Any help offered would be so very greatly appreciated.
>
> Thanks you.
>
> r/
> gregg powell
> AZ
>
>
>
>
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list