[R] Problem with filling dataframe's column

Bill Dunlap w||||@mwdun|@p @end|ng |rom gm@||@com
Tue Jun 13 20:00:33 CEST 2023


It is safer to use !grepl(...) instead of -grep(...) here.  If there are no
matches, the latter will give you a zero-row data.frame while the former
gives you the entire data.frame.

E.g.,

> d <- data.frame(a=c("one","two","three"), b=c(10,20,30))
> d[-grep("Q", d$a),]
[1] a b
<0 rows> (or 0-length row.names)
> d[!grepl("Q", d$a),]
      a  b
1   one 10
2   two 20
3 three 30

-Bill

On Tue, Jun 13, 2023 at 6:19 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:

> Às 17:18 de 13/06/2023, javad bayat escreveu:
> > Dear Rui;
> > Hi. I used your codes, but it seems it didn't work for me.
> >
> >> pat <- c("_esmdes|_Des Section|0")
> >> dim(data2)
> >      [1]  281549      9
> >> grep(pat, data2$Layer)
> >> dim(data2)
> >      [1]  281549      9
> >
> > What does grep function do? I expected the function to remove 3 rows of
> the
> > dataframe.
> > I do not know the reason.
> >
> >
> >
> >
> >
> >
> > On Mon, Jun 12, 2023 at 5:16 PM Rui Barradas <ruipbarradas using sapo.pt>
> wrote:
> >
> >> Às 23:13 de 12/06/2023, javad bayat escreveu:
> >>> Dear Rui;
> >>> Many thanks for the email. I tried your codes and found that the length
> >> of
> >>> the "Values" and "Names" vectors must be equal, otherwise the results
> >> will
> >>> not be useful.
> >>> For some of the characters in the Layer column that I do not need to be
> >>> filled in the LU column, I used "NA".
> >>> But I need to delete some of the rows from the table as they are
> useless
> >>> for me. I tried this code to delete entire rows of the dataframe which
> >>> contained these three value in the Layer column: It gave me the
> following
> >>> error.
> >>>
> >>>> data3 = data2[-grep(c("_esmdes","_Des Section","0"), data2$Layer),]
> >>>        Warning message:
> >>>         In grep(c("_esmdes", "_Des Section", "0"), data2$Layer) :
> >>>         argument 'pattern' has length > 1 and only the first element
> will
> >> be
> >>> used
> >>>
> >>>> data3 = data2[!grepl(c("_esmdes","_Des Section","0"), data2$Layer),]
> >>>       Warning message:
> >>>       In grepl(c("_esmdes", "_Des Section", "0"), data2$Layer) :
> >>>       argument 'pattern' has length > 1 and only the first element
> will be
> >>> used
> >>>
> >>> How can I do this?
> >>> Sincerely
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sun, Jun 11, 2023 at 5:03 PM Rui Barradas <ruipbarradas using sapo.pt>
> >> wrote:
> >>>
> >>>> Às 13:18 de 11/06/2023, Rui Barradas escreveu:
> >>>>> Às 22:54 de 11/06/2023, javad bayat escreveu:
> >>>>>> Dear Rui;
> >>>>>> Many thanks for your email. I used one of your codes,
> >>>>>> "data2$LU[which(data2$Layer == "Level 12")] <- "Park"", and it works
> >>>>>> correctly for me.
> >>>>>> Actually I need to expand the codes so as to consider all "Levels"
> in
> >>>> the
> >>>>>> "Layer" column. There are more than hundred levels in the Layer
> >> column.
> >>>>>> If I use your provided code, I have to write it hundred of time as
> >>>> below:
> >>>>>> data2$LU[which(data2$Layer == "Level 1")] <- "Park";
> >>>>>> data2$LU[which(data2$Layer == "Level 2")] <- "Agri";
> >>>>>> ...
> >>>>>> ...
> >>>>>> ...
> >>>>>> .
> >>>>>> Is there any other way to expand the code in order to consider all
> of
> >>>> the
> >>>>>> levels simultaneously? Like the below code:
> >>>>>> data2$LU[which(data2$Layer == c("Level 1","Level 2", "Level 3",
> ...))]
> >>>> <-
> >>>>>> c("Park", "Agri", "GS", ...)
> >>>>>>
> >>>>>>
> >>>>>> Sincerely
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Jun 11, 2023 at 1:43 PM Rui Barradas <ruipbarradas using sapo.pt>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Às 21:05 de 11/06/2023, javad bayat escreveu:
> >>>>>>>> Dear R users;
> >>>>>>>> I am trying to fill a column based on a specific value in another
> >>>>>>>> column
> >>>>>>> of
> >>>>>>>> a dataframe, but it seems there is a problem with the codes!
> >>>>>>>> The "Layer" and the "LU" are two different columns of the
> dataframe.
> >>>>>>>> How can I fix this?
> >>>>>>>> Sincerely
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> for (i in 1:nrow(data2$Layer)){
> >>>>>>>>               if (data2$Layer == "Level 12") {
> >>>>>>>>                   data2$LU == "Park"
> >>>>>>>>                   }
> >>>>>>>>               }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> There are two bugs in your code,
> >>>>>>>
> >>>>>>> 1) the index i is not used in the loop
> >>>>>>> 2) the assignment operator is `<-`, not `==`
> >>>>>>>
> >>>>>>>
> >>>>>>> Here is the loop corrected.
> >>>>>>>
> >>>>>>> for (i in 1:nrow(data2$Layer)){
> >>>>>>>       if (data2$Layer[i] == "Level 12") {
> >>>>>>>         data2$LU[i] <- "Park"
> >>>>>>>       }
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> But R is a vectorized language, the following two ways are the
> >> idiomac
> >>>>>>> ways of doing what you want to do.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> i <- data2$Layer == "Level 12"
> >>>>>>> data2$LU[i] <- "Park"
> >>>>>>>
> >>>>>>> # equivalent one-liner
> >>>>>>> data2$LU[data2$Layer == "Level 12"] <- "Park"
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> If there are NA's in data2$Layer it's probably safer to use
> ?which()
> >> in
> >>>>>>> the logical index, to have a numeric one.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> i <- which(data2$Layer == "Level 12")
> >>>>>>> data2$LU[i] <- "Park"
> >>>>>>>
> >>>>>>> # equivalent one-liner
> >>>>>>> data2$LU[which(data2$Layer == "Level 12")] <- "Park"
> >>>>>>>
> >>>>>>>
> >>>>>>> Hope this helps,
> >>>>>>>
> >>>>>>> Rui Barradas
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>> Hello,
> >>>>>
> >>>>> You don't need to repeat the same instruction 100+ times, there is a
> >> way
> >>>>> of assigning all new LU values at the same time with match().
> >>>>> This assumes that you have the new values in a vector.
> >>>>
> >>>> Sorry, this is not clear. I mean
> >>>>
> >>>>
> >>>> This assumes that you have the new values in a vector, the vector
> Names
> >>>> below. The vector of values to be matched is created from the data.
> >>>>
> >>>>
> >>>> Rui Barradas
> >>>>
> >>>>>
> >>>>>
> >>>>> Values <- sort(unique(data2$Layer))
> >>>>> Names <- c("Park", "Agri", "GS")
> >>>>>
> >>>>> i <- match(data2$Layer, Values)
> >>>>> data2$LU <- Names[i]
> >>>>>
> >>>>>
> >>>>> Hope this helps,
> >>>>>
> >>>>> Rui Barradas
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> >>>>> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>>
> >>>
> >> Hello,
> >>
> >> Please cc the r-help list, R-Help is threaded and this can in the future
> >> be helpful to others.
> >>
> >> You can combine several patters like this:
> >>
> >>
> >> pat <- c("_esmdes|_Des Section|0")
> >> grep(pat, data2$Layer)
> >>
> >> or, programatically,
> >>
> >>
> >> pat <- paste(c("_esmdes","_Des Section","0"), collapse = "|")
> >>
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >>
> >
> Hello,
>
> I only posted a corrected grep statement, the complete code should be
>
>
> pat <- c("_esmdes|_Des Section|0")
> data3 <- data2[-grep(pat, data2$Layer),]
>
>
> Sorry for the confusion.
>
> Hope this helps,
>
> Rui Barradas
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list