[R] problem for strsplit function

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sat Jul 10 05:10:57 CEST 2021


A bit too fast there, Duncan... x[[c(1,2)]] is illegal.

On July 9, 2021 5:16:13 PM PDT, Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>On 09/07/2021 6:44 p.m., Bert Gunter wrote:
>> OK, I stand somewhat chastised.
>> 
>> But my point still is that what you get when you "extract" depends on
>> how you define "extract." Do note that ?"[" yields a help file titled
>> "Extract or Replace Parts of an object"; and afaics, the term
>"subset"
>> is not explicitly used as Duncan prefers.
>
>?"[[" gives you the same page, but I agree:  this part of the 
>documentation isn't written very clearly. The "Introduction to R"
>manual 
>uses the terms I used (see section 2.7, "Index vectors; selecting and 
>modifying subsets of a data set"), as does the source code (and the R 
>Language Definition manual, though it's not as clear as the Intro).
>
>But the point isn't to chastise you, it's to educate you (and the OP). 
>Thinking of [] as subsetting is more helpful than thinking of it as 
>extraction.  That way the result of x[c(1,2)] makes sense.  It's a 
>little bit more of a stretch, but the result of x[[c(1,2)]] also makes 
>sense when you think of it as extraction.
>
>Duncan Murdoch
>
>  The relevant part of the
>> Help file says for "[" for recursive objects says: "Indexing by [ is
>> similar to atomic vectors and selects a list of the specified
>> element(s)."  That a data.frame is a list is explicitly stated, as I
>> noted; that lists are in fact vectors is also explicitly stated
>(?list
>> says: "Almost all lists in R internally are Generic Vectors") but
>then
>> one is stuck with: a data.frame is a list and therefore a vector, but
>> is.vector(d3) is FALSE. The explanation is explicit again in
>> ?is.vector ("is.vector returns TRUE if x is a vector of the specified
>> mode having no attributes other than names. It returns FALSE
>> otherwise."). But I would say these issues are sufficiently murky
>that
>> my warning to be precise is not entirely inappropriate;
>unfortunately,
>> I may have made them more so. Sigh....
>> 
>> Cheers,
>> Bert
>> 
>> 
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming
>along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch
><murdoch.duncan using gmail.com> wrote:
>>>
>>> On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
>>>> "Strictly speaking", Greg is correct, Bert.
>>>>
>>>>
>https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects
>>>>
>>>> Lists in R are vectors. What we colloquially refer to as "vectors"
>are more precisely referred to as "atomic vectors". And without a
>doubt, this "vector" nature of lists is a key underlying concept that
>explains why adding a dim attribute creates a matrix that can hold data
>frames. It is also a stumbling block for programmers from other
>languages that have things like linked lists.
>>>
>>> I would also object to v3 (below) as "extracting" a column from d.
>>> "d[2]" doesn't extract anything, it "subsets" the data frame, so the
>>> result is a data frame, not what you get when you extract something
>from
>>> a data frame.
>>>
>>> People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly
>legal.
>>> That extracts the 3rd element (the number 3).  The problem is that R
>has
>>> no way to represent a scalar number, only a vector of numbers, so
>x[[3]]
>>> gets promoted to a vector containing that number when it is returned
>and
>>> assigned to y.
>>>
>>> Lists are vectors of R objects, so if x is a list, x[[3]] is
>something
>>> that can be returned, and it is different from x[3].
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> On July 9, 2021 2:36:19 PM PDT, Bert Gunter
><bgunter.4567 using gmail.com> wrote:
>>>>> "1.  a column, when extracted from a data frame, *is* a vector."
>>>>> Strictly speaking, this is false; it depends on exactly what is
>meant
>>>>> by "extracted." e.g.:
>>>>>
>>>>>> d <- data.frame(col1 = 1:3, col2 = letters[1:3])
>>>>>> v1 <- d[,2] ## a vector
>>>>>> v2 <- d[[2]] ## the same, i.e
>>>>>> identical(v1,v2)
>>>>> [1] TRUE
>>>>>> v3 <- d[2] ## a data.frame
>>>>>> v1
>>>>> [1] "a" "b" "c"  ## a character vector
>>>>>> v3
>>>>>    col2
>>>>> 1    a
>>>>> 2    b
>>>>> 3    c
>>>>>> is.vector(v1)
>>>>> [1] TRUE
>>>>>> is.vector(v3)
>>>>> [1] FALSE
>>>>>> class(v3)  ## data.frame
>>>>> [1] "data.frame"
>>>>> ## but
>>>>>> is.list(v3)
>>>>> [1] TRUE
>>>>>
>>>>> which is simply explained in ?data.frame (where else?!) by:
>>>>> "A data frame is a **list** [emphasis added] of variables of the
>same
>>>>> number of rows with unique row names, given class "data.frame". If
>no
>>>>> variables are included, the row names determine the number of
>rows."
>>>>>
>>>>> "2.  maybe your question is "is a given function for a vector, or
>for a
>>>>>      data frame/matrix/array?".  if so, i think the only way is
>reading
>>>>>      the help information (?foo)."
>>>>>
>>>>> Indeed! Is this not what the Help system is for?! But note also
>that
>>>>> the S3 class system may somewhat blur the issue: foo() may work
>>>>> appropriately and differently for different (S3) classes of
>objects. A
>>>>> detailed explanation of this behavior can be found in appropriate
>>>>> resources or (more tersely) via ?UseMethod .
>>>>>
>>>>> "you might find reading ?"[" and  ?"[.data.frame" useful"
>>>>>
>>>>> Not just 'useful" -- **essential** if you want to work in R,
>unless
>>>>> one gets this information via any of the numerous online
>tutorials,
>>>>> courses, or books that are available. The Help system is accurate
>and
>>>>> authoritative, but terse. I happen to like this mode of
>documentation,
>>>>> but others may prefer more extended expositions. I stand by this
>claim
>>>>> even if one chooses to use the "Tidyverse", data.table package, or
>>>>> other alternative frameworks for handling data. Again, others may
>>>>> disagree, but R is structured around these basics, and imo one
>remains
>>>>> ignorant of them at their peril.
>>>>>
>>>>> Cheers,
>>>>> Bert
>>>>>
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people keep coming
>along
>>>>> and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>
>>>>> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall using umich.edu>
>>>>> wrote:
>>>>>>
>>>>>> Kai,
>>>>>>
>>>>>>> one more question, how can I know if the function is for column
>>>>>>> manipulations or for vector?
>>>>>>
>>>>>> i still stumble around R code.  but, i'd say the following (and
>look
>>>>>> forward to being corrected! :):
>>>>>>
>>>>>> 1.  a column, when extracted from a data frame, *is* a vector.
>>>>>>
>>>>>> 2.  maybe your question is "is a given function for a vector, or
>for
>>>>> a
>>>>>>       data frame/matrix/array?".  if so, i think the only way is
>>>>> reading
>>>>>>       the help information (?foo).
>>>>>>
>>>>>> 3.  sometimes, extracting the column as a vector from a data
>>>>> frame-like
>>>>>>       object might be non-intuitive.  you might find reading ?"["
>and
>>>>>>       ?"[.data.frame" useful (as well as ?"[.data.table" if you
>use
>>>>> that
>>>>>>       package).  also, the str() command can be helpful in
>>>>> understanding
>>>>>>       what is happening.  (the lobstr:: package's sxp() function,
>as
>>>>> well
>>>>>>       as more verbose .Internal(inspect()) can also give you
>insight.)
>>>>>>
>>>>>>       with the data.table:: package, for example, if "DT" is a
>>>>> data.table
>>>>>>       object, with "x2" as a column, adding or leaving off
>quotation
>>>>> marks
>>>>>>       for the column name can make all the difference between
>ending up
>>>>>>       with a vector, or with a (much reduced) data table:
>>>>>> ----
>>>>>>> is.vector(DT[, x2])
>>>>>> [1] TRUE
>>>>>>> str(DT[, x2])
>>>>>>    num [1:9] 32 32 32 32 32 32 32 32 32
>>>>>>>
>>>>>>> is.vector(DT[, "x2"])
>>>>>> [1] FALSE
>>>>>>> str(DT[, "x2"])
>>>>>> Classes ‘data.table’ and 'data.frame':  9 obs. of  1 variable:
>>>>>>    $ x2: num  32 32 32 32 32 32 32 32 32
>>>>>>    - attr(*, ".internal.selfref")=<externalptr>
>>>>>> ----
>>>>>>
>>>>>>       a second level of indexing may or may not help, mostly
>depending
>>>>> on
>>>>>>       the use of '[' versus of '[['.  this can sometimes cause
>>>>> confusion
>>>>>>       when you are learning the language.
>>>>>> ----
>>>>>>> str(DT[, "x2"][1])
>>>>>> Classes ‘data.table’ and 'data.frame':  1 obs. of  1 variable:
>>>>>>    $ x2: num 32
>>>>>>    - attr(*, ".internal.selfref")=<externalptr>
>>>>>>> str(DT[, "x2"][[1]])
>>>>>>    num [1:9] 32 32 32 32 32 32 32 32 32
>>>>>> ----
>>>>>>
>>>>>>       the tibble:: package (used in, e.g., the dplyr:: package)
>also
>>>>>>       (always?) returns a single column as a non-vector.  again,
>a
>>>>>>       second indexing with double '[[]]' can produce a vector.
>>>>>> ----
>>>>>>> DP <- tibble(DT)
>>>>>>> is.vector(DP[, "x2"])
>>>>>> [1] FALSE
>>>>>>> is.vector(DP[, "x2"][[1]])
>>>>>> [1] TRUE
>>>>>> ----
>>>>>>
>>>>>>       but, note that a list of lists is also a vector:
>>>>>>> is.vector(list(list(1), list(1,2,3)))
>>>>>> [1] TRUE
>>>>>>> str(list(list(1), list(1,2,3)))
>>>>>> List of 2
>>>>>>    $ :List of 1
>>>>>>     ..$ : num 1
>>>>>>    $ :List of 3
>>>>>>     ..$ : num 1
>>>>>>     ..$ : num 2
>>>>>>     ..$ : num 3
>>>>>>
>>>>>>       etc.
>>>>>>
>>>>>> hth.  good luck learning!
>>>>>>
>>>>>> cheers, Greg
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible
>code.
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list