[R] Problem with data distribution

John Fox j|ox @end|ng |rom mcm@@ter@c@
Thu Feb 17 20:31:30 CET 2022


Dear Neha gupta,

I hope that I'm not overstepping my role when I say that googling 
solutions to specific problems isn't an inefficient way to learn a 
programming language, and will probably waste your time in the long run. 
There are many good introductions to R.

Best,
  John

On 2022-02-17 2:27 p.m., Neha gupta wrote:
> Dear John, thanks a lot for the detailed answer.
> 
> Yes, I am not an expert in R language and when a problem comes in, I 
> google it or post it on these forums. (I have just a little bit 
> experience of ML in R).
> 
> 
> 
> On Thu, Feb 17, 2022 at 8:21 PM John Fox <jfox using mcmaster.ca 
> <mailto:jfox using mcmaster.ca>> wrote:
> 
>     Dear Nega gupta,
> 
>     On 2022-02-17 1:54 p.m., Neha gupta wrote:
>      > Hello everyone
>      >
>      > I have a dataset with output variable "bug" having the following
>     values (at
>      > the bottom of this email). My advisor asked me to provide data
>     distribution
>      > of bugs with 0 values and bugs with more than 0 values.
>      >
>      > data = readARFF("synapse.arff")
>      > data2 = readARFF("synapse.arff")
>      > data$bug
>      > library(tidyverse)
>      > data %>%
>      >    filter(bug == 0)
>      > data2 %>%
>      >    filter(bug >= 1)
>      > boxplot(data2$bug, data$bug, range=0)
>      >
>      > But both the graphs are exactly the same, how is it possible?
>     Where I am
>      > doing wrong?
> 
>     As it turns out, you're doing several things wrong.
> 
>     First, you're not using pipes and filter() correctly. That is, you
>     don't
>     do anything with the filtered versions of the data sets. You're
>     apparently under the incorrect impression that filtering modifies the
>     original data set.
> 
>     Second, you're greatly complicating a simple problem. You don't need to
>     read the data twice and keep two versions of the data set. As well,
>     processing the data with pipes and filter() is entirely unnecessary.
>     The
>     following code works:
> 
>          with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0))
> 
>     Third, and most fundamentally, the parallel boxplots you're apparently
>     trying to construct don't really make sense. The first "boxplot" is
>     just
>     a horizontal line at 0 and so conveys no information. Why not just plot
>     the nonzero values if that's what you're interested in?
> 
>     Fourth, you didn't share your data in a convenient form. I was able to
>     reconstruct them via
> 
>         bug <- scan()
>         0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0
>     0 0 0
>         0 4 1 0
>         0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1
>     1 0 0
>         0 0 0 0
>         1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0
>     0 0 0
>         7 0 0 1
>         0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0
>     0 0 0
>         0 1 0 0
>         0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4
>     1 1 0
>         0 0 0 1
>         0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
> 
>         data <- data.frame(bug)
> 
>     Finally, it's better not to post to the list in plain-text email,
>     rather
>     than html (as the posting guide suggests).
> 
>     I hope this helps,
>        John
> 
>      >
>      >
>      > data$bug
>      >    [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0
>     1 0 0 0 0 0
>      > 0 4 1 0
>      >   [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0
>     0 1 1 1 0 0
>      > 0 0 0 0
>      >   [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5
>     0 0 0 0 0 0
>      > 7 0 0 1
>      > [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0
>     0 0 0 0 0
>      > 0 1 0 0
>      > [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1
>     0 4 1 1 0
>      > 0 0 0 1
>      > [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
>      >
>      >       [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>      > https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>      > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     <http://www.R-project.org/posting-guide.html>
>      > and provide commented, minimal, self-contained, reproducible code.
>     -- 
>     John Fox, Professor Emeritus
>     McMaster University
>     Hamilton, Ontario, Canada
>     web: https://socialsciences.mcmaster.ca/jfox/
>     <https://socialsciences.mcmaster.ca/jfox/>
> 
-- 
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/



More information about the R-help mailing list