[R] Problem with data distribution
John Fox
j|ox @end|ng |rom mcm@@ter@c@
Thu Feb 17 20:31:30 CET 2022
Dear Neha gupta,
I hope that I'm not overstepping my role when I say that googling
solutions to specific problems isn't an inefficient way to learn a
programming language, and will probably waste your time in the long run.
There are many good introductions to R.
Best,
John
On 2022-02-17 2:27 p.m., Neha gupta wrote:
> Dear John, thanks a lot for the detailed answer.
>
> Yes, I am not an expert in R language and when a problem comes in, I
> google it or post it on these forums. (I have just a little bit
> experience of ML in R).
>
>
>
> On Thu, Feb 17, 2022 at 8:21 PM John Fox <jfox using mcmaster.ca
> <mailto:jfox using mcmaster.ca>> wrote:
>
> Dear Nega gupta,
>
> On 2022-02-17 1:54 p.m., Neha gupta wrote:
> > Hello everyone
> >
> > I have a dataset with output variable "bug" having the following
> values (at
> > the bottom of this email). My advisor asked me to provide data
> distribution
> > of bugs with 0 values and bugs with more than 0 values.
> >
> > data = readARFF("synapse.arff")
> > data2 = readARFF("synapse.arff")
> > data$bug
> > library(tidyverse)
> > data %>%
> > filter(bug == 0)
> > data2 %>%
> > filter(bug >= 1)
> > boxplot(data2$bug, data$bug, range=0)
> >
> > But both the graphs are exactly the same, how is it possible?
> Where I am
> > doing wrong?
>
> As it turns out, you're doing several things wrong.
>
> First, you're not using pipes and filter() correctly. That is, you
> don't
> do anything with the filtered versions of the data sets. You're
> apparently under the incorrect impression that filtering modifies the
> original data set.
>
> Second, you're greatly complicating a simple problem. You don't need to
> read the data twice and keep two versions of the data set. As well,
> processing the data with pipes and filter() is entirely unnecessary.
> The
> following code works:
>
> with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0))
>
> Third, and most fundamentally, the parallel boxplots you're apparently
> trying to construct don't really make sense. The first "boxplot" is
> just
> a horizontal line at 0 and so conveys no information. Why not just plot
> the nonzero values if that's what you're interested in?
>
> Fourth, you didn't share your data in a convenient form. I was able to
> reconstruct them via
>
> bug <- scan()
> 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0
> 0 0 0
> 0 4 1 0
> 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1
> 1 0 0
> 0 0 0 0
> 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0
> 0 0 0
> 7 0 0 1
> 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0
> 0 0 0
> 0 1 0 0
> 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4
> 1 1 0
> 0 0 0 1
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
>
> data <- data.frame(bug)
>
> Finally, it's better not to post to the list in plain-text email,
> rather
> than html (as the posting guide suggests).
>
> I hope this helps,
> John
>
> >
> >
> > data$bug
> > [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0
> 1 0 0 0 0 0
> > 0 4 1 0
> > [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0
> 0 1 1 1 0 0
> > 0 0 0 0
> > [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5
> 0 0 0 0 0 0
> > 7 0 0 1
> > [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0
> 0 0 0 0 0
> > 0 1 0 0
> > [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1
> 0 4 1 1 0
> > 0 0 0 1
> > [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
> -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://socialsciences.mcmaster.ca/jfox/
> <https://socialsciences.mcmaster.ca/jfox/>
>
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
More information about the R-help
mailing list