[R] Loop to check for large dataset

Adrian Dușa dusa.adrian at unibuc.ro
Mon Oct 10 23:34:33 CEST 2016


Granted,, there are better solutions than my "KISS" (keep it simple and
stupid) example.

Hopefully, Christoph will have learned from both.

Best,
Adrian

On 10 Oct 2016 13:44, "PIKAL Petr" <petr.pikal at precheza.cz> wrote:

> Hi
>
>
>
> Given this example data, you can get same answer with less typing and
> without loops.
>
>
>
> res<-xtabs(~W+P+S,mydata)
>
> res1<-which(res==0, arr.ind=T)
>
> head(res1)
>
>       W P S
>
> 10   10 1 1
>
> 11   11 1 1
>
> 82   82 1 1
>
> 100 100 1 1
>
> 117 117 1 1
>
> 148 148 1 1
>
>
>
> Cheers
>
> Petr
>
>
>
>
>
> *From:* dusa.adrian at gmail.com [mailto:dusa.adrian at gmail.com] *On Behalf
> Of *Adrian Du?a
> *Sent:* Monday, October 10, 2016 12:26 PM
> *To:* Christoph Puschmann <c.puschmann at student.unsw.edu.au>
> *Cc:* r-help at r-project.org; PIKAL Petr <petr.pikal at precheza.cz>
> *Subject:* Re: [R] Loop to check for large dataset
>
>
>
> This is an example of how a reproducible code looks like, assuming you
> have three columns in your dataset named S (store), P (product) and W
> (week), and also assuming they have integer values from 1 to 19, 1 to 22
> and 1 to 157 respectively:
>
> #########
>
> mydata <- expand.grid(seq(19), seq(22), seq(157))
> names(mydata) <- c("S", "P", "W")
>
> # randomly delete 65626 - 63127 = 2499 rows
> set.seed(12345) # make it replicable
>
> mydata <- mydata[-sample(seq(nrow(mydata)), nrow(mydata) - 63127), ]
>
> #########
>
>
> Now the dataframe mydata contains exactly 63127 rows, just as in your
> case. The task is to find which weeks are missing, from which store and for
> which product.
>
> Below is a possible code to do that. Given you have a small number of
> stores and products, I'll keep it simple and stupid, by using for loops:
>
>
>
>
>
> #########
>
>
>
> result <- matrix(nrow = 0, ncol = 3)
>
>
>
> for (i in seq(19)) {
>
>     for (j in seq(22)) {
>
>         miss <- setdiff(seq(157), mydata$W[mydata$S == i & mydata$P == j])
>
>         if (length(miss) > 0) {
>
>             result <- rbind(result, cbind(S = i, P = j, W = miss))
>
>         }
>
>     }
>
> }
>
>
>
> # The result matrix contains 2499 rows that are missing.
>
>
>
> > head(result)
>
>      S P   W
>
> [1,] 1 1  10
>
> [2,] 1 1  11
>
> [3,] 1 1  82
>
> [4,] 1 1 100
>
> [5,] 1 1 117
>
> [6,] 1 1 148
>
>
>
> #########
>
>
>
>
>
> In this example, for S(tore) number 1 and P(roduct) number 1, you are
> missing W(eek) 10, 11, 82 and so on.
>
>
>
> In hoping you can adapt this code to your particular example,
>
> Adrian
>
>
>
> On Sun, Oct 9, 2016 at 2:26 AM, Christoph Puschmann <
> c.puschmann at student.unsw.edu.au> wrote:
> >
> > Dear Adrian,
> >
> > Yes it is a cyclical data set and theoretically it should repeat this
> interval until 61327. The data set itself is divided into 2 Parts:
> > 1. Product category (column 10)
> > 2. Number of Stores Participating (column 01)
> > Overall there are 22 different products and in each you have 19
> different stores participating. And theoretically each store over each
> product category should have a 1 - 157 week interval.
> >
> > The part I am struggling with is how do I run a loop over the whole data
> set, while checking if all stores participated 157 weeks over the different
> products.
> >
> > So far I came up with this:
> >
> > n=61327                           # Generate Matrix to check for values
> > Control = matrix(
> >   0,
> >   nrow = n,
> >   ncol = 1)
> >
> > s <- seq(from =1 , to = 157, by = 1)
> > CW = matrix(
> >   s,
> >   nrow = 157,
> >   ncol = 1
> > )
> >
> > colnames(CW)[1] <- ’s'
> >
> > CW = as.data.frame(CW)
> >
> > for (i in 1:nrow(FD)) {           # Let run trhough all the rows
> >   for (j in 1:157) {
> > if(FD$WEEk[j] == C$s[j]) {
> >   Control[i] = 1                 # coresponding control row = 1
> > } else {
> >   Control[i] = 0                 # corresponding control row = 0
> > }
> > }
> > }
> >
> > I coded a  MRE and attached an sample of my data set.
> >
> > MRE:
> >
> > #MRE
> >
> > dat <- data.frame(
> >   Store = c(rep(8, times = 157), rep(12, times = 157)),  # Number of
> stores
> >   WEEK = rep(seq(from=1, to = 157, by = 1), times = 2)
> > )
> >
> >
> >
> >
>
>
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr.90
> 050663 Bucharest sector 5
> Romania
>
> ------------------------------
> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
> určeny pouze jeho adresátům.
> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie
> vymažte ze svého systému.
> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email
> jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi
> či zpožděním přenosu e-mailu.
>
> V případě, že je tento e-mail součástí obchodního jednání:
> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout;
> Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany
> příjemce s dodatkem či odchylkou.
> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
> výslovným dosažením shody na všech jejích náležitostech.
> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
> společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn
> nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto
> emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich
> existence je adresátovi či osobě jím zastoupené známá.
>
> This e-mail and any documents attached to it may be confidential and are
> intended only for its intended recipients.
> If you received this e-mail by mistake, please immediately inform its
> sender. Delete the contents of this e-mail with all attachments and its
> copies from your system.
> If you are not the intended recipient of this e-mail, you are not
> authorized to use, disseminate, copy or disclose this e-mail in any manner.
> The sender of this e-mail shall not be liable for any possible damage
> caused by modifications of the e-mail or by delay with transfer of the
> email.
>
> In case that this e-mail forms part of business dealings:
> - the sender reserves the right to end negotiations about entering into a
> contract in any time, for any reason, and without stating any reasoning.
> - if the e-mail contains an offer, the recipient is entitled to
> immediately accept such offer; The sender of this e-mail (offer) excludes
> any acceptance of the offer on the part of the recipient containing any
> amendment or variation.
> - the sender insists on that the respective contract is concluded only
> upon an express mutual agreement on all its aspects.
> - the sender of this e-mail informs that he/she is not authorized to enter
> into any contracts on behalf of the company except for cases in which
> he/she is expressly authorized to do so in writing, and such authorization
> or power of attorney is submitted to the recipient or the person
> represented by the recipient, or the existence of such authorization is
> known to the recipient of the person represented by the recipient.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list