[R] Remove duplicates from a data frame but with some special requirements

Thu Dec 17 06:08:25 CET 2009

Hi,
Try:

subset(Samps, !duplicated(Samps$ESR_ref_edit) | Samps$Loaded == "Y")

I'd need specific code to be sure that this is exactly what you want
(ie you specify input and desired output), but indexing with a logical
vector is probably going to be the solution.

Best,
Gray

On Wed, Dec 16, 2009 at 7:55 PM, gcam <gcam032 at gmail.com> wrote:
>
> Hi all.
>
> So I have a data frame with multiple columns/variables.  The first variable
> is a major sample name for which there are some sub-samples.  Currently I
> have used the following command to remove the duplicates:
>
> Samps_working<-Samps[-c(which(duplicated(Samps$ESR_Ref_edit))),]
>
> This removes all of the duplicated sample rows.
>
> However, I just realised that, of course, this removes the first observation
> of each duplicated set.  However, I wish to retain any that have the code
> "Y" in another variable Samps$Loaded.  I'm at a bit of a loss as to how best
> to approach this problem.
>
> Just to reiterate.  I want to remove all duplicate lines based on sample
> name, but, I want the lines to be removed with a preference given to those
> that do not include a "Y" in the Loaded variable column.
> --
> View this message in context: http://n4.nabble.com/Remove-duplicates-from-a-data-frame-but-with-some-special-requirements-tp965745p965745.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Gray Calhoun

Assistant Professor of Economics
Iowa State University