[R] HOW TO FILTER DATA
MacQueen, Don
macqueen1 at llnl.gov
Thu Jan 4 17:41:36 CET 2018
Just a couple of minor comments:
> help.search('read_delim')
No vignettes or demos or help files found with alias or concept or
title matching 'read_delim' using regular expression matching.
read_delim is not part of base R; it must come from some unnamed non-base package. I'd recommend using base R as much as possible for someone who is new to R, as I suspect the original poster is.
The call to subset would be better written as
df_new <- subset(df, IPC == 'H04M001/02' | IPC == 'C07K016/26' )
instead of
df_new <- subset(df, df$IPC == 'H04M001/02' | df$IPC == 'C07K016/26' )
IPC is a variable within the data frame, so it is unnecessary to include the data frame's name in the logical expression.
-Don
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
On 1/3/18, 12:54 PM, "R-help on behalf of Leilei Ruan" <r-help-bounces at r-project.org on behalf of ruanleilei at gmail.com> wrote:
Try the code below:
df <- read_delim("C:/Users/lruan1/Desktop/1112.csv", "|", escape_double =
FALSE, trim_ws = TRUE)
df_new <- subset(df,df$IPC == 'H04M001/02'| df$IPC == 'C07K016/26' )
You can add more condition with "|" in the subset function. Good luck!
On Wed, Jan 3, 2018 at 2:53 PM, Saptorshee Kanto Chakraborty <
chkstr at unife.it> wrote:
> Hello,
>
> I have a data of Patents from OECD in delimited text format with IPC being
> one column, I want to filter the data by selecting only certain IPC in that
> column and delete other rows which do not have my required IPCs. Please,
> can anybody guide me doing it, also the IPC codes are string variables.
>
> The data is somewhat like below, but its a huge dataset containing more
> than 11 million rows
>
>
> Appln_id|Prio_Year|App_year|IPC
> 1|1999|2000|H04Q007/32
> 1|1999|2000|G06K019/077
> 1|1999|2000|H01R012/18
> 1|1999|2000|G06K017/00
> 1|1999|2000|H04M001/2745
> 1|1999|2000|G06K007/00
> 1|1999|2000|H04M001/02
> 1|1999|2000|H04M001/275
> 2|1991|1992|C12N015/62
> 2|1991|1992|C12N015/09
> 2|1991|1992|C07K019/00
> 2|1991|1992|C07K016/26
>
>
>
> Thanking You
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list