[R] subselecting on Data frame

PQuery pierre.khoueiry at embl.de
Sat Aug 3 17:42:38 CEST 2013


Dear all,

I have a data frame of features (example pasted below) from which I would
like to select, say:

how many triplets of features (corresponding to rows) have the same Scaff
and the same "Cat" and a score >0.6 and fall in a distance of max 10000
(distance defined as Start of row[i+1] - End of row[i])

I've been trying that using selectors and combn in R but it is becoming
complicated.
Is there an intuitive way to achieve that elegantly ?

Many thanks,
Best,

Scaff	Start	End	Score	Cat
scaff_234	767099	767299	0.93	cat1
scaff_234	790221	790421	0.924	cat1
scaff_234	1341263	1341463	0.845	cat2
scaff_234	1543343	1543543	0.715	cat2
scaff_234	1551844	1552044	0.967	cat1
scaff_234	1560829	1561029	0.825	cat2
scaff_234	1580868	1581068	0.929	cat3
scaff_234	1589612	1589812	0.744	cat3
scaff_234	1597306	1597885	0.864	cat2
scaff_234	1598617	1599091	0.908	cat2
scaff_234	1613500	1613700	0.705	cat2
scaff_234	1614297	1614643	0.748	cat1
scaff_234	1623852	1624052	0.799	cat2
scaff_234	1669873	1670073	0.691	cat2
scaff_234	1670210	1670515	0.904	cat1
scaff_234	1822690	1822890	0.918	cat2
scaff_234	1824905	1825105	0.854	cat2
scaff_234	1826092	1826292	0.95	cat2
scaff_234	1855240	1855457	0.962	cat2
scaff_234	1872803	1873106	0.97	cat2
scaff_234	1894767	1894967	0.945	cat1
scaff_234	1903338	1903538	0.854	cat3
scaff_234	1920157	1920509	0.739	cat1
scaff_234	1944032	1944232	0.871	cat2
scaff_234	1976753	1976953	0.847	cat2
scaff_234	1992677	1992877	0.694	cat2
scaff_234	2007772	2007972	0.916	cat2
scaff_234	2009638	2010167	0.945	cat2



--
View this message in context: http://r.789695.n4.nabble.com/subselecting-on-Data-frame-tp4672992.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list