[R] < symbols in a data frame
Marc Schwartz
marc_schwartz at me.com
Wed Jul 9 19:29:42 CEST 2014
On Jul 9, 2014, at 12:19 PM, Sam Albers <tonightsthenight at gmail.com> wrote:
> Hello,
>
> I have recently received a dataset from a metal analysis company. The
> dataset is filled with less than symbols. What I am looking for is a
> efficient way to subset for any whole numbers from the dataset. The column
> is automatically formatted as a factor because of the "<" symbols making it
> difficult to deal with the numbers is a useful way.
>
> So in sum any ideas on how I could subset the example below for only whole
> numbers?
>
> Thanks in advance!
>
> Sam
>
> #code
>
> metals <-
>
>
> structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
> 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
> = c("Antimony",
> "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)",
> "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury",
> "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium",
> "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = structure(c(3L,
> 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
> 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200",
> "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4",
> "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4",
> "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50",
> "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72",
> "77", "89", "951"), class = "factor")), .Names = c("Parameter",
> "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame")
Sam,
You can use ?gsub to remove the '<' characters from the column and then use ?subset to select the records you wish.
Note that gsub() returns a character vector, so you want to coerce to numeric.
> as.numeric(gsub("<", "", metals$Cedar.Creek))
[1] 100 100 500 100 10 1000 100 516 550 10 200 500 100
[14] 500 100 951 1000 1000 100
For example:
> subset(metals, as.numeric(gsub("<", "", Cedar.Creek)) == 100)
Parameter Cedar.Creek
1 Antimony <100
2 Arsenic <100
4 Beryllium <100
7 Cobalt <100
13 Selenium <100
15 Thallium <100
19 Antimony <100
> subset(metals, as.numeric(gsub("<", "", Cedar.Creek)) <= 500)
Parameter Cedar.Creek
1 Antimony <100
2 Arsenic <100
3 Barium <500
4 Beryllium <100
5 Cadmium <10
7 Cobalt <100
10 Mercury <10
11 Molybdenum <200
12 Nickel <500
13 Selenium <100
14 Silver <500
15 Thallium <100
19 Antimony <100
You can also just create a new column that is numeric and go from there:
metals$CC.Num <- as.numeric(gsub("<", "", metals$Cedar.Creek))
> str(metals)
'data.frame': 19 obs. of 3 variables:
$ Parameter : Factor w/ 20 levels "Antimony","Arsenic",..: 1 2 3 4 6 7 8 9 10 11 ...
$ Cedar.Creek: Factor w/ 45 levels "<1","<10","<100",..: 3 3 7 3 2 4 3 34 36 2 ...
$ CC.Num : num 100 100 500 100 10 1000 100 516 550 10 ...
> metals
Parameter Cedar.Creek CC.Num
1 Antimony <100 100
2 Arsenic <100 100
3 Barium <500 500
4 Beryllium <100 100
5 Cadmium <10 10
6 Chromium <1000 1000
7 Cobalt <100 100
8 Copper 516 516
9 Lead 550 550
10 Mercury <10 10
11 Molybdenum <200 200
12 Nickel <500 500
13 Selenium <100 100
14 Silver <500 500
15 Thallium <100 100
16 Tin 951 951
17 Vanadium <1000 1000
18 Zinc <1000 1000
19 Antimony <100 100
Regards,
Marc Schwartz
More information about the R-help
mailing list