[R] (no subject)
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Mon Sep 16 20:47:28 CEST 2024
Às 15:23 de 16/09/2024, Francesca escreveu:
> Sorry for posting a non understandable code. In my screen the dataset
> looked correctly.
>
>
> I recreated my dataset, folllowing your example:
>
> test<-data.frame(matrix(c( 8, 8, 5 , 5 ,NA ,NA , 1, 15, 20, 5, NA, 17,
> 2 , 5 , 5, 2 , 5 ,NA, 5 ,10, 10, 5 ,12, NA),
> c( 18, 5, 5, 5, NA, 9, 2, 2, 10, 7 , 5, 19,
> NA, 10, NA, 4, NA, 8, NA, 5, 10, 3, 17, NA),
> c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, 2,
> 2, 3, 2, 3, 3, 2, 2 ,4),
> c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5,
> 8, 5, 1, 2, 4, 7, 6, 6)))
> colnames(test) <-c("cp1","cp2","role","groupid")
>
> What I have done so far is the following, that works:
> test %>%
> group_by(groupid) %>%
> mutate(across(starts_with("cp"), list(mean = mean)))
>
> But the problem is with NA: everytime the mean encounters a NA, it creates
> NA for all group members.
> I need the software to calculate the mean ignoring NA. So when the group is
> made of three people, mean of the three.
> If the group is two values and an NA, calculate the mean of two.
>
> My code works , creates a mean at each position for three subjects,
> replacing instead of the value of the single, the group mean.
> But when NA appears, all the group gets NA.
>
> Perhaps there is a different way to obtain the same result.
>
>
>
> On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas using sapo.pt> wrote:
>
>> Às 08:28 de 16/09/2024, Francesca escreveu:
>>> Dear Contributors,
>>> I hope someone has found a similar issue.
>>>
>>> I have this data set,
>>>
>>>
>>>
>>> cp1
>>> cp2
>>> role
>>> groupid
>>> 1
>>> 10
>>> 13
>>> 4
>>> 5
>>> 2
>>> 5
>>> 10
>>> 3
>>> 1
>>> 3
>>> 7
>>> 7
>>> 4
>>> 6
>>> 4
>>> 10
>>> 4
>>> 2
>>> 7
>>> 5
>>> 5
>>> 8
>>> 3
>>> 2
>>> 6
>>> 8
>>> 7
>>> 4
>>> 4
>>> 7
>>> 8
>>> 8
>>> 4
>>> 7
>>> 8
>>> 10
>>> 15
>>> 3
>>> 3
>>> 9
>>> 15
>>> 10
>>> 2
>>> 2
>>> 10
>>> 5
>>> 5
>>> 2
>>> 4
>>> 11
>>> 20
>>> 20
>>> 2
>>> 5
>>> 12
>>> 9
>>> 11
>>> 3
>>> 6
>>> 13
>>> 10
>>> 13
>>> 4
>>> 3
>>> 14
>>> 12
>>> 6
>>> 4
>>> 2
>>> 15
>>> 7
>>> 4
>>> 4
>>> 1
>>> 16
>>> 10
>>> 0
>>> 3
>>> 7
>>> 17
>>> 20
>>> 15
>>> 3
>>> 8
>>> 18
>>> 10
>>> 7
>>> 3
>>> 4
>>> 19
>>> 8
>>> 13
>>> 3
>>> 5
>>> 20
>>> 10
>>> 9
>>> 2
>>> 6
>>>
>>>
>>>
>>> I need to to average of groups, using the values of column groupid, and
>>> create a twin dataset in which the mean of the group is replaced instead
>> of
>>> individual values.
>>> So for example, groupid 3, I calculate the mean (12+18)/2 and then I
>>> replace in the new dataframe, but in the same positions, instead of 12
>> and
>>> 18, the values of the corresponding mean.
>>> I found this solution, where db10_means is the output dataset, db10 is my
>>> initial data.
>>>
>>> db10_means<-db10 %>%
>>> group_by(groupid) %>%
>>> mutate(across(starts_with("cp"), list(mean = mean)))
>>>
>>> It works perfectly, except that for NA values, where it replaces to all
>>> group members the NA, while in some cases, the group is made of some NA
>> and
>>> some values.
>>> So, when I have a group of two values and one NA, I would like that for
>>> those with a value, the mean is replaced, for those with NA, the NA is
>>> replaced.
>>> Here the mean function has not the na.rm=T option associated, but it
>>> appears that this solution cannot be implemented in this case. I am not
>>> even sure that this would be enough to solve my problem.
>>> Thanks for any help provided.
>>>
>> Hello,
>>
>> Your data is a mess, please don't post html, this is plain text only
>> list. Anyway, I managed to create a data frame by copying the data to a
>> file named "rhelp.txt" and then running
>>
>>
>>
>> db10 <- scan(file = "rhelp.txt", what = character())
>> header <- db10[1:4]
>> db10 <- db10[-(1:4)] |> as.numeric()
>> db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
>> as.data.frame() |>
>> setNames(header)
>>
>> str(db10)
>> #> 'data.frame': 25 obs. of 4 variables:
>> #> $ cp1 : num 1 5 3 7 10 5 2 4 8 10 ...
>> #> $ cp2 : num 10 2 1 4 4 5 6 4 4 15 ...
>> #> $ role : num 13 5 3 6 2 8 8 7 7 3 ...
>> #> $ groupid: num 4 10 7 4 7 3 7 8 8 3 ...
>>
>>
>> And here is the data in dput format.
>>
>>
>>
>> db10 <-
>> structure(list(
>> cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
>> 2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
>> cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
>> 4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
>> role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
>> 11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
>> groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
>> 20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
>> class = "data.frame", row.names = c(NA, -25L))
>>
>>
>>
>> As for the problem, I am not sure if you want summarise instead of
>> mutate but here is a summarise solution.
>>
>>
>>
>> library(dplyr)
>>
>> db10 %>%
>> group_by(groupid) %>%
>> summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)))
>>
>> # same result, summarise's new argument .by avoids the need to group_by
>> db10 %>%
>> summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)), .by =
>> groupid)
>>
>>
>>
>> Can you post the expected output too?
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> --
>> Este e-mail foi analisado pelo software antivírus AVG para verificar a
>> presença de vírus.
>> www.avg.com
>>
>
>
Hello,
Something like this?
test <-
structure(list(
cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
class = "data.frame", row.names = c(NA, -25L))
library(dplyr)
test %>%
group_by(groupid) %>%
mutate(across(starts_with("cp"), list(mean = ~ mean(.x, na.rm = TRUE))))
#> # A tibble: 25 × 6
#> # Groups: groupid [11]
#> cp1 cp2 role groupid cp1_mean cp2_mean
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 10 13 4 7 8
#> 2 5 2 5 10 5 2
#> 3 3 1 3 7 6.17 5.17
#> 4 7 4 6 4 7 8
#> 5 10 4 2 7 6.17 5.17
#> 6 5 5 8 3 10.7 13.3
#> 7 2 6 8 7 6.17 5.17
#> 8 4 4 7 8 5 4
#> 9 8 4 7 8 5 4
#> 10 10 15 3 3 10.7 13.3
#> # ℹ 15 more rows
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list