[R] Data import R: some explanatory variables not showing up correctly in summary
David Winsemius
dwinsemius at comcast.net
Thu Jun 1 18:17:27 CEST 2017
> On Jun 1, 2017, at 8:57 AM, William Dunlap via R-help <r-help at r-project.org> wrote:
>
> Check for leading or trailing spaces in the strings in your data.
> dput(dataset) would show them.
This function would strip any leading or trailing spaces from a column:
trim <-
function (s)
{
s <- as.character(s)
s <- sub(pattern = "^[[:blank:]]+", replacement = "", x = s)
s <- sub(pattern = "[[:blank:]]+$", replacement = "", x = s)
s
}
You could restrict it to non-mumeric columns with:
my_dfrm[ !sapply(my_dfrm, is.numeric) ] <- lapply( my_dfrm[ !sapply(my_dfrm, is.numeric) ], trim)
It would have the side-effect, (desirable in my opinion but opinions do vary on this matter), of converting any factor columns to character-class.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jun 1, 2017 at 8:49 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com>
> wrote:
>
>> Hi Tara,
>>
>> It seems that you categorise and count for each category. Could it be that
>> the method you use puts everything that doesn't match the predefined
>> categories in Other?
>>
>> I'm only guessing because without a minimal reproducible example it's
>> difficult to do anything else.
>>
>> Best wishes
>> Ulrik
>>
>> Rui Barradas <ruipbarradas at sapo.pt> schrieb am Do., 1. Juni 2017, 17:30:
>>
>>> Hello,
>>>
>>> In order for us to help we need to know how you've imported your data.
>>> What was the file type? What instructions have you used to import it?
>>> Did you use base R or a package?
>>> Give us a minimal but complete code example that can reproduce your
>>> situation.
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Em 01-06-2017 11:02, Tara Adcock escreveu:
>>>> Hi,
>>>>
>>>> I have a question regarding data importing into R.
>>>>
>>>> When I import my data into R and review the summary, some of my
>>> explanatory variables are being reported as if instead of being one
>>> variable, they are two with the same name. See below for an example;
>>>>
>>>> Behav person Behav dog Position
>>>> **combination : 38 combination : 4** Bank :372
>>>> **combination : 7 combination : 4** **Island :119**
>>>> fast :123 fast : 15 **Island : 11**
>>>> slow :445 slow : 95 Land : 3
>>>> stat :111 stat : 14 Water :230
>>>>
>>>> Also, all of the distances I have imported are showing up in the
>> summary
>>> along with a line entitled "other". However, I haven't used any other
>>> distances?
>>>>
>>>> Distance Distance.dog
>>>> 2-10m :184 <50m : 35
>>>> <50m :156 2-10m : 27
>>>> 10-20m :156 20-30m : 23
>>>> 20-30m : 91 30-40m : 16
>>>> 40-50m : 57 10-20m : 13
>>>> **(Other): 82 (Other): 18**
>>>>
>>>> I have checked my data sheet over and over again and I think
>>> standardised the data, but the issue keeps arising. I'm assuming I need
>> to
>>> clean the data set but as a nearly complete novice in R I am not certain
>>> how to do this. Any help at all with this would be much appreciated.
>> Thanks
>>> so much.
>>>>
>>>> Kind Regards,
>>>>
>>>> Tara Adcock.
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list