[R] For column values-Quality control
Bansal, Vikas
vikas.bansal at kcl.ac.uk
Sat Jul 9 18:45:28 CEST 2011
Dear sir,
I was doing with different code that is why u did not get output which I was saying.Please use this code on summary file-
I have a file that is summary.txt(I have attached it) .we can read
this file using-
dfa=read.table("summar.txt",fill=T,colClasses = "character",header=T)
In V10 column I have ASCII values which I converted into decimal
numbers using this code-
dfa$V10 <- sapply(dfa$V10, function(a) paste(as.integer(charToRaw(a)), collapse = ' '))
now you will get this output.
dfa
V7 V8 V9 V10
1 0 1 G 96
2 0 1 T 97
3 0 1 C 97
4 0 1 A 97
5 0 1 G 95
6 0 1 G 94
7 0 1 C 94
8 0 1 C 92
9 0 1 A 98
10 0 1 T 97
11 0 1 g 94
12 0 1 A 92
13 0 1 C 95
14 0 1 G 97
15 0 1 C 88
16 0 1 C 96
17 0 1 G 97
18 0 1 G 95
19 0 1 G 97
20 0 1 G 97
21 0 1 A 97
22 0 1 G 97
23 0 1 G 97
24 0 1 C 97
25 0 1 A 97
26 0 1 C 95
27 0 1 A 88
28 0 1 g 96
29 0 2 GG 92 92
30 0 2 GG 91 94
31 0 2 AT 89 94
32 0 2 GG 96 93
the values in column V10 corresponds to A,C,G T in column V9.I want
only those, whose score is more than 90.so output of above should be-
V7 V8 V9 V10
1 0 1 G 96
2 0 1 T 97
3 0 1 C 97
4 0 1 A 97
5 0 1 G 95
6 0 1 G 94
7 0 1 C 94
8 0 1 C 92
9 0 1 A 98
10 0 1 T 97
11 0 1 g 94
12 0 1 A 92
13 0 1 C 95
14 0 1 G 97
16 0 1 C 96
17 0 1 G 97
18 0 1 G 95
19 0 1 G 97
20 0 1 G 97
21 0 1 A 97
22 0 1 G 97
23 0 1 G 97
24 0 1 C 97
25 0 1 A 97
26 0 1 C 95
28 0 1 g 96
29 0 2 GG 92 92
30 0 2 GG 91 94
31 0 2 T 89 94
32 0 2 GG 96 93
so in output 15th and 27th row should be deleted and 31st row should be-
31 0 2 T 89 94
because 89 is score for A and 94 is score for T.Therefore A has been deleted because its score is less than 90.
Can you help me please.
Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
________________________________________
From: David Winsemius [dwinsemius at comcast.net]
Sent: Saturday, July 09, 2011 12:04 AM
To: Bansal, Vikas
Cc: r-help at r-project.org
Subject: Re: [R] For column values-Quality control
On Jul 8, 2011, at 6:46 PM, Bansal, Vikas wrote:
> Yes sir.you are right.after this I use this code to convert ASCII
> values in column V10 to decimal numbers-
>
> dfa$V10=lapply(dfa[,4], function(c) as.numeric(charToRaw(c)))
>
> now u will get output something like this-
>
> V7 V8
> V9 V10
> 0 1
> G 82
> 0 1 CGT
> c(90, 92, 96)
> 0 1
> GA c(78, 92)
> 0 1 GAG
> c(90, 92, 92)
> 0 1
> G 88
> 0 1
> A 96
> 0 1 ATT
> c(90, 96, 92)
> 0 1
> T 94
> 0 1
> C 97
>
> now after this I am facing the problem-
>
I don't think so: Here's what I getas teh top pf dfa after that
operation:
> str(dfa)
'data.frame': 111 obs. of 4 variables:
$ V7 : chr "0" "0" "0" "0" ...
$ V8 : chr "1" "1" "1" "1" ...
$ V9 : chr "G" "T" "C" "A" ...
$ V10:List of 111
..$ : num 96
..$ : num 97
..$ : num 97
..$ : num 97
..$ : num 95
..$ : num 90
..$ : num 94
..$ : num 92
..$ : num 90
..$ : num 97
..$ : num 94
..$ : num 92
..$ : num 95
..$ : num 97
..$ : num 88
..$ : num 96
..$ : num 97
..$ : num 95
..$ : num 97
..$ : num 97
..$ : num 97
..$ : num 97
..$ : num 97
..$ : num 97
..$ : num 97
..$ : num 95
..$ : num 88
..$ : num 96
..$ : num 92 92
..$ : num 91 94
..$ : num 89 94
,,,, more follows and output was terminated
I say again/// read the Posting Guide and use dump() or dput().
--
David.
> the values in column V10 corresponds to A,C,G T in column V9.I want
> only those, whose score is more than 91.so output of above should be-
>
> V7 V8
> V9 V10
> 0 1 GT
> c(90, 92, 96)
> 0 1 A
> c(78, 92)
> 0 1 AG
> c(90, 92, 92)
> 0 1
> A 96
> 0 1 TT
> c(90, 96, 92)
> 0 1
> T 94
> 0 1
> C 97
>
> First row should be deleted because it contains 82 which is less
> than 91.In second row C should deleted because it has less than 91
> score in col V10.
>
>
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ________________________________________
> From: David Winsemius [dwinsemius at comcast.net]
> Sent: Friday, July 08, 2011 11:37 PM
> To: Bansal, Vikas
> Cc: r-help at r-project.org
> Subject: Re: [R] For column values-Quality control
>
> I get something entirely different when I execute that input command
> with the attached file:
>
> This is what I see as the first 14 lines for a displayed value for
> dfa:
>
>> dfa
> V7 V8 V9 V10
> 1 0 1 G `
> 2 0 1 T a
> 3 0 1 C a
> 4 0 1 A a
> 5 0 1 G _
> 6 0 1 G Z
> 7 0 1 C ^
> 8 0 1 C \\
> 9 0 1 A Z
> 10 0 1 T a
> 11 0 1 g ^
> 12 0 1 A \\
> 13 0 1 C _
> 14 0 1 G a
>
> If this is different than what you see when you type dfa after input
> of that file in that manner then you should consider alternative
> methods of communicating an unambiguous representation of your dfa
> object.... as I have detailed in prior private messages.
>
> --
>
> David.
>
> On Jul 8, 2011, at 6:10 PM, Bansal, Vikas wrote:
>
>>
>> Dear all,
>>
>> I am really sorry for not giving the input file because in my mail,I
>> did not explain my problem in a best way.
>>
>> I have a file that is summary.txt(I have attached it) .we can read
>> this file using-
>>
>> dfa=read.table("summar.txt",fill=T,colClasses = "character",header=T)
>>
>> In V10 column I have ASCII values which I converted into decimal
>> numbers using this code-
>>
>> dfa$V10=lapply(dfa[,4], function(c) as.numeric(charToRaw(c)))
>>
>> Now I have a dataframe dfa with these columns something like this-
>>
>> V7 V8
>> V9 V10
>> 0 1
>> G 82
>> 0 1 CGT
>> c(90, 92, 96)
>> 0 1
>> GA c(78, 92)
>> 0 1 GAG
>> c(90, 92, 92)
>> 0 1
>> G 88
>> 0 1
>> A 96
>> 0 1 ATT
>> c(90, 96, 92)
>> 0 1
>> T 94
>> 0 1
>> C 97
>>
>> the values in column V10 corresponds to A,C,G T in column V9.I want
>> only those whose score is more than 91.so output of above should be-
>>
>> V7 V8
>> V9 V10
>> 0 1 GT
>> c(90, 92, 96)
>> 0 1 A
>> c(78, 92)
>> 0 1 AG
>> c(90, 92, 92)
>> 0 1
>> A 96
>> 0 1 TT
>> c(90, 96, 92)
>> 0 1
>> T 94
>> 0 1
>> C 97
>>
>> Can you please tell me the solution.
>>
>> Thanking you,
>> Warm Regards
>> Vikas Bansal
>> Msc Bioinformatics
>> Kings College
>> London<summary.txt>______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
David Winsemius, MD
West Hartford, CT
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: summary.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110709/15f0aa60/attachment.txt>
More information about the R-help
mailing list