[R] column-binary data
David Barron
david.barron at said-business-school.oxford.ac.uk
Tue Sep 20 11:14:48 CEST 2005
Thanks for the replies. That's not quite what I meant. These data are multipunched to allow more than one variable to be coded in the same column. For example, the first 7 columns of the first card of the data I'm trying to read contain the following:
Column Rows Description
-------------------------------
1-5 Serial number
6 Card number
7 Y,X Sex of respondent
7 0-3 Marital status
7 4-9 Occupational status
I happen to know that the actual punches for the first respondent are 00001,1,Y,1,4.
When I use
> ip <- readBin(ff,what="raw",n=14,signed=FALSE)
I get
> 08 00 08 00 08 00 08 00 04 00 04 00 24 20
for these seven columns.
When I use raw2bin from package caTools I get:
> binip <- raw2bin(ip,"integer",size=2)
> 8 8 8 8 4 4 8228
Now I can see that the relationship between binary numbers and punches is this:
binary punch binary punch
------------------ ------------------------
1 3 256 9
2 2 512 8
4 1 1024 7
8 0 2048 6
16 X 4096 5
32 Y 8192 4
64 16384
128 32768
I can also see that the binary value for column 7 (8228) is equal to the sum of the values for each of the three punches in that column (Y=32 + 1=4 + 4=8192), but what I don't get is how I can get R to work out the punches either from the raw values or from the binary values. If anyone can suggest anything I would be very, very grateful!
David
-----Original Message-----
From: Ted Harding [mailto:Ted.Harding at nessie.mcc.ac.uk]
Sent: Fri 9/16/2005 14:31
To: E-Mail
Cc: David Barron
Subject: Re: [R] column-binary data
On 16-Sep-05 jim holtman wrote:
> Each card column had 12 rows, so as binary it comes in as 12 bits. The
> question is does this come as a 16 bit integer, or a string of 12 bits
> that I have to extract from. Either case is not that difficult to do.
Indeed ... as an example of how one could proceed, I "deconstruct"
my example below (see at end).
> On 9/16/05, Ted Harding <Ted.Harding at nessie.mcc.ac.uk> wrote:
>>
>> On 16-Sep-05 David Barron wrote:
>> > I have a number of datasets that are multipunch column-binary
>> > format.
>> > Does anyone have any advice on how to read this into R? Thanks.
>> >
>> > David
>>
>> Do you mean something like the old
>>
>> HOLLERITH PUNCHED CARD BINARY FORMAT?
>> 1111111110111111101111011111101111110
>> 0000000001000000010000100000010000001
>> 0000010100110000000010000001100010011
>> 1111001010001010000000001100100101001
>> 0111100100011001100001000100001101011
>> 0100010000001100001010010101001110001
>> 0100101000010101001100001010100101101
>>
>> (here "1" = hole in card, binary representation of 7-bit ASCII
>> encoding, high-order bit on top).
#First, construct a vector ASCII consiting of the printable
#characters:
ASCII<-c(" ","!","\"","#","$","%","&","'","(",")",
"*","+",",","-",".","/","0","1","2","3",
"4","5","6","7","8","9",":",";","<","=",
">","?","@","A","B","C","D","E","F","G",
"H","I","J","K","L","M","N","O","P","Q",
"R","S","T","U","V","W","X","Y","Z","[",
"\\","]","^","_","`","a","b","c","d","e",
"f","g","h","i","j","k","l","m","n","o",
"p","q","r","s","t","u","v","w","x","y",
"z","{","|","}","~")
#Next, a vector of powers of 2:
rad<-2^(6:0)
#Read in the data from stdin():
M<-t(matrix(as.integer(unlist((strsplit(scan(stdin(),
what="character"),split="")))),ncol=7))
#(read 7 lines from stdin by copy&paste:
#1: 1111111110111111101111011111101111110
#2: 0000000001000000010000100000010000001
#3: 0000010100110000000010000001100010011
#4: 1111001010001010000000001100100101001
#5: 0111100100011001100001000100001101011
#6: 0100010000001100001010010101001110001
#7: 0100101000010101001100001010100101101
#8:
#Read 7 items
#and convert the columns to ASCII codes:
R<-rad%*%M
#and see what you've got:
paste(ASCII[R-31],collapse="")
#[1] "HOLLERITH PUNCHED CARD BINARY FORMAT?"
The above can be adapted to whatever your binary data represent
and to how they are laid out in the input.
Others may find a slicker way of doing this.
The only fly in the above ointment is that I haven't located
in R a character-vector constant which consists of the printable
ASCII characters, or a function to convert numerical ASCII code
to characters, so I made my own.
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Sep-05 Time: 22:26:16
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list