[R] Ever see a stata import problem like this?
Paul Johnson
pauljohn at ku.edu
Wed Sep 22 00:34:32 CEST 2004
Greetings Everybody:
I generated a 1.2MB dta file based on the general social survey with
Stata8 for linux. The file can be re-opened with Stata, but when I bring
it into R, it says all the values are missing for most of the variables.
This dataset is called "morgen.dta" and I dropped a copy online in case
you are interested
http://www.ku.edu/~pauljohn/R/morgen.dta
looks like this to R (I tried various options on the read.dta command):
> myDat <- read.dta("morgen.dta")
> summary(myDat)
CASEID year id hrs1 hrs2
Min. : 19721 Min. :1972 Min. : 1 NAP : 0 NAP : 0
1st Qu.: 1983475 1st Qu.:1978 1st Qu.: 445 DK : 0 DK : 0
Median : 1996808 Median :1987 Median : 905 NA : 0 NA : 0
Mean : 9963040 Mean :1986 Mean : 990 NA's:40933 NA's:40933
3rd Qu.:19872187 3rd Qu.:1994 3rd Qu.:1358
Max. :20002817 Max. :2000 Max. :3247
prestige agewed age educ paeduc
DK,NA,NAP: 0 NAP : 0 DK : 0 NAP : 0 NAP : 0
NA's :40933 DK : 0 NA : 0 DK : 0 DK : 0
NA : 0 NA's:40933 NA : 0 NA : 0
NA's:40933 NA's:40933 NA's:40933
maeduc speduc income
NAP : 0 NAP : 0 $25000 OR MORE:14525
DK : 0 DK : 0 $10000 - 14999: 5022
NA : 0 NA : 0 $15000 - 19999: 3869
NA's:40933 NA's:40933 $20000 - 24999: 3664
REFUSED : 1877
(Other) : 8523
NA's : 3453
>
Here's what Stata sees when I load the same thing:
summarize, detail
Case identification number
-------------------------------------------------------------
Percentiles Smallest
1% 197432 19721
5% 199649 19722
10% 1974116 19723 Obs 40933
25% 1983475 19724 Sum of Wgt. 40933
50% 1996808 Mean 9963040
Largest Std. Dev. 9006352
75% 1.99e+07 2.00e+07
90% 2.00e+07 2.00e+07 Variance 8.11e+13
95% 2.00e+07 2.00e+07 Skewness .18931
99% 2.00e+07 2.00e+07 Kurtosis 1.045409
GSS YEAR FOR THIS RESPONDENT
-------------------------------------------------------------
Percentiles Smallest
1% 1972 1972
5% 1973 1972
10% 1974 1972 Obs 40933
25% 1978 1972 Sum of Wgt. 40933
50% 1987 Mean 1986.421
Largest Std. Dev. 8.61136
75% 1994 2000
90% 1998 2000 Variance 74.15552
95% 2000 2000 Skewness -.0789223
99% 2000 2000 Kurtosis 1.799939
RESPONDENT ID NUMBER
-------------------------------------------------------------
Percentiles Smallest
1% 18 1
5% 89 1
10% 178 1 Obs 40933
25% 445 1 Sum of Wgt. 40933
50% 905 Mean 989.9129
Largest Std. Dev. 689.0596
75% 1358 3244
90% 2027 3245 Variance 474803.2
95% 2437 3246 Skewness .8359211
99% 2867 3247 Kurtosis 3.311248
NUMBER OF HOURS WORKED LAST WEEK
-------------------------------------------------------------
Percentiles Smallest
1% 6 0
5% 15 0
10% 21 0 Obs 23279
25% 37 0 Sum of Wgt. 23279
50% 40 Mean 41.05206
Largest Std. Dev. 13.95931
75% 48 89
90% 60 89 Variance 194.8624
95% 65 89 Skewness .195045
99% 82 89 Kurtosis 4.448998
NUMBER OF HOURS USUALLY WORK A WEEK
-------------------------------------------------------------
Percentiles Smallest
1% 4 0
5% 15 0
10% 20 1 Obs 774
25% 38 2 Sum of Wgt. 774
50% 40 Mean 39.79199
Largest Std. Dev. 13.43383
75% 45 89
90% 55 89 Variance 180.4677
95% 60 89 Skewness -.0002332
99% 80 89 Kurtosis 5.009869
RS OCCUPATIONAL PRESTIGE SCORE (1970)
-------------------------------------------------------------
Percentiles Smallest
1% 14 12
5% 17 12
10% 20 12 Obs 24267
25% 30 12 Sum of Wgt. 24267
50% 39 Mean 39.35645
Largest Std. Dev. 14.03712
75% 48 82
90% 60 82 Variance 197.0407
95% 62 82 Skewness .2927414
99% 76 82 Kurtosis 2.775553
AGE WHEN FIRST MARRIED
-------------------------------------------------------------
Percentiles Smallest
1% 15 12
5% 17 12
10% 17 12 Obs 25382
25% 19 12 Sum of Wgt. 25382
50% 21 Mean 22.09609
Largest Std. Dev. 4.813944
75% 24 63
90% 28 68 Variance 23.17405
95% 31 73 Skewness 2.002265
99% 39 73 Kurtosis 11.28279
AGE OF RESPONDENT
-------------------------------------------------------------
Percentiles Smallest
1% 19 18
5% 21 18
10% 24 18 Obs 40790
25% 30 18 Sum of Wgt. 40790
50% 42 Mean 45.14798
Largest Std. Dev. 17.53519
75% 58 89
90% 71 89 Variance 307.4828
95% 77 89 Skewness .4774907
99% 86 89 Kurtosis 2.239618
HIGHEST YEAR OF SCHOOL COMPLETED
-------------------------------------------------------------
Percentiles Smallest
1% 3 0
5% 7 0
10% 8 0 Obs 40806
25% 11 0 Sum of Wgt. 40806
50% 12 Mean 12.48152
Largest Std. Dev. 3.176226
75% 14 20
90% 16 20 Variance 10.08841
95% 18 20 Skewness -.3389303
99% 20 20 Kurtosis 3.960311
HIGHEST YEAR SCHOOL COMPLETED, FATHER
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 3 0
10% 4 0 Obs 29347
25% 8 0 Sum of Wgt. 29347
50% 11 Mean 10.20994
Largest Std. Dev. 4.342143
75% 12 20
90% 16 20 Variance 18.85421
95% 17 20 Skewness -.1628909
99% 20 20 Kurtosis 2.826482
HIGHEST YEAR SCHOOL COMPLETED, MOTHER
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 3 0
10% 6 0 Obs 34151
25% 8 0 Sum of Wgt. 34151
50% 12 Mean 10.41478
Largest Std. Dev. 3.709352
75% 12 20
90% 14 20 Variance 13.75929
95% 16 20 Skewness -.6324499
99% 18 20 Kurtosis 3.605715
HIGHEST YEAR SCHOOL COMPLETED, SPOUSE
-------------------------------------------------------------
Percentiles Smallest
1% 4 0
5% 7 0
10% 8 0 Obs 22780
25% 12 0 Sum of Wgt. 22780
50% 12 Mean 12.53095
Largest Std. Dev. 3.103418
75% 14 20
90% 16 20 Variance 9.631203
95% 18 20 Skewness -.287755
99% 20 20 Kurtosis 4.051822
TOTAL FAMILY INCOME
-------------------------------------------------------------
Percentiles Smallest
1% 1 1
5% 3 1
10% 5 1 Obs 37480
25% 9 1 Sum of Wgt. 37480
50% 11 Mean 9.75619
Largest Std. Dev. 2.994967
75% 12 13
90% 12 13 Variance 8.969825
95% 13 13 Skewness -1.29205
99% 13 13 Kurtosis 3.759778
.
--
Paul E. Johnson email: pauljohn at ku.edu
Dept. of Political Science http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas Office: (785) 864-9086
Lawrence, Kansas 66044-3177 FAX: (785) 864-5700
More information about the R-help
mailing list