CSES Module 1 Data Set Errata
Posted: October 1, 2002

STATA Data Types/Storage Formats: Data Descriptor Statements

Some variables in the CSES Module 1 dataset, in particular the weights and some of the identification variables, require larger data types/storage formats than were provided in the original STATA data descriptor statements. The result is that some variables are inappropriately rounded when reading the data into STATA.

This can have a number of adverse affects. One example is that some of the longer identification variables, particularly A1003 and A1009, may no longer be unique (due to rounding) and so merges based on those variables will not perform appropriately. Additionally, the weight variables for some countries may in fact be more accurate than the designated storage format allows.

Solution: STATA users will want to use a text editor to revise STATA data descriptor file 'cm1_col.dct' so that the proper data types/storage formats are used, as shown here (excerpted and revised from file 'cm1_col.dct'):

long   A1003        30-  37
double A1009        80-  89
double A1010_1      91- 101
double A1010_2     103- 112
double A1010_3     114- 124
double A1011_1     126- 135
double A1011_2     137- 146
double A1011_3     148- 157
double A1012_1     159- 169
double A1012_2     171- 181
double A1012_3     183- 193
double A1013       195- 204
double A1014_1     206- 215
double A1014_2     217- 227
double A1014_3     229- 239

For your convenience, we have a revised version of the file available for download here: cm1_col.dct.

After editing 'cm1_col.dct' most users will want to read in their ASCII data again using the revised STATA statements, so that the revised data types/storage formats for these variables are applied. Users who choose to to read and merge in only the affected variables will need to re-apply errata, merges, or other corrections that relied on identification variables A1003 and/or A1009.

This revision makes greater demands on your computer's memory, and so depending on your computing resources and the version of STATA you are using, you may also want to use a text editor to alter the 'set memory' command in the file 'cm1_run.do'. The relevant line in 'cm1_run.do' appears as 'set memory 60m'. We recommend setting the value to '65m' or higher, which has worked well for us.

This is the first CSES release for which STATA statements have been provided. If you encounter any errors in the STATA data descriptor statements, or in using CSES files in STATA in general, please provide a detailed description of the problem by e-mail to cses@umich.edu so that we may investigate the problem. Thank you!