The American Mosaic Project is a multiyear, multi-method study of the bases of solidarity and diversity in American life. The principal investigators of this project are Doug Hartmann, Penny Edgell and Joseph Gerteis at the University of Minnesota. The survey portion of the project consists of a random-digit-dial telephone survey (N=2,081) conducted during the summer of 2003 by the University of Wisconsin Survey Center. The survey was designed to gather data on attitudes about race, religion, politics and American identity as well as demographic information and social networks.
- Data File
- Cases: 2,081
Weight Variable: DESWT, PSWT1, FINALWT
- Data Collection
- Date Collected: May 20, 2003 to August 27, 2003
- Funded By
- Edelstein Family Foundation
- Collection Procedures
- The interviews were conducted by phone by University of Wisconsin Survey Center’s paid interviewing staff from its central phone bank on the campus of the University of Wisconsin-Madison. All interviews had been through UWSC’s general multisession training, had done “mock” one on one interviews and had listened to actual interviews conducted by experienced staff as well as having their own work systematically monitored. Specific training for this project included printed materials and a number of “briefing” sessions in which the project was discussed, the instrument reviewed and various project-specific factors were explained.
During the field period, alongside the normal monitoring, productivity was watched and supervisors spoke to interviewers who seemed to be encountering difficulty with completing refusals. A number of special sessions (refusal avoidance seminars, in progress debriefings, and the like) including two plenary sessions with the PI’s present were held. Actual interviewing proceeded using a Computer Assisted Telephone Interviewing (CATI) system developed at the University of California Berkeley (CASES), which is a widely used program in academically based survey operations. Cases (phone numbers) were “delivered” to interviewers, with the computer keeping track of appointments, times the potential respondent might have asked NOT to be disturbed, and so on. Branching was handled automatically by CASES, without special intervention by interviewers as were the three split half series, to which respondents were randomly assigned. In the first stages of interviewing, no case was abandoned unless it had been finally resolved or at least fifteen separate calls had been made. This latter figure was a minimum level, and in practice many more calls were made to some hard to find numbers.
Once a number was reached and determined to be a (probable) household, a randomly chosen adult was designated as the intended respondent, and no other person in the household was allowed to be substituted. If the designated respondent was not available, an appointment was sought and automatically “remembered” by the computer. If the designated respondent (or a household informant) refused to cooperate, the case went into a special refusal queue where specialists attempted to elicit cooperation. Detailed calling records made the calling history of the number available real-time and conversion proceeded where possible at a time other than when the original refusal had occurred. (Interviewers could also make comments which would be of assistance to those making later calls, which were displayed along with the case history).
- Sampling Procedures
- The intent of the project was not only to derive a dataset from which the characteristics of the adult U.S. non-institutionalized population could be estimated and analyzed, but to gain sufficient numbers of African-Americans and Hispanics so these two key groups could be examined. Specifically, while the original target was approximately 2000 persons for the nation as a whole, the goal was also to find approximately 400 African-Americans and 400 Hispanics. This required oversamples.
The original scheme was to begin with a nationwide RDD sample but oversample areas with higher concentrations of blacks and/or Hispanics so that these targets could be met. This would actually produce three partially overlapping datasets. One would represent the country as a whole, one would represent black residents of the United States, and one would represent the Hispanic residents. The sampling scheme was operationalized by grouping telephone exchanges into five basic categories. Two were “high” in minority prevalence for blacks and Hispanics respectively, two were “moderate” and the fifth was neither. Consultation with UWSC’s sampling expert determined that this was the best way of meeting the goals expeditiously without too much wasted effort “screening out” non-Hispanics and non-African Americans and/or having to put a racial screening question to close to the front of the questionnaire with possible harmful impact on cooperation and raised salience of race as other questions were being asked.
Telephone numbers were delivered to interviewers without explicitly reminding them of the existence of oversamples or whether a number fell into one of the special strata (though since this code was imbedded in the case ID, it was not entirely invisible to them). Originally the sample had been divided into 150 replicates with each stratum represented in each replicate. The sample was intended to be added whole replicate by whole replicate with the plan that if one were running too far ahead (or behind) for either desired subsample, that strategy might be modified. As it turned out, this scheme was more efficacious for blacks, while Hispanics were lagging a bit as we got into July. With consultation with our sampling expert and the PI’s, it was determined to stop putting in the two especially black strata while after replication 120, while the two Hispanic and residual stratum would be put in up to replicate 135. This wound up producing a black oversample somewhat overachieving its target (494 overall, 424 for black but not Hispanic), and a Hispanic oversample (399) that came within one case of the theoretically desired number.
- Principal Investigators
- Doug Hartmann, Penny Edgell and Joseph Gerteis
- Related Publications
- Croll, Paul R. 2007. "Modeling Determinants of White Racial Identity: Results from a New National Survey." Social Forces, 86(2): 613-642.
Edgell, Penny and Danielle Docka. 2007. "Beyond the Nuclear Family? Familism and Gender Ideology in Diverse Religious Communities." Sociological Forum 22 (1): 26-51.
Edgell, Penny, Joseph Gerteis, and Douglas Hartmann. 2006. "Atheists as 'Other': Moral Boundaries and Cultural Membership in American Society." American Sociological Review 71 (2): 211-234.
Edgell, Penny and Eric Tranby. 2007. "Religious Influences on Understandings of Racial Inequality in the United States." Social Problems 54 (2): 263-288.
Hartmann, Douglas and Joyce Bell. 2007. "Diversity in Everyday Discourse: The Cultural Ambiguities and Consequences of 'Happy Talk.'" American Sociological Review, 72 (December): 895-914.
Hartmann, Douglas and Paul R. Croll. 2006. "Measuring Whiteness." in Encyclopedia of Race, Ethnicity, and Society, edited by Richard T. Schaefer. SAGE Publications, Inc.
Hartmann, Douglas and Joseph Gerteis. 2005. "Dealing with Diversity: Mapping Multiculturalism in Sociological Terms." Sociological Theory 23 (2): 218-240.
Hartmann, Douglas, Xeufeng Zhang, and William Wischstadt. 2005. "One (Multicultural) Nation Under God? Changing Uses and Meanings of the Term 'Judeo-Christian' in the American Media." Journal of Media and Religion 4 (4): 207-240.
King, Ryan D. and Melissa Weiner. 2007. "Group Position, Collective Threat, and American Anti-Semitism." Social Problems 54 (1): 57-77.
King, Ryan D. and Darren Wheelock. 2007. "Group Threat and Social Control: Race, Perceptions of Minorities and the Desire to Punish." Social Forces 85 (3):1255-1280.
- Weight Variables
- There are three separate variables (DESWT, PSWT1, and FINALWT) each of which is stored to eight significant digits after the decimal point. The first step reflects characteristics of the sample design which gave different cases different likelihoods of coming into the final sample. First, telephone exchanges had a different probability of being selected with those relatively high on blacks and those high on Hispanics having the highest probability, followed by those moderate on blacks or moderate on Hispanics. The final stratum was of exchanges neither especially high on blacks or Hispanics relative to the population overall. In order to locate more African-Americans and Hispanics for the oversamples, those strata were overrepresented and the weights begin for “correcting” for this and the related fact that as the study proceeded it was clear that we were “overachieving” on blacks and slightly underachieving on Hispanics, so fewer replicates from the black strata were included. The fact that strata achieved different response rates was taken into account in the next step, which also equalized the distribution of respondent selection codes across strata so that the number with RND1 of “1” (male preference) was weighted to be equal to that for RND of “0” (female preference). Taken together these led to a household design weight to correct for unequal probabilities of numbers coming into the sample. The last “design” component was household size, since if all telephone households had equal chances of being selected, and since only one respondent was taken from a given household, those in one person households had twice as great a chance of being interviewed, all else equal as those from two person households, who themselves had twice as much chance as someone from a four person household, and so on. This weight is shown as DESWT and corrects for factors stemming from sampling design and procedures. It is normed so that the total weighted “N” is the same as the unweighted number of cases (2081). A second stage of weighting involved “post-stratification”. Specifically this adjusted the weights from the design phase to recover the age group by gender distribution pulled from Census estimates. For completeness, this factor (which is identical for each case falling into one of the eight cells of the agegroup by gender table) is shown, but its major role was to be multiplied against the “design weight” to yield the final weight taking both design and post-stratification into account.
Studies differ on what will go into any post-stratification weight. In this case, after experimentation, it was determined not explicitly to include race, since the way questions were asked – in order to identify persons thinking of themselves in a way to qualify for either the black or Hispanic samples – did not yield estimates that precisely paralleled the Census divisions (related to, among other things, at what stage if any categories were suggested, how multiple racial/ethnic classifications were handled and whether a single explicit question on the order of “do you consider yourself Hispanic” is included. As it turns out, the weighted proportion falling under Blacks and Hispanics is quite reasonable, without subjecting the sample to the vagaries of trying to match each of the cells in a multi-dimensional age by gender by racial/ethnic category.
DESWT 'SAMPLING BASED WEIGHT'
Weight, as described above to equalize the different probabilities of coming into the sample based on selection of exchanges, release of replicates, response rates, respondent selection procedures, and number of households. Normed for a mean of 1.00 and a total of 2081.0 (to match the raw “N”).
PSWT1 'POST-STRATIFICATION ADJUSTMENT'
Weight adjusting the age group by gender distribution from the above weighted procedure to match the corresponding distribution based on census estimates. Normed for a mean of 1.00 and a total of 2081.0 (to match the raw “N”). DESWT was multiplied by PSWT1 to get the unnormed version of FINALWT. This weight is included for completeness. If one desires to poststratify based on a combination of variables other than age group by gender, one would create a new weight in the place of PSWT1 and use that to create the new “final” weight.
FINALWT 'TOTAL WEIGHT'
Final weight multiplicatively “correcting” for both design and sampling related components and to recover the age by gender distribution. This is the weight which is applied by default in the SPSS system file, and should normally be used. Normed for a mean of 1.00 and a total of 2081.0 (to match the raw “N”).
- Constructed Variables
- In addition to the variables coming directly from answers to questions on the survey (shown in the order asked) there are several “supplemental variables” in the ASCII dataset and the SPSS system file. Notes pertaining to each are appended.
LDAT 'DATE OF INTERVIEW'
Mmddyyyy format, date interview completed
GENDER 'RECORDED GENDER'
SPLTHLF 'SPLIT HALF FOR STERXX, MARRXX, 151XX-152XX'
This determined, except for about 100 cases at the start of interviewing, who were asked both halves, which half of the three “split-ballot” series was asked. For instance, a code of “2” meant that the case was randomly assigned to be asked the B series for the first battery, the B series for the second, but the A series for the last.
0 'BOTH HALVES ASKED'
GRP1 'RANDOM START POINT FOR GROUPS'
Respondents were asked how much members of ten groups agreed with their general view of society (questions in the 121a-n range). The “start point” was determined randomly as was the direction in which the groups were presented. Thus, for instance, a 3 for this variable and a 1 for the next meant that the respondent was first presented with “recent immigrants” and then proceeded down the list (white Americans, Jews, Muslims, etc. to Asian Americans, who would be presented last.) A 5 and a 2 mean that Jews were mentioned first, followed by white Americans and so on in reverse order ending up with conservative Christians and Muslims.
2 'ASIAN AMERICANS'
3 'RECENT IMMIGRANTS'
4 'WHITE AMERICANS'
7 'CONSERVATIVE CHRISTIANS'
GRPDIR 'RANDOM DIRECTION FOR GROUPS'
See directions for GRP1 (above)
1 'DOWN (a->b=>c)'
2 'UP (f->e->d)'
ETHOPEN 'BLACK-HISP ANSWER TO OPENENDED RACE'
Race was determined first by asking, without prompting, about a person’s background, and the answers were classified by interviewers into one of several precoded categories, including mixed, other, and multiple. This question was followed up by another, for those who named more than one race by a query asking for respondent’s designation of the best descriptor for them. Finally, a last question asked if there were any other special group with which respondents identified. For each of these, verbatim responses were noted for any unclear case. These were examined to determine if any answer would have classified the individual as either Black (African-American) or Hispanic or both. This code allows users to determine which if any of these classifications applied.
1 'BLACK-CODED ANSWER ONLY'
2 'HISP-CODED ANSWER ONLY'
3 'BOTH BLACK & HISPANIC ANSWERS'
RND1 'RANDOM CHOICE FOR RESPONDENT'
In making random selection of respondents from multiperson households, one asked either how many of the adults were men or how many were women. If both genders were represented, this variable also determined among which set a second selection would be made if there was more than one of the “preferred” gender. This variable is included for completeness and because this factor also figured into the weighting scheme, which will be noted below.
0 'FEMALE PREFERENCE'
1 'MALE PREFERENCE'
STAT 'FIRST TWODIGIT OF FIPS CODE'
CNTY 'DIGITS THREE TO FIVE OF FIPS'
BLKDUM 'DUMMY FOR BLACK'
Respondents may be treated as black if they gave an answer originally coded as black in the first of the race questions (128) or in the followup for best single choice (129) or gave an openended answer which belonged under black or African American (see ETHOPEN above). This variable appears as 1 for all cases meeting any of the criteria. The unweighted number of such cases was 494, which makes up the black subsample.
0 'NOT BLACK'
1 'BLACK ON 128,129 OR OPEN'
HISPDUM 'DUMMY FOR HISPANIC'
Respondents may be treated as Hispanic is they gave an answer in the same series falling under Spanish surnamed or Hispanic. Note that it is possible for a person to meet both criteria. The overall number of such persons (unweighted) was 399, comprising the Hispanic subsample.
0 'NOT HISP'
1 'HISP ON 128,129 OR OPEN'
AGEGRP1 'GROUPED AGE (2003-BIRTHYR)'
Respondents were asked for their year of birth, from which – treating all those born in the same calendar year the same – chronological age may be calculated and grouped as shown below. Those who did not answer the birth year question fell into one of the two special categories for “dk” and “ref”, respectively.
1 '18-29 YEARS'
2 '30-44 YEARS'
3 '45-54 YEARS'
4 '55 OR MORE'
8 'DK BIRTHYR'
9 'REF BIRTHYR'
AGEGRP2 'IMPUTED AGE (MISSING SET TO MEAN)'
In order to calculate weights, it was necessary to assign cases missing birthyear to one or another of the agegroups. For this purpose, the 20 case for which the full information was not present were assigned the mean age of 46, placing them in category “3”.
1 '18-29 YEARS'
2 '30-44 YEARS'
3 '45-54 YEARS'
4 '55 OR MORE'
HHSIZE 'HH SIZE'
Number of adults from among whom respondent was chosen
LANGREST 'LANGUAGE RESTRICTION'
Cases where it appeared an interview would have to be done in Spanish were split off into a special “Spanish queue” for handling by multilingual interviewers. If these were later determined NOT to require Spanish, they were restored. This variable operates also as dummy for whether the interview was conducted in Spanish. Note that the latter is not a perfect indication of which language a given question was asked in, since interviewers had the operation of “toggling back and forth” between Spanish and English.
0 'NOT RESTRICTED INTERVIEW IN ENGLISH'
1 'RESTRICTED TO SPANISH INTERVIEWERS, INTERVIEW IN SPANISH'