Collecting the Data
Weighting the Data
Michael O. Emerson, Rice University
David Sikkink, University of Notre Dame
Adele D. James, Rice University
The Portraits of American Life Study (PALS) is a multi-level panel study of adult in the U.S., focused on religion and several other topics in the U.S., with a particular emphasis on capturing ethnic and racial diversity. From April to October 2006, face-to-face interviews were conducted with 2,610 respondents. The following sections describe the sampling methodology, data collection, outcome rates, and data weighting.
In 2012, over 1300 PALS respondents were re-interviewed, as well as approximately 100 new respondents who were living in the household of the original 2006 respondents and 14-18 in 2006 (so they were 20-24 in 2012). To go directly to the methodology of the 2012 survey, go to the bottom of each section.
In 2006, to obtain a probability sample, yet achieve the goal of racially diverse oversamples, a four stage sampling procedure was used. The sample design and interviews were conducted by RTI International, the second largest independent nonprofit research organization in the United States. The PALS covers the civilian, non-institutionalized household population in the continental U.S. who were 18 years of age or older at the time the survey was conducted, and speak English or Spanish. The sampling frame was based on the use of residential mailing lists supplemented with a frame-linking procedure to add households not included on the lists to the frame. In a recently completed national household survey, RTI estimated that this combined sampling frame accounted for over 98% of the occupied housing units in the U.S.
As noted above, RTI selected the sample in four stages. At the first stage, they used Census data to construct a nationally representative sampling frame of Primary Sampling Units (PSUs) defined as three-digit Zip Code Tabulation Areas. After the frame was constructed, RTI selected a first-stage sample of 60 PSUs with probabilities proportional to a composite size measure that weights PSUs with concentrations of minorities higher than other PSUs with the same number of addresses. The sample of 60 PSUs yielded a variety of local areas from across the country and provided an adequate number of degrees of freedom for variance estimation. While the use of composite size measures reduced screening costs by focusing the sample on PSUs with concentrations of minorities, it should be noted that the coverage of the sample was not adversely affected because PSUs that were mostly “nonminority” had a chance of being selected, as were non-minority households within mostly “minority” PSUs.
At the second stage, RTI selected two five-digit Zip Codes from each selected PSU (120 Zips in all) again with composite size measures that weights SSUs with concentrations of minorities higher than other SSUs with the same number of addresses.
At the third stage, RTI selected an average about 100 addresses from each selected Zip Code. From these, some were found ineligible because they were not occupied, had no English or Spanish speakers (rarely), or due to physical and mental incompetence. After the addresses were selected, RTI produced digital maps for a sub-sample of selected addresses to facilitate the use of the half-open interval (HOI) frame-linking procedure that identified and included housing units that are not on the mailing lists. Housing units may be missing because of new housing units built in the time between frame development and data collection, or because of errors in frame development stage. Field Interviewers reported to the home office any missing housing units that are not on the field enumeration. When confirmed by the home office that the units were excluded from the field enumeration, the missed unit was added to the sample to improve coverage (McMichael, et al, 2008). |
At the fourth stage, RTI selected one per selected housing unit for interview. RTI generated a sample selection table for use by the Field Interviewers at each address to randomly determine which eligible person at the address should be asked to participate in the study.
After data collection was completed, RTI assigned a sampling weight to each respondent that reflected his/her probability of selection at each stage. The weight was calculated as the inverse of the overall selection probability and can be thought of as the number of persons in the population that the sample member represents. Moreover, and importantly, RTI used Census projections to post-stratify the weights of respondents to compensate for differential non-response and noncoverage. Also, due to the design, the data should be analyzed to correct for clustering (by obtaining correct standard errors). Programs such as STATA or SPSS’ Complex Samples are designed for calculating corrected standard errors and significance tests.
In 2012, our sampling strategy was much simpler: attempt to locate and interview the 2006 respondents, plus interview anyone living in the households of PALS respondents who in 2006 were ages 15-17 (and thus had become adults by 2012). For our 2012 response rate, click here.
For the 2006 interviews, to establish a strong baseline and connection with respondents, interviews were done in-home. Advance letters were mailed to all selected households four to five days before interviewers’ initial visits to the sample households. Interviewers then visited sample households and completed a screening interview, narrowing our sample to meet the subsample goals as well as and identify English- or Spanish-speaking adults. The screening was conducted using a paper-and-pencil instrument (PAPI). Upon selecting a respondent from the household and if the respondent agreed to participate, a questionnaire was administered using a laptop computer. Respondents were paid an incentive of $50 to complete the interview, which took an average of 80 minutes.
A portion of the questionnaire covered sensitive topics such as relationship behaviors and quality, deviance, attitudes about race and ethnicity, moral attitudes, and religious beliefs and authority. At this point, the respondent was given a device for audio computer-assisted self-interviewing (ACASI) to complete about 70 questions. During this portion of the survey, the respondent wore earphones to hear the prerecorded questions, and entered their responses directly into the computer, apart from the knowledge or aid of the interviewer.
In addition to the primary questionnaire, other PAPI instruments were left behind or mailed to spouses or partners at a later date to complete and return on their own. A $15 incentive check was mailed to all spouses or partners who returned a completed questionnaire.
For the 2012 interviews, our aim was to conduct most interviews on-line. We gave respondents the option to do the survey by telephone as well. Thus, so we could later assess the impact of mode of interview, we randomly assigned some respondents to conduct the survey by telephone. Most occurred on-line (80%), another 13% were by telephone, and 7% were in-person. The latter were done because these respondents neither responded to requests for the on-line survey nor a telephone interview. Mode of interview made very little impact, seemingly affecting just 5 variables. For the list of those variables and the analysis, see the mode analysis PowerPoint. Researchers concerned about mode-of-response effects can include in their analysis the variable phoneonlyflag_w2.
Respondents were paid $50 for on-line surveys, $30 for phone surveys, and $50 for in-person surveys.
We calculated outcome rates -- contact rate, screening completion rate, cooperation rate, and response rate -- for the PALS, Wave 1 using the appropriate formulas based on the definitions provided by the American Association for Public Opinion Research (AAPOR, 2009).
In 2006, of the homes in which interviewers attempted to reach an eligible respondent, 83% were successfully contacted. Of those contacted persons, 86% were screened. Of the persons screened and selected for an interview, 82% completed an interview. This yielded a response rate of 58% (.83 contact rate x .86 screening completion rate x .82 cooperation rate).
In 2012, the overall follow-up response rate was 51%. If we remove those 2006 respondents no longer living in 2012, the response rate was 53%. (To return to the PALS sampling section, click here.)
By applying the weight variable, the 2006 national-level PALS sample closely mirrors the averaged 2005-2007 American Community Surveys (ACS) estimates . Table 1 compares the unweighted and weighted percentages for certain demographics alongside the ACS figures. Once the weight variable is applied, the distributions by race and ethnicity, household income, educational attainment, median age, and marital status for the two sets of data are not significantly different from each other.
Table 1: Comparison of 2007 PALS and Averaged 2005-2007 American Community Surveys (ACS)
|Race & Ethnicity|| || || |
|% White, Non-Hispanic||48%||69%||69%|
|% Black, Non-Hispanic ||20%||11%||11%|
|% Hispanic ||20%||12%||13%|
|% Asian, Non-Hispanic ||7%||4%||4%|
|% Married ||46%||57%||53%|
| Median Age in Years ||42||45||45|
|Household Income|| || || |
|Less than $30,000 ||37%||30%||30%|
|$30,000 up to $59,999||30%||31%||28%|
|$60,000 up to $99,999||21%||24%||23%|
|$100,000 and up ||12%||16%||19%|
| Educational Attainment || || || |
|Less than High School ||14%||12%||16%|
|H.S. Grad up to a 4-yr degree ||60%||60%||59%|
|Bachelors' Degree ||17%||16%||16%|
| Advanced Degree ||10%||11%||9%|
Due to rounding, figures may not add to 100%.
The same is true for 2012. Weighting was based on the 2010 U.S. Census. The U.S. Census was used instead of later versions of the ACS because only one additional year of the ACS (2011) was available at the time of the 2012 weight creation, and because of the higher reliability of the census. The weights correct for any bias in who did not respond in 2012, but did in 2006.
Weights were also created for analyzing CHANGE and CONTINUITY between 2006 and 2012. For the full details of all the weights, how they were created, and which weights to use when, see the PALS Weight Documentation.
i The American Association for Public Opinion Research, Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys, Revised 2009. See http://www.aapor.org/AM/Template.cfm?Section=Standard_Definitions1&Template=/CM/ContentDisplay.cfm&ContentID=1814
ii The number of persons contacted was adjusted to exclude an estimated number of refusals that would not have met the PALS screening criteria. This estimate was based on the ratio of the actual count of persons not selected for interview (2,109) to the total number of completed screenings (5,689) and applied to the number of refusals at the screening stage.
iii Outcome rates for the PALS Wave 1 were calculated using the appropriate formulas based on the definitions provided by the American Association for Public Opinion Research (AAPOR, 2009). For the contact rate, we used Contact Rate 2 (CON2), which is calculated as ((I+P)+R+O)/((I+P)+R+O+NC+e(U+UO)) where I=completed screening, P=partial screening, R=refusal and break-offs, NC=non-contacts, O=other contacts, e=estimated proportion of cases of unknown eligibility that are eligible, UH=unknown if household occupied, and UO=other unknown. Using the proportional allocation or CASRO method, the estimated eligible cases were calculated by applying the proportion of all persons screened of the determined eligible cases to the number of unknown eligibility cases. A screening completion rate was calculated to reflect the nonresponse during the screening phase. This rate is the proportion of completed screenings to the adjusted number of persons contacted. For the interview rate or cooperation rate, we used AAPOR’s Cooperation Rate 2 (COOP2), as (I+P)/(I+P)+R+O where I=completed interview, P=partial interview, R=refusal and break-offs, and O=other contacts. Because the data collection involved a two-stage process of screening in order to meet race subsample targets and then interviewing those persons, we present a response rate based on taking the product of the contact rate, a screening completion rate (reflecting refusals before the screening), and the cooperation rate, specifically, 83% x 86% x 82% = 58%. AAPOR recognizes this approach for a multi-stage design, because we are able to demonstrate that the PALS sample is representative of the US population, using the 2005-2007 American Community Survey as the comparison. At the end of the data collection period, a sample of 620 households was opened and work begun on contacting them (preliminary letters sent out, in some cases an initial contact made), but not concluded. These cases were excluded from all of the outcome rate calculations. If we include those households in the potential sample, the contact rate is 82%, the screening completion rate is 84%, the cooperation rate remains 82% and the response rate becomes 56% (.82 x .84 x .82).
iv US Census Bureau (2009). American Community Survey, 3-year estimates 2005-2007. Internet release date: January 16, 2009. Retrieved July 30, 2009 from http://factfinder.census.gov/servlet/DatasetMainPageServlet?_program=ACS&_submenuId=&_lang=en&_ts=