Portraits of American Life Study
Hosted by The ARDA
The PALS seeks to understand the impact of religion in everyday life, and ultimately the connections between religious change and other forms of change among diverse individuals and families over the course of their lives.

Controlling for Complex Sampling Design when using PALS data:

Given that the PALS data was collected using a complex sampling technique, it is important to control for sampling weights, clustering and stratification (following obtained from Stata Manuals - "Survey Data").

1) Sampling Weights: In sample surveys, observations are selected through a random process, but different observations may have different probabilities of selection. Weights are equal to (or proportional to) the inverse of the probability of being sampled. Omitting weights from the analysis results in estimates that may be biased, sometime seriously so. Sampling weights also need to be taken in account when estimating standard errors, and for the purposes of testing and inference.
2) Clustering: Individuals are not sampled independently in almost all survey designs. Collections of individuals (for example: counties, city blocks, or households) are typically sampled as a group, known as a cluster. Sampling by cluster implies a sample-to-sample variability of the resulting estimator that is usually greater than that obtained through sampling individually, and this variability must be accounted for when estimating standard errors, testing, or performing other inferences.
3) Stratification: Because the PALS was collected using a single stage complex design, there is not a stratum variable to identify different strata.

Summary: It is important to use sampling weights to get the point estimates correct. We must consider the weights, clustering, and strata to obtain correct standard errors. If clustering is not accounted for, standard errors are likely to be smaller than they should be.

Using the Stata 'SVY' commands with the PALS

The 'svy' command in Stata fits statistical models for complex survey data. Because the PALS was collected using a single stage complex design, it is important to identify the data as such prior to statistical examination. The data contain a primary sampling unit identifier (psu_id) and primary sampling weight variable (pawt2). The 'svyset' command will incorporate these parameters into all subsequent models that use 'svy' commands in order to adjust for potential bias associated with the obtained point estimates from the sample design. This procedure allows for the sampling weights to be applied to all subsequent analyses given the design mentioned above. Please follow the directions below to set your data for further analyses.

The first step in obtaining statistics with the PALS is to declare it as complex survey data using the 'svyset' command. The 'svyset' command not only declares the data to be from a complex survey design but also designates variables that contain information about the survey design and specify the default method for variance estimation. The 'svyset' command must be run before using any 'svy' command for obtaining descriptive and model-based statistics.
 

Example Code:

  In order to be sure all settings for the data are clear run the following command:

svyset, clear

  Next use the 'svyset' command to identify the primary sampling unit for the single-stage design and set the sampling weight variable of the PALS data

svyset psu_id [pweight=pawt2]

For List and Explanation of which weight variable to use, see PALS WEIGHT DOCUMENTATION.

  Here primary sampling unit is identified using the 'psu-id' variable and the sampling weight is identified as the 'pawt2' variable.

  The command should produce the following output in order to identify the data as being properly weighted given the single-stage complex survey design.

pweight: pawt2
VCE: linearized
Single Unit: missing
Strata 1: <one>
SU 1: psu_id
FPC 1: <zero>


Now that the data is set to correct for the complex survey design, the 'svyset' command may be entered alone at anytime to recall the above settings. In order to insure your data is prepared for analysis, enter the 'svyset' command at the beginning of each Stata session. The above output will be repeated.

All subsequent analyses must now include the 'svy:' command before the estimation procedure. An example of an 'svy:' command for linear regression follows:

svy: regress depvar indepvars (linear regression with survey data)


SPSS Users

It is recommended that statistical analysis be conducted with the IBM SPSS Complex Samples add-on module to account for the PALS sample design and its use of sampling weights and clustering. Users must specify an "analysis plan" that tells the program which variables will be accounted for in the analysis plan (i.e., those that are integral to the sampling design). SPSS specifically requests information for the "strata." The PALS sample is stratified by the variable "re_race."

This is an example of the syntax for the CSPlan (Complex Sample Plan):

* Analysis Preparation Wizard.
CSPLAN ANALYSIS
/PLAN FILE='C:\directory_name\SPSS\file_name.csaplan'
/PLANVARS ANALYSISWEIGHT=pawt2
/SRSESTIMATOR TYPE=WOR
/PRINT PLAN
/DESIGN STRATA=re_race CLUSTER=psu_id
/ESTIMATOR TYPE=WR.

"WR" and "WOR" refer to WITH REPLACEMENT and WITHOUT REPLACEMENT. "WR" is used because it will allow for specifying additional stages of the data sampling procedure. See the SPSS Complex Samples Module users guide for more information.

Recommended Sources (containing examples and syntax):

--Stata Manuals (Current Edition) "Survey Data".

--IBM SPSS Complex Samples Manuals "Analysis Plan"

-- http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/sample_surveys/svy_commands/

-- http://www.ats.ucla.edu/stat/Stata/faq/svy_introsurvey.htm

 

Give us your feedback!

Through out these sections, we invite you to enter the discussion about these findings and what their implications may be. Click on the COMMENT link to send your thoughts to pals@rice.edu. Where appropriate we will include comments for others to read and discuss. Names of contributors will not be posted, but, if you like, include your religious tradition.

Search Geographical Profiles
ARDA's Geographic Profiles allow you to view maps of the social, economic, demographic, and religious landscape of the United States, all the way down to the neighborhood level. You can even generate a summary report using a variety of options.

Enter a zip code to see a map now:
US Congregational Membership Reports
Explore congregational membership across the United States. These data are provided by the Association of Statisticians of American Religious Bodies.

Enter your 5-digit zip code to see a religious profile now: