Controlling for Complex Sampling Design when using PALS data:
Given that the PALS data was collected using a complex sampling technique, it is important to control for sampling weights, clustering and stratification (following obtained from Stata Manuals - "Survey Data").
Sampling Weights: In sample surveys, observations are selected through a random process, but different observations may have different probabilities of selection. Weights are equal to (or proportional to) the inverse of the probability of being sampled. Omitting weights from the analysis results in estimates that may be biased, sometime seriously so. Sampling weights also need to be taken in account when estimating standard errors, and for the purposes of testing and inference.
Clustering: Individuals are not sampled independently in almost all survey designs. Collections of individuals (for example: counties, city blocks, or households) are typically sampled as a group, known as a cluster. Sampling by cluster implies a sample-to-sample variability of the resulting estimator that is usually greater than that obtained through sampling individually, and this variability must be accounted for when estimating standard errors, testing, or performing other inferences.
Stratification: Because the PALS was collected using a single stage complex design, there is not a stratum variable to identify different strata.
Summary: It is important to use sampling weights to get the point estimates correct. We must consider the weights, clustering, and strata to obtain correct standard errors. If clustering is not accounted for, standard errors are likely to be smaller than they should be.
Using the Stata 'SVY' commands with the PALS
The 'svy' command in Stata fits statistical models for complex survey data. Because the PALS was collected using a single stage complex design, it is important to identify the data as such prior to statistical examination. The data contain a primary sampling unit identifier (psu_id) and primary sampling weight variable (pawt2). The 'svyset' command will incorporate these parameters into all subsequent models that use 'svy' commands in order to adjust for potential bias associated with the obtained point estimates from the sample design. This procedure allows for the sampling weights to be applied to all subsequent analyses given the design mentioned above. Please follow the directions below to set your data for further analyses.
The first step in obtaining statistics with the PALS is to declare it as complex survey data using the 'svyset' command. The 'svyset' command not only declares the data to be from a complex survey design but also designates variables that contain information about the survey design and specify the default method for variance estimation. The 'svyset' command must be run before using any 'svy' command for obtaining descriptive and model-based statistics.
||In order to be sure all settings for the data are clear run the following command:
||Next use the 'svyset' command to identify the primary sampling unit for the single-stage design and set the sampling weight variable of the PALS data
svyset psu_id [pweight=pawt2]
For List and Explanation of which weight variable to use, see PALS WEIGHT DOCUMENTATION.
||Here primary sampling unit is identified using the 'psu-id' variable and the sampling weight is identified as the 'pawt2' variable.
||The command should produce the following output in order to identify the data as being properly weighted given the single-stage complex survey design.
Single Unit: missing
Strata 1: <one>
SU 1: psu_id
FPC 1: <zero>
Now that the data is set to correct for the complex survey design, the 'svyset' command may be entered alone at anytime to recall the above settings. In order to insure your data is prepared for analysis, enter the 'svyset' command at the beginning of each Stata session. The above output will be repeated.
All subsequent analyses must now include the 'svy:' command before the estimation procedure. An example of an 'svy:' command for linear regression follows:
svy: regress depvar indepvars (linear regression with survey data)
It is recommended that statistical analysis be conducted with the IBM SPSS Complex Samples add-on module to account for the PALS sample design and its use of sampling weights and clustering. Users must specify an "analysis plan" that tells the program which variables will be accounted for in the analysis plan (i.e., those that are integral to the sampling design). SPSS specifically requests information for the "strata." The PALS sample is stratified by the variable "re_race."
This is an example of the syntax for the CSPlan (Complex Sample Plan):
* Analysis Preparation Wizard.
/DESIGN STRATA=re_race CLUSTER=psu_id
"WR" and "WOR" refer to WITH REPLACEMENT and WITHOUT REPLACEMENT. "WR" is used because it will allow for specifying additional stages of the data sampling procedure. See the SPSS Complex Samples Module users guide for more information.
Recommended Sources (containing examples and syntax):
--Stata Manuals (Current Edition) "Survey Data".
--IBM SPSS Complex Samples Manuals "Analysis Plan"