The following examples include a line-by-line explanation of the command syntax. The examples are based on data from Manitoba Health and from the Manitoba Bureau of Vital Statistics (Statistics Canada).
Abbreviations used in the examples are as follows:
VSTS: Vital Statistics
MHSIP: Manitoba Health Services Insurance Plan
SEX: Sex
DYR: Death year
DMO: Death month
DDA: Death day
BYR: Birth year
BMO: Birth month
LCA: Locality
MST: Marital status
INT: Initials
A. TESTPW Module
SyntaxSAS-PC format:
%_TESTPW(<options>);
%_VAR(<variables>);
%_RUN;
Unix:_TESTPW <options>;Notes:
_VAR <variables>;
_RUN;
- The _TESTPW statement has three options:
1. DATA=SAS-data-set - Specifies the input dataset. If the DATA= option is omitted, the most recently created SAS dataset is used.
2. OUT=SAS-data-set - Specifies the dataset containing the output from the TESTPW procedure.
3. FREQ=YES | NO - Prints frequency tables for variables appearing on the _VAR statement. The default is YES.
The _VAR statement lists the variables to be used in the analysis.
- The _RUN statement indicates the end of the procedure statements. This statement is required.
- The TESTPW procedure is run separately on each dataset to be used in the linkage.
Example of Program
From these tables it can be noted that:_TESTPW DATA=VSTS FREQ=NO;_VAR SEX DYR DMO DDA BYR BMO LCA MST;
_RUN;
_TESTPW DATA=MHSC FREQ=NO;
_VAR SEX DYR DMO DDA BYR BMO LCA MST;
_RUN;
OutputNote: the FREQ=NO option has been selected to suppress frequency tables).
DATA=VSTS:
Variable Missing Levels Discriminating Power Shannon Entropy BYR 0 98 55.5323 58.1759 DDA 0 31 49.4068 49.5028 BMO 970 12 35.8170 35.8330 DMO 0 12 35.7624 35.8068 DYR 0 4 19.9890 19.9945 LCA 49 24 15.9797 28.1261 MST 924 3 10.3678 11.5908 SEX 0 2 9.8636 9.9314 DATA=MHSC:
Variable Missing Levels Discriminating Power Shannon Entropy BYR 0 91 56.0918 58.5864 DDA 0 31 49.4451 49.4872 BMO 394 12 35.8168 35.8330 DMO 0 13 35.7973 35.8311 DYR 0 4 19.9925 19.9962 LCA 8 24 16.3289 28.5645 MST 110 3 12.9641 14.0811 SEX 0 2 9.5929 9.7925
1. The variable BYR (birth year) is the most powerful discriminator in each dataset and, therefore, the most useful for linkage purposes. Conversely, sex is the least powerful and the least useful for linkage.2. There is a substantial number of missing values for several of the VSTS variables. This could suggest data quality problems, such as coding errors. Before proceeding with the linkage, it would be advisable to review the data and correct as many of these errors as possible.
SyntaxSAS-PC format:
%_TESTPK(<options>);Unix:
%_VAR(<variables>);
%_RUN;_TESTPK <options>;
_VAR <variables>;
_RUN;
Notes:
- The _TESTPK statement has one option:
DATA=SAS-data-set - Specifies the input dataset. If the DATA= option is omitted, the most recently created SAS dataset is used.
- The _VAR statement lists the variables used to define the pockets. One pocket definition is created for each variable on the _VAR statement. For example, statement:
_VAR A B C;would generate output for three pockets: the pocket defined by A, the pocket defined by A and B, and the pocket defined by A, B and C.
For each pocket, the following information is provided:1. N: The number of pockets.
2. MIN: The minimum number of records in a pocket.
3. MAX: The maximum number of records in a pocket.
4 MEAN: The mean number of records in a pocket.
- The _RUN statement indicates the end of the procedure statements. This statement is required.
Example of Program
_TESTPK DATA=VSTS;_VAR SEX DYR DMO DDA BYR BMO LCA MST;
_RUN;
Output
DATA=VSTS:
N Min Max Mean SEX 2 12 823 15 592 14 207.5 SEX DYR 8 3 091 4 140 3 551.9 SEX DYR DMO 96 214 472 296.0 SEX DYR DMO DDA 2 936 1 173 9.7 SEX DYR DMO DDA BYR 25 242 1 13 1.1 SEX DYR DMO DDA BYR BMO 28 053 1 3 1.0 SEX DYR DMO DDA BYR BMO LCA 28 316 1 3 1.0 SEX DYR DMO DDA BYR BMO LCA MST 28 407 1 2 1.0
Syntax
SAS-PC format:
%_LINKEX(<options>);Unix:
%_BY(<variables>);
%_VAR(<variables>);
%_LPX('SAS statement');
%_WTX('SAS statement');
%_RUN;
_LINKEX <options>;
_BY <variables>;
_VAR <variables>;
_LPX 'SAS statement';
_WTX 'SAS statement';
_RUN;
Notes:
- The _LINKEX statement has three options:
1. DATA1=SAS-data-set - Specifies the first input dataset.
2. DATA2=SAS-data-set - Specifies the second input dataset.
3. LMIN=Numeric-value - The minimum number of variables listed in the _VAR statement which must agree exactly. The number specified by LMIN= is in addition to those listed in the _BY statement. This specifies only the number of variables that must agree, not which ones. The _LPX statement gives greater control over which variables must match. This statement is optional. The default value is 1
- _BY <variables> Specifies the variables that must match exactly. If a _BY statement is specified without a _VAR statement, then the program will perform a deterministic linkage only, creating a file of records which match uniquely on all _BY variables. This is equivalent to a Merge statement in SAS.
The _BY statement is optional, but highly recommended to avoid formation of spurious links, and to increase the efficiency of the linkage (see documentation for the _TESTPK module).
- _VAR <variables> Specifies the variables which are considered when there is agreement on the _BY variables.
The _VAR statement is required for probabilistic linkage