Max Rady College of Medicine
Concept: MCHP Data Quality SAS® Macros
Last Updated: 2013-11-15
Data Quality, encompassing the completeness and accuracy of the data, is an important aspect for program management as well as for research purposes. This concept briefly describes the MCHP Data Quality Framework that is applied to data sets acquired for the MCHP Data Repository and provides details on the tools that assist during this phase of the Data Management process. These tools include a set of SAS® Macros that are used to assess the quality of data of every data set in the MCHP Data Repository. This concept also describes the logical step-by-step process that applies these macros during a typical data quality assessment of a new data set.
The information contained in this concept comes from MCHP's original internal Data Quality Framework document (2012) and is based on information that was developed during the project that produced the deliverable
A Systematic Investigation of Manitoba's Provincial Laboratory Data
- The MCHP Data Quality Process is constantly evolving. On November 15, 2013, the MCHP Data Quality process / information was made available on our external web site. Please visit for more up-to-date information on the current data quality process, including the SAS macros available for download and the current documentation related to the Data Quality Process at MCHP.
MCHP Data Quality Framework
At MCHP, a six-step Data Management process was created with input from a variety of individuals, including MCHP researchers and data analysts, representatives from Manitoba Health and the province's Regional Health Authorities. For more information, please read
Chapter 2: The Data Management Process at MCHP
in the deliverable. As part of the Data Management process, a formal MCHP Data Quality Framework was developed, following a review of provincial, national, and international quality evaluation frameworks for secondary data sources and with consideration for data privacy legislation in Manitoba.
The MCHP Data Quality Framework allows MCHP to evaluate the quality of data it receives and assists with data quality evaluations during the research process. Within the Framework, "we identify dimensions of quality that are important for administrative data, measures of quality encompassed by the Framework, and tools (i.e.: SAS® macros) that have been developed to operationalize key components of the Framework". For more information, please read Chapter 3: A Framework for Data Quality Evaluation at MCHP in the deliverable.
The MCHP Data Quality Framework is illustrated in Figure 3.1: MCHP Data Quality Framework. This figure illustrates two different perspectives on data quality: 1) database-specific, and 2) project-specific.
1. Database-Specific Quality
"Database-specific quality encompasses concepts of accuracy, internal and external validity, timeliness, and interpretability. Each of these quality concepts is useful for assessing the usability of a database in a different way and is measured by one or more indicators using quantitative or qualitative methods. Some of the quality indicators are linked to macros, written in SAS and developed as part of this study. These macros can be used to generate summary data for a Data Quality Report."
The data quality SAS® macros developed by MCHP are described in the next section of this concept.
2. Project-Specific Quality
"Project-specific quality focuses on concepts of accuracy and validity. These concepts have the same meaning as the corresponding database-specific quality concepts, but indicators of these concepts are applied to a specific cohort, region, or time period that is the focus of the project. Accordingly, data quality evaluation results may vary for database-specific and project-specific analyses. For example, completeness of a field may be very high for the entire population, but if analysis is limited to a specific segment of the population (e.g., to study participants of a program), completeness may be much lower because of the characteristics of the program administration or population participation in the program."
MCHP Data Quality SAS® Macros
The information in this section relates to the original macros developed for the 2012 deliverable. Please see the
MCHP Data Quality web section
for more up-to-date information.
This section identifies and describes the five main SAS macros that are used to investigate the database-specific quality of data sets at MCHP. This includes a description of what each macro does, the syntax for the macro call, the parameters that are available for the macro and examples of the macro call. The majority of this information comes from Appendix C: Description of Macros for Data Quality Evaluation in the deliverable. For each macro, a link to the actual MCHP SAS macro code is provided. The SAS macro code is formatted in a text file, but contains valid SAS code.
NOTE: The main macros involve calls to other SAS macros that are nested within the main macros. These "intermediate" macros are listed and described in the Intermediate Macros section of this concept.
The five main data quality SAS macros are:
LINK macro; and
- AGREEMENT macro;
1. CONTENTS Macro
Description: This macro runs PROC CONTENTS for a series of tables within a specified Domain and Database and generates a single overview table.
Syntax: %CONTENTS (DOMAIN=, DB);
Parameters:DOMAIN= Database domain on SPDS
DB= Database prefix
- %CONTENTS (HEALTH,BMD);
CONTENTS Macro SAS Code: CONTENTS.sas.txt
2. VIMO Macro
Description: For a specified data file, this macro generates a table of valid, invalid, missing, and outlier (VIMO) observations. The table is in Excel format.
Syntax: %VIMO (DS= );
Parameters:DS= Name of data file, which is in SAS format. This could be a temporary or permanent SAS dataset
INVALID= Option to turn invalid checks on or off (Default value=ON)
MEMNUM= List number of cluster members to include in the VIMO table. This parameter is not specified if there are no clusters.
MUNCODES= List of variables containing municipal codes, separated by blanks.
POSTALS= List of variables containing postal codes, separated by blanks.
- %VIMO (health.MHCPL_virustests_19922010);
- %VIMO (DS = health.MHCPL_virustests_19922010, INVALIDS = OFF);
- %VIMO (DS = HEALTH.mhmed_1997apr, MEMNUM = 23 24 25);
- %VIMO (DS = social.hcm_edi_2006jan, MEMNUM = 3, POSTALS = POSTAL p_code_original CL_POSTAL P_CODE P_CODE_E POSTAL_CODE POSTAL_CODE_HCM );
VIMO Macro SAS Code: VIMO.sas.txt
3. TREND Macro
Description: This macro conducts a trend analysis for a specified period of time. The results are summarized in a graphical format. The graph(s) are shown on screen and also saved in a PNG file.
Syntax: %TREND (DS=, STARTYR=, ENDYR=, BYDATE=, BYVAR=, BYFMT=, BYMONTH=);
Parameters:DS= Name of dataset
STARTYR= Beginning fiscal year (1st part, 4-digit)
ENDYR= Ending fiscal year (1st part, 4-digit)
BYDATE= Desired date variable (Must be a SAS date)
BYVAR= An optional categorical variable to conduct stratified analyses. If omitted only one analysis is conducted for all records in the dataset.
BYFMT= An optional format for BYVAR.
BYMONTH= An optional parameter that will produce the analyses by month, instead of year, if assigned a value of YES (default value = NO).
- %TREND (DS=health.wrha_ccic_med_2003mar, STARTYR=2003, ENDYR=2010, BYDATE=admit_dt, BYVAR=HOSP);
- %TREND (DS=health.MHCPL_virustests_19922010, STARTYR=1992, ENDYR=2009, BYDATE=RECEIVEDDT);
- %TREND (DS=health.MHCPL_virustests_19922010, STARTYR=1992, ENDYR=2009, BYDATE=RECEIVEDDT, BYFMT=$HOSPFMTL.);
- %TREND (DS=health.MHCPL_virustests_19922010, STARTYR=1992, ENDYR=2009, BYDATE=RECEIVEDDT, BYMONTH=YES);
TREND Macro SAS Code: TREND.sas.txt
4. LINK Macro
Description: For a series of data files, this macro creates a table (or members of a cluster) that calculates the linkability of individual data files based on the personal health information number (PHIN) in the Manitoba Health Insurance Registry. The output is shown on screen and also saved in an Excel file. The macro will also generate a frequency table for PHIN types.
Syntax: %LINK (DOMAIN=, DB=, PHIN=);
Parameters:DOMAIN= Database domain
DB= Database prefix (or full name of cluster)
PHIN= Name of PHIN variable (Default=SCRPHIN)
TYPE= Name of PHINTYPE variable (Default=SCRPHINTYPE)
- %LINK (health,MHCPL);
- %LINK (health,MHCPL, PHIN=filephin);
- %LINK (DOMAIN=social, DB=hcm_edi_2006jan, PHIN=FILEPHIN, TYPE=FILEPHINTYPE);
LINK Macro SAS Code: LINK.sas.txt
5. AGREEMENT Macro
Description: This macro measures the agreement between a dataset and the Manitoba Health Insurance Registry and produces kappa statistics for sex and date of birth.
Syntax: %AGREEMENT (DS=, REGYR=, SEX=, M=, F=, BIRTHDT=);
Parameters:DS= Name of dataset
REGYR= Latest available registry file (Default=2010)
PHIN= Variable containing PHIN (Default=SCRPHIN)
SEX= Variable containing sex (Default=SEX)
M= Numeric value assigned to males (Default=1)
F= Numeric value assigned to females (Default=2)
BIRTHDT= Variable containing date of birth (Default=BIRTHDT)
- %AGREEMENT (DS=health.MHCPL_SPSECTION_19922010);
- %AGREEMENT (DS=health.MHCPL_SPSECTION_19922010, REGYR=2009);
AGREEMENT Macro SAS Code: AGREEMENT.sas.txt
Several of the main data quality macros call intermediate macros that assist in the processing of the main macro. These intermediate macros are listed alphabetically and include a description and a link to the macro SAS code:
- this macro gets the levels of character variables in a dataset and lists them. If there are too many then only first and last level will be shown. This is called by the VIMO macro.
AUTOLEVEL Macro SAS Code: AUTOLEVEL.sas.txt
- this macro creates a fiscal year format to be used on a SAS Date variable. The format will be saved as a SAS file in /dq/saswork/[USERNAME]/ under the name 'temp_fy.sas'. Use %include 'temp_fy.sas' in your programs to call it. The format name is 'fy.'
FISCALYR Macro SAS Code: FISCALYR.sas.txt
- for a given dataset this macro creates a temporary SAS dataset called contents (which is basically output of PROC CONTENTS). Also two macro variables NVFMT & CVFMT containing numeric and character variables with their formats separated by a blank will be created.
GETFORMAT Macro SAS Code: GETFORMAT.sas.txt
- this macro returns the Number of Observations in the given dataset through the macro variable NO. It is used in several of the main Data Quality macros.
GETNOBS Macro SAS Code: GETNOBS.sas.txt
- for a given dataset this macro creates two macro variables containing a list of all character and numeric variables separated by a blank: NVARLIST & CVARLIST. It is used in several of the main Data Quality macros.
GETVARLIST Macro SAS Code: GETVARLIST.sas.txt
- this macro is called from the VIMO macro. It compares values of character variables with the values in their associated formats and provides percentages and list of mismatching values. The VIMO macro captures and adds this information into VIMO tables. This macro also can be called directly (results can be found in Work Directory in a SAS dataset called Invalid).
INVALID Macro SAS Code: INVALID.sas.txt
- this macro creates a table for Percentage of linkable records for a single table (or a whole cluster) . The table will be shown on screen and also will be saved in Excel format in /dq/saswork/[USERNAME]/ under name of [LIBNAME]_[DSNAME]_linkability_over_years.xls
LINKYR Macro SAS Code: LINKYR.sas.txt
- this macro creates a monthly-based format to be used on a SAS Date variable. The format will be saved as a SAS file in /dq/saswork/[USERNAME]/ under the name 'temp_monthly.sas'. Use % include 'temp_monthly.sas' in your programs to call it. The format name is 'monthly.'
MONTHLY Macro SAS Code: MONTHLY.sas.txt
- this macro is called from a number of main macros for all numeric variables and adds it to VIMO's output.
OUTLIER Macro SAS Code: OUTLIER.sas.txt
- this macro is called from the VIMO MACRO if Postal and Municipal code validation is requested. Percentage of postal and municipal codes which don't match with the Canadian patterns for these codes will be reported in Invalid code column and the actual invalid codes will be added to the Comment column of VIMO Table.
POSTMUN Macro SAS Code: POSTMUN.sas.txt
Application of the Data Quality SAS Macros
The following section provides a logical, step-by-step application of the Data Quality SAS macros, and also introduces two additional SAS macros relevant to the Data Acquisition/Data Quality process at MCHP. These macros are described below, along with a link to the specific SAS macro code.
- this MCHP specific macro generates a Metadata dataset using PROC CONTENTS with OUT option for a series of tables within a specified Domain and Database that is used in the Data Quality and Documentation processes. The output data will be generated in a WORK directory and the user will save it later in its' proper location. The output dataset is named Meta_[DB]. Note that all the SAS formats must be available to produce additional descriptive output.
META Macro SAS Code: META.sas.txt
to generate a list of database contents.
to generate the VIMO table for the Data Quality report.
to generate the graphs for the Data Quality report.
- this Macro performs a few checks on the validity of PHIN particularly on SCRPHIN variable. For all individual specific PHINs, the PHINCHEK macro checks the position of numeric values within the PHIN and validates them. It also checks the distribution of the first position and compares it with the corresponding PHINs from registry files and if there is a significant difference it will be flagged on the results.
PHINCHECK Macro SAS Code: PHINCHECK.sas.txt
to calculate the linkability of individual data files for the Data Quality report.
to calculate the level of agreement between the data set and the registry file for the Data Quality report.
NOTE: The last three steps (macros) are "registry-based" and require access to the Manitoba Health Insurance Registry data.
Data Quality Report
Information generated from the Data Quality macros can be used to develop a Data Quality report for a specific data set. For an example Data Quality report, please see Chapter 4: Data Quality Report for the Cadham Provincial Laboratory Data in the deliverable.
Cautions / Limitations
The following cautions / limitations are provided:
these macros were developed to work specifically within the MCHP computing environment with specific data sets, file structures, directory settings and formats related directly to the MCHP data. If you choose to work with these macros, they will need to be adapted to your organization's data management practices.
The MCHP Data Quality SAS macros were developed using SAS® version 9.2 running on a Solaris (Unix) operating system. Therefore, some of the macro code is specific to these programming and operating system environments.
provide support for the original macros at other sites. Please use them at your own discretion.
- The information in this concept is based on the deliverable titled A Systematic Investigation of Manitoba's Provincial Laboratory Data (2012). For more current information related to data quality activities at MCHP, please visit the MCHP Data Quality web section / information .
- Data Management Process
- Data Quality Framework
- Database-Specific Quality
- Project-Specific Quality
- Statistical Analysis System / Statistical Analysis Software (SAS®)
- Validity - Data Quality
- Lix L, Smith M, Azimaee M, Dahl M, Nicol P, Burchill C, Burland E, Goh C, Schultz J, Bailly A. A Systematic Investigation of Manitoba's Provincial Laboratory Data. Winnipeg, MB: Manitoba Centre for Health Policy, 2012. [Report] [Summary] (View)
Manitoba Centre for Health Policy
Community Health Sciences, Max Rady College of Medicine,
Rady Faculty of Health Sciences,
Room 408-727 McDermot Ave.
University of Manitoba
Winnipeg, MB R3E 3P5 Canada