The MCHP SAS MANUAL - Exercises on simulated MB Health data


Home    Contents

Windows in SAS
File management

The SAS Program
Program syntax
Debugging tips

1. Prepare the data set 
   Types of data 
   Example programs    
2. View the data
   SAS Procedures
3. Explore the data  
   Numeric statistics    
   Frequency tables    
4. Manipulate the data  
   Basic techniques    
   New variables
5. Adding Variables and 
Observations to Data Sets
   The SET Statement
   The MERGE Statement

6. Data Processing
   ARRAY Statement
   Do Loops
   By-Group Processing
   RETAIN Statement

 Simulated clinical data 
 Simulated Manitoba Health 

Exercises on simulated MB Health data

Prior to completing the exercises, several steps are needed to prepare the data:

  • Open the lbls93 file in the Program Editor window to comment out the FORMAT statement (i.e., add * to the beginning of the statement). Save the revised file and clear the Program Editor window. The original values, rather than the formatted values of variables can thus be referenced in the programs (it is simpler, for example, to refer to regionre='1' rather than regionre='central Manitoba').

  • Open the program that creates the temporary SAS data set "test" from the simulated Manitoba Health data set into the Program Editor window. No changes need to be made to this program.

  • Submit the program and check the log for messages; the log should indicate that a temporary SAS data set called "test" (in the WORK library) was created for use for this SAS session.

It is assumed that the questions are completed during the course of one SAS session. If not, the data set must be re-created for the next SAS session, as well as the formats (the record layout shows which formats correspond with each of the variables in the data set).

Programs can be developed and tested in a number of different ways. If the programming for all the questions below is saved into one file, the user might, rather than submitting the entire file to test only portions of code, instead highlight the portion to be tested before pressing the submit key. The resulting log and output can thus be checked to ensure the code is accurate, before keeping it as part of the larger program.

For each of the questions, add: 1) a title descriptive of the data set being used, and 2) either a second title or a footnote indicating the question number. The same title can be used for each question, so there is no need to repeat the TITLE1 statement for the other questions (SAS will automatically keep the same title for the duration of the SAS session unless instructed otherwise).

  1. Produce the following listings of data:
    • For the first 20 observations, specify the following variables to be shown on the output (original values): gender, age, los, op01, diag01, and diag02.
    • Sort the data by gender and regionre and produce a listing of the first 40 observations. Display only ncase, gender, regionre, and icd17brk in the output. This time display the formatted, or labeled, values rather than the original values for all except ncase.

  2. For a later exercise, utilization for Winnipeg vs non-Winnipeg residents will be compared. Create two formats, one that will be used to group regionre into new values and one that will be used to label the new values:
    • Name the grouping format $wpgf; this format should be able to group the Winnipeg value into '1' and non-Winnipeg values into '0',
    • Name the labeling format $wpgl; this format should be able to label each of the two new values.

      Although this question could be done using only one format (i.e., specifying the label 'Winnipeg' in the first format instead of '1'), the two-step process is typically used, for example, to simplify specification of values of the new variable within a SAS program - e.g., to be able to use '1' within a line of code rather than 'Winnipeg' to reference Winnipeg records.

  3. Obtain information on the number of observations and the mean, minimum, and maximum values, setting maximum decimal places to 2 for the following:
    • The variables for age, length of stay, and days to death.
    • Note the skewed results for deathsep. The value of 9999 actually refers to those still alive. Run a program for this variable only, including a WHERE statement to keep only the values which are less than 9999.
    • The variables for age and length of stay, this time showing the results by region of residence. Use the region format to attach labels to region of residence.

  4. How does the distribution of hospital discharges for selected categories of ICD-9-CM diagnoses icd17brk differ by gender gender? Display the information using original values and again using formatted values.

  5. Examine the relationship among variables for the following:
    • Is the presence of high-risk diagnoses on admission charyes associated with neighbourhood income level incdr? Display the formatted values for both variables.
    • How does the relationship between these two variables differ by gender (use the formatted value for this variable as well)?

  6. Develop a program that will create the following new variables (always within a data step):
    • loswks - a numeric variable that has values of length of stay calculated in weeks.
    • losgroup - a character variable that groups length of stay into 3 categories (0 to 30 days, 31 to 365 days, and 366+ days). A grouping format can be created, and a labeling format; this PROC FORMAT step will need to go before the data step creating these variables.
    • wpgres - a character variable created from region of residence that uses the previously created $wpggrp format.
    • diag3x - a character variable created from diag01 that will include only the first 3 digits.
    • op2x - a character variable created from op01 that will include only the first 2 digits.
    Also create labels for each of the 5 new variables within the same data step. Before submitting the DATA step, add the PROCs in the next question to the program.

  7. Check the new variables:
    • For losgroup and wpgres, use a side-by-side listing (PROC FREQ) to compare original variables against the new variables, ensuring that labeled values are used for the 3 character variables. Both comparisons can be run within the same PROC FREQ.
    • For loswks (the only new numeric variable) and los, run a PROC MEANS.
    • For the remaining two character variables, run a PROC PRINT for the first 30 observations, showing both original and new variables (i.e., output for a total of 4 variables).
    • Do a PROC CONTENTS on the data set to ensure the new variables were properly labeled.
The program, log, and output are all available for the above questions. For additional practice, another set of more research-focused questions has been developed.

Contact: Charles Burchill       Telephone: (204) 789-3429
Manitoba Centre for Health Policy
Department of Community Health Sciences, University of Manitoba
4th floor Brodie Centre
408 - 727 McDermot Avenue
Winnipeg, Manitoba R3E 3P5       Fax: (204) 789-3910
Last modified on Wednesday, 24-Aug-2005 13:43:35 CDT