Prior to completing the exercises, several steps are needed to prepare the data:
- Open the lbls93 file in the Program Editor window to comment out the FORMAT
statement (i.e., add * to the beginning of the statement). Save
the revised file and clear the Program Editor window.
The original values, rather than the formatted
values of variables can thus be referenced in the programs (it is simpler,
for example, to refer to regionre='1' rather than regionre='central Manitoba').
- Open the program that creates the temporary
SAS data set "test" from the simulated Manitoba Health data set
into the Program Editor window. No changes need to be made to
this program.
- Submit the program and check the log for messages; the log should
indicate that a temporary SAS data set called "test" (in the WORK library)
was created for use for this SAS session.
It is assumed that the questions are completed during the course of one SAS
session. If not, the data set must be re-created for the next SAS
session, as well as the formats (the record
layout shows which formats correspond with each of the variables
in the data set).
Programs can be developed and tested in a number of different ways. If the
programming for all the questions below is saved into one file,
the user might, rather than submitting the entire file to test only portions of
code, instead highlight the portion to be tested before pressing the submit key. The
resulting log and output can thus be checked to ensure the code is accurate,
before keeping it as part of the larger program.
For each of the questions, add: 1) a title descriptive of the data set being
used, and 2) either a second title or a footnote indicating the
question number. The same title can be used for each question, so
there is no need to repeat the TITLE1 statement for the other questions
(SAS will automatically keep the same title
for the duration of the SAS session unless instructed otherwise).
- Produce the following listings of data:
- For the first 20 observations, specify the following variables
to be shown on the output (original values):
gender, age, los, op01, diag01,
and diag02.
- Sort the data by gender and regionre and produce
a listing of the first 40 observations. Display only ncase,
gender, regionre, and icd17brk in the
output. This time display the formatted, or labeled, values
rather than the original values for all except ncase.
- For a later exercise, utilization for Winnipeg vs non-Winnipeg
residents will be compared. Create two formats,
one that will be used to group regionre into new values and one
that will be used to label the new values:
- Name the grouping format $wpgf; this format should be
able to group the Winnipeg value into '1' and non-Winnipeg values into '0',
- Name the labeling format $wpgl; this format should be able
to label each of the two new values.
Although this question could be done using only one format (i.e.,
specifying the label 'Winnipeg' in the first format instead of '1'), the
two-step process is typically used, for example, to simplify specification
of values of the new variable within a SAS program - e.g., to be able to use
'1' within a line of code rather than 'Winnipeg' to reference Winnipeg records.
- Obtain information on the number of observations and the mean, minimum, and
maximum values, setting maximum decimal places to 2 for the following:
- The variables for age, length of stay, and days to death.
- Note the skewed results for deathsep. The value of 9999
actually refers to those still alive. Run a program for
this variable only, including a WHERE statement to keep only the
values which are less than 9999.
- The variables for age and length of stay, this time showing the
results by region of residence. Use the region format to attach
labels to region of residence.
- How does the distribution of hospital discharges for selected categories
of ICD-9-CM diagnoses icd17brk differ by gender gender?
Display the information using original values and again using formatted
values.
- Examine the relationship among variables for the following:
- Is the presence of high-risk diagnoses on admission charyes
associated with neighbourhood income level incdr? Display the
formatted values for both variables.
- How does the relationship between these two variables differ by
gender (use the formatted value for this variable as well)?
- Develop a program that will create the following new variables (always within a data step):
- loswks - a numeric variable that has values of length of stay
calculated in weeks.
- losgroup - a character variable that groups length of
stay into 3 categories (0 to 30 days, 31 to 365 days, and
366+ days). A grouping format can be created, and a labeling
format; this PROC FORMAT step will need to go before the data
step creating these variables.
- wpgres - a character variable created from region of residence
that uses the previously created $wpggrp format.
- diag3x - a character variable created from diag01
that will include only the first 3 digits.
- op2x - a character variable created from op01 that
will include only the first 2 digits.
Also create labels for each of the 5 new variables within the same
data step. Before submitting the DATA step, add the PROCs in the next
question to the program.
- Check the new variables:
- For losgroup and wpgres, use a side-by-side
listing (PROC FREQ) to compare original variables against
the new variables, ensuring that labeled values are used for
the 3 character variables. Both comparisons can be run within
the same PROC FREQ.
- For loswks (the only new numeric variable) and los,
run a PROC MEANS.
- For the remaining two character variables, run a PROC PRINT
for the first 30 observations, showing both original and
new variables (i.e., output for a total of 4 variables).
- Do a PROC CONTENTS on the data set to ensure the new variables
were properly labeled.
The program, log,
and output are all available for
the above questions. For additional practice, another set of more
research-focused questions has been
developed.
|