Concept: Socioeconomic Factor Index (SEFI) - Version 2 (SEFI-2)
Last Updated: 2023-03-23
NOTE: The SEFI-2 methodology was first introduced in Metge et al. (2009). Since that time, the SEFI and SEFI-2 terms have been used interchangeably in MCHP research, but the methodology reflects the SEFI-2 definition.This concept provides detailed information on the SEFI-2, including:
Census Variables Used to Create the SEFI-2
The SEFI-2 uses four variables that come directly from, or are derived from, the Census data. These include:
- unemployment rate age 15+;
- average household income age 15+;
- proportion of single parent households (derived); and
- proportion of population age 15+ without high school graduation (derived).
NOTE: The SAS code examples below contain specific variable names used in the 2011 Census and the coding necessary to calculate the SEFI-2 factor scores. The specific Census variable names change slightly over time, depending on the Census year, so these descriptions are generalizable for other Census years.
Major Steps in Creating the Index
The six major steps in creating values for the SEFI-2 at MCHP, as summarized from the SAS code examples below, include:Step 1 - read in Census data at the dissemination area (DA) level and at the Census Subdivision (CSD) levels;
Step 2 - calculate the derived variables from the Census at the DA and CSD level. The Unemployment Rate 15+ and Average Household Income Age 15+ variables come directly from the Census data. The two derived variables are:
- proportion of single parent households, and
- proportion of population age 15+ without high school graduation.
Step 3 - merge the data files at the DA level. Check for missing values at the DA level;
Step 4 - if data is missing at the DA level, impute the missing values from the Census data at the CSD level. Check for values that are missing at the CSD level;
Step 5 - for people living in First Nations Communities with no Census data available (due to suppression of data by Statistics Canada or non-participation of the First Nations community), impute the appropriate weighted values for either North or South communities based on geographic location. The postal code conversion file (PCCF) is used to provide geographical information for assigning First Nations Communities to the North and South groups.
Step 6 - run the SAS procedure PROC FACTOR (factor analysis - see description below) on the data set containing the four variables, to calculate the SEFI-2 value for each DA. Save a copy of the "SEFI-2 Values" file for use in other work. The value for each DA can then be assigned to the corresponding records (based on postal code) in other data sets.
Statistical Methodology - Factor Analysis
The SEFI is a composite index, a mathematical combination of several variables that generate a single number from all the variables. The SEFI-2 was generated using a SAS procedure called factor analysis, a "statistical procedure that identifies the common variance amongst a set of observed variables (i.e., indicators), and creates a factor (i.e., index) comprised of that common variance. The factor scores are calculated with a linear equation that incorporates a weighted contribution of each of the variables that are included in the analysis. The contribution (i.e., weight) of each variable is relative to the amount of variance in common with the other variables." (Metge et al., 2009).
For a thorough discussion of factor analysis and how the composite index was created, please read the following information available in the MCHP Composite Measures/Indices of Health and Health System Performance (2009) deliverable:
SEFI-2 Factor Scores
SEFI-2 Factor Scores have been developed for Census years 1981 to 2021. In addition, the mean and median scores by RHA and for Manitoba are also available for the index for each 5-year Census interval from 1981 to 2021. This information is available in the following two pdf documents:
- As with income quintiles, there will be postal codes that have a missing factor score. This may be because of a variety of reasons including:
- Census data was suppressed (and a valid value could not be imputed);
- The postal code was not in the postal code conversion file (PCCF); and
- The postal code is new, and not in the format
SAS Code Examples and Formats
SAS® code and formats have been developed for using the SEFI-2. The SAS Code and Formats section below contains example SAS code/programs for running the SEFI-2 factor analysis. Both internal and external versions of this SAS code are available.
SAS formats are available in the MCHP SAS Format Library (internal access only) for applying the SEFI-2 factor scores to a population via postal codes. The formats are available annually from 1979 to 2021. The index uses Census data from the years 1981, 1986, 1991, 1996, 2001, 2006, 2011, 2016 and 2021, and the Census data used for each annual format is the one that is closest to the year in question, within +/- 2 years. For example, the 2016 Census values are applied to the formats for calendar years 2014-2018.
A valid Manitoba postal code is required to use the format and assign the index score to a record. The following example calculates the corresponding factor scores by postal code for the year 2011.sefi = input(postal,sefi11f.);
For 2021, the following code is used as a method to calculate SEFI values, using the 2021 Census data.sefi = input(postal,sefi21f);NOTE: the annual format names usually include only the last two digits of the applicable calendar year.
- The formats apply the average factor score value for a particular postal code, but some postal codes may contain more than one Dissemination Area (DA)/Enumeration Area (EA) and vice versa. In some situations, especially First Nations communities that are adjacent to a town and share the same postal code, a higher SES value may be assigned to that area. In the case where you are analyzing data by small geographical area or by First Nations community, it is suggested the SAS code available below - Applying the Factor Scores to the Population - SAS Code (internal access only) - should be used to apply the index score to the population, not the formats.
Metge et al. (2009)
In the Composite Measures/Indices of Health and Health System Performance deliverable by Metge et al. (2009), they investigated and developed a number of composite indices designed to measure specific aspects of health, use of health services and health system performance. The pros and cons of using composite indices are presented in Table 1.1.
The SEFI-2 was one of the indices developed for this project, and represents the most recent version of the SEFI. The SEFI is used as a proxy measure of socioeconomic status. For more information on the SEFI and SEFI-2, please read section 8.6.1 Socioeconomic Factor Index.
In this deliverable, measures for both the original SEFI and the new SEFI-2 were developed and presented. Graphic illustrations of the SEFI and SEFI-2 scores by Regional Health Authority (RHA) and Winnipeg Community Areas (CA) are available at:
- Figure 8.11 to Figure 8.14 in this deliverable.
The deliverable found that there is a strong relationship between both the SEFI indices and premature mortality rate (PMR). Areas with a higher PMR tended to have higher SEFI and SEFI-2 scores, while areas with lower PMR tended to have lower SEFI and SEFI-2 scores.
Additional work from this research provides comparative information on the SEFI and SEFI-2. The first document, titled sefi_vs_sefi2_Mar_31_2009.pdf provides information on the correlation between the SEFI and SEFI-2, a list of simple statistics for the SEFI and SEFI-2, the SEFI Principal Component Analysis results, and the SEFI-2 Factor Analysis results. The second document, titled compare_sefi_2001_2006_Oct_8_2009.pdf provides statistics on the SEFI-2 values calculated using either the 2006 or 2001 Census data by RHA.
Brownell et al. (2010)
In the Evaluation of the Healthy Baby Program deliverable by Brownell et al. (2010), they used the SEFI as an area-level socioeconomic status (SES) measure. SEFI used as a predictor variable in the regression analysis performed in the research.
For more information on the use of SEFI in this research, please read the section titled Predictor Variables in this deliverable, and view various tables provided in this publication.
Fransoo et al. (2011)
In the Adult Obesity in Manitoba: Prevalence, Associations, and Outcomes deliverable by Fransoo et al. (2011), they analyzed several key indicators (e.g.: physician visits, prescription drug use and hospital separations) using multivariate modelling which included several variables, one of which was the socioeconomic status (SES) measured by scores on the SEFI-2.
For more information on the methods and results related to SEFI-2 found in this research, please see:
- Methods section - Multivariate Modelling
- Table 5.2: Factors Related to Physician Visits in the Year After Survey Date, Survey Participants Age 18 and Older Measured/corrected BMI
- Table 5.4: Factors Related to Prescription Drug Costs in the Year After Survey Date, Survey Participants Age 18 and Older
- Table 5.7: Factors Related to Inpatient Hospital Separations in the Year After Survey Date, Survey Participants Age 18 and Older
Fransoo et al. (2013)
In The 2013 RHA Indicators Atlas deliverable by Fransoo et al. (2013), they used SEFI-2 as a measure of socioeconomic status (SES). Analyses were performed by current RHAs (5) and former RHAs (11), RHA Districts, and Winnipeg Community Areas (CA), for 2001 and 2006.
For more information on the methods and results found in this research, please read:
Fransoo et al. (2019)
In The 2019 RHA Indicators Atlas deliverable by Fransoo et al. (2019), they used SEFI-2 as a measure of socioeconomic status (SES). Analyses were performed by current RHAs (5) for 2011 and 2016.
For more information on the methods and results found in this research, please read: