Concept: Socioeconomic Factor Index (SEFI) - Version 2 (SEFI-2)

 Printer friendly

Concept Description

Last Updated: 2021-01-18

    This concept describes the methodology for developing the Socioeconomic Factor Index (SEFI) - Version 2 (SEFI-2) and discusses the use of this index in MCHP research. The SEFI is a measure of socioeconomic characteristics based on Canadian Census data that reflects non-medical social determinants of health. The SEFI-2 is the most recent modification to the original SEFI.
    NOTE: The SEFI-2 methodology was first introduced in Metge et al. (2009). Since that time, the SEFI and SEFI-2 terms have been used interchangeably in MCHP research, but the methodology reflects the SEFI-2 definition.
    This concept provides detailed information on the SEFI-2, including:

    • a definition and general description of the SEFI-2;
    • a description of the methodology used to develop the SEFI-2, including a list of Census variables used to create the SEFI-2, the major steps in creating values for the index, a description of the statistical method called factor analysis that is used to create the SEFI-2 values, a link to a list of factor scores for the index for Census years 1986 to 2011, and discussion of the SAS© formats and links to SAS code examples for using the SEFI-2;
    • identification of the major differences between the original SEFI and SEFI-2;
    • a description of how the SEFI-2 has been used in MCHP research, with links to detailed information and results published in MCHP research; and
    • a list of pros and cons of using the SEFI.

    The majority of information for this concept comes from published MCHP deliverables (see the References section below) and from discussion with staff involved in the development and use of the SEFI-2.
Definition of SEFI-2
    The SEFI is a factor score derived from Canadian Census data that reflects non-medical social determinants of health and is used as a proxy measure of socioeconomic status (SES). The SEFI-2 uses four variables that come directly, or are derived from Census variables, to calculate an overall score.

    SEFI-2 is calculated at the dissemination area (DA) level reported in the Census and the resulting values are assigned to residents based on postal codes. Using the postal codes, the SEFI-2 can be applied to different geographic levels, such as Manitoba Regional Health Authorities (RHAs), RHA Districts and Winnipeg Community Areas (CA).

    When interpreting the SEFI-2 values, scores less than zero indicate more favourable socioeconomic conditions, while scores greater than zero indicate less ideal socioeconomic conditions.
    The methodology section of this concept identifies the variables used for calculating the SEFI-2, lists and describes the major steps in creating the SEFI-2, provides links to examples SAS© code and format files related to creating and using the SEFI-2, and provides access to internal information on factor scores developed from Census data from 1986 to 2016.
Census Variables Used to Create the SEFI-2
    The SEFI-2 uses four variables that come directly from or are derived from the Census data. These include:

    • unemployment rate age 15+;
    • average household income age 15+;
    • proportion of single parent households (derived); and
    • proportion of population age 15+ without high school graduation (derived).

    NOTE: The SAS code examples below contain specific variable names used in the 2011 Census and the coding necessary to calculate the SEFI-2 factor scores. The specific Census variables used change over time, depending on the Census year, so these descriptions are generalizable for other Census years.
Major Steps in Creating the Index
    The six major steps in creating values for the SEFI-2 at MCHP, as summarized from the SAS code examples below, include:
    Step 1 - read in Census data at the dissemination area (DA) level and at the Census Subdivision (CSD) levels;

    Step 2 - calculate the derived variables from the Census at the DA and CSD level. The Unemployment Rate 15+ and Average Household Income Age 15+ variables come directly from the Census data. The two derived variables are:
    • proportion of single parent households, and
    • proportion of population age 15+ without high school graduation.

    Step 3 - merge the data files at the DA level. Check for missing values at the DA level;

    Step 4 - if data is missing at the DA level, impute the missing values from the Census data at the CSD level. Check for values that are missing at the CSD level;

    Step 5 - for people living in First Nations Communities with no Census data available (due to suppression of data by Statistics Canada or non-participation of the First Nations community), impute the appropriate weighted values for either North or South communities based on geographic location. The postal code conversion file (PCCF) is used to provide geographical information for assigning First Nations Communities to the North and South groups.

    Step 6 - run the SAS procedure called PROC FACTOR (factor analysis - see description below) on the data set containing the four variables, to calculate the SEFI-2 value for each DA. Save a copy of the "SEFI-2 Values" file for use in other work. The value for each DA can then be assigned to the corresponding records (based on postal code) in other data sets.
Statistical Methodology - Factor Analysis
    The SEFI is a composite index, a mathematical combination of several variables that generate a single number from all the variables. The SEFI-2 was generated using a SAS procedure called factor analysis, a "statistical procedure that identifies the common variance amongst a set of observed variables (i.e., indicators), and creates a factor (i.e., index) comprised of that common variance. The factor scores are calculated with a linear equation that incorporates a weighted contribution of each of the variables that are included in the analysis. The contribution (i.e., weight) of each variable is relative to the amount of variance in common with the other variables." (Metge et al., 2009).

    For a thorough discussion of factor analysis and how the composite index was created, please read the following information available in the MCHP Composite Measures/Indices of Health and Health System Performance (2009) deliverable:

SEFI-2 Factor Scores
    Factor scores have been developed for Census years 1986 to 2019. The average factor score values for each RHA and for Manitoba are also available for each index for each Census year from 1986 to 2011. This information is available in the pdf document titled SEFI_Factor_Scores_1986_to_2011.pdf (internal access only). An update to this document for 2019 is being worked on.
  • As with income quintiles, there will be postal codes that have a missing factor score. This may be because of a variety of reasons including:

    • Census data was suppressed (and a valid value could not be imputed);
    • The postal code was not in the postal code conversion file (PCCF); and
    • The postal code is new, and not in the format
SAS Code Examples and Formats
    SAS code and format files have been developed for using the SEFI-2. The SAS Code and Formats section below contains SAS code/programs for running the SEFI-2 factor analysis. Both internal and external versions of this code are available.

    Format files

    SAS format files are available in the MCHP SAS Format Library (internal access only) for applying the SEFI-2 factor scores to a population via postal codes. The formats are available annually from 1984 to 2019. The index uses Census data from the years 1986, 1991, 1996, 2001, 2006, 2011 and 2016, and the Census data used for each annual format is the one that is closest to the year in question, within +/- 2 years. For example, the 1996 Census values are applied to the formats for calendar years 1994-1998.

    A valid Manitoba postal code is required to use the format and assign the index score to a record. The following example calculates the corresponding factor scores by postal code for the year 2011.
    sefi = input(postal,sefi11f.);

    For 2019, the following code is used as a temporary method to calculate SEFI values, using the 2016 Census data. This method will be updated when the 2021 Census data is available:

    sefi = input(postal,sefi19_2016f);
    NOTE: the annual format names usually include only the last two digits of the applicable calendar year.
  • The formats apply the average factor score value for a particular postal code, but some postal codes may contain more than one Dissemination Area (DA)/Enumeration Area (EA) and vice versa. In some situations, especially First Nations communities that are adjacent to a town and share the same postal code, a higher SES value may be assigned to that area. In the case where you are analyzing data by small geographical area or by First Nations community, it is suggested the SAS code available below - Applying the Factor Scores to the Population - SAS Code (internal access only) - should be used to apply the index score to the population, not the formats.
Differences Between the Original SEFI and SEFI-2
    SEFI-2 is a simplified version of the original SEFI, which utilizes prior factor scores of multiple education and employment variables, an additional measure of single parent families and an age-dependency ratio. Most importantly, due to data restrictions of prior Censuses (prior to 2001), the original SEFI did not include a measure of income in its calculation of socioeconomic risk. The SEFI-2 was developed to take advantage of the available Census income data.

    There are two notable differences between the original SEFI and the SEFI-2. The first relates to the geographic level of aggregated Census data. The SEFI-2 uses the dissemination area (DA) as the smallest aggregation level. Prior to the 2001 Census, the Enumeration Area (EA) was the smallest aggregation level of Census data.

    The second difference between the SEFI and SEFI-2 is the number of variables used to develop the index. The original SEFI uses six variables derived from Census data, including two variables that are principal components of other Census variables, and is created in a multi-stage process involving several principal components analyses. The SEFI-2 is created using only four variables that are either directly from or derived from Census data in a single factor analysis.

    The original SEFI variables include:

    • age dependency ratio;
    • percent single parent families;
    • proportion of female single parent families;
    • labour force participation rate - female;
    • unemployment rate; and
    • proportion with a high school graduation certificate.

    The last two variables are derived from principal components analysis from several other variables. See the Socioeconomic Factor Index (SEFI) - Based on the 1986, 1991, and 1996 Census Data and the Socioeconomic Factor Index (SEFI) - Based on the 2001 Census Data concepts for more detailed information on this method.

    For the SEFI-2, three variables that were part of the original SEFI were removed and one new variable was added, as follows:

    • age-dependency ratio - removed because its loading on the original SEFI is quite low;
    • the proportion of female single parent families - removed because it was deemed redundant since we included the proportion of single parent families and most of these are female-based;
    • female labour force - removed because it was deemed redundant since we included the overall employment rate; and
    • average household income - added because it is a measure of socioeconomic status and is now available in the Census data due to changes made in the reporting of income, as well as in the way Census data is now disseminated.
MCHP Research Using the SEFI-2 Methodology
    This section identifies a sample of MCHP deliverables (reports) that have used the SEFI-2 in their research. For each publication there is a brief description of how the index was used and links to specific information available in the publication.
Metge et al. (2009)
    In the Composite Measures/Indices of Health and Health System Performance deliverable by Metge et al. (2009), they investigated and developed a number of composite indices designed to measure specific aspects of health, use of health services and health system performance. The pros and cons of using composite indices are presented in Table 1.1.

    The SEFI-2 was one of the indices developed for this project, and represents the most recent version of the SEFI. The SEFI is used as a proxy measure of socioeconomic status. For more information on the SEFI and SEFI-2, please read section 8.6.1 Socioeconomic Factor Index.

    In this deliverable, measures for both the original SEFI and the new SEFI-2 were developed and presented. Graphic illustrations of the SEFI and SEFI-2 scores by Regional Health Authority (RHA) and Winnipeg Community Areas (CA) are available at:

    The deliverable found that there is a strong relationship between both the SEFI indices and premature mortality rate (PMR). Areas with a higher PMR tended to have higher SEFI and SEFI-2 scores, while areas with lower PMR tended to have lower SEFI and SEFI-2 scores.

    Additional work from this research provides comparative information on the SEFI and SEFI-2. The first document, titled sefi_vs_sefi2_Mar_31_2009.pdf provides information on the correlation between the SEFI and SEFI-2, a list of simple statistics for the SEFI and SEFI-2, the SEFI Principal Component Analysis results, and the SEFI-2 Factor Analysis results. The second document, titled compare_sefi_2001_2006_Oct_8_2009.pdf provides statistics on the SEFI-2 values calculated using either the 2006 or 2001 Census data by RHA.
Brownell et al. (2010)
    In the Evaluation of the Healthy Baby Program deliverable by Brownell et al. (2010), they used the SEFI as an area-level socioeconomic status (SES) measure. SEFI used as a predictor variable in the regression analysis performed in the research.

    For more information on the use of SEFI in this research, please read the section titled Predictor Variables in this deliverable, and view various tables provided in this publication.
Fransoo et al. (2011)
Fransoo et al. (2013)
Fransoo et al. (2019)
    In The 2019 RHA Indicators Atlas deliverable by Fransoo et al. (2019), they used SEFI-2 as a measure of socioeconomic status (SES). Analyses were performed by current RHAs (5) for 2011 and 2016.

    For more information on the methods and results found in this research, please read:

Pros and Cons of Using the SEFI
  • SEFI-2 is easier to construct than the original SEFI, and by including a measure of income, has greater face validity than the original SEFI as a measure of socioeconomic status (SES);
  • the advantage of SES indicators such as SEFI over health-based indicators is that the SES indicators can be calculated for very small areas and are available every five years. Calculations for premature mortality (PMR) and life expectancy require either a much larger population for the same time period (i.e.: a larger geographic area), or a much longer time period for a small geographic area due to the rarity of outcome (i.e.: death).
  • the SEFI is based on aggregated data from Statistics Canada which has suppressed values and requires imputation to fill in any missing values. Indicators such as PMR come from person-level data and no imputation is required.

SAS code and formats 

Related concepts 

Related terms 


  • Brownell M, Chartier M, Au W, Schultz J. Evaluation of the Healthy Baby Program. Winnipeg, MB: Manitoba Centre for Health Policy, 2010. [Summary] [Full Report] (View)
  • Chateau D, Metge C, Prior H, Soodeen RA. Learning from the census: The socio-economic factor index (SEFI) and health outcomes in Manitoba. Canadian Journal of Public Health 2012;103(Suppl 2):S23-S27. [Abstract] [Full Report] (View)
  • Fransoo R, Martens P, Prior H, Chateau D, McDougall C, Schultz J, McGowan K, Soodeen R, Bailly A. Adult Obesity in Manitoba: Prevalence, Associations, and Outcomes. Winnipeg, MB: Manitoba Centre for Health Policy, 2011. [Summary] [Full Report] (View)
  • Fransoo R, Martens P, The Need to Know Team, Prior H, Burchill C, Koseva I, Bailly A, Allegro E. The 2013 RHA Indicators Atlas. Winnipeg, MB: Manitoba Centre for Health Policy, 2013. [Summary] [Full Report] [Data extras] (View)
  • Fransoo R, Mahar A, The Need to Know Team, Anderson A, Prior H, Koseva I, McCulloch S, Jarmasz J, Burchill S. The 2019 RHA Indicators Atlas. Winnipeg, MB: Manitoba Centre for Health Policy, 2019. [Summary] [Full Report] [Data extras] [Errata] (View)
  • Metge C, Chateau D, Prior H, Soodeen R, De Coster C, Barre L. Composite Measures/Indices of Health and Health System Performance. Winnipeg, MB: Manitoba Centre for Health Policy, 2009. [Summary] [Full Report] (View)