Concept: Socioeconomic Factor Index (SEFI) - Based on the 2001 Census Data
Last Updated: 2009-01-05
This concept describes the changes in the MCHP Socioeconomic Factor Index (SEFI) measure in 2001 based on the changes in the 2001 Canadian Census data. This concept includes: background information about the SEFI; a brief description of the changes in the SEFI using 2001 Census data; the method used to develop the SEFI based on these changes, including the variables used, how SEFI values were created and assigned, and the imputation strategy employed. This concept also contains a list of SEFI values by geographic areas (e.g. RHA, RHA Districts, Neighbourhood Clusters, etc.) calculated from the 2001 Census data, and compares the 1996 and 2001 SEFI values resulting from using the methods described in this concept. MCHP research using this version of the SEFI is identified and describes how the SEFI is used.
NOTE: More recent information on the current methods and use of the SEFI in MCHP research is available in the Socioeconomic Factor Index (SEFI) - Version 2 (SEFI-2) concept.
Background Information on the SEFI
The Socioeconomic Factor Index (SEFI) is an index created at MCHP for examining the relationship of a population's socioeconomic characteristics to its health status and use of health care services. It provides an alternative to such measures as income quintiles and permits mapping of SEFI values according to geographic areas based on residence. The original SEFI is based on the earlier SERI measure (see
Socio-Economic Risk Index (SERI)
concept) and combines the socioeconomic characteristics most strongly related to health outcomes into a single score
(P. Martens et al., 2002).
These characteristics include unemployment rates for various age groups, rates of adults who completed high school, rates of families headed by a lone parent, and rates of women in the workforce.
SEFI has been calculated at MCHP, using factor analysis, for four census years 2001, 1996, 1991, and 1986. Significant changes in the 2001 Census data necessitated the need to review and update the methods used at MCHP to calculate and use the SEFI in its analysis. These changes are described below. For more background information on the original MCHP SEFI, please read the Socioeconomic Factor Index (SEFI) – Based on the 1986, 1991 and 1996 Censuses concept.
Changes in the SEFI Using 2001 Census Data
With the 2001 Census, the most significant change to the SEFI was the geographic level at which the SEFI values were calculated. This changed from the larger municipal code /
neighbourhood clusters (NC)
areas to the smaller
enumeration area (EA)
dissemination area (DA)
level. The former method limited the geographic areas for which the SEFI could be calculated (i.e., only those areas fully defined by a municipality code or an NC) and population-weighted the area-level SEFI scores prior to the factor analysis. The SEFI can now be calculated at the smallest possible geographic area: the DA level using 2001 Census data and the enumeration area (EA) level using 1996 and earlier Census data, and then calculated for other geographic areas, such as Regional Health Authorities (RHA). This finer level of assignment allows for greater discrimination and power, for example, for regression analyses. SEFI is also recommended for grouping and ranking areas routinely used in MCHP research. SEFI is considered to be a stable, valid index since individual DA values are not reported and because the DA level values are weighted by population to form the larger area values.
Using DA as the level of assignment resulted in about 750 different SEFI values for Winnipeg (versus 25 using the larger-area NC method), and about 1800 SEFI values (versus 300 using the larger-area municipal code method) for all of Manitoba. A weighted average score can also be calculated for the 25 Neighbourhood Clusters (NC) recognized by the Winnipeg Regional Health Authority. The scores for the 25 NC areas can further be aggregated into four socioeconomic status (SES) groups based on standard deviations from the mean (based on the log value of SEFI, transformed into standard deviations) for the area definition chosen for analysis:
more than 1 SD below the mean - highest SES (or most advantaged)
less than 1 SD below the mean - middle SES
less than 1 SD above the mean - middle-low SES
- more than 1 SD above the mean - lowest SES (or most disadvantaged)
See 2001 Census SEFI Values by Geographic Area for additional information.
Methods to Develop the SEFI From 2001 Census Data
This sections describes the methods used to develop the SEFI values using the 2001 Census data, including: a description of the Census variables used in the SEFI, including changes in definition of key variables over time; how the SEFI index was developed, including the imputation strategy for dealing with small cell sizes which are suppressed by Statistics Canada; a list of the 2001 SEFI values and a comparison between 1996 and 2001 SEFI values; and additional information including a list of the ordering of SEFI values for different geographic areas and a list of SAS® formats available for using the SEFI measure in your research.
1. Census Variables
The SEFI components come from the DA/EA level of Census records provided to MCHP through the Statistics Canada Data Liberation Initiative (DLI). Information for each of the following variables was collected at the DA/EA level and used to calculate the SEFI. Some of the variables are available directly from the census and others are calculated from two or more variables on the census.
- age dependency ratio - expresses the ratio of the population aged 65 or older in a region by the population aged 15-64. The age dependency ratio was tested in two ways: 1) Pop 65+/Pop 15-64 and 2) (pop 0-14 + Pop 65+)/Pop 15-64. The second method was selected for use, following Statistics Canada's definition. See the SEFI Age Variables and Age Dependency Ratio Development documentation and resulting age dependency ratio values.
- Percent single parent families - is calculated by dividing the number of lone-parent families by the total number of census families. In the past (1996 census and earlier), the percent of single parent households among households with children aged 0-14 was used. This information was not available from the 2001 census. Hence this variable was replaced by the percent of families which are single parent. Census variables are:
- famstat_totcenfam - total number of census families in private households (20% Sample Data)
- famstat_totlpar - total lone-parent families (20% Sample Data)
- Percent female single parent families (see percent single parent families above) - calculated by dividing the number of female lone-parent families by the number of census families. Census variables are:
- famstat_totcenfam - total number of census families in private households (20% Sample Data)
- famstat_totflpar - female parent lone-parent families
- Labour force participation rate female - rate of women working or seeking work on census day, reported directly on the census files. The numerator is the number of females aged 15 or more in the labour force. The denominator for this rate is based on the population of all women aged 15 or more. Census variables are:
- lf_fempoptot15 - total Population Females 15 years and over - Labour force activity
- lf_femprate15 Participation rate Females 15 years and over*
- lf_fempop15Pop of Females 15 years and over In the labour force
Labour force definitions:
- Labour Participation rate = Pop in Labour Force / Population Aged 15 and over * 100
- Employment rate = Employed Labour Force/ Population Aged 15 and over * 100
- Unemployment rate = Unemployed Labour Force/ Pop in Labour Force *100
- Labour Force = Employed + Unemployed
- Total Pop = Pop in Labour Force + Pop Not in the labour force
- Unemployment rate 15-24, 25-34, 35-44, 45-54 - reported directly on the census. The unemployed include persons during the week prior to the census that were without work, had looked for work in the previous four weeks and were available for work in the week of the census. The denominator for each age group was the count of the total labour force in that age group. See SEFI Unemployment Variables for more information.
- Proportion with a High School Graduation Certificate by Age Group 25-34, 35-44, 45-54 - based on the age group specific population minus the count of the number of residents on census day reporting less than high school graduation certificate. The age-specific rates were computed by dividing the value above by the total population in the age group. See SEFI High School Graduation Variables for more information.
2. SEFI Index Development
SEFI values were created and assigned in the following way:
- Factor analysis was carried out at the DA level first (without population weighting). The education and unemployment variables were reduced to a single education and unemployment factor by using an un-weighted principal component analysis (SAS PRINCOMP procedure). These two factors were used in place of the seven education/unemployment variables in the calculation of the SEFI.
- SEFI values were assigned to postal codes using the Statistics Canada PCCF (postal code conversion file), which links each postal code to the "best" DA. The SEFI score was defined for each DA/EA using the first principal component factor from an un-weighted principal components analysis of the group of socioeconomic characteristics previously listed.
- Population-weighted average SEFI values were calculated for each area. For each postal code, a SEFI value was generated for the First Nations reserves in the postal code and for the non-First Nations communities in the postal code. The First Nations DAs are identified by the census by CSD type. The SEFI values were then linked to the 2001 Manitoba population file by postal code and municipality code type (0 = non First Nations municipality, 1 = First Nation Municipality). The geographic area level SEFI values are calculated by taking the mean of the values linked to the population. This method more accurately reflects the SEFI values of districts with First Nations reserve areas within them. Unlike the previous SEFI methodology, these values are NOT standardized to have a weighted mean of 0 and an un-weighted standard deviation of 1. This avoids the problem of changing geographic area SEFI values whenever there is a slight change in the number of geographic areas.
- The mean and standard deviation of the logarithm of the SEFI averages were calculated.
Calculating SEFI Values for Larger Geographic AreasThree methods for calculating the SEFI at the NC level and then grouping the NCs into 4 groups by SEFI were investigated and compared.
- Method 1 takes the weighted mean of the DA level SEFI values directly to the NC level and weights by the 2001 census population of Manitoba. No Manitoba Health Insurance Registry (Registry) population values are involved.
- Method 2 assigns the DA level SEFI values to the 2001 Registry population by postal code. The NC was defined from the Manitoba population using municipal code and postal code. The mean value of SEFI is calculated for each NC, which is essentially weighted by the Registry population.
- Method 3 assigns the DA level SEFI values to the 2001 Registry population by postal code. The NC was defined from the Manitoba population using postal code alone. The mean value of SEFI is calculated for each NC weighted by the Registry population. However, method 3 is problematic due to grouping by postal code only, and was dropped from consideration as a viable method.
Comparing the resulting SEFI levels from each of the methods, there is some movement within the groupings between methods. Arguments can be made for both choosing either Method 1 and 2 since they were deemed methodologically sound, but Method 2 was considered the most appropriate method to use for the Inequalities in Child Health study (Brownell et al. (2004) because it uses the Registry population. This method reflects a small refinement in earlier methods used to calculate SEFI values for larger geographic areas.
Imputation StrategyTo protect the confidentiality of individual responses on the Census, Statistics Canada has adopted a technique known as area suppression, or the deletion of all characteristic data for geographic areas below a specified size. Income distributions and related statistics are suppressed if the total non-institutional population in the area from either the 100% or 20% databases is less than 250. Other characteristics are suppressed if the total non-institutional population in the area from either the 100% or 20% databases is less than 40. Imputation methods were used to provide a value for the SEFI components that were missing at the DA/EA level, according to the strategy below:
- Non-First Nations Communities - socioeconomic characteristics were imputed for missing values at the DA/EA level using records at the census subdivision level if they were not defined as a First Nations community.
- First Nations Communities - socioeconomic characteristics were imputed for missing values at the DA/EA level for First Nations communities from the weighted north or south First Nation community average value according to whether the community was defined as northern or southern (by latitude).
3. 2001 SEFI Values and 1996/2001 SEFI Comparisons by Geographic Regions
The following provides the SEFI values calculated for a variety of geographic regions using the 2001 Census data:
The following provides an illustrated comparison of the 1996 SEFI values using the "old" methodology and the "newer" 2001 methodology at the RHA, RHA district, Winnipeg NC and Winnipeg Community Area (CA) levels:
- SEFI value listings for RHA, RHA district, and Winnipeg NC. See SEFI Values by Geographic Area for more information.NOTE: SEFI scores less than zero indicate more favourable socioeconomic conditions, while SEFI scores greater than zero indicate less ideal socioeconomic conditions.
- Comparison of SEFI Values Using Different Methods document.
The R2 values at each of the geographic levels are quite high which indicates a strong correlation between the values generated by the two methods. The R2 value at the RHA district level is relatively low because a few of the districts had been assigned the same SEFI value using the old method. These districts could not be assigned unique SEFI values because they could not be fully defined by municipality code. The new 2001 method allows unique SEFI values to be calculated for all of the RHA districts.
A comparison of the 1996 SEFI using the 2001 methodology to the 2001 SEFI was done. There were 2318 DA's in 2001 and 1802 EA's in 1996. Of these 1785 were comparable for calculating a correlation between the SEFI values for these small areas in 1996 and in 2001. The correlations are as follows:
These are quite good correlations for a variable such as this calculated at such a low level. Small changes in income, education, single parent proportions, could change values considerably, especially since it is not the same people answering the long form in every DA in both of the Census.
- pearson r = .693
- spearman r = .637
- the sefi_region3.txt conversion file contains the new 2001 ordering of SEFI based on DA level for RHAs, RHA Districts, Winnipeg Community Areas and Winnipeg Neighbourhood Clusters.
- SAS® formats – available in the MCHP SAS® Format Library include:
- to create SEFI from Winnipeg NC (Neighbourhood Clusters) values: sefigroup=put(wpg_nc,$sefincg.)
- to create SEFI from RHA district values: sefigroup=put(district,$sefincg.)
MCHP Research using the 2001 SEFI Methodology
The following sections identify a sample of MCHP deliverables (reports) that use this version of the SEFI, briefly describe how the SEFI is used in the research, and provides links to relevant results / findings.
Brownell et al. (2004)
In the Manitoba Child Health Atlas deliverable by Brownell et al. (2004), they used the SEFI as a measure of socioeconomic status (SES). They calculated SEFI scores for the 1146 dissemination areas (DAs) within Winnipeg and for the 1172 DAs outside of Winnipeg, using publicly available data from the 2001 Census. SEFI scores for each of the 25 Winnipeg Neighbourhood Clusters were calculated using a weighted average of the scores for each DA in that neighbourhood. Likewise, a SEFI score was calculated for each RHA District using a weighted average of the scores for each DA in that district. For ease of presentation, for both Winnipeg and non-Winnipeg areas, we divided the neighbourhoods or districts into four groups based on how different they were from the average score for all neighbourhoods or districts. Thus for both Winnipeg and non-Winnipeg areas we end up with four SEFI Groups: Low SES (or most disadvantaged), Low-Mid SES, Middle SES, and High SES.
As shown in the Winnipeg map, the more disadvantaged areas (Low SES areas shown in red on the map) tend to be found in the central part of Winnipeg, with the most advantaged areas (High SES areas in dark green on the map) on the outskirts of the city. The Non-Winnipeg map (RHA Districts) shows where each of these four groups are located in non-Winnipeg areas of the province. It is clear that the more disadvantaged (red) areas tend to be in the northern parts of the province, with the more advantaged areas in the south central parts of Manitoba. It should be noted that the total number of people and the total number of children residing in these SES groups is not equal: The Middle SES category in Winnipeg has almost half of Winnipeg’s total population; and the Middle SES category has just over half of the non-Winnipeg population.
Socioeconomic characteristics for each neighbourhood area of Winnipeg and for each non-Winnipeg RHA district can be found in the following tables and graphs:
- SEFI Variable Values by Region - list SEFI variable values and identifies the SES of RHA Districts and Winnipeg NC using SES quartiles.
- Average Number of Children per Family for Winnipeg and Non-Winnipeg by SES (SEFI) Group
For more information on this research, please see the Manitoba Child Health Atlas 2004 Web Site.
Finlayson et al. (2007)
In the Allocating Funds for Healthcare in Manitoba Regional Health Authorities: A First Step--Population-Based Funding deliverable by Finlayson et al. (2007), they investigated the SEFI (as a measure of socioeconomic status (SES)) as one of the top 5 factors expected to affect the need / use of health services. The SEFI was used in all statistical models as an independent variable / covariate.
Appendix B: Detailed Results in the deliverable provides the SEFI contribution to the calculations for all predictive models used in this research. Additional information on the parameter estimates used in this research are available in Table A.1 of this same appendix.
One of the important findings of this work was that community-level socioeconomic status is a better predictor of health services utilization than any of the other community characteristics that were considered: aboriginal population, older population, population density, infant mortality rate, etc. This is valuable information because it indicates that although aboriginal status and infant mortality rates (for example) may be important in determining health services use, socioeconomic status is able to explain more variability in utilization than the other factors.
Limitations and Future Development
Limitations related to the use of this version of the SEFI and suggested future developments include:
Longitudinal analyses are currently problematic given that absolute values of SEFI cannot be compared over time (although relative values/proportions might be roughly compared).
Statistical comparison tests are to be developed, to permit identifying significant differences from the mean.
SEFI programming currently assigns only some PCH residents a SEFI value; others are removed where the DA consists solely of the PCH. This has implications for analyses using mortality calculations. Because income quintile assignment methods remove all PCH residents, the income quintile macro should be used to explicitly identify PCH postal codes and remove such residents.
- Future development might include categorizing SEFI similarly to income quintile (e.g.: into SEFI deciles)
- Income Quintiles - Child Health Income Quintiles
- Socio-Economic Risk Index (SERI)
- Socioeconomic Factor Index (SEFI) - Based on the 1986, 1991, and 1996 Census Data
- Socioeconomic Factor Index (SEFI) - Version 2 (SEFI-2)
- Canadian Census Data
- Risk Factors
- Socio-Economic Factor Index (SEFI)
- Socio-Economic Indicators
- Socio-Economic Status (SES)
- Socioeconomic Factor Index (SEFI) - Version 2 (SEFI-2)
- Brownell M, Roos NP, Fransoo R, Guevremont A, Frohlich N, Kozyrskyj A, Bond R, Bodnarchuk J, Derksen S, MacWilliam L, Dahl M, Dik N, Bogdanovic B, Sirski M, Prior H. Manitoba Child Health Atlas. Winnipeg, MB: [Last update. 2004. Available from: http://www umanitoba ca/centres/mchp/reports/child_inequalities/index shtml. Accessed on: 2007 Oct 19. [Report] (View)
- Finlayson GS, Forget E, Ekuma O, Derksen S, Bond R, Martens P, De Coster C. Allocating Funds for Healthcare in Manitoba Regional Health Authorities: A First Step--Population-Based Funding. Manitoba Centre for Health Policy, 2007. [Report] [Summary] (View)
- Martens P, Frohlich N, Carriere K, Derksen S, Brownell M. Embedding child health within a framework of regional health: Population health status and sociodemographic indicators. Canadian Journal of Public Health 2002;93((Suppl)(2)):S15-S20. [Abstract] (View)
Manitoba Centre for Health Policy
Community Health Sciences, Max Rady College of Medicine,
Rady Faculty of Health Sciences,
Room 408-727 McDermot Ave.
University of Manitoba
Winnipeg, MB R3E 3P5 Canada