The Gender Inequality Index and its correlations in the UN Data Futures Platform data

By | 2022-02-23

The UN Data Futures Platform (UNDP) is a project of the United Nations that provides data and visualization tools for different social and economic metrics of the countries in the world. Its main aim is to show the impact of the COVID-19 pandemic, and it does so by presenting various insights grouped into five categories (health, social protection, economic recovery, economic impact, and social cohesion).

There is, however, also the Access All Data page where you can select two metrics of the countries out of a long list and plot them against each other. These metrics include social factors like unemployment rates or the percentage of people living in urban areas, financial factors like inflation or GDP, and development indices like the Human Development Index or the Gender Inequality Index. That is all nice to plot and to look at, but I felt that there are so many indices that it is hard to pick which ones could be interesting. Fortunately there is the option to download all data, so I could automatize this decision.

But still, there are 163 metrics and looking at all the 13366 possible combinations is a bit much. Thus, I selected the Gender Inequality Index (GII) as one metric. The GII is one of the Human Development Indices of the United Nations that measures — surprise! — the degree of inequality between the genders in a country. The GII combines different metrics in the categories Health, Empowerment, and Labor Market to come up with a number between 0 (fairly equal) to 1 (as unequal as possible). What I was interested in is to learn how the GII behaves compared to all the other metrics of the UNDP data.

For that purpose, I downloaded the UNDP data and wrote a little Python script to compute the correlation. To be specific, I used the Pandas framework to compute the Pearson coefficient r. This coefficient describes the linear correlation between two datasets as a number between -1 and +1. If it is +1, it means that if the data in one set increases by a factor, the data in the second set increases by the same factor. Conversely, if it is -1, it means that if the data in one set increases by a factor, the data in the second set decreases by the same factor. If it is 0, there is essentially no correlation between the two datasets.

If we compare our GII with some other metric, we have that

  • if r is close to -1, a decrease in gender inequality corresponds to a raise that metric,
  • if r is close to 1, a decrease in gender inequality corresponds to a decline that metric,
  • and if r is around 0 there is no obvious correlation between the behavior of the GII and of the metric.

Just to be sure: We take about correlation, not causality. If there is a high correlation this indicates some relationship, but it is not clear what is the cause and what is the effect. Moreover, some data that was used when computing the GII was likely also used in some other metrics of the UNDP data, in which case the correlations between the GII and these metrics are partly trivial.1 With that in mind, let’s look at some selected results.

Metricr
Human development index (HDI)-0.92
Internet users, total (% of population)-0.85
Government Effectiveness-0.83
Population ages 65 and above, % of total population-0.81
GDP per capita-0.79
Rule of law-0.77
Can people be trusted-0.68
Net debt as percent of GDP-0.12
Share of female business owners of new LLCs-0.06
GII – female labour force participation rate-0.03
Large Enterprises employment, % of total0.05
Mob violence-0.12
Migrant Acceptance Index0.10
Unemployment rate0.24
Multidimensional Poverty Index (MPI)0.71
Fragile State Index: Economy0.77
Economic Inequality0.84
Fragile State Index: Demographic Pressure0.89
Fragile State Index: Public Services0.91

This table shows data with large positive correlation, almost no correlation, and large negative correlation, However, before we interpret anything, we need to plot the data (you can do that on the Access All Data of the UNDP yourself, too).

Always plot the data!

pic_legend

Image 1 of 20

Aha, there are some outliers! In the metric Net debt as percent of GDP we have Norway with a very low net debt (the top 3 with high net debt are Lebanon, Japan and Italy). In the Share of female business owners of new LLCs (data from 2014-2018) there are quite a few zeros, remarkably also for countries with low GII (the four points in the bottom left are Sweden, France, the Republic of Korea, and Austria). In Mob violence India tops by far, followed by Mexico, Brazil and Bangladesh. However, this is an absolute measure (the sum of the incidences) and thus not super useful. Economy, Demographic Pressure, and Public Services are all parts of the Fragile State Index. These metrics seems to be given as integer numbers. Also, we need to know that contrary to my expectations from reading the name, a larger value in a metric related to the Fragile State Index corresponds to a worse state of the country with respect to the corresponding measure.

The top of the list for negative correlations is the Human Development Index (HDI), and the corresponding plot shows a nice linear behavior. The HDI summarizes how good a country scores with respect to a long and healthy life of its citizens, access to knowledge and the standard of living. Also pretty high up in the list of negative correlations to the GII are the percentage of internet users and the GDP. This all indicates that rich countries are more equal with respect to gender than poorer countries. What I also found interesting is the negative correlation between the GII and the percentage of participants in World Values Survey who confirmed that “most people can be trusted“. However, if you look at the corresponding plot you’ll see that for countries with low GII this varies drastically, while a large gender inequality correlates well with a low trust in others. On the side of positive correlations the picture is the similar, with a worse state in an economic metric corresponding to higher gender inequality (remember that larger values in Economy, Demographic Pressure, and Public Services mean a worse state in these measures).

A good state of the economy correlates with low gender inequality.

But there are a few interesting observations if we take a look at the metrics that are uncorrelated with the GII. For example, the Share of female business owners of new LLCs as well as the GII – female labour force participation rate do almost not correlate at all with the GII, although the GII explicitly includes the labor market in its computation. It could be interesting to look at the raw data of the GII to understand why the correlation is missing here. Also, the GII is uncorrelated to the count of mob violence incidence (even if the outliers are removed) as well as to the migrant acceptance rate. Especially the last one puzzles me a bit, as I would have guessed that a society where genders are treated as equal would also treat foreigners as equal, but I might be wrong on that.

So, that’s it for now! It is good to see that there are few surprises in the strongly positively or negatively correlated metrics, as this rarely hints at a great discovery but more likely at a problem in the data or in the reasoning. And it is also good to see that there are some surprises in the uncorrelated metrics that can be analyzed further. Now go ahead and do that!

  1. If you make a metric based on the sum of apples and books that you have at home and a metric based on the sum of apples and the number of toothbrushes, both will go up if you increase the number of apples.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.