Math ia analyis of the lorenz curve different methods in plotting the lorenze curve PDF

Title Math ia analyis of the lorenz curve different methods in plotting the lorenze curve
Author Niki Li
Course Foundations of Marketing
Institution University of Queensland
Pages 14
File Size 634 KB
File Type PDF
Total Downloads 2
Total Views 136

Summary

maths ia on different methods in plotting the lorenze curve...


Description

An evaluation of the different methods in calculating the Gini coefficient

1

Introduction Due to the recent events of the global pandemic COVID-19, there has been devastating consequences in the global economy, especially the emergence of economic inequality. The virus is a major risk particularly for the lower class of the population [ CITATION And20 \l 3081 ]. While studying development economics in my Economic HL class, the topic of income inequality has interested me after witnessing how the more vulnerable socio-economic groups bear the larger burden of financial exposure and health risks. The methods that each wealth class used to cope with the economic effects of the virus drew my attention to the drastic inequalities within nations of the global economy. During my study as an International Baccalaureate student, I studied higher level Economics where I was introduced to a calculation called the Gini coefficient. This economics concept is a largely government employed measure of inequality [ CITATION Mal11 \l 3081 ].

Background information Lorenz curve As illustrated in figure 1, the Lorenz curve is a graphical representation of the income distribution across the population within an economy. It is represented as a function of the cumulative proportion of the nation’s income in terms of the cumulative proportion of the population, creating the curved Lorenz curve [ CITATION Pra19 \l 3081 ]. The cumulative proportion of income is indicated thorough quintiles where the income of a nation is divided into five groups approximating 20% in each group. Within this graph, the line of perfect equality as represented by the 45 ° line indicates where each quintile earns the same income as the other quintiles [ CITATION Mal11 \l 3081 ]. This line of equality is unrealistic and is used as a comparison to the Lorenz curve to determine its income distribution. If there is any inequal income distribution within a nation, the Lorenz curve will fall beneath the line of perfect equality [ CITATION Dam20 \l 3081 ]. From this, an indicator of inequality called the Gini coefficient can be calculated through the area between the line of perfect equality and the Lorenz curve expressed as a ratio of the total triangular area under the line of perfect equality [CITATION Ken \l 3081 ].

Gini coefficient The Gini coefficient is a measure of statistical dispersion representing wealth distribution based on the net income of the residents [ CITATION Smith2020 \l 3081 ]. The value of the coefficient ranges from 0 to 1, where 0 is indicative of perfect equality and 1 represents perfect inequality [ CITATION Juh19 \l 3081 ]. In Economics, the formula that I have been taught to calculate the Gini coefficient is as follows: Gini coefficient =

Figure 1 – The Lorenz curve selfconstructed through Word

A A+B

Here, A represents the area between the line of perfect equality and the Lorenz curve and B is the area under the Lorenz curve as demonstrated in Figure 1. The value of the Gini coefficient ranges between 0 and 1 where the lower value corresponds to an equality while a high value represents inequality. The Gini coefficient calculated by the Australian Bureau of statistics was 0.328 [ CITATION Aus19 \l 3081 ]. As the value is closer to zero, this value represents a relatively equal income distribution in Australia [ CITATION Mog19 \l 3081 ]. However, as the website does not state the method, this I will investigate how the Gini coefficient, a typically computer-generated value, can be mathematically quantified. I am curious to investigate different mathematical methods I could use from the scope of 2

a Mathematics SL student. In order to compare the various methods, the accuracy will be measured by comparing closeness of each calculated value to the actual Gini coefficient. Hence, the aim of this investigation is to determine which method is the most accurate in calculating the Gini coefficient. The first method I will use is the trapezium rule, which separates each quintile into a trapezium where the area will be calculated and totalled to calculate the total area under the Lorenz curve. Secondly, I will model the Lorenz curve as a quadratic function by using three coordinates where the three variables of a parabolic equation are to be identified through simultaneous equations. Then through numerical integration, the area between the line of perfect equality and the parabolic Lorenz curve will be calculated to identify areas A and B . Lastly, a 5-degree polynomial Lorenz function will be constructed where each variable of the function is calculated using the Vandermonde matrix. Then, the integration method of calculating the area between two curves will be used again to determine the areas A and B

Mathematical application For this investigation, I will use the equivalised disposable household income statistics, (income including tax), from the 2018 Household income and Wealth summary produced by the Australian Bureau of Statistics (ABS). Although the data is produced in percentages, I chose to convert all percentages to decimals, presenting all data and calculations as percentages are unable to be processed through the Vandermonde matrix. In addition, all finalised calculations will be presented in 3 decimal places to match the decimal places of the Gini coefficient. Cumulative proportion of Proportional income share Cumulative proportion of population income 0.000 0.000 0.00 0.200 0.074 0.074 0.400 0.125 0.199 0.600 0.170 0.369 0.800 0.227 0.596 1.000 0.404 1.00 Table 1 – Table displaying the proportional income distribution within Australia between 2018-2019 [ CITATION Aus19 \l 3081 ] In order to graph the Lorenz curve, the cumulative proportion of income needs to be calculated to from the y -variables. Firstly, to calculate each cumulative proportion of income, all previous existing income shares are summed to total the cumulative proportion. A worked example for the income share for the 0.40 cumulative proportion is seen below.

cumulativ e0.40 =0.125 + 0.074

¿ 0.199

The data retrieved and calculated is displayed in Figure 2 demonstrating the cumulative proportion of for each cumulative proportion of population. The line of perfect equality shows the same income earnt among the quintiles therefore is represented through the equation of: E ( x )=x . Furthermore, throughout the investigation, the function of the Lorenz curve is written as Ln (x) while the Gini coefficient will be referred to as the constant G . As different methods are devised to calculate the area under the curve, all calculated values of areas

A

and

B

are substituted back into

A A+ B

to calculate the Gini

coefficient.

Method 1 – Modelling Lorenz curve through trapezium rule After consideration of the shape under the Lorenz curve, I thought it resembled multiple trapeziums. Therefore, the first method I will use is the trapezium rule where the area under the curve (area

3

B ) is estimated by through representing each quintile as a trapezium as visually illustrated in Figure 2. Figure 2 – Graph of Lorenz curve with the quintiles divided under trapezium rule.

Here, each trapezium’s area can be calculated using the formula of:

Area=

a+ b h 2 1

The variables of a and b represents the two parallel lengths along the trapezium while h1 represents the distance between these two sides. The parallel sides are denoted through the y n as illustrated in Figure 2. The calculated trapezoidal areas are appropriate values of displayed in table 2 below. However, a slight adjustment is needed as the Lorenz curve intersects the origin making the first quintile form a triangle instead of a trapezium. To mitigate this issue, the first quintile’s area will be calculated as a triangle using the area formula of a triangle:

A=

1 y h 2 1 1

Area=12y1In this formula, h1 is the base length along the x -axis and the height is represented through the variable y 1 . The calculation of the triangular area is shown below:

1 Area= × 0.074 ×0.200 2 Trapezium # T1 T2 T3 T4

¿ 0.0074 Value of � 0.074 0.199 0.369 0.596

Value of � 0.199 0.369 0.596 1.000

Value of ℎ 0.200 0.200 0.200 0.200

Area 0.027 0.057 0.097 0.160

Table 2 – Table displaying the value of each variable to calculate each trapezoidal area Following the calculations of the trapezium areas, the sum of these areas can be calculated to estimate the total area under the Lorenz curve as shown below:

0.007 + 0.027 + 0.057 + 0.097 + 0.160=0.348 When the area of all the trapeziums are calculated, it is subtracted from the total triangular area under the line of perfect equality using the following formula:

Area A +B =

1 y × h . Here, 2 5 2

y5 4

y -axis while h2 represents the height of the y 5=1.000 and h2=1.000 .

equals to the length of the base measured by the triangle as measured by the x -axis. Therefore,

1 Area A +B = ( 1.000) × ( 1.000 ) 2 A , area¿ 0.500 B is subtracted from the area under the line of perfect

Lastly, to calculate area equality. The previously calculated values are substituted into the following formula:

Area A =

( 12 y ×h )−( 12 h [ y +2( y + y …+ y 5

2

1

2

3

1

n−1

) + y n] + 2 y 1 ×h

)

¿ ( 0.500)− (0.348 )

¿ 0.1534

≈ 0.153 Therefore, to calculate the Gini coefficient, the area of formula.

G=

0.153 0.500

¿ 0.306

A

and

B is substituted into the original

∴G=0.306

The Gini coefficient calculated using the trapezium method was calculated to have a 6.7% percentage error difference. This was calculated using the formula of:

%error=

actual −experimental actual

This understatement of the value may be due to the trapezium only representing an estimation of each proportional quintile area. This serves as an indication of the trapezium rule as a somewhat accurate measure in determining the Gini coefficient. Despite the minor difference, this validates the method of calculating area A by subtracting area B from the area under the line of perfect equality.

Method 2 – Modelling Lorenz curve from quadratic function From this, I decided to investigate further and search for another method to calculate the Gini coefficient more accurately. Here, I remembered to my SL mathematics class where I was taught A , the definite the formula for calculating the area between two curves. To calculate the area integral of the bottom curve is subtracted from the definite integral of the top curve under certain x -axis limits using the formula of: b

∫ ( top curve−bottom curve ) dx a

The variables a and b indicate the lower and upper bounds respectively. Throughout this investigation, the bounds were consistent as the Lorenz curve did not exceed the bounds between 0 and 1.



a=0

and

b=1

Moreover, when each area was calculated, it will be substituted into the original Gini coefficient formula to be expressed as:

G=

∫ E ( x ) −L ( x ) dx ∫ E ( x ) dx

represents the area of

where

∫ E ( x )−L ( x )dx

represents area

A

while

∫ E ( x ) dx

A+B .

As I was investigating the Lorenz curve, I questioned what kind of equation would be used to model the function. The first method I devised was to use a quadratic equation to model the function as the curve was parabolic like. Therefore, the model equation I will use was:

5

2

L2 ( x )=ax + bx + c In order to determine the values of a , b and c , simultaneous equations can be used solve each value algebraically [CITATION Con \l 3081 ]. To improve accuracy, values will be taken from the first, median and last quintile to demonstrate use of an even spread of data. Therefore, the values of the quintiles of 0.000, 0.600 and 1.000 are used to construct the simultaneous equations. To begin, the first data point of (0.000,0.000) was substituted into the model equation to determine the value of c or the y -intercept.

( 0.000) +c L ( 0.000) =a( 0.000 )2+b(1)

c=0.000

Henceforth, the value of c=0.000 , illustrating the y -intercept to be 0 as well. Next, the values of (0.600, 0.369) are substituted into the model equation to form equation (2).

0.600 ¿ ¿ L ( 0.600) =a ¿

(2)

0.369=0.360 a+0.600 b

After this, the values of (3) (1.00,1.000) are substituted likewise to created equation (3).

1.000 ¿ ¿ L ( 1.000 ) =a ¿

1=1 a+ (4) 1 b (5)

Following this, equations (2) and (3) are simultaneously calculated to determine the value of a . To solve this simultaneous equation, the elimination method will be used where equation (2) was (5)

multiplied by a factor to (3) isolate the (2) to have an identical

5 3

a variable. The factor of

was used to multiply equation

b value as equation (3).

(2) 0.369=0.36 a+0.6 b (4)0.615=0.60 a+1.0 b

Then equation 4 is multiplied by -1 to allow the elimination of the

b value.

−0.615=−0.60−1.0 b

0.615=0.60 a+1.0 b

Following this, through the addition of opposite signs in equation (5) and (3), the determined.

−0.615=−0.60−1.0 b After identifying the

1.0=1.0 a+ 1.0 b

a=

0.385 0.4

¿ 0.9625

a value, it is reinjected back into equation (2) to determine the

0.369= 0.360 ( 0.963)+0.600 b , the a and b function.

0.385=0.4 a

a value can be

0.600 b=0.369−0.347

b=

0.023 0.6

¿ 0.0372

≈ 0.963 b value. ≈ 0.037 Lastly

value are substituted into the model quadratic equation to produce the following

L2 ( x )=0.963 x 2 +0.037 x From this, the function can be electronically modelled on a graph to examine the closeness of the points

6

Figure 3 – Quadratic graph of the Lorenz curve accompanied by the line of perfect equality.

From the quadratic equation, the formula of the area between two curves can be used to calculate area A and A + B . Firstly, the area of A is calculated by identifying the top and bottom curve as E ( x )=x and L2 ( x ) ≈ 0.963 x 2 +0.038 x, respectively. Here, they are substituted into the formula for calculating the area between two curves. The limits identified for this equation was x=0 and x=1 as those points are points of intersection and the values of the first and last quintile.

E ( x ) −¿ L2 ( x ) dx 1

1

¿∫ x−( 0.963 x + 0.037 x ) dx 2

AreaA =∫ ¿

0

0

[ (

x2 0.963 x 3 0.038 x 2 + ¿ − 2 3 2

)]

1

0

The bounds of 1 and 0 are then substituted into the integration equation

¿

[ (

)]

[ (

2 2 0.963 (1 3) 0.038 (1 ) 2 0.963( 0 )3 0.038 ( 0 )2 (1) ( 0) − + + − − 2 2 3 3 2 2

)]

¿ 0.16056

≈ 0.161

Then the generated values of area A and area A + B (previously calculated in past worked examples) are applied to formula of the Gini coefficient as follows:

G=

0.161 0.500

¿ 0.321

The Gini coefficient calculated from the quadratic function was observed to have an approximate 2.1% percentage difference. This may have been due to the discrepancies seen in Figure 3 where the cumulative proportion of the population of 0.200, 0.400 and 0.800 had minor inaccuracies as a result of the quadratic function only using values from the first, median and last quintile. The difference between the actual Gini coefficient and the value produced from the quadratic function is very minimal deeming this method as somewhat successful and accurate. Additionally, the quadric function fulfils the requirements of the Lorenz curve intersecting at the origin of (0.000,0.000) and at (1.000,1.000). However, the difference does indicate minor inaccuracies prompting me to find another method that could be closer to the actual value of the Gini coefficient. Therefore, I decided to search another method that could ideally be more accurate across all data points.

Method 3 – modelling Lorenz curve from polynomial function using Vandermonde matrix As I was not completely satisfied with previously calculated Gini coefficients, I continued to do research on a method to construct a more accurate Lorenz curve. My research then prompted me 7

to find a method alike of the quadratic function called a polynomial function which would construct an equation utilising all six coordinates as opposed to only three. A polynomial function takes the 2

n

3

n form of y n=C 0 +C 1 x + C 2 x +…+C n x , with C denoting the set of coefficients and symbolising the degree of the polynomial function. To maximise accuracy, I decided to use a fifthdegree polynomial as it is the maximum degree order that can be constructed using all six coordinates. As the polynomial function can utilise all data points, the result is presumed to model the most accurate Lorenz curve. The fifth-degree polynomial function of the Lorenz curve is expressed as follows: L3 ( x )=C 5 x 5 + C 4 x 4 + C 3 x 3+ C2 x2 + C 1 x + C 0 To calculate the set of coefficients, the simultaneouselimination method can be used. However, this would be time-consuming and inefficient which prompted me to research a more efficient way. Along this journey, I found the Vandermonde matrix, which is used when interpolating a polynomial. Through substitution of the data points, the set coefficients can be calculated. [ CITATION Harnd \l 3081 ] The matrix will take on the form of as follows:

[ ] [ ][ ] y1 y2 = … yn

1

x1

x12

C1 1 x2 x C 2 … … … … 1 x n x n2 C n 2 2

As there are six unknown variables, the matrix will compose of 6 rows.

Moreover, the 5-degree polynomial constructs a six columned matrix. Therefore, the matrix for the polynomial function can be expressed as:

[][ y1 y2 y3 = y4 y5 y6

2 3 x 1 x1 x 1 2 3 x2 x2 x2 2 3 x 3 x3 x 3 2 3 x4 x4 x4 x 5 x52 x35 1 x 6 x62 x36

1 1 1 1 1

The ascending

xn

x 14 x 24 x 34 x 44 x 54 x

4 6

x 15 x 25 x 35

][ ]

C0 C1 ∙ C2 5 x4 C 3 x 55 C 4 x 65 C 5

values refer to the cumulative proportion of population while the ascending

y n values denote the cumulative proportion of income shared amongst the nation. Furthermore, each of the x n -values are raised to ascending powers across the columns.

Firstly, the appropriate variables are substituted into the matrix as follows:

[ ][ 0 0.074 0.199 = 0.369 0.596 1

1 1 1 1 1 1

0 0.2 0.4 0.6 0.8 1

0 0.04 0.16 0.36 0.64 1

0 0.008 0.064 0.216 0.512 1

0 0.0016 0.0256 0.1296 0.4096 1

][ ]

C0 0 0.00032 C1 0.01024 C2 ∙ 0.00776 C3 0.32768 C 4 1 C5

8

The matrix equation is then rearranged to isolate the unknown variables by inversing the matrix. Due to the magnitude of the matrix, the multiplication and inversion was completed using a GDC.<...


Similar Free PDFs