Entropy and Information Gain Theory PDF

Title	Entropy and Information Gain Theory
Author	NICKSON MUTUA
Course	Project Management in Information Technology
Institution	Strathmore University
Pages	8
File Size	407 KB
File Type	PDF
Total Downloads	86
Total Views	182

Preview

CLICK TO PREVIEW PDF

Summary

Download Entropy and Information Gain Theory PDF

Description

Data Warehousing, Entropy and Information Gain in Decision Tree, Data Science

MIT8107 - Advanced Database Systems

Entropy and Information Gain (Data Warehousing – Decision Trees)

Nickson Mwikya

137440

Data Warehousing, Entropy and Information Gain in Decision Tree, Data Science

Entropy and Information Gain in Decision Tree Decision Tree is one of the most popular algorithms in machine learning. It is relatively simple, yet able to produce good accuracy. But the main reason it is widely used is the interpretability. We can see how it works quite clearly. We can understand how it works. Decision Tree is a supervised machine learning algorithm. So we train the model using a dataset, in order for it to learn. Then we can use it to predict the output. It can be used for both regression and classification. Regression is when the output is a number. Classification is when the output is a category. In this article I would like to focus on Entropy and Information Gain, using investment funds as an example. Entropy is the level of disorder in the data. Entropy In thermodynamics, Entropy is the level of disorder or randomness in the system. Similary in data analytics, entropy is the level of disorder or randomness in the data. If we have 100 numbers and all of them is 5, then the data is in very good order. The level of disorder is zero. The randomness is zero. There is no randomness in the data. Everywhere you look you get 5. The entropy is zero. If these 100 numbers contain different numbers, then the data is in a disorder state. The level or randomness is high. When you get a number, you might get number 4, or you might get number 7, or any other number. You don’t know what you are going to get. The data is “completely” random. The level of randomness in the data is very high. The entropy in data is very high. The distribution of these different numbers in the data determine the entropy. If there are 4 possible numbers and they are distributed 25% each, then the entropy is very high. But if they are distributed 99%, 1%, 1%, 1% then the entropy is very low. And if it’s 70%, 10%, 10%, 10% the entropy is somewhere in between (medium). The maximum value for entrophy is 1. The minimum value for entrophy is 0. Information Gain Now that we have a rough idea of what entropy is, let’s try to understand Information Gain. A Decision Tree consists of many levels. In the picture below it consists of 2 levels. Level 1 consists of node A. Level 2 consists of node B and node C.

Data Warehousing, Entropy and Information Gain in Decision Tree, Data Science

Information Gain is the decrease in entropy from one level to the next. Node B has entrophy = 0.85, a decrease of 0.1 from Node A’s entrophy which is 0.95. So Node B has information gain of 0.1. Similarly, Node C has information gain of 0.95 – 0.75 = 0.2. When the entropy goes down from 0.95 to 0.75, why do we say that the amount of information is more (gaining)? Higher entrophy means the data is more uniform, lower entropy means the data is more distributed or varied. That’s why there is more information in the data, because the data is more varied. That’s why when the entropy decreases the amount of information is higher. We have “additonal” information. That is Information Gain. Calculating Entropy Now we know what Entropy is, and what Information Gain is. Let us now calculate the entropy. First let’s find the formula for entropy. In thermodynamics, entropy is the logarithmic measure of the number of states Entropy is the average of information content (link). The information content of an event E1 is the log of 1/(the probability of E1). The information content is called I. So I1 = log of (1/p(E1)). If we have another event (E2), the information content is: I2 = log of (1/p(E2)). The average of the information content I1 and I2 (or the entropy) is: the sum of (information content for each event x the probability that event occuring) = I1 x p(E1) + I2 x p(E2) = log of (1/p(E1)) x p(E1) + log of (1/p(E2)) x p(E2) = –log of p(E1) x p(E1) –log of p(E2) x p(E2) If we have i events, the entropy is: = -sum of (p(Ei) x log of p(Ei)) Fund Price Now that we know how to caculate entropy, let us try to calculate the entropy of probability of the price of a fund going up in the next 1 year.

Data Warehousing, Entropy and Information Gain in Decision Tree, Data Science

In the above table, the last column is the price of a fund 1 year from now, which can be higher or lower than today. This is denoted with “Up” or “Down”. This price is determined from 4 factors or features: 1. The performance of the fund in the last 3 years (annualised, gross of fees). This past performance is divided into 3 buckets: Down (less than zero, “Up between 0 and 2%”, and “Up more than 2%”. 2. The interest rate, for example LIBOR GBP 1 Year today. This today interest rate is compared with the interest 1 year go, and divided into 3 buckets: today it’s higher than 1 year ago, lower than 1 year ago, or the same (constant). 3. The value of the companies that the fund invest in, by comparing the book value to the share price of the company today. Also the earning (the income) the companies make compared to the share price (cyclically adjusted). This company value factor is divided into 3 buckets: overvalued, undervalued and fair value. 4. The ESG factors, i.e. Environment, Social and Governance factors such as polution, remuneration, the board of directors, employee rights, etc. This is also divided into 3 buckets, i.e. high (good), medium, and low (bad).

Data Warehousing, Entropy and Information Gain in Decision Tree, Data Science

The Four Factors 1. Past performance Funds which have been going up a lot, generally speaking, has the tendency to reverse back to the mean. Meaning that it’s going to go down. But another theory says that if the fund price has been going up, then it has the tendency to keep going up, because of the momentum. Who is right is up for a debate. In my opinion the momentum principle has stronger effect compared to the “reveral to the mean” principle. 2. Interest rate Because the value/price of the fund is not only affected by the companies or shares in the fund, but also affected by external factors. The interest rate represent these external factors. When the interest rate is high, share prices growth is usually constraint because more investors money is invested in cash. On the contrary, when the interest rate is low, people don’t invest in cash and invest in shares instead (or bonds). But the factor we are considering here is the change of interest rate. But the impact is generally the same. Generally speaking if the interest rate is going up then the investment in equity is decreasing, thus putting pressure on the share price, resulting lower share price. 3. Value If the company valuation is too high, the investors become concerned psychologically, afraid of the price would go down. This concern creates pressure on the share price, and the share price will eventually goes down. On the contrary, if the the company valuation is lower compared to similar companies in the same industry sector and in the same country (and similar size), then the investors would feel that this stock is cheap and would be more inclined to buy. And this naturally would put the price up. 4. ESG Factors like climate change, energy management, health & safety, compensation, product quality and employee relation can affect the company value. Good ESG scores usually increase the value of companies in the fund, and therefore collectively increases the value of the fund. On the contrary, concerns such as accidents, controversies, pollutions, excessive CEO compensation and issues with auditability/control on the board of directors are real risks to the company futures and therefore affect the their share price. Entropy at Top Level Now that we know the factors, let us calculate the Information Gain for each factor (feature). This Information Gain is the Entropy at the Top Level minus the Entropy at the branch level.

Data Warehousing, Entropy and Information Gain in Decision Tree, Data Science

Of the total of 30 events, there are 12 “Price is down” events and 18 “Price is up” events. The probability of the price of a fund going “down” event is 12/30 = 0.4 and the probability of an “up” event is 18/30 = 0.6. The entropy at the top level is therefore: -U*Log(U,2) -D*Log(D,2) where U is the probably of Up and D is the probability of Down = -0.6*Log(0.6,2) -0.4*Log(0.4,2) = 0.97 Information Gain of the Performance branch The Information Gain of the Performance branch is calculated as follows:

First we calculate the entropy of the performance branch for “Less than 0”, which is: -U*Log(U,2) -D*Log(D,2) where U is the probably of the price is going up when the performance is less than zero, and D is the probability of the price is going down when the performance is less than zero. = -0.5 * Log(0.5,2) -0.5 * Log(0.5,2) =1 Then we calculate the entropy of the performance branch for “0 to 5%”, which is: = -0.56 * Log(0.56,2) -0.44 * Log(0.44,2) = 0.99 Then we calculate the entropy of the performance branch for “More than 5%”, which is: = -0.69 * Log(0.69,2) -0.31 * Log(0.31,2) = 0.89 Then we calculate the probability of the “Less than 0”, “0 to 5%” and “More than 5%” which are: 8/30 = 0.27, 9/30 = 0.3 and 13/30 = 0.43 So if Performance was the first branch, it would look like this:

Data Warehousing, Entropy and Information Gain in Decision Tree, Data Science

Then we sum the weighted entropy for “Less than 0”, “0 to 5%” and “More than 5%”, to get the total entropy for the Performance branch: 1 * 0.27 + 0.99 * 0.3 + 0.89 * 0.43 = 0.95 So the Information Gain for the Performance branch is 0.97 – 0.95 = 0.02 Information Gain for the Interest Rate, Value and ESG branches We can calculate the Information Gain for the Interest Rate branch, the Value branch and the ESG branch the same way:

Data Warehousing, Entropy and Information Gain in Decision Tree, Data Science

Why do we calculate the entropy? Because we need entropy to know the Information Gain. But why do we need to know the Information Gain? Because the decision tree would be more efficient if we put the factor with the largest Information Gain as the first branch (the highest level). In this case, the factor with the largest Information Gain is Value, which has the Information Gain of 0.31. So Value should be the first branch, followed by ESG, Interest Rate and the last one is Performance....