ISYE6501 Analytics Project LB PDF

Title	ISYE6501 Analytics Project LB
Course	Intro to Analytics Modeling
Institution	Georgia Institute of Technology
Pages	5
File Size	219.3 KB
File Type	PDF
Total Downloads	35
Total Views	127

Preview

CLICK TO PREVIEW PDF

Summary

Final Project...

Description

Industry: Energy & Utilities Company: WaterBot Case Study Link: https://www.ibm.com/case-studies/waterbot Project Goals: WaterBot plans to build a Smarter City Management grid with an early warning detection system that can forecast changes and can diagnose any leaks up-front to ensure efficient water management. Analytics problem statement: How to detect and predict whether contaminant exists in the drinking water? Is the water safe to drink? Or, if there is a water leak happening, what is the cause of it? I would like to propose two steps about how to go about this problem: 1. IoT infrastructure 2. Analytics 1. IoT infrastructure Using IoT what we can do is, we can put up a sensor in each of the houses in their water tanks. Although we need to put multiple sensors, as each of them will measure a particular type of containment. In addition to the sensors for the containment we can also put up sensors to measure the temperature, pressure and volumetric flow to detect any kind of anomaly in the water flow. This set of sensors needs to be placed inside the house. Along with these to detect the leakage and detect contaminants further we need to place these sensors along the water distribution pipes as well. This can be assessed using a time series model further to find any anomaly. There are several substances that are treated as drinking water contaminants. The list below shows some of the major contaminants: ● Inorganic Compounds: Arsenic, Nitrate, and Lead ● Infectious agents: Bacteria, Viruses, and Parasites ● Organic Compounds: Atrazine, DEHP, TCE, and PCE ● Disinfection Byproducts: Trihalomethanes and Haloacetic Acids ● Radionuclides: Uranium and Radium The data will then be sent to cloud so analytics can be performed. Water management control command will have the analytics and visualization tools to monitor the entire system and help solve this issue. 2. Analytics We’ll use three analytics models:

a) Detect if a particular contaminant is high in each house using Kernel CUSUM algorithm Considering there are multiple contaminants we will have to make individual models for each of the contaminants. Here as a first example I will talk about the bacteria detection. •

Given: Bacteria sensor data as mentioned in the IoT infrastructure point

•

Use: CUSUM to detect increase in contaminant concentration because a CUSUM is used for monitoring change detection.

•

Result: An alert will be sent to both the house owner and water management control command whenever the amount of bacteria increases

The advantage of using CUSUM to detect change instead of simply using threshold on raw data is, it is more robust to sudden change due to noise in sensor measurements. CUSUM model will trigger alert only when there is a significant change in the sensor reading. We can further use this model to detect false alarms by these sensors. Online change detection involves monitoring a stream of data for changes in the statistical properties of incoming observations. A good change detector will detect any changes shortly after they occur, while raising only a few false alarms. Although there are algorithms with confirmed optimality properties for this task, they rely on the describe a kernel-based variant of the Cumulative Sum (CUSUM) change detection algorithm that can detect changes under less restrictive assumptions. Instead of using the likelihood ratio, which is a parametric quantity, the Kernel CUSUM (KCUSUM) algorithm compares incoming data with samples from a reference distribution using a statistic based on the Maximum Mean Discrepancy (MMD) non-parametric testing framework. The KCUSUM algorithm is applicable in settings where there is a large amount of background data available and it is desirable to detect a change away from this background setting. Exploiting the random-walk structure of the test statistic, we derive bounds on the performance of the algorithm, including the expected delay and the average time to false alarm. b) Classify whether the water in each house is safe to drink. Previous model that uses kernel CUSUM will only detect increase in each contaminant. Consider the following example case (the numbers here are just used as an example): Virus alert will be triggered when the amount of virus is 2.2 or more (point green will be considered dangerous level). Bacteria alert will be triggered when amount of bacteria is 14.5 or more (point blue will be considered dangerous level). But there might be a case where the amount of virus is less than the threshold and the amount of bacteria is also less than the threshold, but because both virus and bacteria is present in the water it

becomes very dangerous to drink (represented by point red). Point

Amount of bacteria

Amount of virus

Green

1

2.2

Blue

15

1

Red

14

2

The above example is only for two variables, but we need to consider all the variables of contaminants. Now before feeding this data as input into the classification model, we will smoothen the data to increase signal to noise ratio of the data. Exponential smoothing can be used to smooth out the sensor readings. The alpha value (smoothing constant) will depend on our confidence in each sensor reading. The more precise and sensitive the sensor, the smaller the alpha because with a more precise reading of the sensor we can be confident about our current measurement.

• • •

Given: Each contaminant time series data Use: Exponential smoothing Result: Noise reduced data

After getting the smoothened contaminants data, we need to classify whether the water is safe to drink or not. For classifying the data, we can use Support Vector Machine model (SVM). SVM is a supervised machine learning algorithm which can be used for both classification or regression challenges. In the SVM algorithm, we plot each data item as a point in n-dimensional space (here n would be the number of contaminants that we have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiates the two classes very well. Here we will use a soft margin. An important thing while choosing the hyper plane is to make a trade off between false positive and false negative. Here a false positive is when the model classifies water safe for drinking as unsafe while a false negative is when the model classifies water unsafe for drinking as safe. Another term used for this in statistics is trading off between Type I and Type II error. In our case having a false negative is not acceptable than having a false positive i.e. water that is classified safe for drinking by the model but is actually unfit for drinking as this is related to the health and well being of the public. Hence, it is better to be on a safer side. • • •

Given: Noise reduced contaminants data Use: Support vector machine Result: Classification of safe or unsafe water

Now here whenever the model classifies water as unsafe for drinking, it will trigger an alarm to the house owner as well as to the water system command centre. This is also a great way to prevent and control any water borne diseases.

c) Use analytics to find the root cause or find the main pipe that causes water leak. Negative pressure wave will be generated when the pipeline leaks. Due to the special nature of the negative pressure wave, it is often be used to detect the leak in pipelines. In order to solve the problem of leakage detection, a multivariate classification recognition model can be built by using Decision Tree and Support Vector Machine, which has advantages of rapid speed and high efficiency in classification. The model can be trained with a fault feature vector which is a

dimensionless value extracted from the pipeline pressure signal characteristic parameters, and then using the model to test the samples. This method can be effectively applied to leakage detection in pipelines. Support Vector Machine method based Decision Tree is to decompose the multi-classification problem into a series of binary classification problems and these binary classification problems are distributed on each node of the decision tree. When modeling, according to different attributes the decision tree root nodes and branch nodes are divided into several subsets st ep by step, until all the leaf nodes are obtained. • • •

Given: Pressure data Use: Support Vector Machine combined with Decision Tree Result: Leakage identification

Once a leakage is identified, the water company can get it fixed. We can also add additional features where whenever the model identifies a leakage a message or an email can be sent to the repair department and this will help save waste of water as well....