AI - Lectures - Lecture notes created PDF

Title	AI - Lectures - Lecture notes created
Author	Burhanuddin Fakhri
Course	Artificial Intelligence
Institution	Brunel University London
Pages	17
File Size	802.4 KB
File Type	PDF
Total Downloads	123
Total Views	310

Preview

CLICK TO PREVIEW PDF

Summary

Unsupervised LearningLearning Remember the early chatbots? We talked about chat box passing Turing test, we looked at Alisa chatbot and what it was doing is simple pattern matching, so there would be a database, key terms that it would recognise, those databases were fixed, pre-recorded expert knowl...

Description

Unsupervised Learning Learning • Remember the early chatbots? We talked about chat box passing Turing test, we looked at Alisa chatbot and what it was doing is simple pattern matching, so there would be a database, key terms that it would recognise, those databases were fixed, pre-recorded expert knowledge and storing that in the database. But those chat box couldn’t leave if you had a conversation with them, they wouldn’t remember things and adapt in that way.  This idea of being able to adapt and learn is fundamental in modern AI which is equivalent to machine learning • No ability to learn / adapt • Learning is Key to AI • Appears in different forms • Different Classes of Learning • Unsupervised learning: learning without the desired output (‘teacher’ signals).  Unsupervised learning: This is where you are kind of looking for structure in the world and data you are given, there isn’t a particular desired output, just taking in a lot of data and try to structure it somehow • Supervised learning: learning with the desired output. • Reinforcement Learning – reward / punishment signals  Reinforcement Learning: Its more in the line with the way you would train a dog to do tricks, its pretty much to do with reward and punishment signals. Its not to do with right answer. That’s a really powerful form of AI • Clustering is one of the widely-used unsupervised learning methods. The most common way of to do unsupervised learning is clustering  Clustering is widely used in modern AI  Other techniques which come under the umbrella are: o Dimensionality reduction o Association Rule mining • Other Unsupervised learning: • Dimensionality reduction (e.g. PCA) • Association Rules / Recommender Systems • Supervised learning: • Classification (see next lecture) • Regression • Reinforcement Learning commonly seen in Neural network training (see in 2 weeks) Clustering – in everyday learning  Teach a software/ Algorithm to find something new  We naturally do this, one of the first things we do as a child before we could communicate is we start to make sense around the world, we see through the visual

 



cortex this kind of clustering, whether its our mother, family, human or animal ..we start naturally to cluster things we can see into these groups. Clustering process catches something around us and give it a label Define clustering o We have to partition a data set into subsets, so the data in each subset have something in common, we need some way to measure distance or similarity between the objects Another way of clustering the vehicle by its top speed and weight

Clustering Clustering: to partition a data set into subsets (clusters), so that the data in each subset share some common trait - often similarity or proximity for some defined distance measure. • The process of organizing objects into groups whose members are similar in some way. • A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. • Unsupervised: No need for the ‘teacher’ signals, i.e. the desired output.

Uses: Social networks  If you have network like shown in the diagram you can use links to measure distance which allows you to identify all sorts of interesting things about network o If social network people talking to each other, you can identify who to target to be able to send viral messages • Marketing • Terror networks • Allocation of resources in a company / university

Uses: Customer Segmentation

Customer segmentation in business intelligence  What happens is when you buy stuff online, all of those companies will make decisions about you, what you bought, when you bought, how often you bought stuff and they will segment you to a particular type of person, they cluster them using the data and then label you Uses: Gene networks  Clustering is huge in the 1990’s for trying to understand genomics and how we could understand the function of different genes, they would cluster together the behaviour of different genes  Understanding gene interactions  Identifying important genes linked to disease  

Clustering is useful and powerful How do we actually do that? In a more formalised way

How to do clustering? What we know: patterns represented by their feature vectors, What we need to find out: the clusters

To find the clusters we need to measure the similarities and distance

Pattern Similarity • A key concept in clustering: similarity. • Clusters are formed by similar patterns. • In computer science, we need to define some metric to measure similarity. One of the commonly adopted similarity metrics is distance. A general definition of distance (between pattern A and B): • b=2: Euclidean distance • b=1: Manhattan distance The shorter the distance, the more similar the two patterns. Euclidean / Manhattan Distance

Example

• Euclidean √((5.5−0.2)2 + (2.9−1.0)2+ (4.8−4.8)2 + (6.7−3.8)2 + (0.6−9.2)2) = √((5.3)2 + (1.9)2+ (0.0)2 + (2.9)2 + (−8.6)2) = √ (28.09 + 3.61 + 0.0 + 8.41 + 73.96) = √(114.07) = 10.68 • Manhattan (|5.5−0.2| + |2.9−1.0|+ |4.8−4.8| + |6.7−3.8| + |0.6−9.2|) = 5.3 + 1.9 + 0.0 + 2.9 + 8.6 = 18.7 How to calculate distance?  Euclidean distances: o Take two data points and subtract one from the other one data point at a time and then squaring the difference and then you sum over them  Manhattan Distance:

o Slight distance here is there is no squaring. You subtract one from the other absolute Pattern Similarity & Distance Metrics Distance Metrics • Euclidean • Correlation  Which will measure how similar the shape is of two data points and trying to choose those similarity measures is not simple, this is very application dependant on which measure to use • Minkowski • Manhattan • Mahalanobis Relationship Metrics • How Long is a Piece of String? • Often Application Dependant Gf Algorithm 1: K-Means Clustering 1. Place K points into the feature space. These points represent initial cluster centroids. 2. Assign each pattern to the closest cluster centroid. 3. When all objects have been assigned, recalculate the positions of the K centroids. 4. Repeat Steps 2 and 3 until the assignments do not change. K-means Clustering  Baseline Algorithm  Assign one parameter k and that determines how many clusters your going to find. Randomly assign k points which will be centroids of your clusters in the feature space, it is done randomly, then you assign each data point to the nearest centroid in the data space and when you have done that with all your data pints you recalculate the position of the k centroids so they move into the centre of all the data points that are assigned to then then you repeat steps two and three, assignment to a centroid and the centroid moves, then it is reassigned and it moves Interactive Demo: https://www.naftaliharris.com/blog/visuali zing-k-means-clustering/ http://util.io/k-means http://stanford.edu/class/ee103/visualizati ons/kmeans/kmeans.html Discussions 1. How to determine k, the number of clusters?

2. Any alternative ways of choosing the initial cluster centroids? 3. Does the algorithm converge to the same results with different selections of initial cluster centroids? If not, what should we do in practice? 4. Intuitively, what is the ideal partition of the following problem? Can K-means Clustering algorithm give a satisfactory answer to this problem?





How to determine k, the number of clusters? o To choose the initial value of k is not easy o We also started with random centroids, is that the best solution? Does it always converge? Does it come up with different results every time you run it? o You may also find different results for k every time you run it o Is that bad or is it good? Sometimes its good Shape of the clusters: o It does not work well with elongated clusters because of the distance, it also doesn’t work well when you have different sized clusters so if you have one dense cluster with lots of data and few small ones

Limitations of K-Means At each iteration of the K-Means Clustering algorithm, a pattern can be assigned to one cluster only, i.e. the assignment is ‘hard’.

Pros and Cons of KM Advantages  May be computationally faster than hierarchical clustering (if K is small).  May produce tighter clusters than hierarchical clustering, especially if the clusters are globular. Disadvantages  Fixed number of clusters can make it difficult to predict what K should be.  Different initial partitions can result in different final clusters. Potential empty clusters (not always bad)  Does not work well with non-globular clusters. Hierarchical (agglomerative) Clustering • Hierarchical clustering results in a series of clustering results

• The results start off with each object in their own cluster and end with all of the objects in the same cluster • The intermediate clusters are created by a series of merges • The resultant tree like structure is called a dendrogram Hierarchical (agglomerative) clustering  It is performed by merging things together  It generates something useful, it kind of visualises the clusters as it goes. So what it does is it brings you a set of cluster except for just one. It starts of by saying just assume every data point you have is in its own cluster based on the distances to all the other data points it then starts to merge in cluster of two and then you keep doing that until you have one cluster with everything inside which is totally useless, while you go through the process you create a dendrogram. You end up with rich visualisation of the cluster.  Dendrogram: o Powerful way to understand your data between clusters The Hierarchical Clustering Algorithm 1) Each item is assigned to its own cluster (n clusters of size one) 2) Let the distances between the clusters equal the distances between the objects they contain 3) Find the closest pair of clusters and merge them into a single cluster (one less cluster) 4) Re-compute the distances between the new cluster and each of the old clusters 5) Repeat steps 3 and 4 until there is only one cluster left  

You should know both of these algorithms enough to explain in detail o Both of the algorithms have variation Example: Hierarchical clustering we can compute the distances between the data points and clusters, firstly, we look at the nearest points within two clusters and just say that’s the distance, that is called single linkage clustering  We can look at the furthest points between clusters that is called complete linkage clustering  Or we can see average, overall of the data points in the clusters that is average linkage

How Hierarchical Clustering works?

http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletH.html https://live.yworks.com/demos/analysis/clustering/index.html Pros and Cons of HC Advantages  Can produce an ordering of the objects, which may be informative for data display.  Smaller clusters are generated, which may be helpful for discovery. Disadvantages  No provision can be made for a relocation of objects that may have been 'incorrectly' grouped at an early stage.  Use of different distance metrics for measuring distances between clusters may generate different results. 



Advantages o Ordering of clustering based on the structuring of data which is really informative o You can control the size of the clusters, if you want really detailed small clusters you cut it lower down the dendrogram, if you want really high level clusters you cut it further up Disadvantages: o It is completely deterministic, there is no provision made for re allocated data points so its kind of a greedy search as you move up there it could be that there is a better clustering, but it is trapped, further down the line the cluster should be separated but it could not as it has merged

Other clustering methods Fuzzy Clustering • For example: Fuzzy c-means • In real applications often no sharp boundary between clusters • Fuzzy clustering is often better suited • Fuzzy c-means is a fuzzification of k-Means and the most well-known Cluster membership is now a weight between zero and one The distance to a centroid is multiplied by the membership weight Example: http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletFCM.html Fuzzy Clustering  What fuzzy clustering does is allocates weighting towards a cluster then allocating to a cluster  Probability or likelihood you belong to this cluster  This is used to measure data where it is not consistent. You could be someone who spends little on weekdays and spends a lot in the weekend so you could be part of both clusters DBSCAN (from Ranjay Sankar notes, University of Florida) • DBSCAN is a density based clustering algorithm • Density = number of points within a specified radius (Eps)

• A point is a core point if it has more than specified number of points (MinPts) within Eps • (Core point is in the interior of a cluster)

• Density-Reachable (directly and indirectly): • A point p is directly density-reachable from p2 • p2 is directly density-reachable from p1 • p1 is directly density-reachable from q • p...