Lab Manual Machine Learning PDF

Title	Lab Manual Machine Learning
Course	Machine Learning
Institution	Galgotias University
Pages	27
File Size	893.2 KB
File Type	PDF
Total Downloads	13
Total Views	142

Preview

CLICK TO PREVIEW PDF

Summary

Lab Manual Machine Learning...

Description

SCHOOL OF COMPUTING SCIENCE AND ENGINEERING GALGOTIAS UNIVERSITY, GREATER NOIDA UTTAR PRADESH

LAB Manual

V Semester

Course: Machine Learning Lab (BTCS9304)

Prepared By Ms. Vaishali Gupta

Course objectives: This course will enable students to 1. Make use of Data sets in implementing the machine learning algorithms 2. Implement the machine learning concepts and algorithms in any suitable language of choice. Lab Experiments: 1. Write a program to demonstrate the working of the Simple Linear Regression. Use an appropriate data set for the implementation. 2. Write a program to demonstrate the working of the Logistic Regression. Use an appropriate data set the implementation. 3. Write a program to demonstrate the working of the Random Forest Regression. Use an appropriate data set the implementation. 4. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample 5. Write a program to demonstrate the working of Naive Bayes classifier. Compute the accuracy of the classifier, considering few test data sets. 6. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print both correct and wrong predictions. 7. Write a program to demonstrate the working of the Neural Network. Use an appropriate data set the implementation. 8. Write a program to demonstrate the working of the Support Vector Machine. Use an appropriate data set the implementation. 9. Write a program to demonstrate the working of the K-means Clustering. Use an appropriate data set the implementation. % % % % % % % %

% % % %

Experiment 1 Aim:- Write a program to demonstrate the working of the Simple Linear Regression. Use an appropriate dataset for the implementation. Introduction:- Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age, product price, etc. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable. Implementation using python:Import the libraries:import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from matplotlib.pyplot as plt #Importing the dataset in Google Colab from google.colab import files uploaded=files.upload() dataset = pd.read_csv(‘'Salary_Data.csv'’) X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values print(dataset)

#Splitting the dataset into the Training set and Test set x_train, x_test, y_train, y_test= train_test_split(X, y, test_size=1/3, random_state=0) #Training the Simple Linear Regression model on the Training set regressor = LinearRegression() regressor.fit(x_train,y_train) #Predicting the Test set results y_pred = regressor.predict(x_test) print(y_pred) #Visualising the Training set results plt.scatter(x_train, y_train, color = ‘red’) plt.plot(x_train, regressor.predict(x_train), color = ‘blue’) plt.title(‘Salary vs Experience (Training set)’) plt.xlabel(‘Years of Experience’) plt.ylabel(‘Salary’) plt.show() #Visualising the Test set results plt.scatter(x_test, y_test, color = ‘red’) plt.plot(x_train, regressor.predict(x_train), color = ‘blue’) plt.title(‘Salary vs Experience (Test set)’) plt.xlabel(‘Years of Experience’) plt.ylabel(‘Salary’) plt.show()

Outputs:

Result:-Visualization for both Training and Test set has been done. Conclusion:- The program for Linear Regression has been demonstrated using suitable dataset, values predicted and result visualized.

Experiment 2 Aim:- Write a program to demonstrate the working of the Logistic Regression. Use an appropriate dataset to implement the same. Introduction:- Logistic Regression is a Machine Learning algorithm which is used for the classification problems and comes under supervised learning techniques ,it is a predictive analysis algorithms and based on the concept of probability. It is used for predicting the categorical dependent variable using a given set of independent variables.

Implementation using python:Import the libraries:Import pandas as pd from seaborn as sns from sklearn.linear_model import LogisticRegression from sklearn.metrics import classification_report from sklearn.metrics import accuracy_score from sklearn.model_selection import train_test_split from matplotlib.pyplot as plt #load the dataset data = sns.load_dataset(‘'iris'’) print(data) #Prepare the training dataset #X =feature values,all the columns except the last column X = data.iloc[:, :-1] #y=target value,last column of the data frame

y = data.iloc[:, -1] #Plot the relation of each feature with each species plt.xlabel(‘Features’) plt.ylabel(‘Species’) pltX = data.loc[:, ’sepal_length’] pltY = data.loc[:, ’species’] plt.scatter(pltX, pltY, color=blue, label=sepal_length) pltX = data.loc[:, ’sepal_width’] pltY = data.loc[:, ’species’] plt.scatter(pltX, pltY, color=green, label=sepal_width) pltX = data.loc[:, ’petal_length’] pltY = data.loc[:, ’species’] plt.scatter(pltX, pltY, color=red, label=petal_length) pltX = data.loc[:, ’petal_width’] pltY = data.loc[:, ’species’] plt.scatter(pltX, pltY, color=black, label=petal_width) plt.legend() plt.show() #Split the data into 80% training and 20% testing x_train,x_test,y_train_y_test = train_test_split(X,y,test_size=0.2,random_state=42) #Train the model predictions =model.predict(x_test) print(predictions) print(y_test) #Check precision metrics,recall,f1-score,print accuracy print(classification_report(y_test,predictions)) print(“ accuracy is:”,accuracy_score(y_test,predictions))

Output:-

Result:-Accuracy is 1.0. Conclusion:-The program for Logistic Regression has been demonstrated, values predicted and accuracy computed.

Experiment 3 Aim: Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample. Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation. import numpy as np import math from data_loader import read_data class Node: def init (self, attribute): self.attribute = attribute self.children = [] self.answer = "" def str (self): return self.attribute def subtables(data, col, delete): dict = {} items = np.unique(data[:, col]) count = np.zeros((items.shape[0], 1), dtype=np.int32) for x in range(items.shape[0]): for y in range(data.shape[0]): if data[y, col] == items[x]: count[x] += 1 for x in range(items.shape[0]): dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32") pos = 0 for y in range(data.shape[0]): if data[y, col] == items[x]: dict[items[x]][pos] = data[y] pos += 1

if delete: dict[items[x]] = np.delete(dict[items[x]], col, 1) return items, dict def entropy(S): items = np.unique(S) if items.size == 1: return 0 counts = np.zeros((items.shape[0], 1)) sums = 0 for x in range(items.shape[0]): counts[x] = sum(S == items[x]) / (S.size * 1.0) for count in counts: sums += -1 * count * math.log(count, 2) return sums def gain_ratio(data, col): items, dict = subtables(data, col, delete=False) total_size = data.shape[0] entropies = np.zeros((items.shape[0], 1)) intrinsic = np.zeros((items.shape[0], 1)) for x in range(items.shape[0]): ratio = dict[items[x]].shape[0]/(total_size * 1.0) entropies[x] = ratio * entropy(dict[items[x]][:, -1]) intrinsic[x] = ratio * math.log(ratio, 2) total_entropy = entropy(data[:, -1]) iv = -1 * sum(intrinsic) for x in range(entropies.shape[0]): total_entropy -= entropies[x] return total_entropy / iv def create_node(data, metadata): if (np.unique(data[:, -1])).shape[0] == 1: node = Node("") node.answer = np.unique(data[:, -1])[0] return node gains = np.zeros((data.shape[1] - 1, 1)) for col in range(data.shape[1] - 1):

gains[col] = gain_ratio(data, col) split = np.argmax(gains) node = Node(metadata[split]) metadata = np.delete(metadata, split, 0) items, dict = subtables(data, split, delete=True) for x in range(items.shape[0]): child = create_node(dict[items[x]], metadata) node.children.append((items[x], child)) return node def empty(size): s = "" for x in range(size): s += " " return s def print_tree(node, level): if node.answer != "": print(empty(level), node.answer) return print(empty(level), node.attribute) for value, n in node.children: print(empty(level + 1), value) print_tree(n, level + 2) metadata, traindata = read_data("tennis.csv") data = np.array(traindata) node = create_node(data, metadata) print_tree(node, 0) Data_loader.py import csv def read_data(filename): with open(filename, 'r') as csvfile: datareader = csv.reader(csvfile, delimiter=',') headers = next(datareader) metadata = [] traindata = [] for name in headers: metadata.append(name)

for row in datareader: traindata.append(row) return (metadata, traindata) Tennis.csv outlook,temperature,humidity,wind, answer sunny,hot,high,weak,no sunny,hot,high,strong,no overcast,hot,high,weak,yes rain,mild,high,weak,yes rain,cool,normal,weak,yes rain,cool,normal,strong,no overcast,cool,normal,strong,yes sunny,mild,high,weak,no sunny,cool,normal,weak,yes rain,mild,normal,weak,yes sunny,mild,normal,strong,yes overcast,mild,high,strong,yes overcast,hot,normal,weak,yes rain,mild,high,strong,no Output outlook overcast

b'yes' rain wind b'strong' b'no' b'weak' b'yes' sunny humidity b'high' b'no' b'normal' b'yes

Conclusion: The Decision Tree has been made and values have been predicted.

Experiment 4 Aim:- Write a program to demonstrate the working of the Random Forest Regression. Use an appropriate data set for implementation. Introduction:Random forest is a supervised learning algorithm which is used for both classification as well as regression. But however, it is mainly used for classification problems. As we know that a forest is made up of trees and more trees means more robust forest. Similarly, random forest algorithm creates decision trees on data samples and then gets the prediction from each of them and finally selects the best solution by means of voting. It is an ensemble method which is better than a single decision tree because it reduces the over-fitting by averaging the result. Implementation of RandomForest using Python:Step 1: Import the libraries:import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.ensemble import RandomForest Regressor Step 2: Importing the dataset data = pd.read_csv('Position_Salaries.csv') X = data.iloc[:, 1:-1].values y = data.iloc[:, 1].values

Step 3: Training the RandomForest Regression Model on the data regressor = RandomForestRegressor(n_estimators = 10, random_state = 0 ) regressor.fit(X, y) Step 4: Predicting a new Result regressor.predict([[6.5]) Output:array([167000.]) Step 5: Visualising the Random Forest Code for visualising data

RandomForest Regression Curve

Result-The above curve represents two steps between the position levels. X axis represents Position level and Y axis represents Salary. Conclusion: The Random Forest Regression has been made and values have been predicted.

Experiment 5 Aim:- Write a program to demonstrate the working of the Naive Bayes Classifier. Compute the accuracy of the classifier, considering few data sets. Introduction:- Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems. It is mainly used in text classification that includes a highdimensional training dataset. Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which help in building the fast machine learning models that can make quick predictions. It is a probabilistic classifier, which means it predicts on the basis of the probability of an object. Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles. Implementation of Gaussian Naive Bayes Classifier using scikit-learn in python:Import the libraries:Import pandas as pd from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB from sklearn import metrics #load the iris dataset data = pd.read_csv(‘'Iris.csv'’) print(data) iris = load_iris() #store the feature matrix(X) and reaponse vector(y) X = iris.data y = iris.target

#splitting X and y into training and testing sets X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4, random_state=1) #training the model on training set gnb = GaussianNB() gnb.fit(X_train,y_train) #making predictions on the testing set y_pred = gnb.predict(X_test) #comparing actual response values(y_test) with predicted response values(y_pred) print(“Gaussian Naïve Bayes model accuracy(in %):”, metrics.accuracy_score(y_test,y_pred) Output:-

Result:-Accuracy computed is 0.95. Conclusion:-The Naïve Bayes classifier has been demonstrated, values predicted and accuracy computed.

Experiment-6 Aim:- Write a program to demonstrate the working of the Naive Bayes Classifier. Compute the accuracy of the classifier, considering few data sets. Implementation: import csv import random import math import operator def loadDataset(filename, split, trainingSet=[] , testSet=[]): with open(filename, 'rb') as csvfile: lines = csv.reader(csvfile) dataset = list(lines)

for x in range(len(dataset)-1): for y in range(4): dataset[x][y] = float(dataset[x][y]) if random.random() < split: trainingSet.append(dataset[x]) else: testSet.append(dataset[x]) def euclideanDistance(instance1, instance2, length): distance = 0 for x in range(length): distance += pow((instance1[x] - instance2[x]), 2) return math.sqrt(distance) def getNeighbors(trainingSet, testInstance, k): distances = [] length = len(testInstance)-1 for x in range(len(trainingSet)): dist = euclideanDistance(testInstance, trainingSet[x], length) distances.append((trainingSet[x], dist)) distances.sort(key=operator.itemgetter(1)) neighbors = [] for x in range(k):

neighbors.append(distances[x][0]) return neighbors def getResponse(neighbors): classVotes = {} for x in range(len(neighbors)): response = neighbors[x][-1] if response in classVotes: classVotes[response] += 1 else: classVotes[response] = 1 sortedV otes = sorted(classV otes.iteritems(), reverse=True) return sortedVotes[0][0] def getAccuracy(testSet, predictions): correct = 0 for x in range(len(testSet)): key=operator.itemgetter(1 ), if testSet[x][-1] == predictions[x]: correct += 1 return (correct/float(len(testSet))) * 100.0 def main():

# prepare data trainingSet= [] testSet=[] split = 0.67 loadDataset('knndat.data', split, trainingSet, testSet) print('Train set: ' + repr(len(trainingSet))) print('Test set: ' + repr(len(testSet))) # generate predictions predictions=[] k=3 for x in range(len(testSet)): neighbors = getNeighbors(trainingSet, testSet[x], k) result = getResponse(neighbors) predictions.append(result) print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][1])) accuracy = getAccuracy(testSet, predictions) print('Accuracy: ' + repr(accuracy) + '%') main() OUTPUT:

Confusion matrix is as follows [[11 0 0] [0 9 1] [0 18]] Accuracy metrics 0 1.00 1.00 1.00 11 1 0.90 0.90 0.90 10 2 0.89 0.89 0,89 9 Avg/Total 0.93 0.93 0.93 30

Experiment-7 Aim:- Write a program to demonstrate the working of the Neural Network. Use an appropriate data set the implementation. Implementation: from math import exp from random import seed from random import random # Initialize a network def initialize_network(n_inputs, n_hidden, n_outputs): network = list() hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)] network.append(hidden_layer) output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)] network.append(output_layer) return network # Calculate neuron activation for an input def activate(weights, inputs): activation = weights[-1] for i in range(len(weights)-1): activation += weights[i] * inputs[i] return activation # Transfer neuron activation def transfer(activation): return 1.0 / (1.0 + exp(-activation)) # Forward propagate input to a network output def forward_propagate(network, row): inputs = row for layer in network: new_inputs = [] for neuron in layer: activation = activate(neuron['weights'], inputs) neuron['output'] = transfer(activation) new_inputs.append(neuron['output']) inputs = new_inputs return inputs

# Calculate the derivative of an neuron output def transfer_derivative(output): return output * (1.0 - output) # Backpropagate error and store in neurons def backward_propagate_error(network, expected): for i in reversed(range(len(network))): layer = network[i] errors = list() if i != len(network)-1: for j in range(len(layer)): error = 0.0 for neuron in network[i + 1]: error += (neuron['weights'][j] * neuron['delta']) errors.append(error) else: for j in range(len(layer)): neuron = layer[j] errors.append(expected[j] - neuron['output']) for j in range(len(layer)): neuron = layer[j] neuron['delta'] = errors[j] * transfer_derivative(neuron['output']) # Update network weights with error def update_weights(network, row, l_rate): for i in range(len(network)): inputs = row[:-1] if i != 0: inputs = [neuron['output'] for neuron in network[i - 1]] for neuron in network[i]: for j in range(len(inputs)): neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] neuron['weights'][-1] += l_rate * neuron['delta'] # Train a network for a fixed number of epochs def train_network(network, train, l_rate, n_epoch, n_outputs): for epoch in range(n_epoch): sum_error = 0 for row in train:

outputs = forward_propagate(network, row) expected = [0 for i in range(n_outputs)] expected[row[-1]] = 1 sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))]) backward_propagate_error(network, expected) update_weights(network, row, l_rate) print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error)) # Test training backprop algorithm seed(1) dataset = [[2.7810836,2.550537003,0], [1.465489372,2.362125076,0], [3.396561688,4.400293529,0], [1.38807019,1.850220317,0], [3.06407232,3.005305973,0], [7.627531214,2.759262235,1], [5.332441248,2.088626775,1], [6.922596716,1.77106367,1], [8.675418651,-0.242068655,1], [7.673756466,3.508563011,1]] n_inputs = len(dataset[0]) - 1 n_outputs = len(set([row[-1] for row in dataset])) network = initialize_network(n_inputs, 2, n_outputs) train_network(network, dataset, 0.5, 20, n_outputs) for layer in network: print(layer) OUTPUT: >epoch=0, lrate=0.500, error=6.350 >epoch=1, lrate=0.500, error=5.531 >epoch=2, lrate=0.500, error=5.221 >epoch=3, lrate=0.500, error=4.951 >epoch=4, lrate=0.500, error=4.519 >epoch=5, lrate=0.500, error=4.173 >epoch=6, lrate=0.500, error=3.835 >epoch=7, lrate=0.500, error=3.506 >epoch=8, lrate=0.500, error=3.192 >epoch=9, lrate=0.500, error=2.898 >epoch=10, lrate=0.500, error=2.626 >epoch=11, lrate=0.500, error=2.377 >epoch=12, lrate=0.500, error=2.153 >epoch=13, lrate=0.500, error=1.953 >epoch=14, lrate=0.500, error=1.774 >epoch=15, lrate=0.500, error=1.614 >epoch=16, lrate=0.500, error=1.472 >epoch=17, lrate=0.500, error=1.346 >epoch=18, lrate=0.500, error=1.233 >epoch=19, lrate=0.500, error=1.132 [{'weights': [-1.4688375095432327, 1.850887325439514, 1.0858178629550297], 'output':

0.029980305604426185, 'delta': -0.0059546604162323625}, {'weights': [0.37711098142462157, -0.0625909894552989, 0.2765123702642716], 'output': 0.9456229000211323, 'delta': 0.0026279652850863837}] [{'weights': [2.515394649397849, -0.3391927502445985, -0.9671565426390275], 'output': 0.23648...