LP-II Lab Manual Final it includes all the assignment PDF

Title	LP-II Lab Manual Final it includes all the assignment
Author	Anonymous User
Course	Computer Engineering
Institution	Savitribai Phule Pune University
Pages	84
File Size	5.6 MB
File Type	PDF
Total Downloads	31
Total Views	70

Preview

CLICK TO PREVIEW PDF

Summary

Department of Computer EngineeringLaboratory Practice II(410247)BE COMPUTERSemester IIAcademic Year 2018- 19Lab ManualVisionMissionProgram Education Objectives(PEOs)Program Specific Outcomes (PSOs)To create an Engineer, receptive to the changing demands of the globalmarket To provide technically co...

Description

Department of Computer Engineering

Lab Manual Laboratory Practice II (410247)

BE COMPUTER Semester II Academic Year 2018-19

Vision

To create an Engineer, receptive to the changing demands of the global market Mission  

To provide technically competent professionals in service to Nation. To prepare graduates to respond to the needs of dynamically changing technology. Program Education Objectives(PEOs)

 



PEO1: To prepare graduates to work productively as successful Computer professionals. PEO2: To prepare students with latest skills in the field of technologies supplemented with practical orientation to face challenges of modern computing industry. PEO3: To provide environment that fosters professional growth, communication skill, team work, life-long learning skill and ability to create awareness in society about applications of technology. Program Specific Outcomes (PSOs)

PSO1 Problem Solving and Programming Skills: Graduates will be able to apply computational techniques and complete individual practical experiences in a variety of programming languages and situations. PSO2 Professional Skills: Graduates will be able to design and develop efficient and effective software by following standard software engineering principles. PSO3 Successful Career: Graduates will be able to become entrepreneur and to pursue higher studies / career in IT industries.

Program Outcomes (POs) Graduates will be able to 1. Apply the knowledge of mathematics, science, engineering fundamentals, and an engineering specialization to the solution of complex engineering problems. [Engineering knowledge] 2. Identify, formulate, research literature, and analyse complex engineering problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and engineering sciences. [Problem analysis] 3. Design solutions for complex engineering problems and design system components or processes that meet the specified needs with appropriate consideration for the public health and safety, and the cultural, societal, and environmental considerations. [Design/development of solutions] 4. Use research-based knowledge and research methods including design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions. [Conduct investigations of complex problems] 5. Create, select, and apply appropriate techniques, resources, and modern engineering and IT tools including prediction and modelling to complex engineering activities with an understanding of the limitations. [Modern tool usage] 6. Apply reasoning informed by the contextual knowledge to assess societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional engineering practice. [The engineer and society] 7. Understand the impact of the professional engineering solutions in societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable development. [Environment and sustainability] 8. Apply ethical principles and commit to professional ethics and responsibilities and norms of the engineering practice. [Ethics] 9. Function effectively as an individual, and as a member or leader in diverse teams, and in multidisciplinary settings. [Individual and team work] 10. Communicate effectively on complex engineering activities with the engineering community and with society at large, such as, being able to comprehend and write effective reports and design documentation, make effective presentations, and give and receive clear instructions. [Communication] 11. Demonstrate knowledge and understanding of the engineering and management principles and apply these to one’s own work, as a member and leader in a team, to manage projects and in multidisciplinary environments. [Project management and finance] 12. Recognize the need for, and have the preparation and ability to engage in independent and life-long learning in the broadest context of technological change. [Life-long learning]

Index Sr. No. 1

2

Title of Assignment Bridge the Gap: Introduction to Rapid Mine For an organization of your choice, choose a set of

CO

PO

PSO

C404d.5

1,5

1

Page No. 1

C404d.1

1,2, 3,5, 8

1,2

4

C404d.2

1,2, 3,5, 8

1,2

19

C404d.4

1,2, 3,5, 8

1,2

27

C404d.5

1,2, 3,5, 8

1,2

34

1,2, 3,5, 89, 10, 11, 12

1,2

41

1,2,3,5,8 C405b.2 ,10,11,1 1,2 C405b.3 2

47

business processes. Design star / snow flake schemas for analyzing these processes. Create a fact constellation schema by combining them. Extract data from different data sources, apply suitable transformations and load into destination tables using an ETL tool. For Example: Business Origination: Sales, Order, Marketing Process. Consider a suitable dataset. For clustering of data

3

instances in different groups, apply different clustering techniques (minimum 2). Visualize the clusters using suitable tool. Apply a-priori algorithm to find frequently occurring

4

items from given data and generate strong association rules using support and confidence thresholds. For Example: Market Basket Analysis Consider a suitable text dataset. Remove stop words,

5

apply stemming and feature selection techniques to represent documents as vectors. Classify documents and evaluate precision, recall. Mini project on classification: Consider a labeled

6

dataset belonging to an application domain. Apply suitable data preprocessing steps such as handling of null values, data reduction, and discretization. For prediction of class labels of given data instances, build classifier models using different techniques (minimum 3), analyze the confusion matrix and compare these

C404d.1 C404d.2 C404d.3 C404d.5

models. Also apply cross validation while preparing the training and testing datasets. For Example: Health Care Domain for predicting disease . Mini-Project 1: Create a small application by selecting 7

relevant

system

environment

/

platform

and

programming languages. Narrate concise Test Plan

consisting features to be tested and bug taxonomy. Prepare Test Cases inclusive of Test Procedures for identified Test Scenarios. Perform selective Black-box and White-box testing covering Unit and Integration test by using suitable Testing tools. Prepare Test Reports based on Test Pass/Fail Criteria and judge the acceptance of application developed. 8

Mini-Project 2: Create a small web-based application by selecting relevant system environment / platform and programming languages. Narrate concise Test Plan consisting features to be tested and bug

C405b.2 1,2,3,5,8 C405b.3C ,10,11,1 1,2 taxonomy. Narrate scripts in order to perform 405b.4 2 regression tests. Identify the bugs using Selenium

51

WebDriver and IDE and generate test reports encompassing exploratory testing. 9

Content Beyond syllabus: Introduction to Orange Tool(Alternate tool for Rapid Miner Tool)

C404d.5

1,3,5

1

74

Bridge the Gap Handling .CSV files in Python and Rstudio Getting your data in R Studio There are numerous ways to get data into R. Here, I will go over getting a .csv into RStudio. If you have RStudio on your own computer, skip straight to step 2. Step 1: Get your .csv into your ONID account

Open up RStudio, in the Files tab, click Upload, and choose your csv file. Step 2: Load your data into RStudio In RStudio, click on the Workspace tab, and then on “Import Dataset” -> “From text file”. A file browser will open up, locate the .csv file and click Open. You’ll see a dialog that gives you a few options on the import. Of particular importance is making sure if you have column names in your file, that Header is set to “Yes”. The column names should be bolded in the Data Frame box. Click “Import”. RStudio will now run some R code to import your data. For this fake file, : homework_5 d hc plot(hc) # plot the dendrogram

Conclusion With the help such Tools we can visualize the graph of clustering on Sample Data sets and can perform analysis. Assignment Question 1. What is difference between Supervised and Unsupervised Learning? 2. What are different similarities between Kmean and KNN Algorithm? 3. What is Euclidean distance? Explain with Suitable example? 25

4. What is hamming distance? Explain with Suitable example? 5. What is Chi Squre Distance? Explain with Suitable example? 6. What are different types of Clustering? 7. What is Weka Tool? Explain the Step to Perform Clustering on Sample data set? References 1 www.r-tutor.com/gpu-computing/clustering/hierarchical-cluster-analysis 2 http://www.rdatamining.com/examples/kmeans-clustering 3 http://www.r-statistics.com/2013/08/k-means-clustering-from-r-in-action

26

ASSIGNMENT NO. 4 Title Apply a-priori algorithm to find frequently occurring items from given data and generate strong association rules using support and confidence thresholds. For Example: Market Basket Analysis Problem Definition: Market Basket Analysis Prerequisite: Basic Concepts of ETL Software Requirements: Rapid Miner Hardware Requirement: PIV, 2GB RAM, 500 GB HDD, Lenovo A13-4089Model. Learning Objectives: Model associations between products by determining sets of items frequently purchased together and building association rules to derive recommendations. Outcomes: Create association rules which can be used for product recommendations depending on the confidences of the rules Theory Concepts: Association rule for mining: • Proposed by R Agrawal and R Srikant in 1994. • It is an important data mining model studied extensively by the database and data mining community. • Assume all data are categorical. • Initially used for Market Basket Analysis to find how items purchased by customers are related. The Apriori algorithm: • The best known algorithm • Two steps: – Find all item sets that have minimum support (frequent item sets, also called large item sets). 27

– It Create Association rule with support and Confidence. – E.g. if we buy tooth brush: it suggests Colgate and tongue cleaner Data Set

Table 1:Data Set Given: Minimum Support=60% Minimum Confidence=80% Candidate Table C1: Now find support count of each item set

Table 2: Candidate Table C1

28

Now find out minimum Support • Support = 60/100*5 =3 • Where 5 is Number of entry • Compare Min Support with each item set L1 Support Count

Table 3: L1 Support Count Candidate Table C2:

Table 4: Candidate Table C2

29

Now again Compare C2 with Min Support 3 L2 Support Count

Table 5: L2 Support Count After satisfied minimum support criteria Make Pair to generate C3 Candidate Table C3

Table 6: Candidate Table C3

30

L3 Support Count Now again compare the item set with min support 3

Table 7: L3 Support Count Now create association rule with support and Confidence for {O,K,E} Confidence =Support/No. of time it Occurs

Table 8: Association Rule Compare this with Minimum Confidence=80%

Table 9: Support and Confidence

31

Hence final Association rule are {O ^ K ⇒ E} {O ^ E ⇒ K} 1.From first observation we predict that if the customer buy item O and item K then defiantly he will by item E 2.From Second observation we predict that the customer buy item O and item E then defiantly he will by item K Market Basket Analysis using Rapid Miner Rapid Miner is a data science software platform developed by the company of the same name thatprovides an integrated environment for data preparation, machine learning, deep learning, textmining, and predictive analytics. It is used for business and commercial applications as well as forresearch, education, training, rapid prototyping, and application development and supports all steps of themachine learning process including data preparation, results visualization, model validation and optimization Rapid Miner is developed on an open core model. The Rapid Miner Studio Free Edition,which is limited to 1 logical processor and 10,000 data rows, is available under the AGPL license.Commercial pricing starts at $2,500 and is available from the developer. MARKET BASKET ANALYSIS Model associations between products by determining sets of items frequently purchasedtogether and building association rules to derive recommendations.

Figure 1: Market Basket Analysis 32

Figure 2: Frequent Item Sets (FP Growth) Conclusion Thus we learn that to find frequently occurring items from given data and generate strong association rules using support and confidence thresholds using a-priori algorithm.

Assignment Questions 1. Explain Association Rule 2. What is the Application of A-Priori algorithm? 3. What is Market Basket Analysis? Explain with suitable example? References: https://docs.rapidminer.com/downloads/RapidMiner-v6-user-manual.pdf

33

ASSIGNMENT NO. 5 Title: Consider a suitable text dataset. Remove stop words, apply stemming and feature selection techniques to represent documents as vectors. Classify documents and evaluate precision, recall. Problem Definition: Remove stop words Prerequisite: Basic Concepts of ETL Software Requirements: Rapid Miner Hardware Requirement: PIV, 2GB RAM, 500 GB HDD, Lenovo A13-4089Model. Learning Objectives: We are going to learn how to tokenize and filter a document into its different words and then do words count for each word in a text document Outcomes: You are able to see a word list containing all the different words in your document and their occurrence count next to it in the "Total Occurrences" column TheoryConcepts: Text Processing Tutorial with Rapid Miner In this Manual, we are going to learn how to tokenize and filter a document into its different words and then do words count for each word in a text document Open Rapid Miner and click "New Process". On the left hand pane of your screen, there should be a tab that says "Operators"- this is where you can search and find all of the operators for Rapid Miner and its extensions. By searching the Operators tab for "read", you 34

should get an output like this (you can double click on the images below to enlarge them):

Figure1: Searching the Operators tab for "read" There are multiple read operators depending on which file you have, and most of them work the same way. If you scroll down, there is a "Read Documents" operator. Select this operator and enter it into your Main Process window by dragging it. When you select the Read Documents operator in the Main Process window, you should see a file uploader in the right-hand pane.

Figure2: Drag and Drop “Read Documents" operator

35

Select the text file you want to use

Figure3: Select the text file you want to use After you have chosen your file, make sure that the output port on the Read Documents operator is connected to the "res" node in your Main Process. Click the "play" button to check that your file has been received correctly. Switch to the results perspective by clicking the icon that looks like a display chart above the "Process" tab at the top of the Main Process pane. Click the "Document (Read Document)" tab.Your output text should look something like this depending on the file you have chosen to process:

Figure4: Run the Process 36

Now we will move on to processing the document to get a list of its different words and theirindividual count. Search the Operators list for "Process Documents". Drag this operator the same way as you did for the "Read Documents" operator into the main panel.

Figure5: Search the Operators list for "Process Documents" Double click the Process Documents operator to get inside the operator. This is where we will link operators together to take the entire text document and split it down into its word components. This consists of several operators that can be chosen by going into the Operator pane and looking at the Text Processing folder. You should see several more folders such as "Tokenization", "Extraction", "Filtering", "Stemming", "Transformation", and "Utility". These are some of the descriptions of what you can do to your document. The first thing that you would want to do to your document is to tokenize it. Tokenization creates a "bag of words" that are contained in your document. This allows you to do further filtering on your document. Search for the "Tokenize" operator and drag it into the "Process Documents" process.

Figure 6 Search for the "Tokenize" operator 37

Connect the "doc" node of the process to the "doc" input node of the operator if it has not automatically connected already. Now we are ready to filter the bag of words. In "Filtering" folder under the "Text Processing" operator folder, you can see the various filtering methods that you can apply to your process. For this example, I want to filter certain words out of my document that don't really have any meaning to the document itself (such as the words a, and, the, as, of, etc.); therefore, I will drag the "Filter Stop words (English)" into my process because my document is in English. Also, I want to filter out any remaining words that are less than three characters. Select "Filter Tokens by Length" and set your parameters as desired (in this case, I want my min number of characters to be 3, and my max number of characters to be an arbitrarily large number since I don't care about an upper bound). Connect the nodes of each subsequent operator accordingly as in the picture.

Figure 7 Select the operator "Transform Cases" and drag it into the process. After I filtered the bag of words by stop words and length, I want to transform all of my words to lowercase since the same word would be counted differently if it was in uppercase vs. lowercase. Select the operator "Transform Cases" and drag it into the process.

38

Figure.8 Checks all nodes connections and clicks the "Play" button to run process Now that I have the sufficient operators in my process for this example, I check all of my node connections and click the "Play" button to run my process. If all goes well, your output should look like this in the results view:

Figure 9 Output should look like this in the results view

39

Conclusion We are now able to see a word list containing all the different words in your document and their occurrence count next to it in the "Total Occurrences" column. If you do not get this output, make surethat all of your nodes are connected correctly and also to the right type. Some errors are because youroutput at one node does not match the type expected at the input of the next node of an operator.

Assignment Questions 1. What is use of Tokenize operator? 2. What are different modes of Tokenize operator 3. How to use Read Document operator? 4. Why we use Filter token and Filter stop word? 5. How to use Filter Class operator?

References:-https://docs.rapidminer.com/downloads/RapidMiner-v6-user-manual.pdf

40

Data Mining and Warehousing

MINI PROJECT ON CLASSIFICATION No.1 1.1

Title

Consider a labeled dataset belonging to an application domain. Apply suitable data preprocessing steps such as handling of null values, data reduction, discretization. For prediction of class labels of given data instance...