Microprojects And notes OF subjects From MSBTE Syllabus PDF

Title Microprojects And notes OF subjects From MSBTE Syllabus
Author OK Lomate
Course computer engg
Institution Government Polytechnic, Pune
Pages 13
File Size 430.1 KB
File Type PDF
Total Downloads 23
Total Views 127

Summary

Microprojects And notes OF subjects From MSBTE Syllabus (Maharashtra State Board)
1. PWP Subject Notes
2.Management Subject Microproject
3.ETI Subject Microproject...


Description

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION

The Shetkari Shikshan Mandal’s BHIVARABAI SAWANT COLLEGE OF ENGINEERING & RESEARCH POLYTECHNIC Academic year: 2020-21

MICRO PROJECT ON

Emerging Trends in Computer and Information Technology (22618)

Program: Computer

Program code: CO6I

Course: ETI

Course code: 22618

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION Certificate This is to certify that Mr. Omkar Lomate Roll No. 31 of VI Semester of Diploma in Computer Engineering of Institute, BSCOER POLYTECHNIC (Code: 1606) has completed the Micro Project satisfactorily in Subject – ETI(22618) for the academic year 2020 – 2021 as prescribed in the curriculum.

Place: NARHE Date:

Subject Teacher

Enrollment No: 1916060189 Exam Seat No:

Head of the Department

Principal

INDEX SR.N O.

CONTENT

PAGE NO.

1

Abstract

4

2

Introduction

5

3

Types Of Data Mining

6-7

4

Data Mining Techniques

8

5

Data Mining on Credit Card Fraud Detection

9-12

6

Weekly Progress Report

13

7

Aneexure II

14

Abstract Data mining refers to extracting or “mining” knowledge from large amount of data. There are a number of data mining techniques like clustering, neural networks, regression, multiple predictive models. Here, we discuss only few techniques of data mining which would be considered important to handle fraud detection. Fraudulent electronic transactions are already a significant problem, one that will grow in importance as the number of access points in the nation’s financial information system grows. Financial institutions today typically develop custom fraud detection systems targeted to their own asset bases. Most of these systems employ some machine learning and statistical analysis algorithms to produce patterndirected inference. They use models of anomalous or errant transaction behaviours to forewarn of impending threats. These algorithms require analysis of large and inherently distributed databases of information about transaction behaviors to produce models of “probably fraudulent” transactions. Recently banks have come to realize that a unified, global approach is required, involving the periodic sharing of information about attacks with each other. Such information sharing is the basis of building a global fraud detection infrastructure where local detection systems propagate attack information to each other, thus preventing intruders from disabling the global financial network. Credit card transactions continue to grow in number, taking an ever-larger share of the US payment system and leading to a higher rate of stolen account numbers and subsequent losses by banks. Improved fraud detection thus has become essential to maintain the viability of the US payment system. Banks have used early fraud warning systems for some years. Large scale data-mining techniques can improve the state of the art in commercial practice. Scalable techniques to analyze massive amounts of transaction data that efficiently compute fraud detectors in a timely manner is an important problem, especially for e-commerce. Besides scalability and efficiency, the fraud-detection task exhibits technical problems that include skewed distributions of training data and nonuniform cost per error, both of which have not been widely studied in the knowledge-discovery and data mining community. In this article, we survey and evaluate a number of techniques that address these three main issues concurrently.

Introduction The first use of Data Mining comes from service providers in the mobile phone and utilities industries. Mobile phone and utilities companies use Data Mining and Business Intelligence to predict ‘churn’, the terms they use for when a customer leaves their company to get their phone/gas/broadband from another provider. They collate billing information, customer services interactions, website visits and other metrics to give each customer a probability score, then target offers and incentives to customers whom they perceive to be at a higher risk of churning. Retailers segment customers into ‘Recency, Frequency, Monetary’ (RFM) groups and target marketing and promotions to those different groups. A customer who spends little but often and last did so recently will be handled differently to a customer who spent big but only once, and also some time ago. The former may receive a loyalty, upsell and cross-sell offers, whereas the latter may be offered a win-back deal, for instance.

Applications • • • • • • • •

Future Healthcare. Market Basket Analysis Manufacturing Engineering CRM Fraud Detection Intrusion Detection Customer Segmentation Financial Banking.

• • •

Lie Detection Corporate surveillance Criminal Investigation

Types of data mining 1. Data stored in the database A database is also called a database management system or DBMS. Every DBMS stores data that are related to each other in a way or the other. It also has a set of software programs that are used to manage data and provide easy access to it. These software programs serve a lot of purposes, including defining structure for database, making sure that the stored information remains secured and consistent, and managing different types of data access, such as shared, distributed, and concurrent. A relational database has tables that have different names, attributes, and can store rows or records of large data sets. Every record stored in a table has a unique key. Entity-relationship model is created to provide a representation of a relational database that features entities and the relationships that exist between them. 2. Data warehouse A data warehouse is a single data storage location that collects data from multiple sources and then stores it in the form of a unified plan. When data is stored in a data warehouse, it undergoes cleaning, integration, loading, and refreshing. Data stored in a data warehouse is organized in several parts. If you want information on data that was stored 6 or 12 months back, you will get it in the form of a summary. 3. Transactional data Transactional database stores record that are captured as transactions. These transactions include flight booking, customer purchase, click on a website, and others. Every transaction record has a unique ID. It also lists all those items that made it a transaction. 4. Other types of data We have a lot of other types of data as well that are known for their structure, semantic meanings, and versatility. They are used in a lot of applications. Here are a few of those data types: data streams, engineering design data, sequence data, graph data, spatial data, multimedia data, and more

Data Mining Techniques 1. Association It is one of the most used data mining techniques out of all the others. In this technique, a transaction and the relationship between its items are used to identify a pattern. This is the reason this technique is also referred to as a relation technique. It is used to conduct market basket analysis, which is done to find out all those products that customers buy together on a regular basis. This technique is very helpful for retailers who can use it to study the buying habits of different customers. Retailers can study sales data of the past and then lookout for products that customers buy together. Then they can put those products in close proximity of each other in their retail stores to help customers save their time and to increase their sales. 2. Clustering This technique creates meaningful object clusters that share the same characteristics. People often confuse it with classification, but if they properly understand how both these techniques work, they won’t have any issue. Unlike classification that puts objects into predefined classes, clustering puts objects in classes that are defined by it. Let us take an example. A library is full of books on different topics. Now the challenge is to organize those books in a way that readers don’t have any problem in finding out books on a particular topic. We can use clustering to keep books with similarities in one shelf and then give those shelves a meaningful name. Readers looking for books on a particular topic can go straight to that shelf. They won’t be required to roam the entire library to find their book. 3. Classification This technique finds its origins in machine learning. It classifies items or variables in a data set into predefined groups or classes. It uses linear programming, statistics, decision trees, and artificial neural network in data mining, amongst other techniques. Classification is used to develop software that can be modelled in a way that it becomes capable of classifying items in a data set into different classes. For instance, we can use it to classify all the candidates who attended an interview into two groups – the first group is the list of those candidates who were selected and the second is the list that features candidates that were rejected. Data mining software can be used to perform this classification job. 4. Prediction This technique predicts the relationship that exists between independent and dependent variables as well as independent variables alone. It can be used to predict future profit depending on the sale. Let us assume that profit and sale are dependent and independent variables, respectively. Now, based on what the past sales data says, we can make a profit prediction of the future using a regression curve. 5. Sequential patterns This technique aims to use transaction data, and then identify similar trends, patterns, and events in it over a period of time. The historical sales data can be used to discover items that buyers bought together at different times of the year. Business can make sense of this information by recommending customers to buy those products at times when the historical data doesn’t suggest they would. Businesses can use lucrative deals and discounts to push through this recommendation.

Data Mining on Credit Card Fraud Detection How its used in Credit Card Fraud Detection ? This system implements the supervised anomaly detection algorithm of Data mining to detect fraud in a real time transaction on the internet, and thereby classifying the transaction as legitimate, suspicious fraud and illegitimate transaction. The anomaly detection algorithm is designed on the Neural Networks which implements the working principal of the human brain (as we humans learns from past experience and then make our present day decisions on what we have learned from our past experience). Data mining techniques for fraud detection The most cost effective approach for fraud detection is to “tease out possible evidences of fraud from the available data using mathematical algorithms”. Data mining techniques, which make use of advanced statis-tical methods, are divided in two main approaches: supervised and unsupervised methods. Both of these approaches are based on training an algorithm with a record of observations from the past. Supervised methods require that each115of those observations used for learning has a label about which class it belongs to. In the context of fraud detection, this means that for each observation we know if it belongs to the class “fraudulent” or to the class “legitimate”. Often we do not know which class an observation belongs to. For example, take the case of an online order whose payment was rejected. One will never know whether this was a legitimate order or whether it had been correctly rejected. Such occurrences favour the use of unsupervised methods, which do not require data to be labelled. These methods look120for extreme data occurrences or outliers. In order to get the best of two worlds, some solutions combine supervised and unsupervised techniques. A few authors have studied unsupervised methods for fraud detection, explored the use of graph analysis for fraud detection in a telecommunications setting proposed a mixed approach with the use of a self-organising map which feeds a Neural Network if a transaction does not fall into an identified normal behaviour for the given cus125tomer. compared supervised and unsupervised Neural Networks. According to their experiment the unsupervised method performed far below the supervised one. Supervised methods have dominated the fraud detection literature. In general, the emphasis of research in the late 90s and early 2000s was on Neural Networks. proposed the use of a Neural Network for fraud detection at a commercial bank. studied the use of a profiling approach to telecommunications fraud. discussed the combi-130nation of multiple classifiers in an attempt to create scalable systems which would be able to deal with large volumes of data. More recently, some other works have been published, making use of newer classification techniques. built a model based on a Hidden Markov Model, with focus on fraud detection for creditcard issuing banks. also worked on credit-card fraud detection with data from a bank, in particular addressing the way of pre-processing the data. They studied the use of aggregation of transactions when using Random Forests, Support Vector Machines, Lo-135gistic Regression and K-Nearest Neighbour techniques. compared the performance of Random Forests, Support Vector Machines and Logistic Regression for detecting fraud of credit-card transactions in an international financial. The pinpoint two criticisms to the data mining studies of fraud detection: the lack of publicly available data and the lack of published literature on the topic. Most literature on credit-card fraud detection has focused on classification140models with data from banks. Such data invariably consists of transaction registries, where it is possible to find fraud evidence such as “collision” or “high velocity” events, i.e. transactions happening at the same time in different locations. Some authors have also addressed the techniques for finding the best derived features. proved that transaction aggregation improved performance in some situations, with the aggregation period being an important parameter. However,

none of these particularities seems to apply to a case of detecting fraud with data from one145single merchant as in our case. In this study, we chose to use methods of supervised learning for the classification problem, because it is common for fraud detection applications to have labelled data for training. We chose to test three different models. Logistic regression because of its popularity, and Random Forests and Support Vector Machines, which have been used in a variety of applications showing superior performance, showed that Support Vector Machines perform well150in classification problems. claim that Random Forests are very attractive for fraud detection due to the ease of application and being computationally efficient

WEEKLY PROGRESS REPORT MICRO PROJECT SR.NO.

WEEK

1

1st

2

2nd

3

3rd

Preparation and submission of Abstract Literature Review

4

4th

Collection of Data

5

5th

Discussion and outline of Content

6

6th

Formulation of Content

7

7th

8

8th

Editing and proof Reading of Content Compilation of Report And Presentation Seminar

9

9th

ACTIVITY PERFORMED Discussion and finalization of topic

10

10th

Viva voce

11

11th

Final submission of Micro Project

Sign of the student

SIGN OF GUIDE

DATE

Sign of the faculty

ANEEXURE II Evaluation Sheet for the Micro Project Academic Year: 2020-21 Name of the Faculty: Prof. WALHEKAR P.D Course: ETI Course code: 22618 Semester: VI Title of the project: Emerging Trends in Computer and Information Technology (22618) COs addressed by the Micro Project:

CO1:

Describe Artificial Intelligence, Machine learning and deep learning

CO2:

Interpret IoT concepts

CO3:

Compare Models of Digital Forensic Investigation

CO4:

Describe Evidence Handling procedures

CO5:

Describe Ethical Hacking process

CO6:

Detect Network, Operating System and applications vulnerabilities

Major learning outcomes achieved by student by doing project (a)Practical outcomes: PO1.Basic and Discipline specific knowledge: Apply knowledge of basic mathematics, science and engineering fundamentals and engineering specialization to solve the engineering problems. PO2.Problem analysis: Identify and analyse well-defined engineering problems using codified standard methods. PO3.Design/ development of solutions: Design solutions for well-defined technical problems and assist with the design of systems components or processes to meet specified needs. PO4.Engineering Tools, Experimentation and Testing: Apply modern engineering tools and appropriate technique to conduct standard tests and measurements. PO5.Engineering practices for society, sustainability and environment: Apply appropriate technology in context of society, sustainability, environment and ethical practices. PO6.Project Management: Use engineering management principles individually, as a team member or a leader to manage projects and effectively communicate about welldefined engineering activities.

PO7.Life-long learning: Ability to analyse individual needs and engage in updating in the context of technological changes. (b)Unit outcomes in Cognitive domain: 2a. State the domains and application areas of embedded systems 2b. Describe IoT systems in which information and knowledge are inferred from data

POs Addressed:

PO1, PO2, PO3, PO7

Roll No

Student Name

31

Omkar Lomate

Prof. WALHEKAR P.D

Marks out of 6 for performance in group activity (D5 Col.8)

Marks out of 4 for performance in oral/ presentation (D5 Col.9)

Total out of 10

(Signature of Faculty)...


Similar Free PDFs