Mid term project description for IS665 PDF

Title Mid term project description for IS665
Author Nikhil Teja Vedagiri
Course Data Analytics For Info System
Institution New Jersey Institute of Technology
Pages 3
File Size 98.5 KB
File Type PDF
Total Downloads 38
Total Views 126

Summary

do the project on the data analytics and building a model using the macine learning concept and a. Please form a team with 1 (min) - 3 (max) students. Please pick your teammates wisely. I do not allow “divorce or break-up” once a team is formed.
b. Each team must register your team info by sen...


Description

Mid-Term Project MUST-read Guidelines: ● Teams: a. Please form a team with 1 (min) - 3 (max) students. Please pick your teammates wisely. I do not allow “divorce or break-up” once a team is formed. b. Each team must register your team info by sending an email to OUR TA before 11/3. One student from a team acts as the representative to send in the email. The team info should include your teammates’ names and email addresses. ● Submission a. Each team must select ONLY ONE student to make the project submission. A 10% off penalty will be applied on duplicate submissions from the same team. b. The submission should be named in the following format: group_x (x is the group id that will be published after the team registration) c. Each submission must contain: a three-page report, notebook file with source code and comments for explaining the code, and a readme file with your teammate info. The requirements for the report are listed in the later sections. ● Grading a. Every team member gets the same score and the grading is solely based on the quality of the submitted project

Project Requirements

1. Goal: Predicting the probability that an online transaction is fraudulent, as denoted by the binary target isFraud. 2. Data: The data is broken into two files identity and transaction, which are joined by TransactionID. Not all transactions have corresponding identity information a. Files ■ train_{transaction, identity}.csv - the training set ■ test_{transaction, identity}.csv - the test set (you must predict the isFraud value for these observations)

3. Detained data description: In the Transaction Table *

● ● ● ● ● ● ● ● ● ● ●

TransactionDT: timedelta from a given reference datetime (not an actual timestamp) TransactionAMT: transaction payment amount in USD ProductCD: product code, the product for each transaction card1 - card6: payment card information, such as card type, card category, issue bank, country, etc. addr: address dist: distance P_ and (R__) emaildomain: purchaser and recipient email domain C1-C14: counting, such as how many addresses are found to be associated with the payment card, etc. The actual meaning is masked. D1-D15: timedelta, such as days between the previous transaction, etc. M1-M9: match, such as names on card and address, etc. Vxxx: Vesta engineered rich features, including ranking, counting, and other entity relations.

Categorical Features: ProductCD card1 - card6 addr1, addr2 P_emaildomain R_emaildomain M1 - M9 In the Identity Table *

Variables in this table are identity information – network connection information (IP, ISP, Proxy, etc) and digital signature (UA/browser/os/version, etc) associated with transactions. They're collected by Vesta’s fraud protection system and digital security partners. (The field names are masked and pairwise dictionary will not be provided for privacy protection and contract agreement) Categorical Features: DeviceType, DeviceInfo, id_12 - id_38

4. Requirements: Please use the existing features in the dataset and/or create new

features on top of the existing ones to predict the probability that an online transaction is fraudulent. a. Your approach should achieve a high accuracy or AUC (https://scikitlearn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html), at least above 0.80. The higher the better b. To achieve the above score, you must perform properly the following: EDA (exploratory data analysis), feature engineering, feature selection, model selection 5. Submission: a. Please finish the analysis in a notebook and put concrete explanations for your code (every method you use and their parameters). The explanation is very important to evaluate your understanding of your code and help us identify that the code was not copied and pasted from somewhere else and the authors do not even understand it. Failing to provide concrete explanations will lose a significant amount of credits. b. Write a short 3 page (max) single spacing 12 point font report to explain your choices of EDA, feature engineering/selection, and model selection with parameters. For example, if you decide to remove some features, you need to explain why in your report. Failing to provide the report will lose a significant amount of credits....


Similar Free PDFs