CMT423 Assignment 2 - Business model using data analytics PDF

Title CMT423 Assignment 2 - Business model using data analytics
Author Wei Yew Huong
Course Sistem Sokongan Keputusan & Kecerdasan Perniagaan
Institution Universiti Sains Malaysia
Pages 14
File Size 264.6 KB
File Type PDF
Total Downloads 660
Total Views 690


SCHOOL OF COMPUTER SCIENCESSemester 2, 2019/CMT423 Decision Support Systems and BusinessIntelligenceAssignment 2Lecturer: Dr. Noor Farizah IbrahimGroup MembersOoi Tiat Han 132980Huong Wei-Yew 132915Submission date: 22 May 2020Table of ContentsIntroduction to the organization............................



Semester 2, 2019/2020

CMT423 Decision Support Systems and Business Intelligence

Assignment 2

Lecturer: Dr. Noor Farizah Ibrahim Group Members

Ooi Tiat Han 132980 Huong Wei-Yew 132915 Submission date: 22 May 2020

Table of Contents Introduction to the organization....................................................................................................3 Business objectives.........................................................................................................................................3 Business problems (motivation).....................................................................................................................3

2. Overview of the dataset and list all the information.................................................................4 3. Justification on the choice of dataset. Hint: It should relate to your goal/objectives.............4 4. Conceptual framework - Presenting a conceptual framework of your problem solving. How will it influence the organization?..................................................................................................4 5. Discussion - Discuss the suggested predictive analytics techniques and how the techniques can help to solve your business problems. Relate to your choice of dataset and business problems...........................................................................................................................................4 6. Business model e.g. graph/flowchart..........................................................................................4 7. Conclusion................................................................................................................................... 4 8. References....................................................................................................................................4

Introduction to the organization XYZ Analytics is the Airbnb management service company for vacation rentals. We are headquartered in Seattle. Driven by big-data machine learning and predictive analytics, XYZ Analytics has helped over 10,000 listings in Seattle.

Here at XYZ Analytics, our mission is simple: aggregate and catalog every room on the planet and use that data to provide actionable, in-depth intelligence about the expanding vacation rental landscape.

With clients involved in tourism, property management, real estate investment, our line of products provides data-based insights and serve a breadth of industry needs. Leveraging savvy, real-time reporting allows our clients to understand any rental market, make informed decisions, and outperform the competition.

Business objectives We are Airbnb management service company at Airbnb Seattle. Our job is to observe local developments and gain strategic discernments on the customer experience of Airbnb platform in this city. The purpose is to find ways to improve the customer experience. It may not be compulsory to enhance the customer experience in the entire city of Seattle, but improving it through promotions, offers or implementing new functions is the main objective. It is also worth noting that the "customer" refers to anyone using the platform, which can either be of guests or hosts. It is important to remember that both groups should be having an overall positive experience of the Airbnb platform in order for the platform to thrive. This project is to focus on finding ways that hosts can improve the guests' experiences.

Business problems (motivation) The business environment is that Airbnb would like to understand more about the experience that guests have in the city of Seattle. Furthering that, finding any way to associate the experience to things that hosts can change would be advantageous to answering our main business goal. A large part of this would be to find a good way to quantify customer experience or customer satisfaction. Since Airbnb is a platform where hosts can post listings which guests then visit, I would like to take the "customer" as the guest, and their experience would be related to how well they quantize the experience using the provided rating mechanisms in the website or app. Any customer experience metric derived should take this rating mechanism into account.

The aim is to see what Airbnb hosts can do to most likely improve the guest experience on the Airbnb platform. We can extend our understanding of how customer experience is linked to listing parameters, to understand what Airbnb hosts can easily do to improve the guests' experience. To summarise, translating our business goals into data mining goals, we have: 1. Can we find an appropriate parameters for understanding the quality of guests' experience? 2. How can we link listing attributes with the customer experiences? 3. What aspects of the listings can hosts easily alter to significantly boost the guest's experience?

2. Overview of the dataset and list all the information The dataset contains three csv files which are: 

Listings, including full descriptions and average review score

Reviews, including unique id for each reviewer and detailed comments

Calendar, including listing id and the price and availability for that day

This dataset is collected using Python script written by Tom Slee. The script scrapes the Airbnb web site to collect data about the shape of the company's business. The data is then posted at Inside Airbnb website to download publicly in csv files. The size of dataset ranged from 3819 to more than 104 thousand records.

Here are the columns for the 3 files in the dataset: review.csv listing_id id date reviewer_id reviewer_name comments calendar.csv listing_id date available price

listing.csv id




























































































3. Justification on the choice of dataset. Hint: It should relate to your goal/objectives. This dataset is used because it is free and public, accessible by everyone. This is suitable for us because all the data are not easily manipulated or compromised by other unethical competitors. The analysis that we can probably run through on the dataset can be knowing: 

Can you describe the vibe of each Seattle neighbourhood using listing descriptions?

What are the busiest times of the year to visit Seattle? By how much do prices spike?

Is there a general upward trend of both new Airbnb listings and total Airbnb visitors to Seattle?

4. Conceptual framework - Presenting a conceptual framework of your problem solving. How will it influence the organization? The proposed conceptual model consists of three major components which are identify organizational issues, modeling the issues, and decision-making advices. As we are customer experience analyst, we are going to find out the major issues of the Airbnb business, model the issues and provide decision-making advices. Customer experience is very important aspect for Airbnb business as it will affect the Airbnb rating, customer retention and the revenue of the Airbnb. Failure to improve and adapt to the customer needs will economically and strategically affect the Airbnb business and might cause it to close-down. Hence, it is important for the business to consume the data and use it as for strategic decisions.

A. Identify Organizational Issues Airbnb business owner or agent should identify the issues affecting the customer experience and discover the most critical issues that need to be resolved. The issues can be identified by analyzing the data from all the Seattle Airbnb listings. The listings data provide a lot of hidden information which can be obtained by undergoing detailed analysis. To obtain data for the issues, all the Berlin Airbnb listings are gathered. Customer reviews are gathered as well in the listing and it can be analyzed to review the customer satisfaction based on the review scores.

B. Issue Modeling Issue modelling involves the identification of factors that directly or indirectly affect the customer experience. There are three processes in the framework of the problem solving which are Data Extraction, Data Transformation and Data Loading. Data Extraction involves of full extraction of Airbnb listings data into staging area. After extracting the data, the data will be analyzed to understand all the data and the columns available in the data. Useful data features will be selected after understanding the data and the data will then undergo data transformation process which consists of data selection, data matching, data cleansing, data standardization, characters and unit conversion. After the data is transformed, it will be stored in the data warehouse to be used for data analysis. The listings consist of description, neighborhood overview, transit, host name, host response time, bedrooms, bathrooms, price and other aspects. Detailed techniques and methods will be discussed on Discussion section.

C. Decision Making Advices Decision making advices involves analyzing multiple factors or criteria when making decision. The advices summarized set of factors and evaluate the important factors that affect the customer experience. With the analysis, we can suggest solution to Airbnb hosts to improve their services so that customer experience can be improve. This framework is important for our organization as we are an Airbnb management service company that need to manage a lot of Airbnb businesses. Our goal is to get better customer experience and hence increase the sales of Airbnb business. We need analysis to aid us in making decision on what to improve and the trend that customer likes.

5. Discussion - Discuss the suggested predictive analytics techniques and how the techniques can help to solve your business problems. Relate to your choice of dataset and business problems. The reason we choose this dataset is because we are a company that manage Airbnb businesses in Seattle. This dataset provides all the listings of Seattle from date to date which allow us to analyze and understand the performance and customer experiences of the Airbnb businesses. There are 3818 rows and 92 columns in the data. Before we start the predictive analysis, we will go through and understand the data. We will separate the data into multiple feature groups based on columns. The groups are review-based group, property listed group, geolocation group, policies group, descriptive group, host group, availability group and system group. Groups Review-based

Description Consists of columns that relate to customer review such as number of reviews and rating.

Property listed

Consists of columns that relate to the Airbnb property such as property type, room type, bedrooms and bathrooms.


Consists of geolocation information of the property such as city, state, latitude, longitude and neighborhood.


Consists of principles to guides the customer such as price, security deposit, cleaning fee, extra people, minimum nights and maximum nights.


Consists of columns that describe about the property such as summary, description and experiences offered.


Consists of information about the host such as host name, host location, host profile picture, host acceptance rate and host response time.


Consists of system information such as listing URL, host URL and pictures information.

Data transformation will be done after understanding the data. Columns with null value greater than half will be excluded from the analysis as it is not useful to be analyzed. The data will be cleaned by standardizing the values and the data types. For example, dollar sign of price will be removed, and the data type will be changed to float. Outliers that are at the edge will also be removed as it is unusual record and doesn’t help much in the analysis.

To find the appropriate metric to understand the quality of customer’ stay at the Airbnb, the data of each columns will be analyzed and the response variables to calculate customer experience score will be chosen. By looking at the histogram of the number of reviews and reviews per month, we can see that the distribution of number of reviews is small compared to reviews per month which is more uniformly distributed. Hence, the reviews per month will be a better variable as a listing can have higher number of reviews just by being active for a long period of time. To select an optimal review scores, all the review scores are put into a correlation matrix of heatmap. From the heatmap, we can see that most of the columns correlate strongly with review scores rating. Hence, review scores rating is chosen as the second variable to calculate the customer experience score. The rating review scores and review per month are chosen as the response variables as rating review scores reflects on the quality of customer’s stay and reviews per month represents the number of customer experiences. A new column, customer experience score is added by multiplying the rate of rating review score with review per month.

Customer experience scores can be used in correlation matrix to find out the important features. Correlation matrix is used to summarize the large amount of data to see the patterns. All the features are correlated to customer experience score to find out the variables that highly correlate with it. The features will be correlated, and bar chart will be plotted to visualize the score of each feature. The features with positive and high magnitude are positive features that customer enjoyed while the features with negative magnitude are negative features that customer dislikes and have negative impact. By running through the data analysis, we find that the top four negative correlating features are days as host, host response time, consist of weekly price and consist of monthly price. The top four positive correlating features are breakfast amenities, instant bookable, ratio of security deposit to the price and whether the host is in the neighborhood. Further analysis is done by using the features with correlation magnitude more than 0.45. Multiple regression models will be tested to find out the suitable model that can fit the data optimally. We will try with Ordinary Least

Squares regression model and Ridge Regression model using 6-fold cross-validation. Ordinary Least Squares regression is an analysis that estimates the relationship of independent variables with dependent variable (Ordinary Least Squares Regression 2020). The strategy appraises the relationship by limiting the aggregate of the squares in the distinction between the watched and anticipated estimations of the reliant variable designed as a straight line. Ridge Regression is a method for breaking down various relapse information that experience the ill effects of multicollinearity (Stephanie, 2018). When multicollinearity happens, least squares gauges are impartial, yet their differences are enormous so they might be a long way from the genuine worth. By adding a level of predisposition to the relapse gauges, Ridge Regression decreases the standard mistakes. It is trusted that the net impact will be to give appraises that are progressively solid.


comparing the root mean square error (RMSE) of training and test sets, if the different between the RMSE is small, the model fits the data better. From the analysis, we found that Ridge Regression performs better than Ordinary Least Squares regression in terms of RMSE. Hence, using Ridge Regression, we can obtain the feature importance which can answer our business problem of what aspects the hosts can alter to boost the guest’s experience. After going through the regression, we found the top 5 positive coefficients are extra people ratio, host verified on Facebook, room type of entire home/apartment, instant bookable and amenities of breakfast. In short, we can answer our problem of finding appropriate metric for understanding the quality of customer’s stay through the analysis that determine the customer experience score. Reviews per month and review scores rating are used to obtain the customer experience score. Furthermore, by separating the data into different feature groups, we were able to understand and find out important information easily in the process of feature engineering. The datasets were used with the customer experience scores to link the attributes with quality of customer’s stay. The important aspects that Airbnb owner can change to improve the customer’s experience are be verified on Facebook, enable instant bookable features, respond to queries of customers faster and provide breakfast to Airbnb customer.

6. Business model e.g. graph/flowchart The business model that our company works on is the B2B business model. We are the strategic partner for Airbnb. We provide predictive analytical services to help Airbnb gets some useful insights to help them to enhance their services and products, in return we get paid by Airbnb, thus creating value on both sides.

Our solution is unique because we have multi-disciplinary specialist from real-estate industry, tourism industry, interior designer, data science to give the biggest output from the analysis of Airbnb’s historical data. Our experts will be able to offer custom-made solutions up to suggesting way...

Similar Free PDFs