Assingment 4: visualization with tableau public PDF

Title	Assingment 4: visualization with tableau public
Author	Fernando Quijano
Course	Business Intelligence
Institution	Florida State University
Pages	19
File Size	1.7 MB
File Type	PDF
Total Downloads	103
Total Views	153

Preview

CLICK TO PREVIEW PDF

Summary

Assingment 4: visualization with tableau public. Learn to visualize data in a tableau that support your analysis....

Description

ISM3540: BIG Data Lab Session – Text Analytics & Mining

Job Postings Analysis Goal: In this exercise, you will use the text analytics capabilities of SPSS Modeler to analyze job description texts pulled from Monster.com website. An easy way to collect text data from web pages (e.g. HTML based websites) is to use RSS feeds of such pages (when available). RSS (Rich Site Summary; originally RDF Site Summary; often called Really Simple Syndication) is a type of web feed which allows users to access updates to online content in a standardized, computer-readable format. RSS helps standardize the data provided on the page using common metadata (e.g. tags). If you aren’t already in one of the College of Business’ computer labs, go to FSU’s VLab and then click on the Lab machines icon at the top. Once logged onto that machine, open up the application: IBM SPSS Modeler and “create a new stream” and then click “OK”. FYI… A stream is what SPSS calls a data analytics model.

Steps: 1. Many websites such as Monster.com provide RSS feeds for the content that they publish on the web. To access these feeds and import the data to SPSS Modeler, you can use the “Web Feed” node under the “IBM SPSS Text Analytics” tab by dragging it onto your stream (your empty screen). Add this node to your stream now. Right click and select edit and then enter the following URL to the Input area (FYI you can’t copy/paste directly into VLab but you can copy text from your local machine into VLAB by clicking on the

clipboard link in the Black circle menu top and center – once that is open you can paste text into that and then paste into any needed fields in SPSS):

http://rss.jobsearch.monster.com/rssquery.ashx?q=data%analytics 

By entering this URL, we are querying the jobs with the phrase, “data analytics”, at Monster.com and collecting the information provided on job postings. To preview the first 10 results brought back from Monster you can click on the preview button. Normally you would do this to check out the functionality of your RSS feed. There SHOULD be at least one but this is a quick way to see if there is any data residing at that address. If it returns nothing then that RSS feed is either empty or misspelled or not even there. I tested this one and it works though.

2. If you are confident of your output and want to see ALL of the results in the format of a spreadsheet then hit OK on your Web feed module and then add a “Table” output node (located in the Output tab) onto your stream. You can connect them by right-clicking on the Web Feed and selecting “Connect” and then clicking on the table. After connected, your stream should look like this.

3. Run this entire “flow” and then look at the results by clicking on the big green arrow at the top.

4. Note that the execution of the node took me ~50 seconds and depending upon the timing of the day there may be different numbers of “Data Analytics” jobs posted. You might only have 25 as of this morning. If you want to save this output for later in-depth analysis in Excel you can click on the File/Export/Comma Delimited. Question 1: A. Which attributes (columns) are generated after the execution of this node? Holding your mouse over a highlighted chunk of text will reveal that cell’s content. Which attribute provides the most information about the job position? Answer: short description . B. Notice that we can add multiple URLs to the existing Web Feed node. Replace the existing URL with the following 4 so that you can collect information about more job postings in different analytics-related domains. This data extraction will take 2-3 minutes! RSS URLs to pull from Monster.com: http://rss.jobsearch.monster.com/rssquery.ashx?q=data%analytics http://rss.jobsearch.monster.com/rssquery.ashx?q=big%data http://rss.jobsearch.monster.com/rssquery.ashx?q=business%analytics http://rss.jobsearch.monster.com/rssquery.ashx?q=business%intelligence Paste a screenshot of your table’s output here: FYI…I had 84 records as of 10/27 7:00am but yours will vary from hour to hour. C. Since you now know what your data looks like you can remove your table (or at least disconnect it from your Web Feed). As you have undoubtedly learned in my lectures, one of the steps of the data mining process is to transform your data. As you saw in step 1a there are 7 columns of data but we only need two of them. Since our goal is to analyze job descriptions, we can focus our analysis on only pertinent attributes. Find the “Filter” node in the Favorites tab and connect it to the output of your web feed node. Configure it to keep the “Title” and whatever your answer was to Question 1a attributes in the data, and then click on the arrows of the rest to remove them from the analysis. Then click on “OK”.

Finally, add the “Text Mining” node from the “Text Analytics” tab. This takes a 3-4 minute to add so please note all of the linguistics libraries that are being installed. Take special note of the ones for Customer Satisfaction Opinions (English) and Hotel Satisfaction (English) that we will use later. These are all loaded after you drag it onto your stream. Connect the output of the Filter node in the previous step to this new text mining node.

D. Configure this new Text Mining module to use “Short Description” as the text field.

E. Then on the Expert tab set the global frequency count of at least “3”.

Run this stream now to generate the “concepts” (i.e. terms) within the text data and compute the occurrence frequencies of these concepts within the documents. This is your “Term-By-Document matrix” that I discussed in class the other day. G. Note that the execution of this step may take ~3-4 minutes. Once the execution is complete, take a look at the results (Interactive Workbench) and in the lower left Extract pane locate the concept called “data analytics” from the list at the bottom left of the screen. F.

For instance in the following screenshot the concept of “data” was found 54 times on 27 different job postings (documents in your corpus  ).

H. Sort the Concept column by clicking on the word Concept and then find the row “Big data” or “data analysis” in the concept list. This will give you the Term/document matrix discussed in class. How many times were these concepts found in your corpus? How many documents (i.e. job postings) had the word “data” in them? If you want to see the 15 docs (at least that was the answer today when I did the lab) that contain the phrase “Big Data” click on the “Display” tool.

Term Data Data Analytics Big data developer

I.

Docs (Count) 37 13 7

In order to see what concepts are linked to a given term click on the “Map” button on the top right side of the list and then maximize the output concept map on your screen.

Question 2: Replace my sample screenshot below with a screenshot of your output for your map of the “Data Analytics” concept. Which other concept(s) are closely related to “Data Analytics”? Mine looked like this below. The heavier lines meaning the more often those two phrases were matched in the same document. It looks like algorithms are closely aligned with “Big Data” in the following image. Also, note that you can play with the “Map Display Limits” slider bar to alter the number of hits required to have a linkage show up on your visualization. Slide the bar to the left or right until only approximately the top 10 results are shown. For instance, initially there were 17 concepts mapped to the Big Data concept but after

sliding the bar to the right a bit I was able to get the map to only show me the top 10. Notice the purple colored one that is a Location so there must be a significant number of these jobs in Austin. For points for your Lab 4, take a screenshot of your resulting map.

Question 3: Closing this window and getting back to your Interactive Workbench, notice that in the upper left Category window, by default, there is no (or little) categorization of the concepts extracted from the text documents. You can let SPSS Modeler automatically categorize these concepts into higher levels categories by using the “Build” option at the top of the screen. Do this now.

Once the categories are built, find “big data” under the various headings. Since this is the main focus of our analysis, we want to promote this value to its own category. Right click on it and select “Move to Category” then Create New Category” – then name this new category Big Data and then hit OK

Find your new category at the bottom of your list and then click on the “Score” and then the “Display” arrow.

This will generate provide a Category bar, a category web and a category web table of your new “Big Data” category. This shows you which concepts are linked to which concepts and how often those linkages happen in your corpus.

What a mess. You can clean it up by deleting some of the categories that aren’t linked as much by clicking the “show slider” tool and then sliding the bar to the right from 1 to 2 or 3 etc until you get the top few. This eliminates all but the highest-ranking categories. After dragging the minimum up to 2 mine looked like this after cleaning up a little bit. Note that you can drag the categories around to clean up your visualization if you need to. In the lower right corner of the interactive workbench it will also generate a list of all of the documents that contain the phrase “Big Data” if you want to double-check its logic.

Create a screenshot of this document viewer window and paste it here:

Close this window and there is no need to save it since you can re-create it from your stream again later (assuming that you save your stream to your desktop so that you can find it easily!).

Customer Service Analysis Goal: In this exercise, you will use the text data that summarizes the Trip Advisor feedback given by customers that recently visited the Bellagio hotel in Las Vegas in order to identify the common problems encountered by this hotel and the overall sentiment of its customers. Data: Use the dataset titled “Trip Advisor Dataset.xlsx” with this exercise and upload it onto your VLAB desktop. You can either use the upload button or drag/drop worked on my pc this morning! Steps: A. Before you continue, make sure that your Excel file IS NOT OPEN!!! Create a new stream and add the source type of Excel from the Sources tab. Edit the node to point to the dataset.

Question 4: Click on the preview button and attach a screenshot of the first 10 observations.

A. Next, add a “Text Mining” node, connect it to your Excel file source, edit it and on the Fields tab/Text field select “Feedback” and run it with the default options, as shown below.

Question 5: Run your Text Mining block and then do the same process that you did with the job postings above. Take a look at the concept results in the Interactive Workbench. Click on the Global header to sort by

count. Which concept (i.e. word) is most frequently used in the text documents and how many times is it mention in how many different documents? What word? room

How Many Times Was It Mentioned Overall? 148

How Many Documents? 97

Question 6: Click on the concept “pool” and generate a map of this concept. By looking at this map, what are some of the commonly linked phrases? Attach a screenshot of your map.

Question 7: Now, go back to your stream and edit the Feedback “Text Mining” configurations. Go to the Model tab and change the Model to use a “Text Analysis Package” by clicking on the radio button out in front of the “Text Analysis package”. If it is grayed out/unselectable, delete the Text Mining module from your stream and re-add it.

Click on the Load button and then find and select linguistics library called “Customer Satisfaction.tap” with “Mixed Opinions” selected, as shown below. This change enables the software to use a pre-determined package for analysis (e.g. customer satisfaction ontology).

Run this stream now by clicking on the Load button on this screen and then the “Run” button on the text mining screen. Look at the Interactive Workbench results. You will see that there are pre-defined types such as “Negative” and “Positive” for the different concepts.

Question 8: How many people people’s comments had negative and Positive comments? Change the Concept to “Type”. Look at the frequencies of “Negative” and “Positive” concepts. Which attitude is higher towards this company? Sort by Global and then attach a screenshot of your negative and positive types as shown below. Yours will look different since yours should be sorted!

Question 9: Now, re-display the concepts and find the “Check-in” concept, as shown below. Clicking on the “Display” button shows you all of the texts (documents) that were found in the initial analysis. Take a look at these to see if they make sense and actually contain the “check-in” phrase. Look at some of the surrounding

comments to see what they are saying about the check-in experience.

Create a concept map of this “check-in” and try to identify common issues with the customers are not satisfied with the company. Mine looked like this:

Attach a screenshot of your concept map and then take a moment to look over the texts for these check-in comments (click on the “Display” button in order to see the texts corresponding to the category) to see what people are saying about this important process in any business. Copy/paste a typical comment here:

Question 10: Now that you have some skills built up. Go online and either find an RSS feed with at least 40 responses or copy/paste comments from any website that you want to perform a sentiment analysis on and attach similar screenshots to what you just did in step 9 here and provide a BRIEF (1 paragraph is sufficient) summary of your findings. I usually find an RSS feed by “Googling” RSS and them the subject that I am looking for.

You can usually get the URL by finding the RSS icon, right-clicking on it and then select “Copy Link Address”. Then paste this link into your Web Feed module and test it out by previewing it. Warning… not all RSS feeds work so you will need to test them all out by previewing the results from within your Web Feed module like you did earlier in the exercise (Web feed dumping results to a table output module). For instance, I may find a feed that returns 20 documents. If that is the case then I will then have to find another related feed(s) with the remaining 20 documents returned. Please keep in mind that you are looking for sites that have opinions about the same topic. Don’t just go to ESPN and find forty ESPN articles as they will most likely be about 40 different topics and your map won’t have any linkages.

IF YOU HAVE PROBLEMS FINDING AN RSS FEED: Some students have trouble with this question 10 if they cannot find a good RSS feed or feeds that contain 40 records to summarize. If you experience this and want to do it a way that you may be more comfortable with do the following: 

 

Find a site that has customer feedback on a product, service, movie, band, sports team, etc. This could be any site that lets users post feedback. For instance, if you want to do sentiment analysis on a movie you can go to Rotten Tomatoes and find a movie and then locate the critic’s feedback. One comment at a time, copy and paste the critic’s feedback into the Excel file provided titled: A4 Sentiment Analysis Template.xlsx Once you have 40 comments pasted into column A of the worksheet do the sentiment analysis EXACTLY how you did it in the step above for the customer feedback in the Bellagio Hotel.xlsx.

LOG OUT CORRECTLY from the machine that you are on by clicking on the Windows icon in the lower right corner of your screen, selecting your name and then log off:...