M.Sc. Thesis- The Design and Implementation of a Web-Based News Recommender System Using Case Based Reasoning PDF

Title M.Sc. Thesis- The Design and Implementation of a Web-Based News Recommender System Using Case Based Reasoning
Author A. Michael (Ph.D)
Pages 11
File Size 310.7 KB
File Type PDF
Total Downloads 133
Total Views 249

Summary

THE DESIGN AND IMPLEMENTATION OF A WEB BASED NEWS RECOMMENDER SYSTEM USING CASE-BASED REASONING. By Michael O. Awoleye DECLARATION I confirm that the work contained in this MSc Thesis report has been composed solely by myself and has not been accepted in any previous application for a degree. All so...


Description

THE DESIGN AND IMPLEMENTATION OF A WEB BASED NEWS RECOMMENDER SYSTEM USING CASE-BASED REASONING.

By

Michael O. Awoleye

DECLARATION I confirm that the work contained in this MSc Thesis report has been composed solely by myself and has not been accepted in any previous application for a degree. All sources of information have been specifically acknowledged and all verbatim extracts are distinguished by quotation marks.

Singed……………………………

Date………………………

ACKNOWLEDGEMENTS The most high God is first praised and appreciated who is making my end to be better than the beginning as He has promised, He is ever faithful GOD. I also give my profound appreciation to my able supervisor Dr Nirmallie Wiratunga whom I have spotted for excellent academic prowess and have desired to tap from her wealth of experience. Your assistance and guidance to put this report together is well appreciated. I cannot but appreciate Messrs Sadiq Sanni & Ibrahim Adeyanju as well, for their assistance and advice. Last but not the least is my family and relatives, first to my wife Stella and my children Debby and David for their forbearance and prayers since I left Nigeria for this programme. Also to my mother Mrs C.A. Awoleye and my siblings: Mrs T.A. Okunola (late), Mrs A. Akinade, Mrs O. Eluyemi, Mrs A. Olabimtan, Mrs O. Ajileye and Mr Kola Awoleye, I thank you all.

ABSTRACT This is the concluding volume (Vol.2) of the M.Sc. Thesis. The main goal of this work is to design and implement a web based news recommender system using Case-Based Reasoning (CBR). This volume dwells on the system design, implementation and evaluation of a web based news recommender system. The aim and objectives have been reviewed and priority has been given to the processes involved in text indexing and appropriate mapping of user query to the case-base which is the basis of its recommendation. The application employed jCOLIBRI framework developed by Juan A. Recio-García for the implementation. The case-base is populated using a crawler which was later parsed into the required format and design, based on selected & related news attributes. The application employed Lucene indexing and cosine similarity for the similarity computation and k-nearest neighbour (KNN) for the retrieval of k cases. The performance of the application was tested using a small sample of user’s rating to measure the efficiency of the recommender. The result show that the application (WebNCBR) predicts accurately when compared with users perspective. The accuracy stood at 93% except for a slight difference that constitutes inconsistency as a result of user’s misconception. This therefore creates a bit of deflection to the sinusoidal pattern noticed in the comparison graph.

LIST OF FIGURES Figure 2.1 Important steps in document indexing: tokenization, filtration and stemming…....7 Figure 2.2: Documents dispersed in vector space…………………………………………...10 Figure 2.3: Venn diagram representation of term-document relationship…………………..11 Figure 2.4: Theta as an angle between two coordinates…………………………………….12 Figure 3.1: Tree Diagram of Article Parameters…………………………………………….14 Figure 3.2: Crawling Loop…………………………………………………………………..18 Figure 3.3: Recommender Architecture……………………………………………………..21 Figure 3.4: Layout Design of the query page……………………………………………….22 Figure 3.5: Layout Design of the result page………………………………………………..23 Figure 3.2: Document Root Structure……………………………………………………….24 Figure 4.1: The description of the available tiers……………………………………………28 Figure 4.2: Interaction between a Web Client and a Web Application the uses Servlet .…..30 Figure 5.1: Tree representation of similarity relationship……………………………….…..36 Figure 5.2: Comparison of Human Judgment of Similarity with WebNCBR…………….…...37 Figure 5.3: Query: “late penalty stuns Manchester City; Liverpool breeze past West Brom”……….…..38 Figure 5.4: Chelsea win again; Wigan shock Spurs………………………………………...…38 Figure 5.5: Valencia kick-off La Liga with victor………………………………….………....39 Figure 5.6: Big guns await Champions League draw……………………………..…………..39

LIST OF TABLES Table 2.1: Term-document matrix…………………………………………………………….10 Table 5.1: The structure of the comparisons………………………………………………….31 Table 5.2: Sample of the headline matches………………………………………………….32 Table 5.3: Similarity Comparison of WebNCBR with Human Judgment…………………..33 Table 5.4: Machine output of news headlines matches………………………………l……..35 Table 5.5: Query- recession is over economy now booming………………………………..40 Table 5.6: Query-Intel warns sales will fall short of forecasts……………………….……...40 Table 5.7: Query: can marriage end in divorce…………………………….....……………....41

Table of Contents CHAPTER ONE ........................................................................................................................................................ 9 INTRODUCTION ....................................................................................................................................................... 9 1.1 1.2 1.3

RESEARCH BACKGROUND ............................................................................................................................... 9 MOTIVATION AND PROBLEM SPECIFICATION ...................................................................................................... 10 AIMS & OBJECTIVES..................................................................................................................................... 10

CHAPTER TWO ....................................................................................... ERROR! BOOKMARK NOT DEFINED. THEORETICAL BACKROUND.................................................................. ERROR! BOOKMARK NOT DEFINED. 2.1 DOCUMENT INDEXING ............................................................................. ERROR! BOOKMARK NOT DEFINED. 2.1.1 REPRESENTATION ............................................................................ ERROR! BOOKMARK NOT DEFINED. 2.2 INDEXING USING LUCENE ...................................................................... ERROR! BOOKMARK NOT DEFINED. 2.2.1 Local Weights ...................................................................................Error! Bookmark not defined. 2.2.2 Global Weights .................................................................................Error! Bookmark not defined. 2.3 DOCUMENTS IN VECTOR SPACE ............................................................... ERROR! BOOKMARK NOT DEFINED. 2.4 VENN DIAGRAM REPRESENTATION OF DOCUMENTS ..................................... ERROR! BOOKMARK NOT DEFINED. 2.6 COSINE SIMILARITY................................................................................ ERROR! BOOKMARK NOT DEFINED. CHAPTER THREE ................................................................................... ERROR! BOOKMARK NOT DEFINED. SYSTEM ANALYSIS & DESIGN................................................................ ERROR! BOOKMARK NOT DEFINED. 3.1

REPRESENTATION OF NEWS ARTICLES...................................................... ERROR! BOOKMARK NOT DEFINED. 3.1.1 Essential attributes ..............................................................................Error! Bookmark not defined. 3.2 WEB CRAWLING ................................................................................... ERROR! BOOKMARK NOT DEFINED. 3.1.1 WEB CRAWLER ARCHITECTURES ......................................................... ERROR! BOOKMARK NOT DEFINED. 3.3 RECOMMENDER SYSTEM ARCHITECTURE................................................... ERROR! BOOKMARK NOT DEFINED. 3.3.1 The Front-End ...................................................................................Error! Bookmark not defined. 3.3.2 The Back-End ...................................................................................Error! Bookmark not defined. 3.4 FILE ORGANISATION .............................................................................. ERROR! BOOKMARK NOT DEFINED. 3.5 CLASS DESIGN ..................................................................................... ERROR! BOOKMARK NOT DEFINED. 3.5.1 News.Java ........................................................................................Error! Bookmark not defined. 3.5.2 LuceneTextSimilarity.java .................................................................Error! Bookmark not defined. 3.5.2 LuceneIndexCreator.java ..................................................................Error! Bookmark not defined. 3.5.4 NewsDescription.java........................................................................Error! Bookmark not defined. 3.5.5 CaseComponent.java ........................................................................Error! Bookmark not defined.

IMPLEMENTATION .................................................................................. ERROR! BOOKMARK NOT DEFINED. 4.1

TOOLS AND TECHNOLOGIES .................................................................... ERROR! BOOKMARK NOT DEFINED. 4.1.1 The JAVA EE 6.0 ...............................................................................Error! Bookmark not defined. 4.1.2 Application Memory Enhancement ....................................................Error! Bookmark not defined. 4.1.3 The Distributed Multi-tiered Capability ..............................................Error! Bookmark not defined. 4.1.4. Security Feature ................................................................................Error! Bookmark not defined. 4.1.5 Java EE Clients ..................................................................................Error! Bookmark not defined. 4.1.6 Web Clients ........................................................................................Error! Bookmark not defined. 4.1.7 Application Clients ..............................................................................Error! Bookmark not defined. 4.1.8 Web Integration ..................................................................................Error! Bookmark not defined.

CHAPTER FIVE ....................................................................................... ERROR! BOOKMARK NOT DEFINED. EXPERIMENTATION AND PERFORMANCE ANALYSIS ............................ ERROR! BOOKMARK NOT DEFINED. 5.1

THE METHODOLOGY .............................................................................. ERROR! BOOKMARK NOT DEFINED.

5.2 5.3 5.4 5.5 5.6

HUMAN JUDGEMENT OF SIMILARITY........................................................... ERROR! BOOKMARK NOT DEFINED. THE AUTOMATION (WEBNCBR) .............................................................. ERROR! BOOKMARK NOT DEFINED. RELATIONSHIP TREE .............................................................................. ERROR! BOOKMARK NOT DEFINED. PATTERN OF COMPARISON OF SIMILARITY JUDGMENT .................................. ERROR! BOOKMARK NOT DEFINED. RECOMMENDATION WITH WEBNCBR........................................................ ERROR! BOOKMARK NOT DEFINED.

CHAPTER SIX ......................................................................................... ERROR! BOOKMARK NOT DEFINED. CONCLUSION AND RECOMMENDATION ................................................ ERROR! BOOKMARK NOT DEFINED. REFERENCES ......................................................................................... ERROR! BOOKMARK NOT DEFINED. APPENDIX I............................................................................................. ERROR! BOOKMARK NOT DEFINED. GLOSSARY OF TERMS ........................................................................... ERROR! BOOKMARK NOT DEFINED. APPENDIX II ...................................................................................... ERROR! BOOKMARK NOT DEFINED. PROGRAM CODE LISTING ............................................................ ERROR! BOOKMARK NOT DEFINED. APPENDIX III .................................................................................... ERROR! BOOKMARK NOT DEFINED. DOCUMENTATION ......................................................................... ERROR! BOOKMARK NOT DEFINED. APPENDIX IV ..................................................................................... ERROR! BOOKMARK NOT DEFINED. CLASS DIAGRAM ............................................................................... ERROR! BOOKMARK NOT DEFINED.

CHAPTER ONE INTRODUCTION This is the second volume of the M.Sc. project report on the design and implementation of a web-based news recommender system- using case-based reasoning, christened WebNCBR. While the first volume dwells on the project aim, objectives, motivation, background information on CBR and a survey on recommender systems with related theories and principles, the second volume thus focus on the design, implementation and evaluation. Selection of implementation tools was also discussed and justified, summary and conclusion on the efficacy of using the application (a web based news recommender system) is also reported. 1.1 Research Background Choosing from numerous alternatives have been a great challenge for mankind much more with the advent of the Internet which has resulted in information overload (Montaner, López and De La Rosa 2003). The Internet users are constantly in need of a way to assist them in handling the situation (Hanani, Shapira and Shoval 2001), too many options available to choose from with limited time. The concept of recommender systems come into play in this context and thus represent a tool capable of producing individualized recommendations in different domains (Manouselis and Costopoulou 2008). The application of recommender systems has been explored extensively by researchers in different domains, for example in the tourism industry, it has been investigated that it enables tourists to access reliable and accurate information as well as to undertake reservations and plans faster and at their convenience (Setten, Pokraev and Koolwaaij 2004 & Kabassi 2010). It also have a vast use in book recommendation e.g. Amazon.com, the most popular book recommender (Montaner, López and De La Rosa 2003). In the music industry for music albums recommendation, CDnow is named and consider as a good example as stated by Sarwar et al. (2000).

It has also been observed that recommender systems play prominent role in e-commerce domain (Sarwar et al. 2000; Burke 2002) and the benefits has been reported to be unprecedented. For example Michael Strickman, the CEO of choicestream.com reported about 80% increase in sales of music download since the introduction of recommender system in their business (Leavitt 2006). Also Netflix chief product officer, Neil Hunt projected a growth in their customer base to reach 20million by year 2012 up from 2.6million in 2004. This prediction was hinged on a steady growth of the customer base experienced in the previous years which stood at 4.2million in 2005 and 5.9million in 2006 (Leavitt

2006). The list of its usage is not exhaustive; this is just to mention a few. This work will not be complete if the application in the media industry is not referenced upon which this work is focused. Numerous personalized news services are emerging as investigated by Billsus and Pazzani (2000) especially now with the new trend of the web towards personalized information access. The question now is, is there not dearth in the midst of plenty? i.e do people get satisfaction from the services, despite the acclaimed proliferation of recommender systems? If not, why not? And what is the way forward? These are some of the research questions this work will seek to investigate and attempt to resolve. 1.2 Motivation and Problem Specification The idea of personalized and intelligent agents, search engines and recommender systems has been widely accepted in literature especially among users who require assistance to overcome the information overload in their attempt to carry out information retrieval (IR) activities (Montaner, López and De La Rosa 2003). There is no argument about the fact that there are a number of web portals that offer personalized access to daily news stories from avalanche of categories, some of which worthy of mention are: Yahoo, Lycos and Excite amongst others. The fact still remain that there are some pertinent issues which remain unresolved, some of these are highlighted below. Lee and Park (2007) in their work on MONERS: A news recommender for mobile web. This presents a system that gathers news articles from various news content providers and uses that as input for its recommendations to its users, based on their profile. This work has explored a number of news recommender systems with a view to employ appropriate text-matching techniques on queries (Billsus and Pazzani 2000) best for news recommender.

To achieve accuracy in prediction of a recommender system, a number of research questions were considered, some of which are: (i) the efficacy of capturing appropriate keywords from user’s query (ii) the quest for producing high quality recommendations that will be of interest to users (iii) the question of adaptation to a new set of dataset. 1.3

Aims & Objectives

The main aim of this research is to design and implement a web based news recommendation system using CBR. The lower level objectives are to: i. conduct extensive literature survey on news recommender systems vis-à-vis case-based reasoning;

ii. explore suitable features to represent news articles; iii. measure the impact of similarity between two vectors of n-dimensions with a view to improving recommendation of search results; iv. implement the processes involved in step (iii) above for web accessibility; v. carry out performance evaluation of the CBR News recommender..

These objectives have been carefully chosen in order to make this work relevant immediately because of its demand driven nature. Recommender systems as of today know no boundary i.e it is not discipline or sector bound. The application is dependent on the thought and the experience of the designer as well as the developer.

This work has been structured as follows, Chapter one gave an insight into the research background and a brief statements about the motivation for the work and the highlights of the objectives. It then conclude by a paragraph on the prospect of the thesis and the application developed. Chapter two describes the account of some related theoretical backgrounds such as document indexing which include indexing using lucene. It then wraps it up with the section on cosine similarity, which is central to this work. Chapter three is a discussion on system analysis and design as it relates to the news recommender system. This is discussed under the following highlighted sections: (i) representation of news articles- this discusses the choice attributes, their data type and its justification to representing news article, (ii) web crawling- this describes the process of reaping and formatting data for the application’s use, (iii) recommender system architecture- this discusses the component of the application and flow of the processes and lastly, (iv) the class modules- this describes the classes and the interfaces that made up the implementation of the system. Chapter four is the chapter dedicated for the discussion on the implementation; this discusses tools and technologies used in this work. It specifically described Java EE and the available tiers. Chapter five handles the experiments and performance analysis, the methodology is therefore discussed, the human judgement, the automation (WebNCBR) as well as the comparison between WebNCBR and human judgement is extensively discussed. Chapter six is a brief summary of the whole work. All other necessary information are listed as appendices, appendix I for glossary of terms, appendix II for the code listing, appendix III for the documentation while appendix IV hosts the class diagram....


Similar Free PDFs