Alonso 2018 Article Data Mining Algorithms And Techniq PDF

Title	Alonso 2018 Article Data Mining Algorithms And Techniq
Author	renad sh
Course	Data Structure
Institution	King Abdulaziz University
Pages	16
File Size	720.2 KB
File Type	PDF
Total Downloads	95
Total Views	159

Preview

CLICK TO PREVIEW PDF

Summary

Download Alonso 2018 Article Data Mining Algorithms And Techniq PDF

Description

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/326423098

Data Mining Algorithms and Techniques in Mental Health: A Systematic Review ArticleinJournal of Medical Systems · July 2018 DOI: 10.1007/s10916-018-1018-2

CITATIONS

READS

18

2,322

7 authors, including: Susel Góngora Alonso

Isabel De la Torre Díez

Universidad de Valladolid

Universidad de Valladolid

30 PUBLICATIONS99 CITATIONS

275 PUBLICATIONS3,437 CITATIONS

SEE PROFILE

SEE PROFILE

Sofiane Hamrioui

Miguel Lopez-Coronado

ESAIP

Universidad de Valladolid

73 PUBLICATIONS335 CITATIONS

209 PUBLICATIONS3,068 CITATIONS

SEE PROFILE

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Suicide Prevention using ICTs View project

Foro Regional para la prevención del Suicidio: E-learning y mejores usos de las tecnologías para el seguimiento de conductas de suicidio. View project

All content following this page was uploaded by Sofiane Hamrioui on 24 October 2018. The user has requested enhancement of the downloaded file.

Journal of Medical Systems (2018) 42: 161 https://doi.org/10.1007/s10916-018-1018-2

SYSTEMS-LEVEL QUALITY IMPROVEMENT

Data Mining Algorithms and Techniques in Mental Health: A Systematic Review Susel Góngora Alonso 1 & Isabel de la Torre-Díez1 & Sofiane Hamrioui 2 & Miguel López-Coronado 1 & Diego Calvo Barreno 1 & Lola Morón Nozaleda 3 & Manuel Franco 4 Received: 17 May 2018 /Accepted: 16 July 2018 /Published online: 21 July 2018 # Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract Data Mining in medicine is an emerging field of great importance to provide a prognosis and deeper understanding of disease classification, specifically in Mental Health areas. The main objective of this paper is to present a review of the existing research works in the literature, referring to the techniques and algorithms of Data Mining in Mental Health, specifically in the most prevalent diseases such as: Dementia, Alzheimer, Schizophrenia and Depression. Academic databases that were used to perform the searches are Google Scholar, IEEE Xplore, PubMed, Science Direct, Scopus and Web of Science, taking into account as date of publication the last 10 years, from 2008 to the present. Several search criteria were established such as ‘techniques’ AND ‘Data Mining’ AND ‘Mental Health’, ‘algorithms’ AND ‘Data Mining’ AND ‘dementia’ AND ‘schizophrenia’ AND ‘depression’, etc. selecting the papers of greatest interest. A total of 211 articles were found related to techniques and algorithms of Data Mining applied to the main Mental Health diseases. 72 articles have been identified as relevant works of which 32% are Alzheimer’s, 22% dementia, 24% depression, 14% schizophrenia and 8% bipolar disorders. Many of the papers show the prediction of risk factors in these diseases. From the review of the research articles analyzed, it can be said that use of Data Mining techniques applied to diseases such as dementia, schizophrenia, depression, etc. can be of great help to the clinical decision, diagnosis prediction and improve the patient’s quality of life. Keywords Algorithms . Data mining . Mental health . Techniques

Introduction Mental Health is measured by a high grade of impairment, such as affective disorder that results in depression and different

anxiety disorders. Worldwide, 25% suffer from Mental Health problems in developed and developing countries. The data is turning into terabytes and petabytes, 80% of which is unstructured, so it is difficult to process them with database

This article is part of the Topical Collection on Systems-Level Quality Improvement * Isabel de la Torre-Díez [email protected] Susel Góngora Alonso [email protected]

Manuel Franco [email protected] 1

Department of Signal Theory and Communications, and Telematics Engineering, University of Valladolid, Paseo de Belén, 15, 47011 Valladolid, Spain

2

Bretagne Loire and Nantes Universities, UMR 6164, IETR Polytech Nantes, Nantes, France

3

Nozaleda and Lafora Mental Health Clinic, C/ José Ortega Y Gasset, 44, 28006 Madrid, Spain

4

Psiquiatry Service, Hospital Zamora, Hernán Cortés, Zamora, Spain

Sofiane Hamrioui [email protected] Miguel López-Coronado [email protected] Diego Calvo Barreno [email protected] Lola Morón Nozaleda [email protected]

161 Page 2 of 15

management tools and other traditional techniques. About $ 2.3 trillion is the global cost for Mental Health treatment. By improving the quality of treatments we can reduce costs significantly and this quality can be improved by the introduction of Data Mining tools and techniques in Mental Health [1]. In the last two decades there has been a steady increase in the use of Data Mining techniques in various disciplines [2]. Data Mining incorporates a path to knowledge discovery and is a significant process to discover patterns in data by exploring and modeling large amounts of data. Data Mining incorporates automatic learning algorithms to learn, extract and identify useful information and subsequent knowledge of large databases [3]. In the last 10 years Data Mining techniques have been used in medical research, mainly in neuroscience and biomedicine. More recently, psychiatry has begun to use benefits of these techniques to gain a better understanding of the mental disease genetic composition [4]. According to World Health Organization (WHO) [5] there are a variety of mental disorders within main are dementia, schizophrenia, depression, bipolar disorders and Alzheimer as a dementia derived disease. Currently most people suffer from neurodegenerative disorders related to the brain [6]. These disorders lead to various diseases. Dementia in this case is a general term for decrease in mental capacity severe enough to interfere with daily life [7]. Alzheimer’s Disease (AD) is the most common type of dementia represents 60–80% of mental disorders [8]. The disease diagnosis at an earlier stage is a crucial task, therefore, it is of medical interest to develop predictive tools to evaluate this risk [9]. The objective of predictive data extraction in this area is to build models from high-dimensional medical information and use them to predict diagnostic results on unseen medical data in order to support clinical decision making [10]. Approaches in predictive data extraction can be applied to the construction of decision models for medical procedures, such as prognosis, diagnosis and treatment planning, which can be embedded into clinical systems as systematic support components [11]. In this paper we have posed as research question: Are there work related to Data Mining techniques and algorithms applied to Mental Health with purpose of obtaining predictions of diseases in this pathology? Therefore, the aim of our paper is to present a review state of the art of Data Mining techniques and algorithms in the prevalent diseases of Mental Health, being this exhaustive study the main contribution of our paper and allow us to direct future research in the creation of new prediction algorithms. This paper gives continuity to a first review [12] focused on analyzing sources and techniques of Big Data in the health sector and identify which of these techniques are the most used in the chronic diseases prediction. There are reviews that base their study on: review, analysis and evaluation for the early detection of Alzheimer diseases using Machine Learning techniques [13], as well as in scope

J Med Syst (2018) 42: 161

and limits of Data Mining techniques for predictive analysis in Mental Health [14]. The methodology used in this review is described below. Afterwards, the results obtained the discussion of them and the final conclusions of this research work will be finalized.

Methodology In this paper we have carried out a review of the published works related to techniques and algorithms of Data Mining in Mental Health until March 2018. To carry out the review, the scientific databases were used: Google Scholar, IEEE Xplore, PubMed Science Direct, Scopus and Web of Science. The databases used include the most scientific information in multidisciplinary fields, engineering and medicine, they allow to find and access articles in scientific and academic journals, or in repositories, archives and other collections of scientific texts. The key terms introduced in the search engines of these databases are: BTechniques^ AND BAlgorithms^ AND BData Mi n i n g ^ A N D (Bd emen t i a^ OR Bd ep ressi on ^ OR BAlzheimer^ OR Bschizophrenia^ OR Bmental health^), both in Spanish and English. Those terms are searched in BAbstract/Title/Keywords^, from 2008 to the present. The search criteria shown in Table 1 are those provided specifically by the database search engine itself. The selection process of the papers was carried out by reading the titles and abstracts of the results obtained; the papers were classified by reading their abstracts as well as the full article when necessary. The selection criteria to take into account to classify the papers were the following: 1) Studies of Data mining techniques applied to the main Mental Health diseases. 2) Studies of Data mining algorithms applied to the main Mental Health diseases. 3) Studies aimed at another type of disease that is not related to Mental Health are eliminated. All articles repeated in more than one database will be deleted. The Fig. 1 shows the diagram used in the review. Of the 211 publications found 89 were duplicated or with an irrelevant title for this research, the remaining 122 studies were read and analyzed their abstracts to see which were of interest, obtaining as a result 72 documents which gave rise to relevant contributions. Then, in the following section are shows the most relevant works found and the main techniques and algorithms found in the review are analyzed.

Main techniques and algorithms of data mining used in the review The Data Mining techniques have recently become a predominant field of research with wide applications in medical healthcare, financial services, telecommunications, natural sciences, etc. It is a process to discover useful models in data,

J Med Syst (2018) 42: 161 Table 1

Page 3 of 15 161

Search criteria in the different scientific databases

Keywords/ Databases

Google Scholar

Techniques OR Algorithms AND Data Mining AND Mental Health Techniques OR Algorithms AND Data Mining AND Dementia Techniques OR Algorithms AND Data Mining AND Depression Techniques OR Algorithms AND Data Mining AND Alzheimer Techniques OR Algorithms AND Data Mining AND Schizophrenia

Babstract, title, keywords^ Babstract^ Btitle, abstract^ Babstract, title, Babstract, title, keywords^ keywords^ Babstract, title, keywords^ Babstract^ Btitle, abstract^ Babstract, title, Babstract, title, keywords^ keywords^ Babstract, title, keywords^ Babstract^ Btitle, abstract^ Babstract, title, Babstract, title, keywords^ keywords^ Babstract, title, keywords^ Babstract^ Btitle, abstract^ Babstract, title, Babstract, title, keywords^ keywords^ Babstract, title, keywords^ Babstract^ Btitle, abstract^ Babstract, title, Babstract, title, keywords^ keywords^

with the aim of interpreting existing behaviors or predicting future results [15]. The Data Mining algorithms are classified into two categories: descriptive (or unsupervised learning) and predictive (or supervised learning). Descriptive data mining clusters data by measuring the similarity between objects (or records) and discovers unknown patterns or relationships in data while predictive learning infers prediction rules (classification / prediction models) from (training) data and applies the rules to unpredicted / unclassified data [16]. The algorithms used in prognosis and diagnosis of Mental Health diseases are supervised learning algorithms that include Artificial Neural Networks (ANNs), Decision Tree (DT), genetic algorithms and linear discriminant analysis. Other techniques that generally in use are Support Vector Machine (SVM), Association Rules (ARs) mining and Ensemble methods [12]. Fig. 1 Flow diagram used in the literature review

IEEE Xplore

PubMed

Science Direct Scopus

Web of Science

Btitle, abstract^ Btitle, abstract^ Btitle, abstract^ Btitle, abstract^ Btitle, abstract^

ANNs are computational models inspired by networks of the central nervous system, capable of machine learning and pattern recognition. In general, they are presented as systems of interconnected Bneurons^ that can compute values from inputs by feeding information through their network [9]. Convolutional Neural Networks (CNNs/ConvNets) is inspired by the human visual system; they are similar to classic neural networks. This architecture has been specifically designed based on the explicit assumption that raw data is composed of two-dimensional images that allow us to encode certain properties and also reduce the amount of hyper parameters. The CNN topology uses spatial relationships to reduce the number of parameters that must be learned and, therefore, improves upon general feed-forward back propagation training [17]. DT (for example, C4.5) is used to model sequential decision problems. They are composed of nodes and edges: internal nodes represent the predicate of the objects in the data set,

161 Page 4 of 15

while each edge represents a division rules over an attribute (typically, division binary rules). Indeed, every node has two (or more) outgoing branches: one is associated with objects whose attributes satisfy the predicate, whereas the other to the ones which do not. The generalized DT classifiers, such as C4.5, rely on entropy rule or information gain rule which finds at each node a predicate that optimizes an entropy function of the defined partition [18]. There are classification methods such as those defined below: SVM builds a separation hyperplane, which maximizes the minimum distance between data of different classes in a new space that has been obtained by applying a kernel function to the original data. SVM are particularly suitable for binary classification tasks; in this case, the input data are two sets of n dimensional vectors [18]. Rule-based classifiers assign a given class to each object according to a specific function r: condition-c (called classification rule), such that the rule r covers an object x if the attributes of x satisfy the condition of r. Therefore, in this type of classification, the classifier uses logical propositional formulas in a disjunctive or conjunctive normal form (Bif then rules^) for classifying the given samples [18]. Among the different variants of the Ensemble and Random Forest classifiers, they have attracted the attention of researchers due to its features of handling missing values and noisy data, classification of characteristics and selection to form tree nodes [19]. Random Forest is a variant of ensemble classifier consisting of a collection of tree-structured classifiers {h (x, Θk) k = 1, 2,....}, where {Θk} are independent identically distributed random vectors. Each tree casts a unit vote for the most popular class input x. It is a popular supervised classification and regression method that uses the concept of random feature selection for making decision trees [19]. Naïve Bayesian classifier is a selective classifier which calculates the set of probabilities by counting the frequency and combination of values in a given data set. It assumes that the all variables which contribute towards classification are mutually independent. Naïve Bayesian classifier is based on bayes theorem and theorem of total probabilities [7]. Fig. 2 Relevant papers statistics found in the last 10 years

J Med Syst (2018) 42: 161

JRip (RIPPER) is one of the basic and most popular algorithms. Classes are examined in increasing size and an initial set of rules for the class is generated using the incremental reduced error [7]. Proceed by treating all the examples of a particular judgment in the training data as a class and finding a set of rules to cover all members of that class. Thereafter proceeds to the next class and does the same, repeating this until all classes have been covered. K-mean is a basic technique of grouping in biocomputing. The goal is to find K patterns by calculating the distance between each sample value [ 20]. Apriori is an algorithm that determines the associations between data by checking frequencies. It is one of the major types of algorithm based on association rule. The main purpose of using the Apriori algorithm in Data Mining is to find patterns in data set. Calculates the conditional probability of each case and return the most probable data set, which appears most frequently in input data [21]. The Data Mining approach can significantly help the research into mental illness, to find patterns and knowledge embedded into the data. It requires exploration and analysis of large quantities of data for the purpose of better understanding and deriving knowledge regarding the problem at hand [22]. Below we show the results obtained.

Results In the review of literature we find a large number of studies that base their research on the use of Data mining techniques and algorithms applied to Mental Health diseases. Figure 2 shows the relevant paper statistics found in the last 10 years. From a total of 72 papers found 35 belong to journals while 38 are conference papers.

Alzheimer Alzheimer is a multifaceted disease in which the accumulated cerebral pathology produces a progressive cognitive deterioration that finally leads to dementia [23, 24]. It is characterized by

J Med Syst (2018) 42: 161 Table 2

Page 5 of 15 161

Studies of the bibliographic review related to Data Mining techniques and algorithms applied to patients with Alzheimer’s

Authors

Year of Study proposal publication

Qu, Yuan, & Liu. [8] 2009

Techniques and Algorithms

Predictive model to identify possible conversions Naïve Bayes from MCI to AD based on the ADNI database.

- Naïve Bayes can predict the conversion with a reasonably good AUC value after feature selection. - It was discovered that some specific genetic Neural Networks and factors, diabetes, age and smoking were the Machine Learning: strongest risk factors for AD. Random Forest tree, Multilayer Perceptron - The classification model was validated with the test cases and achieved classification average accuracy of 99.25% with Random Forest tree and the Multilayer Perceptron. SVM, Bayes statistics, - Bayes and VFI yielded superior results compared to the SVM approach, showing a voting feature novel approach to identify regions of high intervals (VFI) discriminatory power for AD identification and conversion prediction to AD between MCI. Association rules - The proposed method yields up to 94.87% classification accuracy (sensitivity = 91.07%, specificity = 100%) overcoming the methods recently developed until that year for AD early diagnosis. SVM - They show that subjects with early stage AD can be distinguished with an accuracy of 79% of healthy subjects of the same age.

Joshi et al. [27]

2010

They propose a new model for the AD classification when considering the risk factors that most influence. Different models were developed for AD classification.

Plant et al. [28]

2010

They develop a new data mining framework in combination with three different classifiers to derive a quan...