Research Review by Ammar K. M. Abasi

Date: 2020-02-26 15:00

Venue: VIVA Room, Level 7, School of Computer Sciences, Universiti Sains Malaysia

TARGET AUDIENCE: CS Postgraduate (by Research) Students 


There will be a seminar by Mr. Ammar Kamal Mousa Abasi and details of the seminar as follows:


Main Supervisor: - Prof. Dr. Ahamad Tajudin Khader

Date: 26/02/2020 (Wednesday)   Time:- 3.00pm

Venue: VIVA Room, Level 7 - School of Computer Sciences

Abstract:- The identification of scientific topics is of current interest to understand historical and emerging ideas of scientific publications. In fact, the topics of scientific publications are short phrases that represent the content of scientific papers. In some contexts, these topics are formulated by humans manually, for example by researchers when they submit a manuscript to a journal, or by professionals when they update their online profile. In the digital world, large scientific publications are inundating the web every day. Consequently, manipulating these text documents to selecting topics is not feasible manually. Automatically identifying topics can then be a good alternative to manually formulating topics. In general, this process can be divided into two main tasks: (i) Text Document Clustering (TDC) and (ii) Topic Extraction (TE). This study aims to propose a suitable TE approach, which provides a better overview of multi-scale scientific publications. To achieve this aim: (i) A new feature selection method for TDC, that is, binary multi-verse optimizer algorithm (BMVO) is proposed to eliminate irrelevantly, redundant features and obtain a new subset of more informative features (ii) Three multi-verse optimizer algorithm (MVOs), namely, (a) basic MVO, (b) modified MVO, (c) hybrid MVO are proposed to solve the TDC problem; these algorithms are incremental improvements of the preceding versions (iii) A novel ensemble method for an automatic TE from a collection of scientific publications as text documents is proposed to extract the topics from the clustered documents. In order to evaluate the proposed methods for TDC, six external measures (i.e., accuracy, precision and recall, F-measure, purity, and entropy) are used. Furthermore, sixteen datasets, including six standard text datasets and ten scientific publications datasets are used in the experiments.  The results produced by the proposed algorithms for TDC are compared with well-regard methods, including clustering methods and metaheuristic-based methods. Surprisingly. The proposed method can excel at all comparative methods in all datasets used using almost all external measurements. Furthermore, to evaluate the proposed ensembled TE method, three external measures (i.e., precision, recall, and F-measure) are used. Again, the same ten scientific publications datasets are also used in the experiments. The results produced by the proposed ensembled TE method are compared with those produced by five statistical methods established in the literature. The proposed ensembled TE method is able to outperform all comparative methods using the entire external measurements for all almost all datasets. Trivially, the proposed ensembled TE method complements the advantages of the previous methods; thus, superior results are obtained.


This email address is being protected from spambots. You need JavaScript enabled to view it.


All Dates

  • 2020-02-26 15:00

Powered by iCagenda