my current focus areas includes Information Extraction, Predictive Analytics, Big Data Analytics, Artificial Intelligence, and Machine Learning. publications are by categories in reversed chronological order.
Purpose: The purpose of this study is to serve as a comprehensive review of the existing annotated corpora. This review study aims to provide information on the existing annotated corpora for event extraction, which are limited but essential for training and improving the existing event extraction algorithms. In addition to the primary goal of this study, it provides guidelines for preparing an annotated corpus and suggests suitable tools for the annotation task. Design/methodology/approach: This study employs an analytical approach to examine available corpus that is suitable for event extraction tasks. It offers an in-depth analysis of existing event extraction corpora and provides systematic guidelines for researchers to develop accurate, high-quality corpora. This ensures the reliability of the created corpus and its suitability for training machine learning algorithms. Findings: Our exploration reveals a scarcity of annotated corpora for event extraction tasks. In particular, the English corpora are mainly focused on the biomedical and general domains. Despite the issue of annotated corpora scarcity, there are several high-quality corpora available and widely used as benchmark datasets. However, access to some of these corpora might be limited owing to closed-access policies or discontinued maintenance after being initially released, rendering them inaccessible owing to broken links. Therefore, this study documents the available corpora for event extraction tasks. Research limitations: Our study focuses only on well-known corpora available in English and Chinese. Nevertheless, this study places a strong emphasis on the English corpora due to its status as a global lingua franca, making it widely understood compared to other languages. Practical implications: We genuinely believe that this study provides valuable knowledge that can serve as a guiding framework for preparing and accurately annotating events from text corpora. It provides comprehensive guidelines for researchers to improve the quality of corpus annotations, especially for event extraction tasks across various domains. Originality/value: This study comprehensively compiled information on the existing annotated corpora for event extraction tasks and provided preparation guidelines.
2023
JKSUCI
A novel bitwise arithmetic optimization algorithm for the rule base optimization of deep neuro-fuzzy system
Noureen Talpur, Said Jadid Abdulkadir, Emelia Akashah Patah Akhir, and 3 more authors
Journal of King Saud University - Computer and Information Sciences, 2023
The novel Deep Neuro-Fuzzy System (DNFS) has attracted significant attention from researchers due to the model’s adaptability and rule-based structure. The model has been successfully implemented in various real-world applications. However, despite its success, the DNFS experience difficulties applying high-dimensional data. The rule base in the DNFS expands exponentially with the number of features affecting the transparency of the model. Additionally, the typical gradient-descent (GD) technique used in the DNFS rule-base optimization frequently encounters the issue of being trapped in local minima. This study aims to introduce a modern optimization approach to address these drawbacks. Therefore, the novel Bitwise Arithmetic Optimization Algorithm (BAOA) has been proposed in this work. The BAOA method has been implemented as a feature selection approach to solve the large rule base problem due to applying high dimensional data. Moreover, the DNFS’ rule base optimization was carried out using the proposed BAOA algorithm to escape the local minima issue owing to the GD algorithm. The simulation results obtained on twelve benchmark datasets illustrated that the BAOA has been able to select the least number of features from high-dimensional data with an average accuracy of 95.53%. In terms of rule base optimization, the novel BAOA achieved better performance with average training and testing accuracies of 96.87% and 96.25%, respectively, on benchmark datasets compared to the Arithmetic Optimization Algorithm (AOA) (average training and testing accuracies of 95.66% and 94.54%) and GD-based optimization (average training and testing accuracies of 94.07% and 93.19%). Additionally, the Wilcoxon test revealed a significant difference between the performances of the proposed BAOA algorithm and comparative methods. The findings indicate that the proposed BAOA approach is highly effective for high-dimensional real-world problems.
IEEE Access
Systematic Literature Review of Information Extraction From Textual Data: Recent Methods, Applications, Trends, and Challenges
Mohd Hafizul Afifi Abdullah, Norshakirah Aziz, Said Jadid Abdulkadir, and 2 more authors
Information extraction (IE) is a challenging task, particularly when dealing with highly heterogeneous data. State-of-the-art data mining technologies struggle to process information from textual data. Therefore, various IE techniques have been developed to enable the use of IE for textual data. However, each technique differs from one another because it is designed for different data types and has different target information to be extracted. This study investigated and described the most contemporary methods for extracting information from textual data, emphasizing their benefits and shortcomings. To provide a holistic view of the domain, this comprehensive systematic literature review employed a systematic mapping process to summarize studies published in the last six years (from 2017 to 2022). It covers fundamental concepts, recent approaches, applications, and trends, in addition to challenges and future research prospects in this domain area. Based on an analysis of 161 selected studies, we found that the state-of-the-art models employ deep learning to extract information from textual data. Finally, this study aimed to guide novice and experienced researchers in future research and serve as a foundation for this research area.
IICE2023
Development of Analytical Thinking Using Flipped Classroom Approach for Big Data Analytics Courses
Norshakirah Aziz, Emelia Akashah Patah Akhir, Said Jadid Abdulkadir, and 2 more authors
In The IAFOR International Conference on Education – Hawaii 2023 Official Conference Proceedings, 2023
This research clarified the development of analytical thinking skills through 10 hands-on lab sessions conducted among undergraduates at Universiti Teknologi PETRONAS (UTP). This systematic review study examined 30 publications from 2017 to 2022 that were discovered through a comprehensive systematic mapping process for a more in-depth analysis. Previous studies indicate that most instructors and students believe the flipped classroom approach improved analytical thinking among undergraduates. Therefore, in order to determine the effectiveness of the flipped classroom approach for the development of analytical thinking, 134 UTP undergraduates enrolled in Big Data Analytics (BDA) course are considered as the participants in this study. One complete module with detailed teaching and learning activities (TLA) was developed for an immersive learning experience, and students were provided with pre-class instruction. The performance of the flipped classroom approach was evaluated using immersive learning experiences and a student satisfaction survey. The results show that the flipped classroom approach is successful in developing students’ analytical thinking skills among the participants of this study. The research has been reviewed and approved by the university through the Scholarship of the Teaching of Learning (SoTL).
ICETIS2022
Event Detection and Information Extraction Strategies from Text: A Preliminary Study Using GENIA Corpus
Mohd Hafizul Afifi Abdullah, Norshakirah Aziz, Said Jadid Abdulkadir, and 2 more authors
In Proceedings of the 2nd International Conference on Emerging Technologies and Intelligent Systems , 2023
In the world we live today, data is the new oil. Data can reveal hidden knowledge that gives us an advantage over our competitors. However, data that are present in an unstructured form such as text documents are difficult to be processed by conventional machine learning algorithms. Therefore, in this study, we attempted to perform information extraction from textual data using current and state-of-the-art models to understand their working mechanisms. To perform this study, we have chosen the GENIA corpus for evaluating the performance of each model. These selected event extraction models are evaluated based on specific measures which are precision, recall, and F-1 measure. The result of our study shows that the DeepEventMine model has scored the highest for trigger detection with a precision of 79.17%, recall at 82.93%, and F-1 measure at 81.01%. Similarly, for event detection, the DeepEventMine model has scored highest among other models with a precision of 65.24%, recall at 55.93%, and F-1 measure at 60.23% based on the selected corpus.
ICETIS2022
Predictive Analytics for Oil and Gas Asset Maintenance Using XGBoost Algorithm
Norshakirah Aziz, Mohd Hafizul Afifi Abdullah, Nurul Aida Osman, and 2 more authors
In Proceedings of the 2nd International Conference on Emerging Technologies and Intelligent Systems , 2023
One of the most important aspects of the oil and gas industry is asset management at their respective platforms. Without proper asset management, it will lead to various unexpected scenarios including an increase in plant deterioration, increased chances of accidents and injuries, and breakdown of assets at unexpected times which will lead to poor and hurried maintenance. Given the significant economic contribution of the oil and gas sector to oil-producing countries like Malaysia, accurate asset maintenance prediction is essential to ensure that the oil and gas platform can manage its operations profitably. This research identifies the parameters affecting the asset failure on oil and platform that will be interpreted using the XGBoost gradient boosting model from machine learning libraries. The model is used to predict the asset’s lifetime based on readings collected from the sensors of each machine. From result, our prediction method using XGBoost for asset maintenance has presented a 6.43% increase in classification accuracy as compared to the Random Forest algorithm.
2022
JCS
Optimizing deep neuro-fuzzy classifier with a novel evolutionary arithmetic optimization algorithm
Noureen Talpur, Said Jadid Abdulkadir, Hitham Alhussian, and 2 more authors
Deep Neuro-Fuzzy System has been successfully employed in various applications. But, the model faces two issues: (i) dataset with many features exponentially increases the fuzzy rule-base, (ii) parameters in the fuzzy rule-base are optimized using the gradient descent approach, which has the drawback of local minima. Therefore, this study aims on improving the model’s accuracy by proposing Arithmetic Optimization Algorithm. The outcomes using the Arithmetic Optimization Algorithm for feature selection have not only reduced the burden of implementing a huge dataset, but the Arithmetic Optimization-based deep neuro-fuzzy system has outperformed with 95.14% accuracy compared to the standard method with 94.52%.
2021
Thesis
Improved Personalised Data Modelling Using Parameter Independent Fuzzy Weighted k-Nearest Neighbour for Spatio/Spectro-Temporal Data
Machine learning technologies have been growing rapidly in recent years. Researchers have come up with several data processing architectures, enabling machines to consume, interpret, and produce understandable output from real-world data to improve the quality of our lives. The NeuCube architecture is a data processing architecture for spatio/spectro-temporal data which consists of four main modules: a spike encoding module, a recurrent SNN reservoir, an output module, and an optimization module. Despite it has been utilised on many various applications, most improvement of the architecture focuses on user experience rather than improving the result accuracy. Upon exploration of the architecture, the weighted k-nearest neighbours algorithm used for the classification module is found to be prone to misclassification as it relies solely on the majority voting rule to determine the class for new data vector. Additionally, it does not consider the class-specific fuzzy weight information during the classification process. Therefore, a data modelling mechanism which implements PIfwkNN classifier algorithm for improving the overall classification accuracy of the NeuCube architecture has been proposed. The proposed data modelling applies an additional class-specific fuzzy weight information to new data vectors during the classification process. In this research, the optimal parameters set for experiments has also been identified. The approach has been validated by using the Kuala Krai Rainfall Dataset, Dow Jones Index Data Set, and Gold Price and Performance Dataset for the 3-days earlier and 1-day earlier event prediction. From the experiments, the improved personalised data modelling using PIfwkNN classifier has shown a significant increase in terms of overall classification accuracy as compared to the conventional MLP, fkNN, and NeuCube with wkNN classifier.
2020
SCDM2020
A Spiking Neural Networks Model with Fuzzy-Weighted k-Nearest Neighbour Classifier for Real-World Flood Risk Assessment
Mohd Hafizul Afifi Abdullah, Muhaini Othman, Shahreen Kasim, and 2 more authors
In Recent Advances on Soft Computing and Data Mining, 2020
Inspired by the brain working mechanism, the spiking neural networks has proven the capability of revealing significant association between different variables spike behavior during an event. The combination of the capability of SNN to produce personalised model has allowed high-precision for data classification. The exiting accuracy of weighted k-nearest neighbors classifier being used in the spiking neural networks architecture, noticeably can be further improved by implementing fuzzy-weights on the features, therefore allowing data to be classified more precisely to the high-impacting features. Simulation has been done by using three classifiers—Multi-layer Perceptron, weighted k-nearest neighbors, and Fuzzy-weighted k-nearest neighbors (FwkNN) using a real-world flood case study dataset and two benchmark dataset. Based on the result using the Kuala Krai Rainfall Dataset, FwkNN classifier has improved accuracy by 3.48% and 3.57% for 3-days earlier and 1-day earlier classification respectively. As compared to, FwkNN classifier has proven the capability to reduce misclassification and increase the accuracy of dataset classification.
ICCI2020
Predictive Analytics for Crude Oil Price Using RNN-LSTM Neural Network
Norshakirah Aziz, Mohd Hafizul Afifi Abdullah, and Ahmad Naqib Zaidi
In 2020 International Conference on Computational Intelligence (ICCI), 2020
Prediction of future crude oil price is considered a significant challenge due to the extremely complex, chaotic, and dynamic nature of the market and stakeholder’s perception. The crude oil price changes every minute, and millions of shares ownerships are traded everyday. The market price for commodity such as crude oil is influenced by many factors including news, supply-and-demand gap, labour costs, amount of remaining resources, as well as stakeholders’ perception. Therefore, various indicators for technical analysis have been utilized for the purpose of predicting the future crude oil price. Recently, many researchers have turned to machine learning approached to cater to this problem. This study demonstrated the use of RNN-LSTM networks for predicting the crude oil price based on historical data alongside other technical analysis indicators. This study aims to certify the capability of a prediction model built based on the RNN-LSTM network to predict the future price of crude oil. The developed model is trained and evaluated against accuracy matrices to assess the capability of the network to provide an improvement of the accuracy of crude oil price prediction as compared to other strategies. The result obtained from the model shows a promising prediction capability of the RNN-LSTM algorithm for predicting crude oil price movement.
2019
IJEECS
Evolving spiking neural networks methods for classification problem: a case study in flood events risk assessment
Mohd Hafizul Afifi Abdullah, Muhaini Othman, Shahreen Kasim, and 1 more author
Indonesian Journal of Electrical Engineering and Computer Science, 2019
Analysing environmental events such as predicting the risk of flood is considered as a challenging task due to the dynamic behaviour of the data. One way to correctly predict the risk of such events is by gathering as much of related historical data and analyse the correlation between the features which contribute to the event occurrences. Inspired by the brain working mechanism, the spiking neural networks have proven the capability of revealing a significant association between different variables spike behaviour during an event. Personalised modelling, on the other hand, allows a personal model to be created for a specific data model and experiment. Therefore, a personalised modelling method incorporating spiking neural network is used to create a personalised model for assessing a real-world flood case study in Kuala Krai, Kelantan based on historical data of 2012-2016 provided by Malaysian Meteorological Department. The result shows that the method produces the highest accuracy among the selected compared algorithms.
IJEECS
A review on data clustering using spiking neural network (SNN) models
Siti Aisyah Mohamed, Muhaini Othman, and Mohd Hafizul Afifi Abdullah
The evolution of Artificial Neural Network recently gives researchers an interest to explore deep learning evolved by Spiking Neural Network clustering methods. Spiking Neural Network (SNN) models captured neuronal behaviour more precisely than a traditional neural network as it contains the theory of time into their functioning model [1]. The aim of this paper is to reviewed studies that are related to clustering problems employing Spiking Neural Networks models. Even though there are many algorithms used to solve clustering problems, most of the methods are only suitable for static data and fixed windows of time series. Hence, there is a need to analyse complex data type, the potential for improvement is encouraged. Therefore, this paper summarized the significant result obtains by implying SNN models in different clustering approach. Thus, the findings of this paper could demonstrate the purpose of clustering method using SNN for the fellow researchers from various disciplines to discover and understand complex data.
2018
SCDM2018
M-DCocoa: M-Agriculture Expert System for Diagnosing Cocoa Plant Diseases
Munirah Mohd Yusof, Nur Fazliyana Rosli, Muhaini Othman, and 2 more authors
In Recent Advances on Soft Computing and Data Mining, 2018
Major technological advancements were experienced including mobile applications in the various domain. The advancement in mobile applications not only used for our daily life and chores but it leads to more specific and technical purposes such as in medical, engineering, agriculture and education domain. This paper aims to study the implementation of mobile systems in agriculture and proposes a development of M-Agriculture that help in diagnosing cocoa plant diseases named as M-DCocoa. This application enables a user to recognize cocoa diseases afflict by the plant and provide user appropriate advice or treatments in shorter time period. The user will answer the questions based on cocoa plant condition or symptoms and the application generates the answer in form of disease and treatments. A rule-based and forward chaining inference engine has been used as part of the system development. With this application, it helps and allows the user to recognize cocoa diseases with useful treatments suggestion.
SCDM2018
A Framework to Cluster Temporal Data Using Personalised Modelling Approach
Muhaini Othman, Siti Aisyah Mohamed, Mohd Hafizul Afifi Abdullah, and 2 more authors
In Recent Advances on Soft Computing and Data Mining, 2018
This research paper is focused on the framework design of temporal data by using personalised modelling approach in order to cluster the temporal data. Real world problem on flood occurrences is used as a case study focusing only in Malaysia region. The data are designed according to the criteria needed for temporal data clustering, tested with three clustering techniques including K-means, X-means, and K-medoids. Rapid Miner is used for conducting the clustering processes. Finally, the result from each clustering method is compared to conclude and justify the best clustering approach for clustering temporal data.
MUCET2017
Empowering Self-Management through M-Health Applications
Muhaini Othman, Norhafizah Mohd Halil, M Mohd Yusof, and 2 more authors
The advancement in mobile technology has led towards a new frontier of medical intervention that never been thought possible before. Through the development of MedsBox Reminder (MBR) application for Android as a pilot project of M-Health, health care information system for patient selfmanagement is made possible. The application acts as an assistant to remind users for their timely medicine intake by notifying them through their mobile phone. MedsBox Reminder application aims to facilitate in the self-management of patient’s health where they can monitor and schedule their own medicine intake more efficiently. Development of the application is performed using Android Studio 1.4, Android SDK, MySQL database, SQLite, Java language and Netbeans IDE 8.1. Object-Oriented System Development (OOSD) methodology has been adapted to facilitate the development of the application.
2017
JOIV
Human Resource Management on Cloud
Muhaini Othman, Mastura Arif, Mohd Hafizul Afifi Abdullah, and 2 more authors
JOIV: International Journal on Informatics Visualization, 2017
iHRMS is a diligent method to replace the traditional employee management system used by small and mid-scale companies and businesses for managing the employee attendance and payroll. Most traditional methods for managing employee lack the capability to capture employee attendance in real time, hence unable to track employees’ punctuality. In another point of view, the problem may affect the overall performance of the employees in an organisation. iHRMS has been introduced not only to cater the said problem but to assist in delivering a more accurate payroll system, therefore, enabling an organisation to better manage their finance.
JOIV
iBid: A Competitive Bidding Environment for Multiscale Tailor
Muhaini Othman, Siti Najiha Shamsudin, Mohd Hafizul Afifi Abdullah, and 2 more authors
JOIV: International Journal on Informatics Visualization, 2017
Nowadays, various online auction web services are available, allowing people to bid on items to be purchased at a competitive price. The same approach is applicable to allow people to bid on projects on Freelancer website. Here, we present an environment for customers to publish a project online, whereby marketers are able to bid on projects, called the iBid system. The iBid system demonstrates an application of bidding system which is capable of assisting customers find local tailors according to three criteria namely location, type of sewing and cost. Reversed auction mechanism is used where the customer will control the business. The prototyping methodology approach has been used to develop the system running on a PHP server and a MySQL database.
JOIV
Cakelicious: Web App for Designing a Customised Wedding Cakes
Muhaini Othman, Mohd Shuqor Nordin, Mohd Hafizul Afifi Abdullah, and 2 more authors
JOIV: International Journal on Informatics Visualization, 2017
In the fast-paced changing world, the Internet keeps people connected to each other. Online shopping has changed the way people buy things, and so does how people book flight tickets and movie passes. Cakelicious Web App is another interesting story of how we revolutionize the way people book wedding cakes the way they love it. The system is designed to replace the current manual booking methods used by Dr. Munie’s Kitchen for managing cakes order, thus is more efficient and effective, as well as meets the user requirements. Prototyping methodology approach has been used to develop and test the system in a systematic manner, which includes the development phases of planning, design, and testing and implementation. This system is developed using the PHP programming language, MySQL database, and runs on an Apache web server.
Book Chapter
Comparative Analysis of Spatio/Spectro-Temporal Data Modelling Techniques
Mohd Hafizul Afifi Abdullah, Muhaini Othman, and Shahreen Kasim
Recent advancement of technology and engineering provides opportunities to explore the idea for the development of an integrated sensing, prediction and alert system of natural disaster events. Therefore, several machine learning techniques have been explored by researchers to harvest knowledge from nature occurrences for prediction of environmental events. The aim of this paper is to investigate, compare and contrast three data modelling techniques, and five data classification training algorithms for spatio-temporal data using real world flood event case study data. A discussion is presented to identify the strengths and weaknesses of each data modelling techniques and training algorithms.
2016
ICAIR-CACRE ’16
An improved computational framework using one stage filtration by incorporating knowledge in gene expression clustering
Shahreen Kasim, Mohd Farhan Md Fudzee, Mohamad Aizi Salamat, and 3 more authors
In Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering, 2016
The field of gene expression data analysis has grown in the past few years from being purely data-centric to integrative, aiming at complementing microarray analysis with data and knowledge from diverse available sources. Given a gene expression dataset, the challenge is to predict gene function given the intensity and redundancy of the data and the lack of similarity of expression profiles without degrading the analysis performance and also without assigning genes at random. At the same time, the computational method must be capable of producing highly compacted clusters with furthest separation, high consistency, and accuracy. In this paper, we present a new computational framework for clustering gene expression data. From this experiment, we can conclude that our new framework capable to improve the accuracy thus determined the dominant gene.
UG Thesis
Research on Improving Dominant Yeast Genes for Gene Function Prediction
The field of gene expression data analysis has grown in the past few years from being purely data-centric to integrative, aiming at complementing micro-array analysis with data and knowledge from diverse available sources. Given a gene expression dataset, the challenge is to predict gene function given the intensity and redundancy of the data and the lack of similarity of expression profiles without degrading the analysis performance and also without assigning genes at random. At the same time, the computational method must be capable of producing highly compacted clusters with furthest separation, high consistency, and accuracy. In this paper, we present a new computational framework for clustering gene expression data. From this experiment, we can conclude that our new framework capable to improve the accuracy thus determined the dominant gene.