Papers by ARIFAH CHE ALHADI

AIP Conference Proceedings
The short text retrieval is gaining tremendous attention due to rapid exposure and widespread usa... more The short text retrieval is gaining tremendous attention due to rapid exposure and widespread usage of social networks. This type of text is widely used in various applications such as spam filtering, tweet classification, questionanswer systems, web search and so forth. Thus, the effective analysis of these texts is of vital need. But, short text retrieval often becomes complex because they are usually contains repeated words, less features and overloaded with noise. Different lexical similarity approaches have been developed to tackle the problem of lexical analysis. However, most of them are in English, and the solutions for problems involving Malay are of limited availability. This paper examined the reliability of lexical similarity approaches focusing on term-based approaches for Malay short text analysis. The Cosine, Overlap coefficient and Jaccard similarity methods were used to quantify the similarities of some news title written in Malay news. Results show that the selected approaches term-based has potential to be used to analyze Malay short texts.

Universiti Malaysia Terengganu Journal of Undergraduate Research
Many tourism companies provide boat services to the Terengganu Islands, but most of them only adv... more Many tourism companies provide boat services to the Terengganu Islands, but most of them only advertise their boat services on social media or the company’s website. Therefore, the Boat Management System to Terengganu Islands (BoatMSTI) was developed, and the main objective of this system is to create a one-stop center platform. Tourists can compare boat services provided by different tourism companies and find the best and most affordable services on one website. Tourists can check ticket availability and book ticket directly based on the selection of the boat service. BoatMSTI also enables the tourist to view other customer’s feedback and rating. At the same time, this system can help the tourism company manage boat services and customer booking tickets. This system is developed using an Agile method which makes the system development more flexible. The general concept is to divide the development of the system into sequences of repeated cycles known as iteration. The system can b...

Internet menjadi pilihan sebagai prasarana asas bagi mendapatkan maklumat digital pelbagai topik ... more Internet menjadi pilihan sebagai prasarana asas bagi mendapatkan maklumat digital pelbagai topik dari seluruh dunia. Namun demikian kebanyakan dokumen web dalam Internet ini adalah tidak berstruktur dan tidak mempunyai maklumat semantik dokumen. Sistem pengekstrakan maklumat yang ada lebih memfokuskan kepada pengekstrakan konsep penting dalam mewakili kandungan dokumen tanpa mengambil kira aspek semantik. Perwakilan kandungan maklumat dalam bentuk kaya semantik merupakan salah satu visi web semantik. Kertas ini membincangkan pengaplikasian pendekatan ontologi dan pemprosesan bahasa tabii dalam menyokong pengekstrakan dan perwakilan maklumat semantik dokumen web. Memandangkan penganotasian maklumat semantik secara manual daripada dokumen web adalah tidak praktikal dan pembangunan sistem automatik sepenuhnya masih terlalu awal untuk diimplementasikan, maka pendekatan separa-automatik telah diusulkan. Dalam hal ini, sistem berfungsi untuk memandu pengguna dalam pemodelan semantik dokumen web yang seterusnya menghasilkan kandungan dokumen web atau set dokumen web yang lebih kaya semantik. Model semantik yang dijana diwakilkan dalam format XML.

Bad News Travel More: A Content-based Analysis of Interestingness on Twitter
On the microblogging site Twitter, users can forward any message they receive to all of their fol... more On the microblogging site Twitter, users can forward any message they receive to all of their followers. This is termed a retweet and is usually done when users find a message particularly interesting and worth sharing with others. Thus, retweets reflect what the Twitter community considers interesting on a global scale, and can be used to generate a model to describe the content-based characteristics of retweets. In this paper, we use this model to deduce what makes a message on Twitter interesting. The question of what causes a message to be retweeted has frequently been addressed, but mainly in a scenario of retweet prediction for a given user and with a focus on the structure of the social network [3, 4, 5]. In this case a typical observation is that a well connected user with active followers is more likely to get retweeted. As the content of a tweet in such a setting is neglected or reduced to a few very simple features, a network-based analysis of retweets may give hints on w...

Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not th... more Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically processed, retrieved and explored by computer applications. Existing information extraction system mainly concerns with extracting important keywords or key phrases that represent the content of the documents. The semantic aspects of such keywords have not been explored extensively. In this paper we propose an approach meant to assist in extracting and modeling the semantic information content of web documents using natural language analysis technique and a domain specific ontology. Together with the user’s participation, the tool gradually extracts and constructs the semantic document model which is represented as XML. The semantic models representing each document are then being integrated to form a global semantic model. Such a model provides users with a global knowledge model of some domains.

This paper presents the Institute for Web Science and Technology's contribution to the TREC2011 M... more This paper presents the Institute for Web Science and Technology's contribution to the TREC2011 Microblog Track. The goal of the Microblog Track is to address the user's information need in which a user wishes to see not only the most recent but also the most interesting and relevant information to a query in Twitter. In this paper we present the LiveTweet system, submitted by the Institute for Web Science and Technologies (WeST) from the University of Koblenz-Landau. The system addresses two issues of microblog media: sparsity and its effect on document length normalization, as well as the problem of assessing content quality. We provide the following approaches to overcome these issues: ignoring length normalization and using interestingness as a static quality measure to find the most recent and interesting tweets related to a given query topic. The results in similar settings have shown that deliberately ignoring length normalization yields better retrieval results in general and that interestingness improves retrieval for underspecified queries.

Computational Approaches in Supporting Special Education Domain: A Review
Journal of Telecommunication, Electronic and Computer Engineering, 2017
Children with learning disabilities, emotional and behavioral problems are unable to accommodate ... more Children with learning disabilities, emotional and behavioral problems are unable to accommodate with the standard educational programs. They are known as special children. Thus, special education is needed to support teaching and learning for this special children. In recent years, there has been an increase in the use of computational approaches to simplify various issues in special education. However, a comprehensive review that gives an overview about to what extent computational approaches are integrated and applied to support various issues in special education domain is still lacking. Thus, the objective of the paper is to explore to what extent the existing computational approaches are used in supporting the field of special education recently especially in categorizing children with learning disabilities and recommending an appropriate technique to increase their quality of life. Systematic Literature Review (SLR) is applied to perform this study. As a consequence, only stu...
Microblogging is a new way of communication among people which allows them to disseminate message... more Microblogging is a new way of communication among people which allows them to disseminate messages via web, mobile phone, email or instant messaging. In 2010, microbloggers generated 65 million messages a day on Twitter alone. Our hypothesis is that tweeting is an activity which its users per- form in order to stills some needs. In this paper, we describe an approach for analysing user purposes in writing single tweets and organize these purposes to taxonomy. We nd that people use microblogging for eight dierent purpose, e.g. promotion, social interaction and expressing emotions. We aim to classify the tweet into categories of purposes.
Capaian secara semantik dokumen web berasaskan domain ontologi

Semantic Extraction and Representation of Web Documents Based on Domain Ontology
Asia-Pacific Journal of Information Technology and Multimedia, 2007
Internet menjadi pilihan sebagai prasarana asas bagi mendapatkan maklumat digital pelbagai topik ... more Internet menjadi pilihan sebagai prasarana asas bagi mendapatkan maklumat digital pelbagai topik dari seluruh dunia. Namun demikian kebanyakan dokumen web dalam Internet ini adalah tidak berstruktur dan tidak mempunyai maklumat semantik dokumen. Sistem pengekstrakan maklumat yang ada lebih memfokuskan kepada pengekstrakan konsep penting dalam mewakili kandungan dokumen tanpa mengambil kira aspek semantik. Perwakilan kandungan maklumat dalam bentuk kaya semantik merupakan salah satu visi web semantik. Kertas ini membincangkan pengaplikasian pendekatan ontologi dan pemprosesan bahasa tabii dalam menyokong pengekstrakan dan perwakilan maklumat semantik dokumen web. Memandangkan penganotasian maklumat semantik secara manual daripada dokumen web adalah tidak praktikal dan pembangunan sistem automatik sepenuhnya masih terlalu awal untuk diimplementasikan, maka pendekatan separa-automatik telah diusulkan. Dalam hal ini, sistem berfungsi untuk memandu pengguna dalam pemodelan semantik dokum...
The partitional and incremental clustering are the common models in mining data in large database... more The partitional and incremental clustering are the common models in mining data in large databases. However, some models are better than the others due to the types of data, time complexity, and space requirement. This paper describes the performance of partitional and incremental models based on the number of clusters and threshold values. Experimental studies shows that partitional clustering outperformed when the number of cluster increased, while the incremental clustering outperformed when the threshold value decreased. Keywords: Clustering, partitional, incremental, distance.

An Ensemble Similarity Model for Short Text Retrieval
The rapid growth of World Wide Web has extended Information Retrieval related technology such as ... more The rapid growth of World Wide Web has extended Information Retrieval related technology such as queries for information needs become more easily accessible. One such platform is online question answering (QA). Online community can posting questions and get direct response for their special information needs using various platforms. It creates large unorganized repositories of valuable knowledge resources. Effective QA retrieval is required to make these repositories accessible to fulfill users information requests quickly. The repositories might contained similar questions and answer to users newly asked question. This paper explores the similarity-based models for the QA system to rank search result candidates. We used Damerau-Levenshtein distance and cosine similarity model to obtain ranking scores between the question posted by the registered user and a similar candidate questions in repository. Empirical experimental results indicate that our proposed ensemble models are very e...

ICT as a Tool for Screening Student with Specific Learning Disabilities
Screening of student that may be at risk for specific learning disabilities (SLD) can be an initi... more Screening of student that may be at risk for specific learning disabilities (SLD) can be an initial step to assist them overcome barriers to learning, hence preventing academic failure, school dropout, and peers rejection. Despite of concerted guideline frameworks and models for disability recognitions such as Individuals with Disabilities Education Improvement Act (IDEA), Ability–Achievement Discrepancy, Response to Intervention (RtI) and Pattern of Strengths and Weaknesses (PSW), the integration of information and communication technologies (ICT) for supporting the screening and suitable education plan for student with SLD has not received adequate attention. Thus in this study, an Ontology-based Specific Learning Disabilities Screening tool has been designed and developed. This ICT-supported tool plays a notable role for parents and teachers to identify their children or students at risks for SLD, as well as to recommend appropriate educational activities to assist their learning...
An Ontological Approach to Semantic Information Extraction and Integration of Web Documents

Short Text Computing Based on Lexical Similarity Model
Communications in Computer and Information Science
Short text similarity deals with determining the closeness of two text mean the same thing by lex... more Short text similarity deals with determining the closeness of two text mean the same thing by lexical or semantic. Various short text similarity approaches have been proposed which are based on lexical matching, semantic knowledge background or combining models. Lexical based model does not capture the actual meaning behind the words. However, semantic approach are relying on knowledge background or corpus which cannot be assumed to be available in handling such huge new word of data sparseness and noise in short text. This work are focusing on lexical-based similarity models for analysing the unstructured short text. The term-based and edit distance model are used in comparing the applicability of these model to compute the similarity value of short text. The experimental results shows that each model have their key strengths and limitations in computing similarity value of short text.

This paper presents the Institute for Web Science and Technology’s contribution to the TREC2011 M... more This paper presents the Institute for Web Science and Technology’s contribution to the TREC2011 Microblog Track. The goal of the Microblog Track is to address the user’s information need in which a user wishes to see not only the most recent but also the most interesting and relevant information to a query in Twitter. In this paper we present the LiveTweet system, submitted by the Institute for Web Science and Technologies (WeST) from the University of Koblenz-Landau. The system addresses two issues of microblog media: sparsity and its effect on document length normalization, as well as the problem of assessing content quality. We provide the following approaches to overcome these issues: ignoring length normalization and using interestingness as a static quality measure to find the most recent and interesting tweets related to a given query topic. The results in similar settings have shown that deliberately ignoring length normalization yields better retrieval results in general an...
Ont-SLD: A Domain Ontology for Learning Disability
Indian Journal of Computer Science and Engineering

International Journal of Electrical and Computer Engineering (IJECE)
The rapid development of Internet along with the wide use of social media applications produce hu... more The rapid development of Internet along with the wide use of social media applications produce huge volume of unstructured data in short text form such as tweets, text snippets and instant messages. This form of data rarely contains repeated word. It presents challenge in sentences similarity analysis as the standard text similarity models merely rely on the number of word occurrence, often resulting unreliable similarity value. Besides, the use of abbreviation, acronyms, slang, smiley, jargon, symbol or non-standard short form also contributes to the difficulty in similarity analysis. Thus, an extended ensemble similarity model approach is proposed. An experimental study has been conducted using datasets of English short sentences. The findings are very encouraging in improving the similarity value for short sentences.

International Journal of Electrical and Computer Engineering (IJECE)
Ontology-based knowledge representation is explored in special education environment as not much ... more Ontology-based knowledge representation is explored in special education environment as not much attention has been given to the area of specific learning disabilities such as dyslexia, dysgraphia and dyscalculia. Therefore, this paper aims to capture the knowledge in special education domain, represent the knowledge using ontology-based approach and make it efficient for early identification of children who might have specific learning disabilities. In this paper, the step-by-step development process of the ontology is presented by following the five phases of ontological engineering approach, which consists of specification, conceptualization, formalization, implementation, and maintenance. The details of the ontological model’s content and structure is built and the applicability of the ontology for early identification and recommendation is demonstrated.

International Journal of Engineering & Technology
In this paper, operational and complexity analysis are investigated for a proposed model of ensem... more In this paper, operational and complexity analysis are investigated for a proposed model of ensemble Artificial Neural Networks (ANN) multiple classifiers. The main idea to this is to employ more classifiers to obtain a more accurate prediction as well as to enhance the classification capabilities in case of larger data. The classification result analyzed between a single classifier and multiple classifiers followed by the estimates of upper bounds of converged functional error with the partitioning of the benchmark dataset. The estimates derived using the Apriori method shows that the proposed ensemble ANN algorithm with a different approach is feasible where such problems with a high number of inputs and classes can be solved with time complexity of O(n^k ) for some k, which is a type of polynomial. This result is in line with the significant performance achieved by the diversity rule applied with the use of reordering technique. As conclusion, an ensemble heterogeneous ANN classi...
Uploads
Papers by ARIFAH CHE ALHADI