2021 9th International Japan-Africa Conference on Electronics, Communications, and Computations (JAC-ECC), 2021
Finding semantic similarity between sentences is very useful in many NLP applications, such as in... more Finding semantic similarity between sentences is very useful in many NLP applications, such as information retrieval, plagiarism detection, information extraction, and machine translation. Limitations in Arabic language resources have led to a poor level of research in Arabic sentence similarity. This challenge makes identifying semantically similar sentences in Arabic even more difficult. This paper presents a new Arabic dataset for the sentence similarity task. This dataset can be used to help develop sentence similarity approaches. In addition, the main purpose of the created dataset is to evaluate the sentence similarity approach. The dataset has been collected from Wikipedia, an intermediate lexicon, and other WWW resources. This paper gives more details about the processes of collecting data, filtering, preprocessing the pairs of sentences and some statistics about the dataset, for building a benchmark for semantic textual similarity. The dataset is available for future research in this field. The experiment shows that the created dataset is an efficient tool for evaluating semantic similarity approaches for the Arabic language.
Uploads
Papers by Khaled fathy