Papers by Djamila Romaissa Beddiar

Applied Sciences
Medical image captioning is a very challenging task that has been rarely addressed in the literat... more Medical image captioning is a very challenging task that has been rarely addressed in the literature on natural image captioning. Some existing image captioning techniques exploit objects present in the image next to the visual features while generating descriptions. However, this is not possible for medical image captioning when one requires following clinician-like explanations in image content descriptions. Inspired by the preceding, this paper proposes using medical concepts associated with images, in accordance with their visual features, to generate new captions. Our end-to-end trainable network is composed of a semantic feature encoder based on a multi-label classifier to identify medical concepts related to images, a visual feature encoder, and an LSTM model for text generation. Beam search is employed to ensure the best selection of the next word for a given sequence of words based on the merged features of the medical image. We evaluated our proposal on the ImageCLEF medic...

Human activity recognition (HAR) systems attempt to automatically identify and analyze human acti... more Human activity recognition (HAR) systems attempt to automatically identify and analyze human activities using acquired information from various types of sensors. Although several extensive review papers have already been published in the general HAR topics, the growing technologies in the field as well as the multi-disciplinary nature of HAR prompt the need for constant updates in the field. In this respect, this paper attempts to review and summarize the progress of HAR systems from the computer vision perspective. Indeed, most computer vision applications such as human computer interaction, virtual reality, security, video surveillance and home monitoring are highly correlated to HAR tasks. This establishes new trend and milestone in the development cycle of HAR systems. Therefore, the current survey aims to provide the reader with an up to date analysis of vision-based HAR related literature and recent progress in the field. At the same time, it will highlight the main challenges...

Artificial Intelligence Review
Automatically understanding the content of medical images and delivering accurate descriptions is... more Automatically understanding the content of medical images and delivering accurate descriptions is an emerging field of artificial intelligence that combines skills in both computer vision and natural language processing fields. Medical image captioning is involved in various applications related to diagnosis, treatment, report generation and computer-aided diagnosis to facilitate the decision making and clinical workflows. Unlike generic image captioning, medical image captioning highlights the relationships between image objects and clinical findings, which makes it a very challenging task. Although few review papers have already been published in this field, their coverage is still quite limited and only particular problems are addressed. This motivates the current paper where a rapid review protocol was adopted to review the latest achievements in automatic medical image captioning from the medical domain perspective. We aim through this review to provide the reader with an up-to...

2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA)
According to the World Health Organization, falling of the elderly is a major health problem that... more According to the World Health Organization, falling of the elderly is a major health problem that causes many injuries and thousands of deaths every year. This increases pressure on health authorities to provide daily health care, reliable medical assistance, reduce fall damages and improve the elderly quality of life. For that, it is a priority to detect or predict falls accurately. In this paper, we present a fall detection approach based on human body geometry inferred from video sequence frames. We calculate the angular information between the vector formed by the head centroid of the identified facial image and the center hip of the body and the vector aligned with the horizontal axis of the center hip. Similarly, we calculate the distance between the vector formed by the head and the body center hip and the vector formed on its horizontal axis; we then construct distinctive image features. These angles and distances are then used to train a twoclass SVM classifier and a Long Short-Term Memory network (LSTM) on the calculated angle sequences to classify falls and nofalls activities. We perform experiments on the Le2i fall detection dataset. The results demonstrate the effectiveness and efficiency of the developed approach.

2017 8th International Conference on Information Technology (ICIT), 2017
The recognition of human activities in the field of video surveillance is attracting more researc... more The recognition of human activities in the field of video surveillance is attracting more researchers. This has led to various approaches and proposals using different methods and techniques. The growing interest in the surveillance has also led researchers to give importance to abnormal human activities in order to propose appropriate and dedicated techniques to this type of activities. Unfortunately, the made proposals until now in this new field are ineffective and are adapted from those dedicated for normal human activities with minor modifications. They also suffer from several limitations and inadequacies and are very restricted because of the very limited number of works and syntheses. Therefore, this paper is an overview which provides a synthesis and an analysis of the existing works on the recognition of abnormal activities in order to provide researchers with a general view on the state of the art and be a help to propose new approaches.
2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA), 2020

2020 25th International Conference on Pattern Recognition (ICPR), 2021
Human activity recognition plays a central role in the development of intelligent systems for vid... more Human activity recognition plays a central role in the development of intelligent systems for video surveillance, public security, health care and home monitoring, where detection and recognition of activities can improve the quality of life and security of humans. Typically, automated, intuitive and real-time systems are required to recognize human activities and identify accurately unusual behaviors in order to prevent dangerous situations. In this work, we explore the combination of three modalities (RGB, depth and skeleton data) to design a robust multi-modal framework for vision-based human activity recognition. Especially, spatial information, body shape/posture and temporal evolution of actions are highlighted using illustrative representations obtained from a combination of dynamic RGB images, dynamic depth images and skeleton data representations. Therefore, each video is represented with three images that summarize the ongoing action. Our framework takes advantage of transfer learning from pre-trained models to extract significant features from these newly created images. Next, we fuse extracted features using Canonical Correlation Analysis and train a Long Short-Term Memory network to classify actions from visual descriptive images. Experimental results demonstrated the reliability of our feature-fusion framework that allows us to capture highly significant features and enables us to achieve the state-of-the-art performance on the public UTD-MHAD and NTU RGB+D datasets.

Pattern Recognition. ICPR International Workshops and Challenges, 2021
Falling is a major health problem that causes thousands of deaths every year, according to the Wo... more Falling is a major health problem that causes thousands of deaths every year, according to the World Health Organization. Fall detection and fall prediction are both important tasks that should be performed efficiently to enable accurate medical assistance to vulnerable population whenever required. This allows local authorities to predict daily health care resources and reduce fall damages accordingly. We present in this paper a fall detection approach that explores human body geometry available at different frames of the video sequence. Especially, the angular information and the distance between the vector formed by the head-centroid of the identified facial image-and the center hip of the body, and the vector aligned with the horizontal axis of the center hip, are then used to construct distinctive image features. A two-class SVM classifier is trained on the newly constructed feature images, while a Long Short-Term Memory (LSTM) network is trained on the calculated angle and distance sequences to classify falls and non-falls activities. We perform experiments on the Le2i fall detection dataset and the UR FD dataset. The results demonstrate the effectiveness and efficiency of the developed approach.
Journal of Visual Communication and Image Representation, 2021

Online Social Networks and Media, 2021
With proliferation of user generated contents in social media platforms, establishing mechanisms ... more With proliferation of user generated contents in social media platforms, establishing mechanisms to automatically identify toxic and abusive content becomes a prime concern for regulators, researchers, and society. Keeping the balance between freedom of speech and respecting each other dignity is a major concern of social media platform regulators. Although, automatic detection of offensive content using deep learning approaches seems to provide encouraging results, training deep learning-based models requires large amounts of high-quality labeled data, which is often missing. In this regard, we present in this paper a new deep learning-based method that fuses a Back Translation method, and a Paraphrasing technique for data augmentation. Our pipeline investigates different word-embeddingbased architectures for classification of hate speech. The back translation technique relies on an encoder-decoder architecture pre-trained on a large corpus and mostly used for machine translation. In addition, paraphrasing exploits the transformer model and the mixture of experts to generate diverse paraphrases. Finally, LSTM, and CNN are compared to seek enhanced classification results. We evaluate our proposal on five publicly available datasets; namely, AskFm corpus, Formspring dataset, Warner and Waseem dataset, Olid, and Wikipedia toxic comments dataset. The performance of the proposal together with comparison to some related state-of-art results demonstrate the effectiveness and soundness of our proposal.

The action of understanding and interpretation of medical images is a very important task in the ... more The action of understanding and interpretation of medical images is a very important task in the medical diagnosis generation. However, manual description of medical content is a major bottleneck in clinical diagnosis. Many research studies were devoted to develop automated alternatives to this process, which would have enormous impact in terms of efficiency, cost and accuracy in the clinical workflows. Different approaches and techniques have been presented in the literature ranging from traditional machine learning methods to deep learning based models. Inspired by the outperforming results of the later techniques, we present in the current paper, our team participation (RomiBed) to the ImageCLEF medical caption prediction task. We addressed the challenge of medical image captioning by combining a CNN encoder model with an attention-based GRU language generator model whereas a multi-label CNN classifier is used for the concept detection task. Using the provided data in the trainin...
Le 2eme Conference Internationale sur intelligence Artificielle et les Technologies Information I... more Le 2eme Conference Internationale sur intelligence Artificielle et les Technologies Information ICAIIT 2019

Multimedia Tools and Applications, 2020
Human activity recognition (HAR) systems attempt to automatically identify and analyze human acti... more Human activity recognition (HAR) systems attempt to automatically identify and analyze human activities using acquired information from various types of sensors. Although several extensive review papers have already been published in the general HAR topics, the growing technologies in the field as well as the multi-disciplinary nature of HAR prompt the need for constant updates in the field. In this respect, this paper attempts to review and summarize the progress of HAR systems from the computer vision perspective. Indeed, most computer vision applications such as human computer interaction, virtual reality, security, video surveillance and home monitoring are highly correlated to HAR tasks. This establishes new trend and milestone in the development cycle of HAR systems. Therefore, the current survey aims to provide the reader with an up to date analysis of vision-based HAR related literature and recent progress in the field. At the same time, it will highlight the main challenges...
Uploads
Papers by Djamila Romaissa Beddiar