Publications by Michel Chammas

Multimedia Tools and Applications, 2023
The extraction of paleographical features is an important task to study the identity of the text ... more The extraction of paleographical features is an important task to study the identity of the text in the Historical Manuscripts. One of the major features is the identification of the writer or copyist. Many researchers have worked on an automated system for writer identification, and with the development of deep learning techniques many approaches have been proposed. Most of the previous studies have developed a multi-steps system, while very few of them performed an End-to-End approach. Most of the systems rely on a pre-processing step to prepare the data in order to facilitate recognition. This paper presents an End-to-End deep learning system for writer identification, tested on four different datasets: ICDAR19 and ICFHR20 (Latin datasets), KHATT and Balamand (Arabic datasets). The system is based on the Deep-TEN approach using a customized ResNet-50 network for features and local descriptor extraction with an integration of a NetVLAD end-layer to compute and encode the global descriptor. It was compared with our state-of-the-art system, winner of ICFHR20 HisFrag competition, and showed an interesting performance on all datasets without any pre-processing techniques.

Multimedia Tools and Applications, 2022
Determining the writer or transcriber of historical Arabic manuscripts has always been a major ch... more Determining the writer or transcriber of historical Arabic manuscripts has always been a major challenge for researchers in the field of humanities. With the development of advanced techniques in pattern recognition and machine learning, these technologies have been applied to automate the extraction of paleographical features in order to solve this issue. This paper presents a baseline system for writer identification, tested on a
Historical Arabic dataset of 11610 single and double folio images. These texts were extracted from a unique collection of 567 Historical Arabic Manuscripts available at the Balamand Digital Humanities Center. A survey has been conducted on the available Arabic datasets and previously proposed techniques and algorithms. The Balamand dataset presents an important challenge due to the geo-historical identity of manuscripts and their physical conditions. An advanced Deep Learning system was developed and tested on three different Latin and Arabic datasets: ICDAR19, ICFHR20 and KHATT, before testing it on the Balamand dataset. The system was compared with many other systems and it has yielded a state-of-the-art performance on the new challenging images with 95.2% mean Average Precision (mAP) and 98.1% accuracy.

Multimedia Tools and Applications, Apr 7, 2022
Determining the writer or transcriber of historical Arabic manuscripts has always been a major ch... more Determining the writer or transcriber of historical Arabic manuscripts has always been a major challenge for researchers in the field of humanities. With the development of advanced techniques in pattern recognition and machine learning, these technologies have been applied to automate the extraction of paleographical features in order to solve this issue. This paper presents a baseline system for writer identification, tested on a Historical Arabic dataset of 11610 single and double folio images. These texts were extracted from a unique collection of 567 Historical Arabic Manuscripts available at the Balamand Digital Humanities Center. A survey has been conducted on the available Arabic datasets and previously proposed techniques and algorithms. The Balamand dataset presents an important challenge due to the geo-historical identity of manuscripts and their physical conditions. An advanced Deep Learning system was developed and tested on three different Latin and Arabic datasets: ICDAR19, ICFHR20 and KHATT, before testing it on the Balamand dataset. The system was compared with many other systems and it has yielded a state-of-the-art performance on the new challenging images with 95.2% mean Average Precision (mAP) and 98.1% accuracy.

2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), 2020
With the growth of artificial intelligence techniques the problem of writer identification from h... more With the growth of artificial intelligence techniques the problem of writer identification from historical documents has gained increased interest. It consists on knowing the identity of writers of these documents. This paper introduces our baseline system for writer identification, tested on a large dataset of latin historical manuscripts used in the ICDAR 2019 competition. The proposed system yielded the best results using Scale Invariant Feature Transform (SIFT) as a single feature extraction method, without any preprocessing stage. The system was compared against four teams who participated in the competition with different feature extraction methods: SRS-LBP, SIFT, Pathlet, Hinge, Co-Hinge, QuadHinge, Quill, TCC and oBIFs. An unsupervised learning system was implemented, where a deep Convolutional Neural Network (CNN) was trained using patches extracted from SIFT descriptors. Then the results were encoded using a multi -Vector of Locally Aggregated Descriptors (VLAD) and applied an Exemplar Support Vector Machine (E-SVM) at the end to compare the results. Our system achieved best performance using a single feature extraction method with 91.2% mean Average Precision (mAP) and 97.0% accuracy.

19th IEEE International Conference on Machine Learning and Applications (ICMLA 2020), 2020
With the growth of artificial intelligence techniques the problem of writer identification from h... more With the growth of artificial intelligence techniques the problem of writer identification from historical documents has gained increased interest. It consists on knowing the identity of writers of these documents. This paper introduces our baseline system for writer identification, tested on a large dataset of latin historical manuscripts used in the ICDAR 2019 competition. The proposed system yielded the best results using Scale Invariant Feature Transform (SIFT) as a single feature extraction method, without any preprocessing stage. The system was compared against four teams who participated in the competition with different feature extraction methods: SRS-LBP, SIFT, Pathlet, Hinge, Co-Hinge, QuadHinge, Quill, TCC and oBIFs. An unsupervised learning system was implemented, where a deep Convolutional Neural Network (CNN) was trained using patches extracted from SIFT descriptors. Then the results were encoded using a multi-Vector of Locally Aggregated Descrip-tors (VLAD) and applied an Exemplar Support Vector Machine (E-SVM) at the end to compare the results. Our system achieved best performance using a single feature extraction method with 91.2% mean Average Precision (mAP) and 97.0% accuracy.

CUKUROVA 5th INTERNATIONAL SCIENTIFIC RESEARCHES CONFERENCE, 2020
Since the early 90's, the TEI (Textual Encoding Initiative) has been trying to establish standard... more Since the early 90's, the TEI (Textual Encoding Initiative) has been trying to establish standards for encoding transcriptions. TEI uses XML (extensible markup language) as technology which employs tags to represents the elements that define the text. The approach is based on hierarchical, nested and structured representation of elements. TEI header contains the information describing the whole document. Most digital humanists have found that XML is easy to learn, human readable and interoperable. Even though, they are still facing some constraints: the lack of a user friendly editing and publishing environment, open tools limitation, the demand for an easy to learn and human readable language with a small learning curve, the need of a full markup text, the need of a fully searchable text, the flexibility to integrate new features (geospatial, ocr, analysis...), the interoperability, the heterogeneous form of documents, and the need of a validator and a place to store and manage the documents and information. Our proposed solution consists of developing a simple user-friendly environment for editing and publishing the digital scholarly editions and to respond to all those constraints. The platform offers a simple lightweight interface for creating dynamic forms for handling heterogeneous source of text and data. This paper presents the new approach based on a comprehensive analysis of the current practices and technologies related to digital scholarly editions. It is organized as follows: 1-Introduction about digital scholarly editions and digital publishing, and a literature review to define the use of TEI for standardization and exchangeable content, 2-Methodology based on a discussion of the current constraints stated on the Digital Scholarly Editions workshop, in the Orient Institute in Beirut, "Establishing a framework for scholarly editing and publishing in the 21st century Workshop", 3-Presentation of our solution based on the latest technologies and trends in web, followed by the platform prototype, 4-Conclusion with a recommendation for further development.

Computers & Electrical Engineering, 2019
Energy prediction is in high importance for smart homes and smart cities, since it helps reduce p... more Energy prediction is in high importance for smart homes and smart cities, since it helps reduce power consumption and provides better energy and cost savings. Many algorithms have been used for predicting energy consumption using data collected from Internet of Things (IoT) devices and wireless sensors. In this paper, we propose a system based on Multilayer Perceptron (MLP) to predict energy consumption of a building using collected information (e.g., light energy, day of the week, humidity, temperature, etc.) from a Wireless Sensor Network (WSN). We compare our system against four other classification algorithms, namely: Linear Regression (LR), Support Vector Machine (SVM), Gradient Boosting Machine (GBM) and Random Forest (RF). We achieve state-of-the-art results with 64% of the coefficient of Determination R 2 , 59.84% Root Mean Square Error (RMSE), 27.28% Mean Absolute Error (MAE) and 27.09% Mean Absolute Percentage Error (MAPE) in the testing set when using weather and temporal data.

Computers & Electrical Engineering, 2019
Energy prediction is in high importance for smart homes and smart cities, since it helps reduce p... more Energy prediction is in high importance for smart homes and smart cities, since it helps reduce power consumption and provides better energy and cost savings. Many algorithms have been used for predicting energy consumption using data collected from Internet of Things (IoT) devices and wireless sensors. In this paper, we propose a system based on Multilayer Perceptron (MLP) to predict energy consumption of a building using collected information (e.g., light energy, day of the week, humidity, temperature, etc.) from a Wireless Sensor Network (WSN). We compare our system against four other classification algorithms , namely: Linear Regression (LR), Support Vector Machine (SVM), Gradient Boosting Machine (GBM) and Random Forest (RF). We achieve state-of-the-art results with 64% of the coefficient of Determination R 2 , 59.84% Root Mean Square Error (RMSE), 27.28% Mean Absolute Error (MAE) and 27.09% Mean Absolute Percentage Error (MAPE) in the testing set when using weather and temporal data.

ABSTRACT A corpus (plural corpora) is a collection of pieces of language text in electronic form,... more ABSTRACT A corpus (plural corpora) is a collection of pieces of language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research (statistical analysis, hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory…). This paper highlights the difficulties related to building corpora using heterogeneous textual sources and suggests an adequate solution. The textual transmission of a written work may contain various types of sources (manuscripts, printed, citations, allusions…). This diversity in terms of media and forms creates a major difficulty in the process of building the corpus because it requires that all these types to be predefined previously. This operation gets more complicated when the text is a translation from another language. The main obstacle in studying the history of such texts is their heterogeneous aspect. This is due to many elements: the wide spectrum of translation dates, the different vorlagen of the translations and their lingual origins, the plurality in terms of compilations and forms. To achieve these objectives, the project will proceed in the following way: • Collecting digital copies of all known textual testimonies. • Building a digital corpus containing the transcriptions of the identified texts. • Defining types and techniques of analysis that will be performed on the corpus contents. • Designing, developing and implementing appropriate tools for textual analysis. The text may be present in two formats: • Explicit (direct): This format includes the text as the author wrote it. • Implicit (indirect): This format includes allusions to the text or contents in different types of writings. These allusions are witnesses of a certain version of the text and may contribute in identifying this tradition if they were formally presented and integrated in the corpus. Both explicit and implicit formats of text will be consolidated in the corpus by a formal taxonomy. This approach will allow the corpus, when queried, to return all the occurrences of a specific textual occurrence regardless of its format or type. Taking into consideration the diversity and the heterogeneity of these sources, the main challenge was to conceive a database capable of handling and consolidating these different types of content. Traditionally, relational databases are used in similar projects and the main issue for decision makers was mainly to select the appropriate relational database to use. In our project, we decided to go in the opposite direction. Instead of a welldesigned relational database schema, we decided to use an unstructured database. This family of databases also known, NoSQL is nonrelational, distributed, opensource and horizontally scalable. Its schemafree approach will enable us to anticipate the emergence of new text formats and will allow the corpus to host different types of texts. This approach will guarantee the following properties: • Data has a flexible schema. Corpus does not enforce document structure. This means that transcribed texts in the corpus do not need to have the same set of fields or structure, and common fields in corpus's documents may hold different types of data. • One of the most important advantages of this technological approach is the shift from developercentered to usercentered application. In Relational Database Management Systems, the load is put on the backend operations (analysis, design, development, programming…). The end user (the researcher in our case) is a consumer of the system and any modification on the level of data structure requires an intervention at the backend, and affects the frontend on several levels. Our approach gives the user the possibility to interact with the platform not only on the client side, but allows the scholar for example to define a new document type, populate it with appropriate existing fields or even add new fields. Data integrity is guaranteed by a set of validation schemes. • The platform supports indexes on any field or subfield contained in documents within the corpus. This allows the scholar to locate word occurrences in all the transcribed texts of the corpus.
Using NoSQL databases for building heterogeneous annotated corpora.
A corpus (plural corpora) is a collection of pieces of language text in electronic form, selected... more A corpus (plural corpora) is a collection of pieces of language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research (statistical analysis, hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory…). This paper highlights the difficulties related to building corpora using heterogeneous textual sources and suggests an adequate solution.

Cloud computing is a trend phenomenon in the online education world. It might advance learning an... more Cloud computing is a trend phenomenon in the online education world. It might advance learning and teaching techniques and methods with a creative collaborative virtual environment. It is a state of art internet technology that offers the user the availability of resources and services in a non spatio-temporal dependency which provides an access to large storage space and wide selection of services (Social Media, Email, Web Apps, Office suite…). Cloud computing has a prodigious potential for creating large accessible, flexible and interoperable networked environment that grants remarkable added value features to the Electronic Education (E-ducation).
The Massive Open Online Course (MOOC) is a semi-autonomous networked learning environment, distributed on the cloud, in which open resources were produced and shared by participants worldwide. This technology was the latest innovation in the last few years. Different platforms were developed by several non-profit providers (edX, Coursera, Udacity, etc...) and supported by the world’s most prestigious universities and institution such as MIT, Stanford, Harvard, etc…
Those platforms are used by millions of students around the globe. This successful innovation so far has helped the development of E-ducation. This notion yields us to start thinking of extending this innovation approach to improve research in E-ducation. This could be an online platform for research which in return could push the development of online research worldwide and increase the collaboration between universities and institutions. The platform, defined as a Massive Open Online Research (MOOR), consists of a rich online open environment for research with unlimited user participation.

Cloud computing is a trend phenomenon in the online education world. It might advance learning an... more Cloud computing is a trend phenomenon in the online education world. It might advance learning and teaching techniques and methods with a creative collaborative virtual environment. It is a state of art internet technology that offers the user the availability of resources and services in a non spatio-temporal dependency which provides an access to large storage space and wide selection of services (Social Media, Email, Web Apps, Office suite…). Cloud computing has a prodigious potential for creating large accessible, flexible and interoperable networked environment that grants remarkable added value features to the Electronic Education (E-ducation). The Massive Open Online Course (MOOC) is a semi-autonomous networked learning environment, distributed on the cloud, in which open resources were produced and shared by participants worldwide. This technology was the latest innovation in the last few years. Different platforms were developed by several non-profit providers (edX, Courser...

ABSTRACT Recently, open source social networking sites have been widely used by institutions as a... more ABSTRACT Recently, open source social networking sites have been widely used by institutions as an interactive mean to build community skills, provide teachers and students with online interactive learning opportunities, and improve students' academic performance. In Lebanon, some institutions have adopted learning management systems (like Moodle, Elgg…) as a cooperative learning platform in education. Moodle is limited to some basic features (user collaboration, group discussions, file sharing…). In contrast, Elgg does not provide full course management since it is not Sharable Content Object Reference Model (SCORM) compliant. Elgg could be integrated with next generation SCROM to form a complete management system. This paper proposes a solution to develop a TIN CAN Client Elgg Plug-In to integrate Elgg and Tin Can API which communicate through a Learning Record Store (LRS).
TIN CAN Client Elgg Plugin: A Proposed Solution to Integrate Social Media with E-Learning Technology.

ABSTRACT Recently, open source social networking sites have been widely used by institutions as a... more ABSTRACT Recently, open source social networking sites have been widely used by institutions as an interactive mean to build community skills, provide teachers and students with online interactive learning opportunities, and improve students' academic performance. In Lebanon, some institutions have adopted learning management systems (like Moodle, Elgg…) as a cooperative learning platform in education. Moodle is limited to some basic features (user collaboration, group discussions, file sharing…). In contrast, Elgg does not provide full course management since it is not Sharable Content Object Reference Model (SCORM) compliant. Elgg could be integrated with next generation SCROM to form a complete management system. This paper proposes a solution to develop a TIN CAN Client Elgg Plug-In to integrate Elgg and Tin Can API which communicate through a Learning Record Store (LRS). TIN CAN Client Elgg Plugin: A Proposed Solution to Integrate Social Media with E-Learning Technology.

Human-to-Human communication has changed in the last few years. Social networking sites have been... more Human-to-Human communication has changed in the last few years. Social networking sites have been used by university students and teachers in their daily communication. Additionally, these tools provide institutions and organizations with an interactive mean to build community skills. In particular, open source social networks have a lot of features that allow people to build social relations and supply institutions with enhanced learning capabilities. Some researchers claim that introducing social networks improves students’ academic performance and increases institutional revenues. In Lebanon, 80% of higher educational institutions have adopted learning management systems, like Moodle, to provide both students and teachers with online interactive learning opportunities. Moodle is limited to some basic features (user collaboration, group discussions, file sharing…). These features might be complemented through the integration of a powerful social networking tool like Elgg. On the o...

Human-to-Human communication has changed in the last few years. Social networking sites have been... more Human-to-Human communication has changed in the last few years. Social networking sites have been used by university students and teachers in their daily communication. Additionally, these tools provide institutions and organizations with an interactive mean to build community skills. In particular, open source social networks have a lot of features that allow people to build social relations and supply institutions with enhanced learning capabilities. Some researchers claim that introducing social networks improves students’ academic performance and increases institutional revenues. In Lebanon, 80% of higher educational institutions have adopted learning management systems, like Moodle, to provide both students and teachers with online interactive learning opportunities. Moodle is limited to some basic features (user collaboration, group discussions, file sharing…). These features might be complemented through the integration of a powerful social networking tool like Elgg. On the other hand, Elgg is not Sharable Content Object Reference Model (SCORM) compliant, so its lack of course management might be made for with the integration with Moodle. This paper discusses the benefits of integrating the next generation of SCORM with Elgg and attempts to describe the integration process and technical procedures involved. It is organized as follows: 1) A literature review to properly express the terms and concepts, 2) an exhibition of the requirements for the Next Generation SCORM, 3) a description of Elgg that shows that it does not have those requirements, 4) highlight the necessity why Elgg should meet these requirements, 5) propose a solution to integrate Tin Can API with Elgg, 6) a conclusion with the future works.

Recently, social computing systems such as Social Network Sites (SNSs) have become more powerful ... more Recently, social computing systems such as Social Network Sites (SNSs) have become more powerful across Human-to-Human interaction. An estimated 80% of university students rely on such SNSs in their daily communications, like internet surfing, discussions and social activities. In some universities, social networks have been adopted as a communication method between teachers and students. Some researchers claim that introducing social networks improves students’ academic performance and increases institutional revenues. In addition, a wide number of educational institutions have started the initiative of using open source social networking application. Elgg, in particular, is one of the widely used social learning platforms. This paper discusses the benefits of using Elgg and assesses its potential to substitute existing learning management systems in Lebanese higher educational institutions. It is organized as follows: 1) A literature review to define open source application and po...

Recently, social computing systems such as Social Network Sites (SNSs) have become more powerful ... more Recently, social computing systems such as Social Network Sites (SNSs) have become more powerful across Human-to-Human interaction. An estimated 80% of university students rely on such SNSs in their daily communications, like internet surfing, discussions and social activities. In some universities, social networks have been adopted as a communication method between teachers and students. Some researchers claim that introducing social networks improves students’ academic performance and increases institutional revenues. In addition, a wide number of educational institutions have started the initiative of using open source social networking application. Elgg, in particular, is one of the widely used social learning platforms. This paper discusses the benefits of using Elgg and assesses its potential to substitute existing learning management systems in Lebanese higher educational institutions. It is organized as follows: 1) A literature review to define open source application and point out its benefits, open education and its progress, 2) a review of open source social networking technologies, their features and contribution to the educational process, 3) a discussion of the effect of open source social networking technologies on education systems, 4) an overview of Elgg pointing out its features and benefits as a social learning platform, followed by a comparison with different social learning platforms (Moodle, Drupal, JomSocial, and Chamilo), 5) a case study of implementing Elgg at the Computer Science Department at the University of Balamand, discussion of results is followed, and finally 6) a conclusion with a recommendation to adopt Elgg as a substitute for learning management systems.
Papers by Michel Chammas

Multimedia Tools and Applications, Dec 5, 2023
The extraction of paleographical features is an important task to study the identity of the text ... more The extraction of paleographical features is an important task to study the identity of the text in the Historical Manuscripts. One of the major features is the identification of the writer or copyist. Many researchers have worked on an automated system for writer identification, and with the development of deep learning techniques many approaches have been proposed. Most of the previous studies have developed a multi-steps system, while very few of them performed an End-to-End approach. Most of the systems rely on a pre-processing step to prepare the data in order to facilitate recognition. This paper presents an End-to-End deep learning system for writer identification, tested on four different datasets: ICDAR19 and ICFHR20 (Latin datasets), KHATT and Balamand (Arabic datasets). The system is based on the Deep-TEN approach using a customized ResNet-50 network for features and local descriptor extraction with an integration of a NetVLAD end-layer to compute and encode the global descriptor. It was compared with our state-of-the-art system, winner of ICFHR20 HisFrag competition, and showed an interesting performance on all datasets without any pre-processing techniques.
Uploads
Publications by Michel Chammas
Historical Arabic dataset of 11610 single and double folio images. These texts were extracted from a unique collection of 567 Historical Arabic Manuscripts available at the Balamand Digital Humanities Center. A survey has been conducted on the available Arabic datasets and previously proposed techniques and algorithms. The Balamand dataset presents an important challenge due to the geo-historical identity of manuscripts and their physical conditions. An advanced Deep Learning system was developed and tested on three different Latin and Arabic datasets: ICDAR19, ICFHR20 and KHATT, before testing it on the Balamand dataset. The system was compared with many other systems and it has yielded a state-of-the-art performance on the new challenging images with 95.2% mean Average Precision (mAP) and 98.1% accuracy.
Using NoSQL databases for building heterogeneous annotated corpora.
The Massive Open Online Course (MOOC) is a semi-autonomous networked learning environment, distributed on the cloud, in which open resources were produced and shared by participants worldwide. This technology was the latest innovation in the last few years. Different platforms were developed by several non-profit providers (edX, Coursera, Udacity, etc...) and supported by the world’s most prestigious universities and institution such as MIT, Stanford, Harvard, etc…
Those platforms are used by millions of students around the globe. This successful innovation so far has helped the development of E-ducation. This notion yields us to start thinking of extending this innovation approach to improve research in E-ducation. This could be an online platform for research which in return could push the development of online research worldwide and increase the collaboration between universities and institutions. The platform, defined as a Massive Open Online Research (MOOR), consists of a rich online open environment for research with unlimited user participation.
TIN CAN Client Elgg Plugin: A Proposed Solution to Integrate Social Media with E-Learning Technology.
Papers by Michel Chammas
Historical Arabic dataset of 11610 single and double folio images. These texts were extracted from a unique collection of 567 Historical Arabic Manuscripts available at the Balamand Digital Humanities Center. A survey has been conducted on the available Arabic datasets and previously proposed techniques and algorithms. The Balamand dataset presents an important challenge due to the geo-historical identity of manuscripts and their physical conditions. An advanced Deep Learning system was developed and tested on three different Latin and Arabic datasets: ICDAR19, ICFHR20 and KHATT, before testing it on the Balamand dataset. The system was compared with many other systems and it has yielded a state-of-the-art performance on the new challenging images with 95.2% mean Average Precision (mAP) and 98.1% accuracy.
Using NoSQL databases for building heterogeneous annotated corpora.
The Massive Open Online Course (MOOC) is a semi-autonomous networked learning environment, distributed on the cloud, in which open resources were produced and shared by participants worldwide. This technology was the latest innovation in the last few years. Different platforms were developed by several non-profit providers (edX, Coursera, Udacity, etc...) and supported by the world’s most prestigious universities and institution such as MIT, Stanford, Harvard, etc…
Those platforms are used by millions of students around the globe. This successful innovation so far has helped the development of E-ducation. This notion yields us to start thinking of extending this innovation approach to improve research in E-ducation. This could be an online platform for research which in return could push the development of online research worldwide and increase the collaboration between universities and institutions. The platform, defined as a Massive Open Online Research (MOOR), consists of a rich online open environment for research with unlimited user participation.
TIN CAN Client Elgg Plugin: A Proposed Solution to Integrate Social Media with E-Learning Technology.