Papers by Saman Halgamuge

Popular models of AI have two significant deficiencies: 1) they are mostly manually designed usin... more Popular models of AI have two significant deficiencies: 1) they are mostly manually designed using the experience of AI-experts 2) they lack human interpretability, i.e., users cannot make sense of the functionality of neural network architectures either semantically/linguistically or mathematically. This lack of interpretability is a main inhibitor of broader use of 21st century AI, e.g., Deep Neural Networks (DNN). The dependence on AI experts to create AI hinders the democratisation of AI and therefore the accessibility to AI. Addressing these deficiencies would provide answers to some of the valid questions about traceability, accountability and the ability to integrate existing knowledge (scientific or linguistically articulated human experience) into the model. This keynote abstract addresses these two significant deficiencies that inhibit the democratisation of AI by developing new methods that can automatically create interpretable neural network models without the help of AI-experts. The proposed cross-fertilised innovation will have a profound impact on the society through the increased accessibility and trustworthiness of AI beneficial to almost all areas of sciences, engineering, and humanities.
ABSTRACT SVM for classification is sensitive to noise and multicollinearity between attributes. C... more ABSTRACT SVM for classification is sensitive to noise and multicollinearity between attributes. Correlative component analysis (CCA) was used to eliminated multicollinearity and noise of original sample data before classified by SVM. To improve the SVM performance, Eugenic Genetic Algorithm (EGA) was used to optimize the parameters of SVM. Finally, a typical example of two classes natural spearmint essence was employed to verify the effectiveness of the new approach CCA-EGA-SVM. The accuracy is much better than that obtained by SVM alone or self-organizing map (SOM) Integrated with CCA.
International Journal of Innovative Computing Information and Control, 2010
... classification accuracy of this new method is much better than that obtained by SVM, CCA-SVM,... more ... classification accuracy of this new method is much better than that obtained by SVM, CCA-SVM, CCA-SOM, and GA-CG-SVM. ... of a great number of chemical components, are so complicated that it is difficult to ascertain their quantitative structure-property relationship (QSPR). ...

This text presents the fundamentals on sensors and signal processing, with emphasises on Mechatro... more This text presents the fundamentals on sensors and signal processing, with emphasises on Mechatronics and applications. Sensors are used to measure signals that present changes in the time domain, e.g. waveforms or digital steps. Different technologies have been developed over the years in order to sense many different physical quantities, such as temperature, flow, force, acceleration, position, sound pressure and intensity of light, among others. Because of their varying nature, all these quantities may be measured under the form of waveforms. However, waveforms-which are analogue signals-are often found difficult to interpret in the time domain and a transformation into the frequency domain is required. The Fourier transform still is the most popular technique used today for converting a time signal into a frequency spectrum. Nevertheless, in signal processing, an analogue-to-digital conversion (ADC) of the time signal is required at some stage, even if the Fourier transform is not used. When proper treatment and filtering approaches are not followed, important features in the signal may be attenuated, and others may be falsely indicated. This text discusses how signals can be measured in order to avoid common pitfalls in signal acquisition and processing. The theoretical background is set in a comprehensive yet practical way.
CRC Press eBooks, Sep 17, 2019
Journal of The National Science Foundation of Sri Lanka, Nov 10, 2022

BMC Bioinformatics, Apr 28, 2008
Background: In metagenomic studies, a process called binning is necessary to assign contigs that ... more Background: In metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k-mer and PhyloPythia, involve assigning sequence fragments by comparing sequence similarity or sequence composition with already-sequenced genomes that are still far from comprehensive. We propose a semi-supervised seeding method for binning that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels) to assign other reads based on their compositional similarity. Results: The proposed seeding method is implemented on an unsupervised Growing Self-Organising Map (GSOM), and called Seeded GSOM (S-GSOM). We compared it with four well-known semi-supervised learning methods in a preliminary test, separating random-length prokaryotic sequence fragments sampled from the NCBI genome database. We identified the flanking sequences of the highly conserved 16S rRNA as suitable seeds that could be used to group the sequence fragments according to their species. S-GSOM showed superior performance compared to the semi-supervised methods tested. Additionally, S-GSOM may also be used to visually identify some species that do not have seeds. The proposed method was then applied to simulated metagenomic datasets using two different confidence threshold settings and compared with PhyloPythia, k-mer and BLAST. At the reference taxonomic level Order, S-GSOM outperformed all k-mer and BLAST results and showed comparable results with PhyloPythia for each of the corresponding confidence settings, where S-GSOM performed better than PhyloPythia in the ≥ 10 reads datasets and comparable in the ≥ 8 kb benchmark tests. Conclusion: In the task of binning using semi-supervised learning methods, results indicate S-GSOM to be the best of the methods tested. Most importantly, the proposed method does not require knowledge from known genomes and uses only very few labels (one per species is sufficient in most cases), which are extracted from the metagenome itself. These advantages make it a very attractive binning method. S-GSOM outperformed the binning methods that depend on already-sequenced genomes, and compares well to the current most advanced binning method, PhyloPythia.

Data Engineering promotes an engineering approach to anlyse big “imperfect” data by creating and ... more Data Engineering promotes an engineering approach to anlyse big “imperfect” data by creating and using new smart algorithms such as Unsupervised Deep Learning, appropriate electronic hardware platforms and mechanical/mechatronic/material/ chemical-based approaches to interrogate and acquire missing data or information. The talk will focus on this paradigm and provide example algorithms and applications from the health sector, for example, neural engineering research on epilepsy and other brain deceases, biomechanical approaches and bioinformatics approaches attempting to find solutions to hard problems in cancer genomics, plat metabolomics, virus detection and drug characterization introducing about 10 on-going PhD projects. Other applications of data engineering, for example, operational optimization of smart grids and business analytics will also be briefly presented. At the conclusion of the talk information about the UGC-University of Melbourne Elite PhD program will also be provided.
arXiv (Cornell University), May 11, 2023
Video with real and fake segments Video with real and fake segments (b) Comparison of our method ... more Video with real and fake segments Video with real and fake segments (b) Comparison of our method with the classical deepfake detection.
Studies in computational intelligence, 2005

ACM SIGEnergy Energy Informatics Review
The rapid uptake of rooftop solar photovoltaic systems is introducing many challenges in the mana... more The rapid uptake of rooftop solar photovoltaic systems is introducing many challenges in the management of distribution networks, energy markets, and energy storage systems. Many of these problems can be alleviated with accurate short term solar power forecasts. However, forecasting the power output of distributed rooftop solar PV systems can be challenging, since many complex local factors can affect solar output. A common approach when forecasting such systems is to extract the daily seasonality from the time series using some form of seasonality model, and then forecast only the residuals that remain after seasonality extraction. In this work, we explore in detail the effectiveness of three commonly used seasonality models, and we propose a new one, called the "characteristic profile". We find that when seasonality models are integrated into the forecasting process, significant gains in forecast accuracy may be obtained - particularly for machine learning based approach...

Expert Systems with Applications
Distributed, small-scale solar photovoltaic (PV) systems are being installed at a rapidly increas... more Distributed, small-scale solar photovoltaic (PV) systems are being installed at a rapidly increasing rate. This can cause major impacts on distribution networks and energy markets. As a result, there is a significant need for improved forecasting of the power generation of these systems at different time resolutions and horizons. However, the performance of forecasting models depends on the resolution and horizon. Forecast combinations (ensembles), that combine the forecasts of multiple models into a single forecast may be robust in such cases. Therefore, in this paper, we provide comparisons and insights into the performance of five state-of-the-art forecast models and existing forecast combinations at multiple resolutions and horizons. We propose a forecast combination approach based on particle swarm optimization (PSO) that will enable a forecaster to produce accurate forecasts for the task at hand by weighting the forecasts produced by individual models. Furthermore, we compare the performance of the proposed combination approach with existing forecast combination approaches. A comprehensive evaluation is conducted using a real-world residential PV power data set measured at 25 houses located in three locations in the United States. The results across four different resolutions and four different horizons show that the PSO-based forecast combination approach outperforms the use of any individual forecast model and other forecast combination counterparts, with an average Mean Absolute Scaled Error reduction by 3.81% compared to the best performing individual model. Our approach enables a solar forecaster to produce accurate forecasts for their application regardless of the forecast resolution or horizon.

Journal of Biomechanics
Joint angle quantification from inertial measurement units (IMUs) is commonly performed using kin... more Joint angle quantification from inertial measurement units (IMUs) is commonly performed using kinematic modelling, which depends on anatomical sensor placement and/or functional joint calibration; however, accurate three-dimensional joint motion measurement remains challenging to achieve. The aims of this study were firstly to employ deep neural networks to convert IMU data to ankle joint angles that are indistinguishable from those derived from motion capture-based inverse kinematics (IK) - the reference standard; and secondly, to validate the robustness of this approach across contrasting walking speeds in healthy individuals. Kinematics data were simultaneously calculated using IMUs and IK from 9 subjects during walking on a treadmill at 0.5 m/s, 1.0 m/s and 1.5 m/s. A generative adversarial network was trained using gait data at two of the walking speeds to predict ankle kinematics from IMU data alone for the third walking speed. There were significant differences between IK and IMU joint angle predictions for ankle eversion and internal rotation during walking at 0.5 m/s and 1.0 m/s (p < 0.001); however, no significant differences in joint angles were observed between the generative adversarial network prediction and IK at any speed or plane of joint motion (p < 0.05). The RMS difference in ankle joint kinematics between the generative adversarial network and IK for walking at 1.0 m/s was 3.8°, 2.1° and 3.5° for dorsiflexion, inversion and axial rotation, respectively. The modeling approach presented for real-time IMU to ankle joint angle conversion, which can be readily expanded to other joints, may provide enhanced IMU capability in applications such as telemedicine, remote monitoring and rehabilitation.

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2016
Distributed energy technologies, such as residential energy storage, embedded generation, and mic... more Distributed energy technologies, such as residential energy storage, embedded generation, and microgrids, are likely to play an increasing role in future energy systems. Getting the most value from these distributed assets is often dependent on the ability to optimize their operation in a distributed manner. This distributed optimization, in turn, calls for effective short-term forecasts of the output of small-scale generating assets, and the demand of small-scale aggregations of users. This paper introduces the integration of data-driven forecasting and operational optimization methods into a single model, avoiding the need to explicitly produce forecasts. The method is tested against two empirical energy storage operational optimization problems, the minimization of peak energy drawn by a small aggregation of customers, and the minimization of the energy costs of a collection of households which have rooftop PV systems. The integrated forecasting and operational optimization appro...

IEEE Transactions on Neural Networks and Learning Systems, 2021
Non-parametric dimensionality reduction techniques, such as t-SNE and UMAP, are proficient in pro... more Non-parametric dimensionality reduction techniques, such as t-SNE and UMAP, are proficient in providing visualizations for datasets of fixed sizes. However, they cannot incrementally map and insert new data points into an already provided data visualization. We present Self-Organizing Nebulous Growths (SONG), a parametric nonlinear dimensionality reduction technique that supports incremental data visualization, i.e., incremental addition of new data while preserving the structure of the existing visualization. In addition, SONG is capable of handling new data increments, no matter whether they are similar or heterogeneous to the already observed data distribution. We test SONG on a variety of real and simulated datasets. The results show that SONG is superior to Parametric t-SNE, t-SNE and UMAP in incremental data visualization. Specifically, for heterogeneous increments, SONG improves over Parametric t-SNE by 14.98 % on the Fashion MNIST dataset and 49.73% on the MNIST dataset regarding the cluster quality measured by the Adjusted Mutual Information scores. On similar or homogeneous increments, the improvements are 8.36% and 42.26% respectively. Furthermore, even when the above datasets are presented all at once, SONG performs better or comparable to UMAP, and superior to t-SNE. We also demonstrate that the algorithmic foundations of SONG render it more tolerant to noise compared to UMAP and t-SNE, thus providing greater utility for data with high variance, high mixing of clusters, or noise.

IEEE Transactions on Wireless Communications, 2021
Traffic flows with different requirements of quality of service (QoS requirements) are aggregated... more Traffic flows with different requirements of quality of service (QoS requirements) are aggregated into different QoS classes to provide differentiated services (Diffserv) and better quality of experience (QoE) for users. The existing aggregation approaches/QoS mapping methods are based on quantitative QoS requirements and static QoS classes. However, they are typically qualitative and time-varying at the edge of the beyond fifth generation (B5G) networks. Therefore, the artificial intelligence technology of preference logic is applied in this paper to achieve an intelligent method for edge computing, called the preference logic based aggregation model (PLM), which effectively groups flows with qualitative requirements into dynamic classes. First, PLM uses preferences to describe QoS requirements of flows, and thus can deal with both quantitative and qualitative cases. Next, the potential conflicts in these preferences are eliminated. According to the preferences, traffic flows are finally mapped into dynamic QoS classes by logic reasoning. The experimental results show that PLM presents better performance in terms of QoE satisfaction compared with the existing aggregation methods. Utilizing preference logic to group flows, PLM implements a novel way of edge intelligence to deal with dynamic classes and improves the Diffserv for massive B5G traffic with quantitative and qualitative requirements.

Renewable and Sustainable Energy Reviews, 2018
Recent increases in the global demand for IT services have increased the power consumption, total... more Recent increases in the global demand for IT services have increased the power consumption, total ownership costs and environmental footprint of data centers. Recent efforts to reduce these effects have focused on either their cooling systems, or their power systems. In this paper, we have developed an integrated approach to minimize the total power demand of data centers, whilst their reliance on power imported from the grid is minimized. First, the power demand of data center has been reduced utilizing various air-side economizer-based cooling systems. Since the effectiveness of economizers significantly depends on the local weather conditions, 42 stations in major cities across the world have been considered. A more than 80% reduction in total cooling power consumption is achieved by using the most appropriate air-side economizer at each location. Second, the reliance of data centers on power imported from the grid is minimized utilizing on-site hybrid renewable power generation and energy storage. The on-site renewable power generation and capacity factors have been calculated for 1 MW wind and solar renewable power plants to identify the location with the highest renewable power generation capability. The optimal size of a hybrid renewable power plant, and associated battery energy storage system, is also determined for each data center using linear programming to minimize total levelized costs. Finally, the optimal location for constructing and operating the most energy efficient, cost-effective and sustainable data center has been identified by calculating its level of independence from the power grid. It is found that the level of grid independence increases as we move away from the equator, for example more than 50% grid independence can be achieved at Regina station located in Canada.
Uploads
Papers by Saman Halgamuge