The analysis of event sequences with temporal dependencies holds substantial importance across va... more The analysis of event sequences with temporal dependencies holds substantial importance across various domains, including healthcare. This study introduces a novel approach that combines sequential rule mining and survival analysis to uncover significant associations and temporal patterns within event sequences. By integrating these techniques, we address the limitations linked to the loss of temporal information. The methodology extends traditional sequential rule mining by introducing time-dependent confidence functions, providing a comprehensive understanding of relationships between antecedent and consequent events. The incorporation of the Kaplan-Meier estimator of survival analysis enables the calculation of temporal distributions between events, resulting in time-dependent confidence functions. These confidence functions illuminate the probability of specific event occurrences considering temporal contexts. To present the application of the method, we demonstrated the usage within the healthcare domain. Analyzing the ICD-10 codes and the laboratory events, we successfully identified relevant sequential rules and their time-dependent confidence functions. This empirical validation underscores the potential of methodology to uncover clinically significant associations within intricate medical data. • The study presents a unique methodology that integrates sequential rule mining and survival analysis. • The methodology extends traditional sequential rule mining by introducing time-dependent confidence functions. • The application of the method is demonstrated within the healthcare domain.
The Paris Climate Agreement and the 2030 Agenda for Sustainable Development Goals declared by the... more The Paris Climate Agreement and the 2030 Agenda for Sustainable Development Goals declared by the United Nations set high expectations for the countries of the world to reduce their greenhouse gas (GHG) emissions and to be sustainable. In order to judge the effectiveness of strategies, the evolution of carbon dioxide, methane, and nitrous oxide emissions in countries around the world has been explored based on statistical analysis of time-series data between 1990 and 2018. The empirical distributions of the variables were determined by the Kaplan–Meier method, and improvement-related utility functions have been defined based on the European Green Deal target for 2030 that aims to decrease at least 55% of GHG emissions compared to the 1990 levels. This study aims to analyze the energy transition trends at the country and sectoral levels and underline them with literature-based evidence. The transition trajectories of the countries are studied based on the percentile-based time-series...
The Paris Climate Agreement and the 2030 Agenda for Sustainable Development Goals declared by th... more The Paris Climate Agreement and the 2030 Agenda for Sustainable Development Goals declared by the United Nations set high expectations for the countries of the world to reduce their greenhouse gas (GHG) emissions and to be sustainable. In order to judge the effectiveness of strategies, the evolution of carbon dioxide, methane, and nitrous oxide emissions in countries around the world has been explored based on statistical analysis of time-series data between 1990 and 2018. The empirical distributions of the variables were determined by the Kaplan–Meier method, and improvement-related utility functions have been defined based on the European Green Deal target for 2030 that aims to decrease at least 55% of GHG emissions compared to the 1990 levels. This study aims to analyze the energy transition trends at the country and sectoral levels and underline them with literature-based evidence. The transition trajectories of the countries are studied based on the percentile-based time-series analysis of the emission data. We also study the evolution of the sector-wise distributions of the emissions to assess how the development strategies of the countries contributed to climate change mitigation. Furthermore, the countries’ location on their transition trajectories is determined based on their individual Kuznets curve. Runs and Leybourne–McCabe statistical tests are also evaluated to study how systematic the changes are. Based on the proposed analysis, the main drivers of climate mitigation and evaluation and their effectiveness were identified and characterized, forming the basis for planning sectoral tasks in the coming years. The case study goes through the analysis of two counties, Sweden and Qatar. Sweden reduced their emission per capita almost by 40% since 1990, while Qatar increased their emission by 20%. Moreover, the defined improvement-related variables can highlight the highest increase and decrease in different aspects. The highest increase was reached by Equatorial Guinea, and the most significant decrease was made by Luxembourg. The integration of sustainable development goals, carbon capture, carbon credits and carbon offsets into the databases establishes a better understanding of the sectoral challenges of energy transition and strategy planning, which can be adapted to the proposed method.
The losses associated with changeovers are becoming more significant in manufacturing due to the ... more The losses associated with changeovers are becoming more significant in manufacturing due to the high variance of products and requirements for just-in-time production. The study is based on the single minute exchange of die (SMED) philosophy, which aims to reduce changeover times. We introduced a method for the analysis of these losses based on models that estimate the product-and operator-dependent changeover times using survival analysis. The root causes of the losses are identified by significance tests of the utilized Cox regression models. The resulting models can be used to design a performance management system that considers the stochastic nature of the work of the operators. An anonymized manufacturing example related to the setup of crimping and wire cutting machines demonstrates the applicability of the method.
A data-driven method to identify frequent sets of course failures that students should avoid in o... more A data-driven method to identify frequent sets of course failures that students should avoid in order to minimize the likelihood of their dropping out from their university training is proposed. The overall probability distribution of the dropout is determined by survival analysis. This result can only describe the mean dropout rate of the undergraduates. However, due to the failure of different courses, the chances of dropout can be highly varied, so the traditional survival model should be extended with event analysis. The study paths of students are represented as events in relation to the lack of completing the required subjects for every semester. Frequent patterns of backlogs are discovered by the mining of frequent sets of these events. The prediction of dropout is personalised by classifying the success of the transitions between the semesters. Based on the explored frequent item sets and classifiers, association rules are formed providing the estimates of the success of the...
e Fourth Industrial Revolution means the digital transformation of production systems. Cyber-phys... more e Fourth Industrial Revolution means the digital transformation of production systems. Cyber-physical systems allow for the horizontal and vertical integration of these production systems as well as the exploitation of the benefits via optimization tools. is article reviews the impact of Industry 4.0 solutions concerning optimization tasks and optimization algorithms, in addition to the identification of the new R&D directions driven by new application options. e basic organizing principle of this overview of the literature is to explore the requirements of optimization tasks, which are needed to perform horizontal and vertical integration. is systematic review presents content from 900 articles on Industry 4.0 and optimization as well as 388 articles on Industry 4.0 and scheduling. It is our hope that this work can serve as a starting point for researchers and developers in the field.
This paper presents an algorithm for learning local Weibull models, whose operating regions are r... more This paper presents an algorithm for learning local Weibull models, whose operating regions are represented by fuzzy rules. The applicability of the proposed method is demonstrated in estimating the mortality rate of the COVID-19 pandemic. The reproducible results show that there is a significant difference between mortality rates of countries due to their economic situation, urbanization, and the state of the health sector. The proposed method is compared with the semi-parametric Cox proportional hazard regression method. The distribution functions of these two methods are close to each other, so the proposed method can estimate efficiently.
Survival analysis is a widely used method to establish a connection between a time to event outco... more Survival analysis is a widely used method to establish a connection between a time to event outcome and a set of variables. The goal of this work is to improve the accuracy of the widely applied parametric survival models. This work highlights that accurate and interpretable survival analysis models can be identified by clustering-based exploration of the operating regions of local survival models. The key idea is that when operating regions of local Weibull distributions are represented by Gaussian mixture models, the parameters of the mixture-of-Weibull model can be identified by a clustering algorithm. The proposed method is utilised in three case studies. The examples cover studying the dropout rate of university students, calculating the remaining useful life of lithium-ion batteries, and determining the chances of survival of prostate cancer patients. The results demonstrate the wide applicability of the method and the benefits of clustering-based identification of local Weibull models.
A data-driven method to identify frequent sets of course failures that students should avoid in o... more A data-driven method to identify frequent sets of course failures that students should avoid in order to minimize the likelihood of their dropping out from their university training is proposed. The overall probability distribution of the dropout is determined by survival analysis. This result can only describe the mean dropout rate of the undergraduates. However, due to the failure of different courses, the chances of dropout can be highly varied, so the traditional survival model should be extended with event analysis. The study paths of students are represented as events in relation to the lack of completing the required subjects for every semester. Frequent patterns of backlogs are discovered by the mining of frequent sets of these events. The prediction of dropout is personalised by classifying the success of the transitions between the semesters. Based on the explored frequent item sets and classifiers, association rules are formed providing the estimates of the success of the continuation of the studies in the form of confidence metrics. The results can be used to identify critical study paths and courses. Furthermore, based on the patterns of individual uncompleted subjects, it is suitable to predict the chance of continuation in every semester. The analysis of the critical study paths can be used to design personalised actions minimizing the risk of dropout, or to redesign the curriculum aiming the reduction in the dropout rate. The applicability of the method is demonstrated based on the analysis of the progress of chemical engineering students at the University of Pannonia in Hungary. The method is suitable for the examination of more general problems assuming the occurrence of a set of events whose combinations may trigger a set of critical events.
The analysis of event sequences with temporal dependencies holds substantial importance across va... more The analysis of event sequences with temporal dependencies holds substantial importance across various domains, including healthcare. This study introduces a novel approach that combines sequential rule mining and survival analysis to uncover significant associations and temporal patterns within event sequences. By integrating these techniques, we address the limitations linked to the loss of temporal information. The methodology extends traditional sequential rule mining by introducing time-dependent confidence functions, providing a comprehensive understanding of relationships between antecedent and consequent events. The incorporation of the Kaplan-Meier estimator of survival analysis enables the calculation of temporal distributions between events, resulting in time-dependent confidence functions. These confidence functions illuminate the probability of specific event occurrences considering temporal contexts. To present the application of the method, we demonstrated the usage within the healthcare domain. Analyzing the ICD-10 codes and the laboratory events, we successfully identified relevant sequential rules and their time-dependent confidence functions. This empirical validation underscores the potential of methodology to uncover clinically significant associations within intricate medical data. • The study presents a unique methodology that integrates sequential rule mining and survival analysis. • The methodology extends traditional sequential rule mining by introducing time-dependent confidence functions. • The application of the method is demonstrated within the healthcare domain.
The Paris Climate Agreement and the 2030 Agenda for Sustainable Development Goals declared by the... more The Paris Climate Agreement and the 2030 Agenda for Sustainable Development Goals declared by the United Nations set high expectations for the countries of the world to reduce their greenhouse gas (GHG) emissions and to be sustainable. In order to judge the effectiveness of strategies, the evolution of carbon dioxide, methane, and nitrous oxide emissions in countries around the world has been explored based on statistical analysis of time-series data between 1990 and 2018. The empirical distributions of the variables were determined by the Kaplan–Meier method, and improvement-related utility functions have been defined based on the European Green Deal target for 2030 that aims to decrease at least 55% of GHG emissions compared to the 1990 levels. This study aims to analyze the energy transition trends at the country and sectoral levels and underline them with literature-based evidence. The transition trajectories of the countries are studied based on the percentile-based time-series...
The Paris Climate Agreement and the 2030 Agenda for Sustainable Development Goals declared by th... more The Paris Climate Agreement and the 2030 Agenda for Sustainable Development Goals declared by the United Nations set high expectations for the countries of the world to reduce their greenhouse gas (GHG) emissions and to be sustainable. In order to judge the effectiveness of strategies, the evolution of carbon dioxide, methane, and nitrous oxide emissions in countries around the world has been explored based on statistical analysis of time-series data between 1990 and 2018. The empirical distributions of the variables were determined by the Kaplan–Meier method, and improvement-related utility functions have been defined based on the European Green Deal target for 2030 that aims to decrease at least 55% of GHG emissions compared to the 1990 levels. This study aims to analyze the energy transition trends at the country and sectoral levels and underline them with literature-based evidence. The transition trajectories of the countries are studied based on the percentile-based time-series analysis of the emission data. We also study the evolution of the sector-wise distributions of the emissions to assess how the development strategies of the countries contributed to climate change mitigation. Furthermore, the countries’ location on their transition trajectories is determined based on their individual Kuznets curve. Runs and Leybourne–McCabe statistical tests are also evaluated to study how systematic the changes are. Based on the proposed analysis, the main drivers of climate mitigation and evaluation and their effectiveness were identified and characterized, forming the basis for planning sectoral tasks in the coming years. The case study goes through the analysis of two counties, Sweden and Qatar. Sweden reduced their emission per capita almost by 40% since 1990, while Qatar increased their emission by 20%. Moreover, the defined improvement-related variables can highlight the highest increase and decrease in different aspects. The highest increase was reached by Equatorial Guinea, and the most significant decrease was made by Luxembourg. The integration of sustainable development goals, carbon capture, carbon credits and carbon offsets into the databases establishes a better understanding of the sectoral challenges of energy transition and strategy planning, which can be adapted to the proposed method.
The losses associated with changeovers are becoming more significant in manufacturing due to the ... more The losses associated with changeovers are becoming more significant in manufacturing due to the high variance of products and requirements for just-in-time production. The study is based on the single minute exchange of die (SMED) philosophy, which aims to reduce changeover times. We introduced a method for the analysis of these losses based on models that estimate the product-and operator-dependent changeover times using survival analysis. The root causes of the losses are identified by significance tests of the utilized Cox regression models. The resulting models can be used to design a performance management system that considers the stochastic nature of the work of the operators. An anonymized manufacturing example related to the setup of crimping and wire cutting machines demonstrates the applicability of the method.
A data-driven method to identify frequent sets of course failures that students should avoid in o... more A data-driven method to identify frequent sets of course failures that students should avoid in order to minimize the likelihood of their dropping out from their university training is proposed. The overall probability distribution of the dropout is determined by survival analysis. This result can only describe the mean dropout rate of the undergraduates. However, due to the failure of different courses, the chances of dropout can be highly varied, so the traditional survival model should be extended with event analysis. The study paths of students are represented as events in relation to the lack of completing the required subjects for every semester. Frequent patterns of backlogs are discovered by the mining of frequent sets of these events. The prediction of dropout is personalised by classifying the success of the transitions between the semesters. Based on the explored frequent item sets and classifiers, association rules are formed providing the estimates of the success of the...
e Fourth Industrial Revolution means the digital transformation of production systems. Cyber-phys... more e Fourth Industrial Revolution means the digital transformation of production systems. Cyber-physical systems allow for the horizontal and vertical integration of these production systems as well as the exploitation of the benefits via optimization tools. is article reviews the impact of Industry 4.0 solutions concerning optimization tasks and optimization algorithms, in addition to the identification of the new R&D directions driven by new application options. e basic organizing principle of this overview of the literature is to explore the requirements of optimization tasks, which are needed to perform horizontal and vertical integration. is systematic review presents content from 900 articles on Industry 4.0 and optimization as well as 388 articles on Industry 4.0 and scheduling. It is our hope that this work can serve as a starting point for researchers and developers in the field.
This paper presents an algorithm for learning local Weibull models, whose operating regions are r... more This paper presents an algorithm for learning local Weibull models, whose operating regions are represented by fuzzy rules. The applicability of the proposed method is demonstrated in estimating the mortality rate of the COVID-19 pandemic. The reproducible results show that there is a significant difference between mortality rates of countries due to their economic situation, urbanization, and the state of the health sector. The proposed method is compared with the semi-parametric Cox proportional hazard regression method. The distribution functions of these two methods are close to each other, so the proposed method can estimate efficiently.
Survival analysis is a widely used method to establish a connection between a time to event outco... more Survival analysis is a widely used method to establish a connection between a time to event outcome and a set of variables. The goal of this work is to improve the accuracy of the widely applied parametric survival models. This work highlights that accurate and interpretable survival analysis models can be identified by clustering-based exploration of the operating regions of local survival models. The key idea is that when operating regions of local Weibull distributions are represented by Gaussian mixture models, the parameters of the mixture-of-Weibull model can be identified by a clustering algorithm. The proposed method is utilised in three case studies. The examples cover studying the dropout rate of university students, calculating the remaining useful life of lithium-ion batteries, and determining the chances of survival of prostate cancer patients. The results demonstrate the wide applicability of the method and the benefits of clustering-based identification of local Weibull models.
A data-driven method to identify frequent sets of course failures that students should avoid in o... more A data-driven method to identify frequent sets of course failures that students should avoid in order to minimize the likelihood of their dropping out from their university training is proposed. The overall probability distribution of the dropout is determined by survival analysis. This result can only describe the mean dropout rate of the undergraduates. However, due to the failure of different courses, the chances of dropout can be highly varied, so the traditional survival model should be extended with event analysis. The study paths of students are represented as events in relation to the lack of completing the required subjects for every semester. Frequent patterns of backlogs are discovered by the mining of frequent sets of these events. The prediction of dropout is personalised by classifying the success of the transitions between the semesters. Based on the explored frequent item sets and classifiers, association rules are formed providing the estimates of the success of the continuation of the studies in the form of confidence metrics. The results can be used to identify critical study paths and courses. Furthermore, based on the patterns of individual uncompleted subjects, it is suitable to predict the chance of continuation in every semester. The analysis of the critical study paths can be used to design personalised actions minimizing the risk of dropout, or to redesign the curriculum aiming the reduction in the dropout rate. The applicability of the method is demonstrated based on the analysis of the progress of chemical engineering students at the University of Pannonia in Hungary. The method is suitable for the examination of more general problems assuming the occurrence of a set of events whose combinations may trigger a set of critical events.
Uploads
Papers by Róbert Csalódi
• The study presents a unique methodology that integrates sequential rule mining and survival analysis.
• The methodology extends traditional sequential rule mining by introducing time-dependent confidence functions.
• The application of the method is demonstrated within the healthcare domain.
• The study presents a unique methodology that integrates sequential rule mining and survival analysis.
• The methodology extends traditional sequential rule mining by introducing time-dependent confidence functions.
• The application of the method is demonstrated within the healthcare domain.