Papers by Dinei A Rockenbach

Zenodo (CERN European Organization for Nuclear Research), Apr 25, 2018
NoSQL databases emerged to fill limitations of the relational databases. The many options for eac... more NoSQL databases emerged to fill limitations of the relational databases. The many options for each one of the categories, and their distinct characteristics and focus makes this assessment very difficult for decision makers. Most of the time, decisions are taken without the attention and background deserved due to the related complexities. This article aims to compare the relevant characteristics of each database, abstracting the information that bases the market marketing of them. We concluded that although the databases are labeled in a specific category, there is a significant disparity in the functionalities offered by each of them. Also, we observed that new databases are emerging even though there are well-established databases in each one of the categories studied. Finally, it is very challenging to suggest the best database for each category because each scenario has its requirements, which requires a careful analysis where our work can help to simplify this kind of decision.
XXVI Brazilian Symposium on Programming Languages

NoSQL databases emerged to fill limitations of the relational databases. The many options for eac... more NoSQL databases emerged to fill limitations of the relational databases. The many options for each one of the categories, and their distinct characteristics and focus makes this assessment very difficult for decision makers. Most of the time, decisions are taken without the attention and background deserved due to the related complexities. This article aims to compare the relevant characteristics of each database, abstracting the information that bases the market marketing of them. We concluded that although the databases are labeled in a specific category, there is a significant disparity in the functionalities offered by each of them. Also, we observed that new databases are emerging even though there are well-established databases in each one of the categories studied. Finally, it is very challenging to suggest the best database for each category because each scenario has its requirements, which requires a careful analysis where our work can help to simplify this kind of decision...
Software: Practice and Experience, 2021
Resumo. Os bancos de dados são ferramentas particularmente interessantes para a manipulação de da... more Resumo. Os bancos de dados são ferramentas particularmente interessantes para a manipulação de dados gerados através do sequenciamento de DNA. Este artigo tem como objetivo avaliar o desempenho de três bancos de dados com cargas relacionadas ao sequenciamento de DNA: PostgreSQL e MySQL como bancos de dados relacionais e MongoDB como banco de dados NoSQL. Os resultados demonstram que o PostgreSQL se sobressai aos demais.

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2019
The stream processing paradigm is used in several scientific and enterprise applications in order... more The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the context of stream processing applications. In this work, our main goal is to present the parallel programming challenges that the programmer has to face when exploiting CPUs and GPUs' parallelism at the same time using traditional programming models. We highlight the parallelization methodology in two use-cases (the Mandelbrot Streaming benchmark and the PARSEC's Dedup application) to demonstrate the issues and benefits of using heterogeneous parallel hardware. The experiments conducted demonstrate how a high-level parallel programming model targeting stream processing like the one offered by SPar can be used to reduce the programming effort still offering a good level of performance if compared with state-of-the-art programming models.

International Journal of Forecasting, 2020
We examine the mortality rates due to occupational accidents of the three states in the southern ... more We examine the mortality rates due to occupational accidents of the three states in the southern region of Brazil using the autoregressive integrated moving average (ARIMA), beta autoregressive moving average ( β ARMA), and Kumaraswamy autoregressive moving average (KARMA) models to fit the data sets, considering monthly observations from 2000 to 2017. We compare them to identify the best predictive model for the southern region of Brazil. We also provide descriptive analysis, revealing the victims’ vulnerability characteristics and comparing them between the states. A clear increase was seen in female participation in the labor market, but the number of deaths from occupational accidents did not increase by the same proportion. Moreover, the state of Parana stood out for having the highest mortality rate from work-related accidents. The fitted ARIMA and β ARMA models using a 6-month time frame presented similar accuracy measurements, while KARMA performed the worst.
Resumo. Os bancos de dados são ferramentas particularmente interessantes para a manipulação de da... more Resumo. Os bancos de dados são ferramentas particularmente interessantes para a manipulação de dados gerados através do sequenciamento de DNA. Este artigo tem como objetivo avaliar o desempenho de três bancos de dados com cargas relacionadas ao sequenciamento de DNA: PostgreSQL e MySQL como bancos de dados relacionais e MongoDB como banco de dados NoSQL. Os resultados demonstram que o PostgreSQL se sobressai aos demais.
Os novos hardwares surgidos nos últimos anos têm seguido a tendência crescente no volume de dados... more Os novos hardwares surgidos nos últimos anos têm seguido a tendência crescente no volume de dados gerados pelo uso das tecnologias digitais. Na vanguarda da busca por mais desempenho estão as Graphics Processing Units (GPUs), hardwares massivamente paralelos originalmente desenhados para processamento gráfico, mas que hoje em dia são largamente utilizados em várias tarefas de cunho geral. O advento das GPUs impulsionou aplicações como carros com direção autônoma, ray tracing em tempo real, aprendizado profundo na inteligência artificial e realidade virtual (VR). Porém, esse ambiente heterogêneo com GPUs e Central Processing Units (CPUs) paralelas apresenta um desafio adicional para o desenvolvimento de software paralelo.

International Journal of Forecasting, 2021
We examine the mortality rates due to occupational accidents of the three states in the southern ... more We examine the mortality rates due to occupational accidents of the three states in the southern region of Brazil using the autoregressive integrated moving average (ARIMA), beta autoregressive moving average (ARMA), and Kumaraswamy autoregressive moving average (KARMA) models to fit the data sets, considering monthly observations from 2000 to 2017. We compare them to identify the best predictive model for the southern region of Brazil. We also provide descriptive analysis, revealing the victims’ vulnerability characteristics and comparing them between the states. A clear increase was seen in female participation in the labor market, but the number of deaths from occupational accidents did not increase by the same proportion. Moreover, the state of Paraná stood out for having the highest mortality rate from work-related accidents. The fitted ARIMA and ARMA models using a 6-month time frame presented similar accuracy measurements, while KARMA performed the worst.

Concurrency and Computation: Practice and Experience
Stream Processing is a parallel paradigm used in many application domains. With the advance of Gr... more Stream Processing is a parallel paradigm used in many application domains. With the advance of Graphics Processing Units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in micro-batches, whose computation is offloaded on the GPU leveraging data paral-lelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the micro-batch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the micro-batches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency-aware adap-tive micro-batching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel-Ziv-Storer-Szymanski (LZSS) compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.

Concurrency and Computation: Practice and Experience
Stream Processing is a parallel paradigm used in many application domains. With the advance of Gr... more Stream Processing is a parallel paradigm used in many application domains. With the advance of Graphics Processing Units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in micro-batches, whose computation is offloaded on the GPU leveraging data paral-lelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the micro-batch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the micro-batches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency-aware adap-tive micro-batching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel-Ziv-Storer-Szymanski (LZSS) compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.

Concurrency and Computation: Practice and Experience, 2020
Stream Processing is a parallel paradigm used in many application domains. With the advance of Gr... more Stream Processing is a parallel paradigm used in many application domains. With the advance of Graphics Processing Units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in micro-batches, whose computation is offloaded on the GPU leveraging data paral-lelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the micro-batch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the micro-batches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency-aware adap-tive micro-batching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel-Ziv-Storer-Szymanski (LZSS) compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.

n proceedings of the 8th Workshop on Parallel Programming Models - Special Edition on IoT and Machine Learning, 2019
The stream processing paradigm is used in several scientific and enterprise applications in order... more The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the context of stream processing applications. In this work, our main goal is to present the parallel programming challenges that the programmer has to face when exploiting CPUs and GPUs' parallelism at the same time using traditional programming models. We highlight the parallelization methodology in two use-cases (the Mandelbrot Streaming benchmark and the PARSEC's Dedup application) to demonstrate the issues and benefits of using heterogeneous parallel hardware. The experiments conducted demonstrate how a high-level parallel programming model targeting stream processing like the one offered by SPar can be used to reduce the programming effort still offering a good level of performance if compared with state-of-the-art programming models.

NoSQL databases emerged to fill limitations of the relational databases. The many options for eac... more NoSQL databases emerged to fill limitations of the relational databases. The many options for each one of the categories, and their distinct characteristics and focus makes this assessment very difficult for decision makers. Most of the time, decisions are taken without the attention and background deserved due to the related complexities. This article aims to compare the relevant characteristics of each database, abstracting the information that bases the market marketing of them. We concluded that although the databases are labeled in a specific category, there is a significant disparity in the functionalities offered by each of them. Also, we observed that new databases are emerging even though there are well-established databases in each one of the categories studied. Finally, it is very challenging to suggest the best database for each category because each scenario has its requirements, which requires a careful analysis where our work can help to simplify this kind of decision.
Escola Regional de Banco de Dados (ERBD), 2017
Key-value databases emerge to address relational databases' limitations and with the increasing c... more Key-value databases emerge to address relational databases' limitations and with the increasing capacity of RAM memory it is possible to offer greater performance and versatility in data storage and processing. The objective is to perform a comparative study of key-value databases with memory storage Redis, Memcached, Voldemort, Aerospike, Hazelcast and Riak KV. Thus, the work contributed to an analysis of different databases and with results that qualitatively demonstrated the characteristics and pointed out the main advantages.
Uploads
Papers by Dinei A Rockenbach