Papers by Maria Clicia Stelling Castro
Cadernos do IME - Série Informática, 2021
Os artigos enviados para publicação deverão ser inéditos, com exceção de resumos ou teses, são de... more Os artigos enviados para publicação deverão ser inéditos, com exceção de resumos ou teses, são de responsabilidade de seus autores, e não refletem, necessariamente, a opinião do IME. Sua reprodução é livre, em qualquer outro veículo de comunicação, desde que citada a fonte.
Cadernos Do Ime Serie Informatica, 2005
Neste artigo propomos uma versão paralela para um problema clássico de otimização, denominado cai... more Neste artigo propomos uma versão paralela para um problema clássico de otimização, denominado caixeiro viajante ou Travelling Salesman Problem (TSP). O TSP é um problema NP-completo de difícil solução. Um dos problemas principais no TSP é o alto tempo de computção necessário para encontrar uma solução. Uma das abordagens para resolver esse problema consiste em encontrar uma solução aproximada, através de alguma heurística conhecida, que pode ser encontrada na literatura. Por exemplo, soluções que utilizam algoritmos genéticos e GRASP. O objetivo deste trabalho é descrever e avaliar uma versão paralela do TSP, utilizando como heurística a simulação de colônia de formigas. Nossos resultados foram obtidos em um sistema SP2, com 2, 4 e 6 processadores, e demonstram que obtivemos bons speedups, e soluções bem próximas do valor ótimo.
Este artigo define o ambiente Algo+naEscola que fornece ao professor, um auxilio ao uso da logica... more Este artigo define o ambiente Algo+naEscola que fornece ao professor, um auxilio ao uso da logica computacional atraves de jogos na sala de aula. Os requisitos desse ambiente foram propostos atraves de um experimento do ensino de Matematica utilizando algoritmos e jogos. A ideia do trabalho de campo foi, de maneira interdisciplinar, desenvolver o ensino de ângulos atraves do uso de algoritmos e programacao de computadores. Esse experimento foi desenvolvido durante aulas de Matematica com alunos do ensino Fundamental II, sendo a base para construcao do ambiente Algo+naEscola.

Anais dos Workshops do V Congresso Brasileiro de Informática na Educação (CBIE 2016), 2016
This paper presents an experiment of teaching Mathematics using algorithms and games for elementa... more This paper presents an experiment of teaching Mathematics using algorithms and games for elementary school students. The idea is to consolidate, in an interdisciplinary way, the contents of angles, algorithms, and computer programming. The research objective is to support the proposal for an environment that helps the mathematics teacher of the Secondary School for the use of computational logic through games. Resumo. Neste artigo é apresentado um experimento de ensino de Matemática utilizando algoritmos através de jogos, para alunos do ensino Fundamental II. A ideia é sedimentar, de uma maneira interdisciplinar, os conteúdos de ângulos, algoritmos e programação de computadores. O objetivo da pesquisa é fundamentar a proposta de um ambiente que auxilie o professor de Matemática do ensino Fundamental II para o uso da lógica computacional através de jogos.
Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001
This work introduces a new technique that enables SDSMs to categorize dynamically and accurately ... more This work introduces a new technique that enables SDSMs to categorize dynamically and accurately memory sharing patterns in both classes of regular and irregular applications. The categorization is carried out automatically at run-time on a per-page basis, requiring no user or compiler assistance. We evaluate the potential benefits of our technique using execution-driven simulations of 8 applications running on TreadMarks on a network of 8 workstations. Surprisingly, we found that producer-consumer(s) and migratory are the dominant patterns even in irregular applications. Preliminary results suggest that the categorization technique we propose is a promising option to further improve the performance of current adaptive SDSM systems.
2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
The Kronecker product, also called tensor product, is a fundamental matrix algebra operation, use... more The Kronecker product, also called tensor product, is a fundamental matrix algebra operation, used to model complex systems using structured descriptions. This operation needs to be computed efficiently, since it is a critical kernel for iterative algorithms. In this work, we focus on the vector-kronecker product operation, where we present an in-depth performance analysis of a sequential and a parallel algorithm previously proposed. Based on this analysis, we proposed three optimizations: changing the memory access pattern, reducing load imbalance and manually vectorizing some portions of the code with Intel SSE4.2 intrinsics. The obtained results show better cache usage and load balance, thus improving the performance, especially for larger matrices.
Option pricing is one of the main topics of the financial market. There are different models to p... more Option pricing is one of the main topics of the financial market. There are different models to perform this calculation, however BlackScholes is one of the most used nowadays. Due to this, the implementation of this algorithm shows an interesting study case for optimization. FPGAs (\textit{Field Programmable Gate Array}) are boards commonly used in high perfomance computing, with promising results in several cases. Therefore, the main objective of this paper is to implement the Black Scholes formula in FPGA and evaluate its efficiency, using HLS (\textit{High level synthesis}) and running in a Pynq-Z1 board. The results were compared with an execution in python on the ARM present on the board. The obtained results show an unexpected performance, and somes of possible explanations to this fact.

Concurrency and Computation: Practice and Experience, 2018
In a program, there is usually a significant amount of instructions that are repeatedly executed ... more In a program, there is usually a significant amount of instructions that are repeatedly executed with the same inputs during the execution. This redundancy allows the reuse of previous computations, potentially reducing the program execution time. The Dynamic Trace Memoization technique (DTM) was proposed to exploit the reuse of a dynamic sequence of redundant instructions for superscalar CPUs. This paper proposes the application of the DTM technique on a GPU architecture. We propose the DTM@GPU model that adapts the original DTM technique to the NVIDIA GPU architecture by introducing architectural modifications and the identification of different trace reuse styles in multithreaded environments. We investigate reuse opportunities in real-world GPU applications and the potential performance gains. We also perform a detailed investigation on the characteristics of the reused traces. This characterization shows the number and size of the reused traces, the influence of the cache size on reuse rates, and the cycles that are saved when all threads in a warp reuse instructions or traces. The results show approximately up to 35.3% of reuse, yielding an estimated speedup gain of 10.7%.
Cadernos Do Ime Serie Informatica, 2004
This work introduces a new fault tolerant and distributed branch-and-bound algorithm, applied to ... more This work introduces a new fault tolerant and distributed branch-and-bound algorithm, applied to the Steiner Problem in Graphs (SPG), to be run on computational Grids. Many Grids are composed of cluster of processors connected via high-speed links and the clusters, geographically distant, are connected through low-speed links, in a hierarchical fashion. The algorithm proposed has the following features: i) it does not employ the usual master-worker paradigm; ii) it considers the hierarchical structure of such Grids in its procedures; and iii) it contains load balance and fault tolerance mechanisms. Good speepuds were obtained, allowing the resolution of hard instances in very reasonable times.
Proceedings - Symposium on Computer Architecture and High Performance Computing, 2007
Speedup in distributed executions of Constraint Logic Programming (CLP) applications are directed... more Speedup in distributed executions of Constraint Logic Programming (CLP) applications are directed related to a good constraint partitioning algorithm. In this work we study different mechanisms to distribute constraints to processors based on straightforward mechanisms such as Round-Robin and Block distribution, and on a more sophisticated automatic distribution method, Grouping-Sink, that takes into account the connectivity of the constraint network graph. This aims at reducing the communication overhead in distributed environments. Our results show that Grouping-Sink is, in general, the best alternative for partitioning constraints as it produces results as good or better than Round-Robin or Blocks with low communication rate.
Lecture Notes in Computer Science, 2003
In this work we approach the seismic wave propagation problem in two dimensions using the finite ... more In this work we approach the seismic wave propagation problem in two dimensions using the finite element method (FEM). This kind of problem is essential to study the structure of the earth’s interior and exploring petroleum reservoirs. Using a representative FEM-based application, we propose and evaluate two parallel algorithms based on the inverse mapping and on the mesh coloring, respectively.
14th Symposium on Computer Architecture and High Performance Computing, 2002. Proceedings.
Software Distributed Shared Memory (SDSM) systems provide the shared memory abstraction on top of... more Software Distributed Shared Memory (SDSM) systems provide the shared memory abstraction on top of a message passing hardware, simplifying application programming in these architectures. However, some memory references exhibit long latencies due to remotely cached data. In order to hide this latency, many techniques that propagate data speculatively were developed. This requires that the data access behavior of the applications
Proceedings. 15th Symposium on Computer Architecture and High Performance Computing
Scheduling by Edge Reversal (SER) is a fully distributed scheduling mechanism based on the manipu... more Scheduling by Edge Reversal (SER) is a fully distributed scheduling mechanism based on the manipulation of acyclic orientations of a graph. This work uses SER to perform constraint partitioning of Constraint Satisfaction Problems (CSP). In order to apply the SER mechanism, the graph representing the constraints must receive an acyclic orientation. Since obtaining an optimal acyclic orientation is an NP-hard problem, this work studies three non-deterministic strategies known in the literature: Alg-Neigh, Alg-Edges, and Alg-Colour. We implemented the three algorithms and the SER scheduling mechanism, applying them to the CSP constraint networks generated from 3 applications. Our results show that SER has a great potential to perform a good partitioning of the constraint graphs.

Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007), 2007
In multi-agent systems, blackboard communication provides an easy way for agents to communicate. ... more In multi-agent systems, blackboard communication provides an easy way for agents to communicate. However, in a distributed architecture, the blackboard is commonly implemented in a single node, that is the "blackboard server". This approach may work well for a small number of agents. Nevertheless, when the number of agents increases, the blackboard server becomes the bottleneck for the system scalability. In this work we propose a novel agent communication system based on distributed-shared mechanisms that allows the implementation of a shared address space on a distributed system, without centralizing the blackboard. Our idea is to distribute the shared data over the nodes and use a message passing subsystem , totally transparent to the agents. We implemented this idea in a multi-agent system based on social laws, and used a distributed-shared memory system to handle the messaging. We created a conflict simulator, and showed that the distributed-shared mechanisms easy the implementation of a truly distributed blackboard.

Em Sistemas multi-agentes, os agentes podem se comunicar de forma direta através de trocas de men... more Em Sistemas multi-agentes, os agentes podem se comunicar de forma direta através de trocas de mensagens ou de forma indireta através de um blackboard. A comunicação por blackboard é mais simples, porém sua implementação em uma arquitetura distribuída não é eficiente. Este fato ocorre porque normalmente se utiliza um único elemento de processamento como mantenedor do blackboard. Neste trabalho, estamos propondo o uso de mecanismos de memória compartilhada distribuída para permitir a implementação de blackboards em sistemas distribuídos de forma eficiente. Nossa idéia é distribuir os dados do blackboard pelos elementos de processamento e usar um sub-sistema que realize a troca de mensagens entre eles de forma totalmente transparente aos agentes. Implementamos esta proposta num sistema multiagentes baseado em leis sociais (Tri-coord), utilizando o sistema software DSM TreadMarks como sub-sistema de distribuição de dados e gerenciamento das mensagens. Criamos um simulador de conflitos para gerar situações onde a comunicação entre os agentes é necessária e mostramos que o sub-sistema de memória compartilhada distribuída é capaz de manter os dados do blackboard coerentes..
Colloquium on Implementation of …, 2001
Abstract: This work presents the parallelisation of the AC-5 arc-consistency algorithm for distri... more Abstract: This work presents the parallelisation of the AC-5 arc-consistency algorithm for distributedshared memory platforms. We conducted our experiments using an adapted version of the PCSOS parallel system, over nite domains, running on top of Treadmarks, a ...

Networks in Systems Biology, 2020
Cells in an organism interact with each other and with the environment through a complex set of s... more Cells in an organism interact with each other and with the environment through a complex set of signals, which triggers responses and activates cellular regulation mechanisms. Models obtained by computational mathematics for cellular signaling dynamics are used to understand factors and causes of deregulation of internal biological processes, which is a relevant knowledge in a disease such as cancer. Gene regulatory networks describe gene interactions and how these relationships control cellular processes such as growth and cell division, which relate this disease to the regulatory network. Despite its simplicity, Boolean networks may accurately model some biological phenomena, such as gene regulatory network dynamics. Indeed, several reports in the literature show that they are accurate enough to build models of regulatory networks of cell lines related to breast cancer. In this chapter, we present a methodology for building cellular regulatory networks based on the Boolean paradigm, which uses entropy as a criterion for selecting genes that are included in the network. The main objective is to understand dynamical behaviors related to situations that cause breast cancer and tumor lineages and to suggest experimentations to verify the outcome of interventions in networks, in order to support the identification of new therapeutic targets.
. In order to apply the SER mechanism, the graph representing the constraints must receive an acy... more . In order to apply the SER mechanism, the graph representing the constraints must receive an acyclic orientation. Since obtaining an optimal acyclic orientation is an NP-hard problem, this work studies three non-deterministic strategies known in the literature: Alg-Neigh, Alg-Edges, and Alg-Colour. We implemented the three algorithms and the SER scheduling mechanism, applying them to the CSP constraint networks generated from 3 applications. Our results show that SER has a great potential to perform a good partitioning of the constraint graphs.
This work presents the parallelisation of the AC-5 arc-consistency algorithm for a centralised me... more This work presents the parallelisation of the AC-5 arc-consistency algorithm for a centralised memory machine (Enterprise). We conducted our experiments using an adapted version of the PCSOS parallel constraint solving system, over finite domains. In the implementation for a centralised memory machine (CMM) we use synchronisation based on atomic read- modify-write primitives supported in hardware. We ran four benchmarks used
Uploads
Papers by Maria Clicia Stelling Castro