Skip to main content

Marcela Printista

Followers

66

Following

55

Co-authors

2

Public Views

University of Oxford

Kansas State University

Stefano R L Campana

University of Siena / Università di Siena

Queen Mary, University of London

Mohamed Rahmouni

SAAD DAHLEB

Durham University

Hossein Sadegh Lafmejani

Università degli Studi di Parma (Italy)

Pontificia Universidad Catolica de Chile

Tribeni Prasad Banerjee

West Bengal University Of Technology

Gacheon University

Interests

Uploads

Papers by Marcela Printista

Oblivious BSP

Lecture Notes in Computer Science, 2000

ABSTRACT The BSP model can be extended with a zero cost synchronization mechanism, which can be u... more ABSTRACT The BSP model can be extended with a zero cost synchronization mechanism, which can be used when the number of messages due to receive is known. This mechanism, usually known as “oblivious synchronization” implies that different processors can be in different supersteps at the same time. An unwanted consequence of these software improvements is a loss of prediction accuracy. This paper proposes an extension of the BSP complexity model to deal with oblivious barriers and shows its accuracy.

Indexación y administración de grandes volúmenes de datos

XXIII Workshop de Investigadores en Ciencias de la Computación (WICC 2021, Chilecito, La Rioja), 2021

Recuperación de información en grandes volúmenes de datos

Palabras Claves: bases de datos masivas, computación de alto desempeño, recuperación de información.

A Tool for Performance Modeling of Parallel Programs

Scientific Programming, 2003

Current performance prediction analytical models try to characterize the performance behavior of ... more Current performance prediction analytical models try to characterize the performance behavior of actual machines through a small set of parameters. In practice, substantial deviations are observed. These differences are due to factors as memory hierarchies or network latency. A natural approach is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each "communication block". Unfortunately, to use this approach implies that the evaluation of parameters must be done for each algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We present a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter,...

Load distribution and balancing support in a workstation-based distributed system

ACM SIGOPS Operating Systems Review, 1997

In distributed systems, load distribution and balancing are primary functions addressed to improv... more In distributed systems, load distribution and balancing are primary functions addressed to improvements on system performance and additional user comfort. Incoming task allocation and remote process execution are main responsibilities of a well designed system to achieve such performance improvements. Both aspects involve a number of non trivial tasks.As a basement for further automatic system decision in a distributed environment, this paper propose a user-supervised processor allocation scheduler, shows which information should be collected and when and how to collect and disseminate it to support the user decision.The main problems to be considered when implementing remote process execution are discussed and a design for an alternative system attempting to solve these problems is also shown.

A low communication overhead parallel implementation of the back-propagation algorithm

The back-propagation algorithm is one of the most widely used training algorithms for neural netw... more The back-propagation algorithm is one of the most widely used training algorithms for neural networks. The training phase of a multilayer perceptron by using this algorithm can take very long time making neural networks difficult to accept. One approach to solve this problem consists in the parallelization of the training algorithm. There exists many different approaches, however most of them are well adapted to specialized hardware. The idea to use a network of workstations as a general purpose parallel computer is widely accepted. However, the communication overhead imposes restrictions in the design of parallel algorithms. In this work, we propose a parallel implementation of the back-propagation algorithm that is suitable to be applied to a network of workstations. The objective is twofold. The first goal is to increment the performance of the training phase of the algorithm with low communication overhead. The second goal is to provide a dynamic assignment of tasks to processors in order to make the best use of the computational resources.

Predicting the performance of parallel programs

Parallel Computing, 2004

This work presents a new approach to the relation between theoretical complexity models and perfo... more This work presents a new approach to the relation between theoretical complexity models and performance analysis and tuning. The analysis of an algorithm produces a complexity function that gives an approach to the asymptotic number of operations performed by the algorithm. The time spent on these operations depends on the software-hardware platform being used. Usually such platforms are described, from the performance point of view, through a number of parameters. Those parameters are evaluated by a benchmarking program. Though for a given available platform, the algorithmic constants associated with the complexity formula can be computed using multidimensional linear regression, there is still the problem of predicting the performance when the platform is not available. We introduce the concept of Universal Instruction Class and derive from it a set of equations relating the values of the algorithmic constants with the platform parameters. Due to the hierarchical design of current memory systems, the performance behavior of most algorithms varies in a small number of large regions corresponding to small size, medium size and large size inputs. The constants involved in the complexity formula usually have different values for these regions. Assuming we have a complexity formula for the memory resources, it is possible to find a partition of the input size space and the different values of the algorithmic constants. This way, though the complexity formula is the same, the family of constants provides the adaptability of the formula to the different stationary uses of the memory.

Recuperacion Eficiente de Informacion Multimedia Luis Britos, Marıa E. Di Gennaro, Jacqueline Fernandez, Veronica Gil-Costa, Fernando Kasian, Veronica Luduena, Marcela Printista, Nora Reyes, Patricia Roggero LIDIC, Departamento de Informatica, Fac. de Ciencias Fısico Matematicas y Naturales Unive...

En general, es tan difícil para los usuarios que intentan recuperar información multimedia poder ... more En general, es tan difícil para los usuarios que intentan recuperar información multimedia poder especificar claramente sus intereses a través de una consulta bien definida, como para los diseñadores del sistema decidir qué características de los objetos multimedia pueden resultar relevantes. La forma en que los datos multimedia se representan, cómo se almacenan y el costo de transferirlos, entre distintos niveles de la jerarquía de memoria o sobre una red, afectan directamente las respuestas del sistema. Dada una consulta, el ...

Multi-BSP vs. BSP: A Case of Study for Dell AMD Multicores

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), 2018

Computer models have been used as a bridge between parallel algorithms and hardware architectures... more Computer models have been used as a bridge between parallel algorithms and hardware architectures. The Bulk-Synchronous Parallel (BSP) is a well-known computing model originally devised for distributed algorithms running on clusters of single-core processors. The Multi-BSP model, that extends the classic BSP model, was recently proposed for multi-core processors. However, this model -implemented through the MulticoreBSP-for-C library- presents some restrictions such as the explicit synchronizations between the cores, introducing some challenges on which the hardware characteristics should be taken into account to properly model the parallel algorithms. Therefore, we explore the suitability of these models for the Dell multi-core architecture. The objectives of this contribution are twofold. First, we model two different multi-core Dell architectures. Second, we show that a simple model with few parameters can be easily adapted to each Dell platform rather than complex models which tends to use tricky hardware parameters.

SCCG 2012 Organizing Committee

Workshop Co-chairs Sergio Nesmachnow, Universidad de la República, Uruguay Bernabé Dorronsoro, Un... more Workshop Co-chairs Sergio Nesmachnow, Universidad de la República, Uruguay Bernabé Dorronsoro, University of Luxembourg, Luxembourg ... Publicity Chair Santiago Iturriaga, Universidad de la República, Uruguay ... Program Committee Members Enrique Alba, Universidad de Málaga, Spain Jhon Edgar Amaya, Universidad Nacional Experimental del Tachira, Venezuela Francisco Brasileiro, Universidade Federal de Campina Grande, Brazil Pascal Bouvry, University of Luxembourg, Luxembourg Juan Carlos Burguillo, Universidad de Vigo, Spain Héctor ...

Recuperación de datos para el procesamiento de datos masivos

A parallel approach for backpropagation learning of neural networks

Learning algorithms for neural networks involve CPU intensive processing and consequently great e... more Learning algorithms for neural networks involve CPU intensive processing and consequently great effort has been done to develop parallel implemetations intended for a reduction of learning time. This work briefly describes parallel schemes for a backpropagation algorithm and proposes a distributed system architecture for developing parallel training with a partition pattern scheme. Under this approach, weight changes are computed concurrently, exchanged between system components and adjusted accordingly until the whole parallel learning process is completed. Some comparative results are also shown.

Recuperación y procesamiento en grandes volúmenes de datos

Herramientas de modelado y simulación para sistemas de gran escala

La línea de investigación presentada en este trabajo recurre a un proyecto que vincula estrechame... more La línea de investigación presentada en este trabajo recurre a un proyecto que vincula estrechamente dos temas que han cobrado gran interés en los últimos años debido al avance de la tecnología y a los costos excesivos que requieren las pruebas y ejecuciones sobre plataformas reales. Nos referimos a las líneas de Modelado y Simulación. En particular, nos enfocamos en el modelado de aplicaciones de gran escala para plataformas paralelas que no pueden ser probadas en sistemas y hardware reales debido al costo de los mismos. Para ello, es posible utilizar diferentes herramientas como las Petri Nets[Petri62], Devs[Zeig76], Análisis Operacional [Den78] y UML. Otra ventaja de las técnicas de modelado y simulación utilizadas en este proyecto, es que permite obtener estimaciones de las métricas utilizadas en las aplicaciones para determinar el costobeneficio de implementar y desplegar la aplicación en un hardware real.

A parallel approach for backpropagation learning of neural networks

Journal of Computer Science Technology, 1999

Extending device management in Minix

Acm Sigops Operating Systems Review, Apr 1, 1993

Minix is a Unix clone Operating System, designed by Tanembaum ([2],[3]) to allow beginners to do ... more Minix is a Unix clone Operating System, designed by Tanembaum ([2],[3]) to allow beginners to do practical training in Operating Systems area. In this context the present paper describes the work done by a group of undergraduates implementing extensions in device management. Problems in the original code, detected during the analysis and development stages, are also reported.

Modelización BSP de listas invertidas paralelas

Analyzing the Buckets Inverted Files

Search engine is a popular term for an information retrieval (IR) system. While researches and de... more Search engine is a popular term for an information retrieval (IR) system. While researches and developers take a broad view of IR systems, consumers think of them more in terms of what they want the system to do - namely search the Web, or an intranet or a database. This paper is aimed at the study of some alternatives to

Groups in bulk synchronous parallel computing

Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing, 1999

An extension to the Bulk Synchronous Parallel Model (BSP) to allow the use of asynchronous BSP gr... more An extension to the Bulk Synchronous Parallel Model (BSP) to allow the use of asynchronous BSP groups of processors is presented. In this model, called Nested BSP, processor groups can be divided and processors in a group synchronize through group dependent collective operations generalizing the concept of barrier synchronization. A classification of problems and algorithms attending to their parallel input-output

Predicting the time of oblivious programs

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001

The BSP model can be extended with a zero cost synchronization mechanism that can be used when th... more The BSP model can be extended with a zero cost synchronization mechanism that can be used when the numbers of messages due to receive is known. This mechanism, usually known as “oblivious synchronization”, implies that different processors can be in different supersteps at the same time. An unwanted consequence of these software improvements is a loss of accuracy in prediction.

Oblivious BSP

Lecture Notes in Computer Science, 2000

ABSTRACT The BSP model can be extended with a zero cost synchronization mechanism, which can be u... more ABSTRACT The BSP model can be extended with a zero cost synchronization mechanism, which can be used when the number of messages due to receive is known. This mechanism, usually known as “oblivious synchronization” implies that different processors can be in different supersteps at the same time. An unwanted consequence of these software improvements is a loss of prediction accuracy. This paper proposes an extension of the BSP complexity model to deal with oblivious barriers and shows its accuracy.

Indexación y administración de grandes volúmenes de datos

XXIII Workshop de Investigadores en Ciencias de la Computación (WICC 2021, Chilecito, La Rioja), 2021

Recuperación de información en grandes volúmenes de datos

Palabras Claves: bases de datos masivas, computación de alto desempeño, recuperación de información.

A Tool for Performance Modeling of Parallel Programs

Scientific Programming, 2003

Current performance prediction analytical models try to characterize the performance behavior of ... more Current performance prediction analytical models try to characterize the performance behavior of actual machines through a small set of parameters. In practice, substantial deviations are observed. These differences are due to factors as memory hierarchies or network latency. A natural approach is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each "communication block". Unfortunately, to use this approach implies that the evaluation of parameters must be done for each algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We present a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter,...

Load distribution and balancing support in a workstation-based distributed system

ACM SIGOPS Operating Systems Review, 1997

In distributed systems, load distribution and balancing are primary functions addressed to improv... more In distributed systems, load distribution and balancing are primary functions addressed to improvements on system performance and additional user comfort. Incoming task allocation and remote process execution are main responsibilities of a well designed system to achieve such performance improvements. Both aspects involve a number of non trivial tasks.As a basement for further automatic system decision in a distributed environment, this paper propose a user-supervised processor allocation scheduler, shows which information should be collected and when and how to collect and disseminate it to support the user decision.The main problems to be considered when implementing remote process execution are discussed and a design for an alternative system attempting to solve these problems is also shown.

A low communication overhead parallel implementation of the back-propagation algorithm

The back-propagation algorithm is one of the most widely used training algorithms for neural netw... more The back-propagation algorithm is one of the most widely used training algorithms for neural networks. The training phase of a multilayer perceptron by using this algorithm can take very long time making neural networks difficult to accept. One approach to solve this problem consists in the parallelization of the training algorithm. There exists many different approaches, however most of them are well adapted to specialized hardware. The idea to use a network of workstations as a general purpose parallel computer is widely accepted. However, the communication overhead imposes restrictions in the design of parallel algorithms. In this work, we propose a parallel implementation of the back-propagation algorithm that is suitable to be applied to a network of workstations. The objective is twofold. The first goal is to increment the performance of the training phase of the algorithm with low communication overhead. The second goal is to provide a dynamic assignment of tasks to processors in order to make the best use of the computational resources.

Predicting the performance of parallel programs

Parallel Computing, 2004

This work presents a new approach to the relation between theoretical complexity models and perfo... more This work presents a new approach to the relation between theoretical complexity models and performance analysis and tuning. The analysis of an algorithm produces a complexity function that gives an approach to the asymptotic number of operations performed by the algorithm. The time spent on these operations depends on the software-hardware platform being used. Usually such platforms are described, from the performance point of view, through a number of parameters. Those parameters are evaluated by a benchmarking program. Though for a given available platform, the algorithmic constants associated with the complexity formula can be computed using multidimensional linear regression, there is still the problem of predicting the performance when the platform is not available. We introduce the concept of Universal Instruction Class and derive from it a set of equations relating the values of the algorithmic constants with the platform parameters. Due to the hierarchical design of current memory systems, the performance behavior of most algorithms varies in a small number of large regions corresponding to small size, medium size and large size inputs. The constants involved in the complexity formula usually have different values for these regions. Assuming we have a complexity formula for the memory resources, it is possible to find a partition of the input size space and the different values of the algorithmic constants. This way, though the complexity formula is the same, the family of constants provides the adaptability of the formula to the different stationary uses of the memory.

Recuperacion Eficiente de Informacion Multimedia Luis Britos, Marıa E. Di Gennaro, Jacqueline Fernandez, Veronica Gil-Costa, Fernando Kasian, Veronica Luduena, Marcela Printista, Nora Reyes, Patricia Roggero LIDIC, Departamento de Informatica, Fac. de Ciencias Fısico Matematicas y Naturales Unive...

En general, es tan difícil para los usuarios que intentan recuperar información multimedia poder ... more En general, es tan difícil para los usuarios que intentan recuperar información multimedia poder especificar claramente sus intereses a través de una consulta bien definida, como para los diseñadores del sistema decidir qué características de los objetos multimedia pueden resultar relevantes. La forma en que los datos multimedia se representan, cómo se almacenan y el costo de transferirlos, entre distintos niveles de la jerarquía de memoria o sobre una red, afectan directamente las respuestas del sistema. Dada una consulta, el ...

Multi-BSP vs. BSP: A Case of Study for Dell AMD Multicores

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), 2018

Computer models have been used as a bridge between parallel algorithms and hardware architectures... more Computer models have been used as a bridge between parallel algorithms and hardware architectures. The Bulk-Synchronous Parallel (BSP) is a well-known computing model originally devised for distributed algorithms running on clusters of single-core processors. The Multi-BSP model, that extends the classic BSP model, was recently proposed for multi-core processors. However, this model -implemented through the MulticoreBSP-for-C library- presents some restrictions such as the explicit synchronizations between the cores, introducing some challenges on which the hardware characteristics should be taken into account to properly model the parallel algorithms. Therefore, we explore the suitability of these models for the Dell multi-core architecture. The objectives of this contribution are twofold. First, we model two different multi-core Dell architectures. Second, we show that a simple model with few parameters can be easily adapted to each Dell platform rather than complex models which tends to use tricky hardware parameters.

SCCG 2012 Organizing Committee

Workshop Co-chairs Sergio Nesmachnow, Universidad de la República, Uruguay Bernabé Dorronsoro, Un... more Workshop Co-chairs Sergio Nesmachnow, Universidad de la República, Uruguay Bernabé Dorronsoro, University of Luxembourg, Luxembourg ... Publicity Chair Santiago Iturriaga, Universidad de la República, Uruguay ... Program Committee Members Enrique Alba, Universidad de Málaga, Spain Jhon Edgar Amaya, Universidad Nacional Experimental del Tachira, Venezuela Francisco Brasileiro, Universidade Federal de Campina Grande, Brazil Pascal Bouvry, University of Luxembourg, Luxembourg Juan Carlos Burguillo, Universidad de Vigo, Spain Héctor ...

Recuperación de datos para el procesamiento de datos masivos

A parallel approach for backpropagation learning of neural networks

Learning algorithms for neural networks involve CPU intensive processing and consequently great e... more Learning algorithms for neural networks involve CPU intensive processing and consequently great effort has been done to develop parallel implemetations intended for a reduction of learning time. This work briefly describes parallel schemes for a backpropagation algorithm and proposes a distributed system architecture for developing parallel training with a partition pattern scheme. Under this approach, weight changes are computed concurrently, exchanged between system components and adjusted accordingly until the whole parallel learning process is completed. Some comparative results are also shown.

Recuperación y procesamiento en grandes volúmenes de datos

Herramientas de modelado y simulación para sistemas de gran escala

La línea de investigación presentada en este trabajo recurre a un proyecto que vincula estrechame... more La línea de investigación presentada en este trabajo recurre a un proyecto que vincula estrechamente dos temas que han cobrado gran interés en los últimos años debido al avance de la tecnología y a los costos excesivos que requieren las pruebas y ejecuciones sobre plataformas reales. Nos referimos a las líneas de Modelado y Simulación. En particular, nos enfocamos en el modelado de aplicaciones de gran escala para plataformas paralelas que no pueden ser probadas en sistemas y hardware reales debido al costo de los mismos. Para ello, es posible utilizar diferentes herramientas como las Petri Nets[Petri62], Devs[Zeig76], Análisis Operacional [Den78] y UML. Otra ventaja de las técnicas de modelado y simulación utilizadas en este proyecto, es que permite obtener estimaciones de las métricas utilizadas en las aplicaciones para determinar el costobeneficio de implementar y desplegar la aplicación en un hardware real.

A parallel approach for backpropagation learning of neural networks

Journal of Computer Science Technology, 1999

Extending device management in Minix

Acm Sigops Operating Systems Review, Apr 1, 1993

Minix is a Unix clone Operating System, designed by Tanembaum ([2],[3]) to allow beginners to do ... more Minix is a Unix clone Operating System, designed by Tanembaum ([2],[3]) to allow beginners to do practical training in Operating Systems area. In this context the present paper describes the work done by a group of undergraduates implementing extensions in device management. Problems in the original code, detected during the analysis and development stages, are also reported.

Modelización BSP de listas invertidas paralelas

Analyzing the Buckets Inverted Files

Search engine is a popular term for an information retrieval (IR) system. While researches and de... more Search engine is a popular term for an information retrieval (IR) system. While researches and developers take a broad view of IR systems, consumers think of them more in terms of what they want the system to do - namely search the Web, or an intranet or a database. This paper is aimed at the study of some alternatives to

Groups in bulk synchronous parallel computing

Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing, 1999

An extension to the Bulk Synchronous Parallel Model (BSP) to allow the use of asynchronous BSP gr... more An extension to the Bulk Synchronous Parallel Model (BSP) to allow the use of asynchronous BSP groups of processors is presented. In this model, called Nested BSP, processor groups can be divided and processors in a group synchronize through group dependent collective operations generalizing the concept of barrier synchronization. A classification of problems and algorithms attending to their parallel input-output

Predicting the time of oblivious programs

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001

The BSP model can be extended with a zero cost synchronization mechanism that can be used when th... more The BSP model can be extended with a zero cost synchronization mechanism that can be used when the numbers of messages due to receive is known. This mechanism, usually known as “oblivious synchronization”, implies that different processors can be in different supersteps at the same time. An unwanted consequence of these software improvements is a loss of accuracy in prediction.