Papers by Armin Größlinger
Software for Exascale Computing - SPPEXA 2016-2019
Present-day stencil codes are implemented in general-purpose programming languages, such as Fortr... more Present-day stencil codes are implemented in general-purpose programming languages, such as Fortran, C, or Java, Python or derivates thereof, and harnesses for parallelism, such as OpenMP, OpenCL or MPI. Project ExaStencils pursued a domain-specific approach with a language, called ExaSlang, that is stratified into four layers of abstraction, the most abstract being the formulation in continuous mathematics and the most concrete a full, automatically generated implementation. At every layer, the corresponding language expresses not only computational directives but also domain knowledge of the problem and platform to be leveraged for optimization. We describe the approach, the software technology
PolyJIT: Polyhedral Optimization Just in Time
International Journal of Parallel Programming

ACM Transactions on Architecture and Code Optimization
Iterative program optimization is known to be able to adapt more easily to particular programs an... more Iterative program optimization is known to be able to adapt more easily to particular programs and target hardware than model-based approaches. An approach is to generate random program transformations and evaluate their profitability by applying them and benchmarking the transformed program on the target hardware. This procedure’s large computational effort impairs its practicality tremendously, though. To address this limitation, we pursue the guidance of a genetic algorithm for program optimization via feedback from surrogate performance models. We train the models on program transformations that were evaluated during previous iterative optimizations. Our representation of programs and program transformations refers to the polyhedron model. The representation is particularly meaningful for an optimization of loop programs that profit a from coarse-grained parallelization for execution on modern multicore-CPUs. Our evaluation reveals that surrogate performance models can be used t...

ACM Transactions on Architecture and Code Optimization
The polyhedron model is a powerful model to identify and apply systematically loop transformation... more The polyhedron model is a powerful model to identify and apply systematically loop transformations that improve data locality (e.g., via tiling) and enable parallelization. In the polyhedron model, a loop transformation is, essentially, represented as an affine function. Well-established algorithms for the discovery of promising transformations are based on performance models. These algorithms have the drawback of not being easily adaptable to the characteristics of a specific program or target hardware. An iterative search for promising loop transformations is more easily adaptable and can help to learn better models. We present an iterative optimization method in the polyhedron model that targets tiling and parallelization. The method enables either a sampling of the search space of legal loop transformations at random or a more directed search via a genetic algorithm. For the latter, we propose a set of novel, tailored reproduction operators. We evaluate our approach against existing iterative and model-driven optimization strategies. We compare the convergence rate of our genetic algorithm to that of random exploration. Our approach of iterative optimization outperforms existing optimization techniques in that it finds loop transformations that yield significantly higher performance. If well configured, then random exploration turns out to be very effective and reduces the need for a genetic algorithm. CCS Concepts: • Computing methodologies → Genetic programming; • Software and its engineering → Massively parallel systems;
Dagstuhl Seminars, 2007
The model-based transformation of loop programs is a way of detecting fine-grained parallelism in... more The model-based transformation of loop programs is a way of detecting fine-grained parallelism in sequential programs. One of the challenges is to agglomerate the parallelism to a coarser grain, in order to map the operations of the program to the available cores in a multicore architecture. We consider shared-memory multicores as target architecture for space-time mapped loop programs and make some observations concerning code generation, load balancing and cache effects.
On Computing Solutions of Linear Diophantine Equations with One Non-linear Parameter
2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2008
ABSTRACT We present an algorithm for solving Diophantine equations which are linear in the variab... more ABSTRACT We present an algorithm for solving Diophantine equations which are linear in the variables, but non-linear in one parameter. We are looking for the pointwise solutions, i.e., the solutions for the unknowns in dependence of the value of the parameter. Solving Diophantine equations is central to computing the data dependences of certain codes (loops with certain array accesses) which often occur in scientific computing. Our algorithm enables the computation of data dependences in more general situations than is possible with current algorithms.
Lecture Notes in Computer Science, 2009
Unlike desktop and server CPUs, special-purpose processors found in embedded systems and on graph... more Unlike desktop and server CPUs, special-purpose processors found in embedded systems and on graphics cards often do not have a cache memory which is managed automatically by hardware logic. Instead, they offer a so-called scratchpad memory which is fast like a cache but, unlike a cache, has to be managed explicitly, i.e., the burden of its efficient use is imposed on the software. We present a method for computing precisely which memory cells are reused due to temporal locality of a certain class of codes, namely codes which can be modelled in the well-known polyhedron model. We present some examples demonstrating the effectiveness of our method for scientific codes.
A Comparison of GPGPU Computing Frameworks on Embedded Systems
IFAC-PapersOnLine, 2015
Parallel Processing Letters, 2014
Type-safe feature-oriented product lines / Sven Apel, Christian Kästner, Armin Größlinger, and Christian Lengauer. - Passau, 2009. - 49 S. - (Universität / Fakultät für Informatik und Mathematik: Technical report MIP ; 0909)
ABSTRACT Eintrag für die Universitätsbibliographie

With the rise of manycore processors, parallelism is becoming a mainstream necessity. Unfortunate... more With the rise of manycore processors, parallelism is becoming a mainstream necessity. Unfortunately, parallel programming is inherently more difficult than sequential programming; therefore, techniques for automatic parallelisation will become indispensable. We aim at extending the well-known polyhedron model, which promises this automation, beyond some of its current restrictions. Up to now, loop bounds and array subscripts in the modelled codes must be expressions linear in both the variables and the parameters. We lift this restriction and allow certain polynomial expressions instead of linear ones. With our extensions, we are able to handle more programs in all phases of the parallelisation process (dependence analysis, transformation of the program model, code generation). We extend Banerjee's classical dependence analysis to handle one non-linear parameter p, i.e., we are able to determine precisely the solutions of the system of conflict equalities for input programs with...
Automatic, model-based program transformation relies on the ability to generate code from a model... more Automatic, model-based program transformation relies on the ability to generate code from a model description of the program. In the context of automatic parallelisation, cache optimisation and similar transformations, the task is to generate loop nests which enumerate the iteration points within given domains. Several approaches to code gener-ation from polyhedral descriptions of iteration sets have been proposed and are in use. We present an approach to generating loop nests for index sets with arbitrary polynomials as bounds using cylindrical alge-braic decomposition. The generated loops are efficient in the sense that no integer superset is enumerated. We also state where this technique is useful, i.e., where non-linearities in the loop bounds arise in loop pro-gram transformations and show some examples for our approach with polyhedral and non-polyhedral input.
Lecture Notes in Computer Science, 2009
The separation of concerns is a fundamental principle in software engineering. Crosscutting conce... more The separation of concerns is a fundamental principle in software engineering. Crosscutting concerns are concerns that do not align with hierarchical and block decomposition supported by mainstream programming languages. In the past, crosscutting concerns have been studied mainly in the context of object orientation. Feature orientation is a novel programming paradigm that supports the (de)composition of crosscutting concerns in a system with a hierarchical block structure. In two case studies we explore the problem of crosscutting concerns in functional programming and propose two solutions based on feature orientation.

The potential of polyhedral optimization: An empirical study
2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013
ABSTRACT Present-day automatic optimization relies on powerful static (i.e., compile-time) analys... more ABSTRACT Present-day automatic optimization relies on powerful static (i.e., compile-time) analysis and transformation methods. One popular platform for automatic optimization is the polyhedron model. Yet, after several decades of development, there remains a lack of empirical evidence of the model's benefits for real-world software systems. We report on an empirical study in which we analyzed a set of popular software systems, distributed across various application domains. We found that polyhedral analysis at compile time often lacks the information necessary to exploit the potential for optimization of a program's execution. However, when conducted also at run time, polyhedral analysis shows greater relevance for real-world applications. On average, the share of the execution time amenable to polyhedral optimization is increased by a factor of nearly 3. Based on our experimental results, we discuss the merits and potential of polyhedral optimization at compile time and run time.

The goal of the workshop and this report is to identify common themes and standardize concepts fo... more The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory. Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal.

2013 35th International Conference on Software Engineering (ICSE), 2013
Product-line technology is increasingly used in mission-critical and safety-critical applications... more Product-line technology is increasingly used in mission-critical and safety-critical applications. Hence, researchers are developing verification approaches that follow different strategies to cope with the specific properties of product lines. While the research community is discussing the mutual strengths and weaknesses of the different strategies-mostly at a conceptual level-there is a lack of evidence in terms of case studies, tool implementations, and experiments. We have collected and prepared six product lines as subject systems for experimentation. Furthermore, we have developed a modelchecking tool chain for C-based and Java-based product lines, called SPLVERIFIER, which we use to compare sample-based and family-based strategies with regard to verification performance and the ability to find defects. Based on the experimental results and an analytical model, we revisit the discussion of the strengths and weaknesses of product-line-verification strategies. • We provide the tool chain SPLVERIFIER for conducting experiments with product-based, sample-based, and familybased model checking of product lines written in C and Java. • We collected and prepared six case studies, written in the general-purpose languages C and Java, to be used as benchmarks for product-line verification. • Based on the case studies, we conducted experiments comparing the three verification strategies (including three different sampling heuristics for feature-interaction

Lecture Notes in Computer Science, 2014
Project ExaStencils pursues a radically new approach to stencil-code engineering. Present-day ste... more Project ExaStencils pursues a radically new approach to stencil-code engineering. Present-day stencil codes are implemented in general-purpose programming languages, such as Fortran, C, or Java, or derivates thereof, and harnesses for parallelism, such as OpenMP, OpenCL or MPI. ExaStencils favors a much more domain-specific approach with languages at several layers of abstraction, the most abstract being the mathematical formulation, the most concrete the optimized target code. At every layer, the corresponding language expresses not only computational directives but also domain knowledge of the problem and platform to be leveraged for optimization. This approach will enable a highly automated code generation at all layers and has been demonstrated successfully before in the U.S. projects FFTW and SPIRAL for certain linear transforms. 1 The Challenges of Exascale Computing The performance of supercomputers is on the way from petascale to exascale. Software technology for high-performance computing has been struggling to keep up with the advances in computing power, from terascale in 1997 to petascale in 2008 on to exascale, now being only a factor of 30 away and predicted for the end of the present decade. 4 So far, traditional host languages, such as Fortran and C, being equipped with harnesses for parallelism, such as MPI and OpenMP, have taken most of the burden, and they are being developed further with some new abstractions, notably the partitioned global address space (PGAS) memory model [1] in the languages Coarray Fortran [30], Chapel [9], Fortress [38], Unified Parallel C [8] or X10 [10]. Yet, the sequential host languages remain generalpurpose: Fortran or C or, if object orientation is desired, C ++ or Java. The step from petascale to exascale performance challenges present-day software technology much more than the advances from gigascale to terascale and terascale to petascale have. The reason is the explicit treatment of the massive parallelism inside one node of a high-performance cluster cannot be avoided any 4 http://www.top500.org
Journal of Symbolic Computation, 2006
We present an application of quantifier elimination techniques in the automatic parallelization o... more We present an application of quantifier elimination techniques in the automatic parallelization of nested loop programs. The technical goal is to simplify affine inequalities whose coefficients may be unevaluated symbolic constants. The values of these so-called structure parameters are determined at run time and reflect the problem size. Our purpose here is to make the research community of quantifier elimination, in a tutorial style, aware of our application domain-loop parallelization-and to highlight the rôle of quantifier elimination, as opposed to alternative techniques, in this domain. Technically, we focus on the elimination method of Weispfenning.

Automated Software Engineering, 2010
A feature-oriented product line is a family of programs that share a common set of features. A fe... more A feature-oriented product line is a family of programs that share a common set of features. A feature implements a stakeholder's requirement and represents a design decision or configuration option. When added to a program, a feature involves the introduction of new structures, such as classes and methods, and the refinement of existing ones, such as extending methods. A feature-oriented decomposition enables a generator to create an executable program by composing feature code solely on the basis of the feature selection of a user-no other information needed. A key challenge of product line engineering is to guarantee that only well-typed programs are generated. As the number of valid feature combinations grows combinatorially with the number of features, it is not feasible to type check all programs individually. The only feasible approach is to have a type system check the entire code base of the feature-oriented product line. We have developed such a type system on the basis of a formal model of a feature-oriented Java-like language. The type system guaranties type safety for feature-oriented product lines. That is, it ensures that every valid program of a well-typed product line is well-typed. Our formal model including type system is sound and complete. 1 Introduction Feature-oriented programming (FOP) aims at the modularization of programs in terms of features [59, 15]. A feature implements a stakeholder's requirement and represents a de

A mathematical model based on polyhedra (the so-called "polyhedron model") serves as a foundation... more A mathematical model based on polyhedra (the so-called "polyhedron model") serves as a foundation for model based loop program transformation like automatic parallelization. One of the restrictions present in the current polyhedron model is the requirement that the coefficients of variables must be numeric constants. This has been hindering some recent developments which require parametric coefficients of variables. We show how such nonlinear parameters can be introduced in the polyhedron model, using quantifier elimination in the real numbers as our main mathematical tool. We describe two approaches of obtaining algorithms for the generalized model. First, we point out how existing algorithms can be implemented for the generalized model. Quantifier elimination is employed in this approach to simplify arising case distinctions. We give Fourier-Motzkin elimination and the Simplex algorithm as examples of this approach. Second, we show how quantifier elimination can be used to solve some problems directly, e.g., by computing lexicographic maxima. We also demonstrate how to apply our methods to the frequently appearing case of tiling an index space with parametric tile size, and we present some performance results of the generalized algorithms we have implemented.
Uploads
Papers by Armin Größlinger