Skip to main content

Klaus Waldschmidt

Followers

7

Following

1

Co-author

1

Public Views

Kansas State University

Armando Marques-Guedes

UNL - New University of Lisbon

Sandra (A.C.C.) van Wijk

American University of Sharjah

Macquarie University

Andrew N Liaropoulos

University of Piraeus

Field Museum

Université du Luxembourg

The Hebrew University of Jerusalem

Alexander Lubotsky

Universiteit Leiden

Universiteit Leiden

Interests

Uploads

Papers by Klaus Waldschmidt

Lösung rechenaufwendiger Probleme

Automatic scheduling for cache only memory architectures

For parallel and distributed systems to gain wider acceptance than they have to date, they must b... more For parallel and distributed systems to gain wider acceptance than they have to date, they must become significantly easier to program. Fundamentally, parallel programming is more difficult than sequential programming as long as data and computation must be distributed by the programmer. Cache Only Memory Architectures (COMAs) provide a Distributed Shared Memory (DSM) where data distribution is performed automatically and transparently. This paper generalizes this idea to achieve the same distribution for computation, thus arriving at an automatic and transparent form of scheduling. Where COMA literature normally makes no assumptions concerning the parallel programs which use the DSM, we use special compiler techniques originally developed for multithreaded and dataflow architectures. Having done so, we can specify ways of significantly simplifying the basic COMA coherency protocols, while at the same time enabling automatic, transparent, adaptive run-time scheduling.

Pipelining and parallel training of neural networks on distributed-memory multiprocessors

This paper presents a parallel neural network simulator, implemented on a Parsytec Multicluster2 ... more This paper presents a parallel neural network simulator, implemented on a Parsytec Multicluster2 transputer system. In practical use, neural networks often employ the backpropagation learning rule, as this supervised learning method can be applied to a wide field of recognition problems. The authors focus on the acceleration of backpropagation learning by combining pipelining and parallel training methods. The pipelining model

Genetic processing of the traveling salesman problem with the associative architecture AM/sup 3/

This paper presents a genetic solvrng izpproach to the "raveling Salesman Problem (TSP), which ca... more This paper presents a genetic solvrng izpproach to the "raveling Salesman Problem (TSP), which can be sign$contly accelerated by using an associative processor architecture, called the A M 3 . To compile the genetic TSP ulgorithm, a C' + programming encironment contaitriny (in associative object library 2s needed as well as a n A M 3 code interpreter to count machine instructions [IO]. Further, a recombination operator, known in the literature as "Partially Mapped Crossover'' (PMX), is employed b?]. The associative character of this operator makes it possible lo reduce its time complexity from quadratic to linear (from O ( n 2 ) to O ( n ) ) . This reduction 2,s noticeable in practice, since genetical recombination demands an increasing portion of the total run-time with growing problem size. A s the TSP can be seen as a typical representative of permutation problems, it i s assumed that the combination of genetic and associative processing is suitable for similar applications

What computer architecture can learn from computational intelligence-and vice versa

This paper considers whether the seemingly disparate fields of Computational Intelligence (CI) an... more This paper considers whether the seemingly disparate fields of Computational Intelligence (CI) and computer architecture can profit from each others' principles, results and experience. In the process, we identify important common issues, such as parallelism, distribution of data and control, granularity and regularity. We present two novel computer architectures which have profited from principles found in CI, and identify two constraints on CI to eliminate the hidden influence of the von Neumann model of computation.

ADARC: a fine grain dataflow architecture with associative communication network

... Current VLSI technologies support the design of VLSI building-blocks for a modular con-struct... more

The AM3Associative Processor

Semi-Symbolic Modeling and Analysis of Noise in Heterogeneous Systems

Forum on specification and Design Languages, 2004

The article describes semi-symbolic methods for the analysis of control and signal processing sys... more The article describes semi-symbolic methods for the analysis of control and signal processing systems, including static and dynamic uncertainties. This above mentioned semi-symbolic description of uncertainties is based upon affine arithmetic. A short introduction to affine arithmetic is given. As affine arithmetic is only able to describe static uncertainties, an extension for effects of dynamic uncertainties is described and its feasibility is demonstrated by an example that delivers the frequency dependent noise of an output stage of a delta-sigma converter.

The SDAARC architecture

While traditional parallel computing systems are still struggling to gain a wider acceptance, the... more While traditional parallel computing systems are still struggling to gain a wider acceptance, the largest parallel computer that has ever been available is currently growing with the communication resource Internet. Unfortunately it is also rarely used in the parallel computation field. The reason for the rejection of parallel computers is mainly the difficulty of parallel programming. In this paper we propose the Self Distributing Associative ARChitecture (SDAARC). It has been derived from the Cache Only Memory Architecture (COMA). COMAs provide a distributed shared memory (DSM) with automatic distribution of data. We show how this paradigm of data distribution can be extended to the automatic distribution of instruction sequences (microthreads). We show how microthreads can be extracted from legacy C code to produce code that can automatically be parallelized by SDAARC at run time. We also discuss how SDAARC can be implemented on a tightly coupled multiprocessor system, on heterogenous LAN based computer networks (Intranet) and on WANs of computing resources.

Physikalischer Entwurf

Großintegration

VHDL-Synthese

Analog/Digital- und Digital/Analog-Umsetzung

In den vorangegangenen Kapiteln haben wir Konzepte und Bausteine zur Realisierung digitaler Steue... more In den vorangegangenen Kapiteln haben wir Konzepte und Bausteine zur Realisierung digitaler Steuerwerke, Operationswerke und Prozessoren kennengelernt. In vielen Fallen dienen diese Schaltwerke zur Verarbeitung von Meswerten. Nun sind die in der Natur auftretenden physikalischen Grosen (z.B. Druck, Spannung, Temperatur) in der Regel analoge Signale, die nicht direkt von digital arbeitenden Systemen ubernommen werden konnen. Die analogen Signale mussen daher entweder zur Verarbeitung digitalisiert werden oder in umgekehrter Richtung wieder in analoge Signale umgesetzt werden. Die digitale Darstellung analoger Meswerte sowie ihre digitale Verarbeitung bieten in vielen Anwendungen eine Reihe wesentlicher Vorteile.

Special section on associative processors and memories

IEE proceedings, 1989

Modellierung des Implementierungsraumes im Analog/Digital Co-Design

MBMV, 2000

Top-Down Design analog/digitaler Systeme mit SystemC-AMS

MBMV, 2007

Wegen des großen Anteils an Software werden eingebettete Hardware/Software Systeme zunehmend basi... more Wegen des großen Anteils an Software werden eingebettete Hardware/Software Systeme zunehmend basierend auf C/C++ entwickelt. Die Modellierung von Hardware wird dabei durch Spracherweiterungen wie SystemC unterstützt. Eingebettete Systeme umfassen neben digitalen Komponenten in zunehmendem Maße auch analoge Komponenten. Dieser Beitrag gibt einen Überblick über die im Rahmen der OSCI SystemC-AMS Working Group entwickelten Erweiterungen zur Modellierung analog/digitaler Systeme. Darüber hinaus zeigt er, wie polymorphe Signale den Top-Down Entwurf und die Mixed-Level Simulation in einem heterogenen Top-Down Designflow unterstützen. 1 Dieser Beitrag wurde im Rahmen des BMBF/edacentrum-Projekt SAMS unter Förderkennzeichen 01M3070D und des EU-Projekt ANDRES (IST-5-033511) unterstützt.

Aktivierung und Zuordnung von Kooperierenden Prozessen im Assko-Mehrprozessorsystem

Informatik-Fachberichte, 1980

Echt parallel ablaufende Prozesse in einer modularen, symmetrischen Mehrprozessorumgebung bedurfe... more Echt parallel ablaufende Prozesse in einer modularen, symmetrischen Mehrprozessorumgebung bedurfen einer konfliktfreien wechselseitigen Abstimmung. Durch die Separation und Abspeicherung aller hierfur notwendigen, globalen Synchronisationsvariablen in einem modularen Assoziativspeicher-Koordinatorsystem (ASSKO) konnen die Prozesse und Betriebsmittel des Mehrprozessor-Systems zu einem effektiven Gesamtwirken gebracht werden. In diesem Beitrag wird der Einsatz dieser Synchronisationsmittel bei der Programmierung von gleichzeitigen, kooperierenden und zueinander asynchron startenden Prozessen angegeben.

A Virtual Layer for FPGA Based Parallel Systems (MP-SoCs)

Dagstuhl Seminar Proceedings, 2008

Die Organisation der Selbstverwaltung der Prozessormodule

Die Prozessormodule des ASSKO-Mehrprozessorsystems sind als autonome, kooperationsfahige, aktive ... more Die Prozessormodule des ASSKO-Mehrprozessorsystems sind als autonome, kooperationsfahige, aktive Betriebsmittel zu betrachten, die gleichberechtigt an der Ausfuhrung eines zerlegbaren Rechenproblems mitwirken konnen. Jedes Rechenproblem mus, damit es auf einem Mehrprozessorsystem zum Ablauf gebracht werden kann, neben seiner maschinengerechten Obersetzung noch in seine sequentiell unabhangig ausfuhrbaren Teilaufgaben zergliedert werden.

Technische Realisierung des ASSKO-Systems

In diesem Abschnitt wird der schaltungstechnische Aufbau des ASSKO-Systems anhand einiger Fotogra... more

Lösung rechenaufwendiger Probleme

Automatic scheduling for cache only memory architectures

For parallel and distributed systems to gain wider acceptance than they have to date, they must b... more For parallel and distributed systems to gain wider acceptance than they have to date, they must become significantly easier to program. Fundamentally, parallel programming is more difficult than sequential programming as long as data and computation must be distributed by the programmer. Cache Only Memory Architectures (COMAs) provide a Distributed Shared Memory (DSM) where data distribution is performed automatically and transparently. This paper generalizes this idea to achieve the same distribution for computation, thus arriving at an automatic and transparent form of scheduling. Where COMA literature normally makes no assumptions concerning the parallel programs which use the DSM, we use special compiler techniques originally developed for multithreaded and dataflow architectures. Having done so, we can specify ways of significantly simplifying the basic COMA coherency protocols, while at the same time enabling automatic, transparent, adaptive run-time scheduling.

Pipelining and parallel training of neural networks on distributed-memory multiprocessors

This paper presents a parallel neural network simulator, implemented on a Parsytec Multicluster2 ... more This paper presents a parallel neural network simulator, implemented on a Parsytec Multicluster2 transputer system. In practical use, neural networks often employ the backpropagation learning rule, as this supervised learning method can be applied to a wide field of recognition problems. The authors focus on the acceleration of backpropagation learning by combining pipelining and parallel training methods. The pipelining model

Genetic processing of the traveling salesman problem with the associative architecture AM/sup 3/

This paper presents a genetic solvrng izpproach to the "raveling Salesman Problem (TSP), which ca... more This paper presents a genetic solvrng izpproach to the "raveling Salesman Problem (TSP), which can be sign$contly accelerated by using an associative processor architecture, called the A M 3 . To compile the genetic TSP ulgorithm, a C' + programming encironment contaitriny (in associative object library 2s needed as well as a n A M 3 code interpreter to count machine instructions [IO]. Further, a recombination operator, known in the literature as "Partially Mapped Crossover'' (PMX), is employed b?]. The associative character of this operator makes it possible lo reduce its time complexity from quadratic to linear (from O ( n 2 ) to O ( n ) ) . This reduction 2,s noticeable in practice, since genetical recombination demands an increasing portion of the total run-time with growing problem size. A s the TSP can be seen as a typical representative of permutation problems, it i s assumed that the combination of genetic and associative processing is suitable for similar applications

What computer architecture can learn from computational intelligence-and vice versa

This paper considers whether the seemingly disparate fields of Computational Intelligence (CI) an... more This paper considers whether the seemingly disparate fields of Computational Intelligence (CI) and computer architecture can profit from each others' principles, results and experience. In the process, we identify important common issues, such as parallelism, distribution of data and control, granularity and regularity. We present two novel computer architectures which have profited from principles found in CI, and identify two constraints on CI to eliminate the hidden influence of the von Neumann model of computation.

ADARC: a fine grain dataflow architecture with associative communication network

... Current VLSI technologies support the design of VLSI building-blocks for a modular con-struct... more

The AM3Associative Processor

Semi-Symbolic Modeling and Analysis of Noise in Heterogeneous Systems

Forum on specification and Design Languages, 2004

The article describes semi-symbolic methods for the analysis of control and signal processing sys... more The article describes semi-symbolic methods for the analysis of control and signal processing systems, including static and dynamic uncertainties. This above mentioned semi-symbolic description of uncertainties is based upon affine arithmetic. A short introduction to affine arithmetic is given. As affine arithmetic is only able to describe static uncertainties, an extension for effects of dynamic uncertainties is described and its feasibility is demonstrated by an example that delivers the frequency dependent noise of an output stage of a delta-sigma converter.

The SDAARC architecture

While traditional parallel computing systems are still struggling to gain a wider acceptance, the... more While traditional parallel computing systems are still struggling to gain a wider acceptance, the largest parallel computer that has ever been available is currently growing with the communication resource Internet. Unfortunately it is also rarely used in the parallel computation field. The reason for the rejection of parallel computers is mainly the difficulty of parallel programming. In this paper we propose the Self Distributing Associative ARChitecture (SDAARC). It has been derived from the Cache Only Memory Architecture (COMA). COMAs provide a distributed shared memory (DSM) with automatic distribution of data. We show how this paradigm of data distribution can be extended to the automatic distribution of instruction sequences (microthreads). We show how microthreads can be extracted from legacy C code to produce code that can automatically be parallelized by SDAARC at run time. We also discuss how SDAARC can be implemented on a tightly coupled multiprocessor system, on heterogenous LAN based computer networks (Intranet) and on WANs of computing resources.

Physikalischer Entwurf

Großintegration

VHDL-Synthese

Analog/Digital- und Digital/Analog-Umsetzung

In den vorangegangenen Kapiteln haben wir Konzepte und Bausteine zur Realisierung digitaler Steue... more In den vorangegangenen Kapiteln haben wir Konzepte und Bausteine zur Realisierung digitaler Steuerwerke, Operationswerke und Prozessoren kennengelernt. In vielen Fallen dienen diese Schaltwerke zur Verarbeitung von Meswerten. Nun sind die in der Natur auftretenden physikalischen Grosen (z.B. Druck, Spannung, Temperatur) in der Regel analoge Signale, die nicht direkt von digital arbeitenden Systemen ubernommen werden konnen. Die analogen Signale mussen daher entweder zur Verarbeitung digitalisiert werden oder in umgekehrter Richtung wieder in analoge Signale umgesetzt werden. Die digitale Darstellung analoger Meswerte sowie ihre digitale Verarbeitung bieten in vielen Anwendungen eine Reihe wesentlicher Vorteile.

Special section on associative processors and memories

IEE proceedings, 1989

Modellierung des Implementierungsraumes im Analog/Digital Co-Design

MBMV, 2000

Top-Down Design analog/digitaler Systeme mit SystemC-AMS

MBMV, 2007

Wegen des großen Anteils an Software werden eingebettete Hardware/Software Systeme zunehmend basi... more Wegen des großen Anteils an Software werden eingebettete Hardware/Software Systeme zunehmend basierend auf C/C++ entwickelt. Die Modellierung von Hardware wird dabei durch Spracherweiterungen wie SystemC unterstützt. Eingebettete Systeme umfassen neben digitalen Komponenten in zunehmendem Maße auch analoge Komponenten. Dieser Beitrag gibt einen Überblick über die im Rahmen der OSCI SystemC-AMS Working Group entwickelten Erweiterungen zur Modellierung analog/digitaler Systeme. Darüber hinaus zeigt er, wie polymorphe Signale den Top-Down Entwurf und die Mixed-Level Simulation in einem heterogenen Top-Down Designflow unterstützen. 1 Dieser Beitrag wurde im Rahmen des BMBF/edacentrum-Projekt SAMS unter Förderkennzeichen 01M3070D und des EU-Projekt ANDRES (IST-5-033511) unterstützt.

Aktivierung und Zuordnung von Kooperierenden Prozessen im Assko-Mehrprozessorsystem

Informatik-Fachberichte, 1980

Echt parallel ablaufende Prozesse in einer modularen, symmetrischen Mehrprozessorumgebung bedurfe... more Echt parallel ablaufende Prozesse in einer modularen, symmetrischen Mehrprozessorumgebung bedurfen einer konfliktfreien wechselseitigen Abstimmung. Durch die Separation und Abspeicherung aller hierfur notwendigen, globalen Synchronisationsvariablen in einem modularen Assoziativspeicher-Koordinatorsystem (ASSKO) konnen die Prozesse und Betriebsmittel des Mehrprozessor-Systems zu einem effektiven Gesamtwirken gebracht werden. In diesem Beitrag wird der Einsatz dieser Synchronisationsmittel bei der Programmierung von gleichzeitigen, kooperierenden und zueinander asynchron startenden Prozessen angegeben.

A Virtual Layer for FPGA Based Parallel Systems (MP-SoCs)

Dagstuhl Seminar Proceedings, 2008

Die Organisation der Selbstverwaltung der Prozessormodule

Die Prozessormodule des ASSKO-Mehrprozessorsystems sind als autonome, kooperationsfahige, aktive ... more Die Prozessormodule des ASSKO-Mehrprozessorsystems sind als autonome, kooperationsfahige, aktive Betriebsmittel zu betrachten, die gleichberechtigt an der Ausfuhrung eines zerlegbaren Rechenproblems mitwirken konnen. Jedes Rechenproblem mus, damit es auf einem Mehrprozessorsystem zum Ablauf gebracht werden kann, neben seiner maschinengerechten Obersetzung noch in seine sequentiell unabhangig ausfuhrbaren Teilaufgaben zergliedert werden.

Technische Realisierung des ASSKO-Systems

In diesem Abschnitt wird der schaltungstechnische Aufbau des ASSKO-Systems anhand einiger Fotogra... more