Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1999, Lecture Notes in Computer Science
This paper describes a system (ParTS) for automatic hardware/software partitioning of applications described in the concurrent programming language occam. Based on algebraic transformations of occam programs, the strategy guarantees, by construction, that the partitioning process preserves the semantics of the original description. ParTS has been developed as an extension of OTS-a tool implemented at Oxford University which allows one to apply basic algebraic laws to an occam program in an interactive way. ParTS extends OTS with elaborate transformation rules which are necessary for carrying out partitioning automatically. To illustrate the partitioning methodology and our system, a convolution program is used as a case study.
Formal Methods in System Design, 2004
A crucial point in hardware/software co-design is how to perform the partitioning of a system into hardware and software components. Although several algorithms to partitioning have been recently proposed, the formal verification of the partitioning procedure is an emergent research topic. In this paper we present an innovative and automatic approach to partitioning with emphasis on correctness. The formalism used is occam and the algebraic laws that define its semantics. In the proposed approach, the partitioning procedure is characterised as a program transformation task and the partitioned system is derived from the original description of the system by applying transformation rules, all of them proved from the basic laws of occam. A tool has been developed to allow the partitioning to be carried out automatically. The entire approach is illustrated here through a small case study.
2000
Hardware and software co-design is a design technique which delivers computer systems comprising hardware and software components. A critical phase of co-design process is to decompose a program into hardware and software. This paper proposes an algebraic partitioning method whose correctness is verified in the algebra of programs. We introduce the program analysis phase before program partitioning and develop a collection of syntax-based splitting rules,
Anais do IV Simpósio Brasileiro de Arquitetura de Computadores e Processamento de Alto Desempenho (SBAC-PAD 1992)
For many years software engineers have been providing the engineers of other fields with advanced design tools. But the tools used by software designers themselves look quite primitive in comparison. Only CASE-like systems developed during last years can provide reasonable help to software developers. Experimental software development systems ALADDIN/LAMP is oriented to creation of distributed computer control systems (DCCS). Proposed approach to development distributed software configurations (DSCs) is based on the model of virtual distributed software configuration (VDSC) with information-transport ports (ITPs) as interconnecting servers. Transputer networks and OCCAM-written software are often used for creation of DCCS. An approach to ALADDIN/LAMP tools-extention for OCCAM-written DSCs handling is discussed in this paper. Survey of tools is presented (OCCAM-oriented structure editor OSE, OCCAM structure extractor OSX, ALADDIN/OOB translator and deadlock locator and analyser DLA)....
Correct Hardware Design and Verification Methods, Proc. IFIP WG10.2 Advanced Research Working Conference, CHARME '93, 1993
This paper shows how to compile a program written in a subset of occam into a normal form suitable for further processing into a netlist of components which may be loaded into a Field-Programmable Gate Array (FPGA). A simple state-machine model is adopted for specifying the behaviour of a synchronous circuit where the observable includes the state of the control path and the data path of the circuit. We identify the behaviour of a circuit with a program consisting of a very restricted subset of occam. Algebraic laws are used to facilitate the transformation from a program into a normal form. The compiling specfii cation is presented as a set of theorems that must be proved correct with respect to these laws. A rapid prototype compiler in the form of a logic program may be implemented from these theorems.
Electronic Notes in Theoretical Computer Science, 2004
The focus of this work is hardware/software partitioning verification. The approach uses occam as specification and reasoning language. The partitioned system is derived from the original description of the system by applying transformation rules, all of them proved from the basic laws of occam. The aim of this work is to show how the rewriting system CafeOBJ can be used to automatically prove the partitioning rules, as well as to implement the reduction strategy that guides the application of these rules. In this way, rewriting systems can be regarded as supporting tools for the construction of partitioning environments, whose emphasis is correctness.
Proceedings of the 6th ACM conference on Computing frontiers - CF '09, 2009
There is a trend towards using accelerators to increase performance and energy efficiency of general-purpose processors. Adoption of accelerators, however, depends on the availability of tools to facilitate programming these devices.
2003
We propose in this paper an algebraic approach to hard-ware/software partitioning in Verilog Hardware Description Language (HDL). We explore a collection of algebraic laws for Verilog programs, from which we design a set of syntax-based algebraic rules to conduct hardware/software partitioning. The co-specification language and the target hardware and software description languages are specific subsets of Verilog.
In many application environments there is a need to able to allocate resources dynamically. The occam programming language has no concept of such dynamic allocation. This paper shows how dynamic allocation can be incorporated into occam without sacrificing the benefits which are obtained using static allocation strategies. This capability has been achieved by a simple extension to occam3 to include a TASK mechanism which is defined using existing occam3 features. The main component of the dynamic mechanism is a library which manages the allocation and de-allocation of tasks. The paper includes an example showing how the dynamic allocation mechanism is employed. 1. Introduction The occam programming language is inherently a static language that does not permit the dynamic allocation of data structures and/or processes. From the philosophical point, where real-time systems are concerned, this is a justifiable position. However, from other viewpoints this position is less tenable. The ...
… , 1993, with the European Event in …, 1994
Efficient utilization of available resources is a key concept in embedded systems. This paper is focused on providing the support for managing dynamic reconfiguration of computing resources in the programming model. We present an approach to map occam-pi programs to a manycore architecture, Platform 2012 (P2012). We describe the techniques used to translate the salient features of the occam-pi language to the native programing model of the P2012 architecture. We present the initial results from a case study of matrix multiplication. Our results show the simplicity of occam-pi program by 6 times reduction in lines-of-code.
2007
We describe a compiler which maps programs expressed in a subset of occam into netlist descriptions of parallel hardware. Using Field-Programmable Gate Arrays to implement such netlists, problem-specific hardware can be generated entirely by a software process. Inner loops of time-consuming programs can be implemented as hardware and the less intensively-used parts of the program can be mapped into machine code by a conventional compiler. Software investment is protected since the same program can run entirely in software, entirely in hardware, or in a mixture of both. A single program can thus result in many implementations across a potentially wide costperformance range. The compilation system has been used to generate innerloops, hardware interfaces to real-world devices, systolic arrays, and complete microprocessors. In the near future we hope to have a proven version of the compiler, enabling us automatically to generate provably correct hardware implementations, including micr...
2010
The automatic parallelization of sequential applications is a great challenge for current compiler technology. The partitioning of a sequential application into parallel programs that can be executed concurrently on a given parallel architecture is a complex and time-consuming undertaking. In addition, the programmer is often responsible for defining a good partitioning that takes into account the properties of both the program and the architecture. This paper proposes a new fully automated partitioning algorithm driven by an intermediate representation of the sequential application in terms of the domain-independent concept-level kernels (e.g., induction, reduction, recurrence) recognized by the XARK compiler framework. Such kernel-centric view of the application hides the complexity of the implementation details (e.g., procedure calls, pointers, global variables, complex control flows) and provides robustness against different codification styles. For illustrative purposes, we use inter-procedural implementations of the Sobel edge filter and the EQUAKE application of SPEC CPU2000.
Parallel Computing, 1999
The OASys (Or/And SYStem) is a software implementation designed for AND/OR-parallel execution of logic programs. In order to combine these two types of parallelism, OASys considers each alternative path as a totally independent computation (leading to OR-parallelism) which consists of a conjunction of determinate subgoals (leading to AND-parallelism). This computation model is motivated by the need for the elimination of communication between processing elements (PEs). OASys aims towards a distributed memory architecture in which the PEs performing the OR-parallel computation possess their own address space while other simple processing units are assigned with AND-parallel computation and share the same address space. OASys execution is based on distributed scheduling which allows either recomputation of paths or environment copying. We discuss in detail the OASys execution scheme and we demonstrate OASys eectiveness by presenting the results obtained by a prototype implementation, running on a network of workstations. The results show that speedup obtained by AND/OR-parallelism is greater than the speedups obtained by exploiting AND or OR-parallelism alone. In addition, comparative performance measurements show that copying has a minor advantage over recomputation. Ó 1999 Elsevier Science B.V. All rights reserved.
1995
The software crises is defined as the inability to meet the demands for new software systems, due to the slow rate at which systems can be developed. To address the crisis, object-based design and implementation techniques and domain models have been developed. However, object-based techniques do not address an additional problem that plagues systems engineers-the effective utilization of distributed and parallel hardware platforms. This problem is partly addressed by program partitioning languages that allow engineers to specify how software components should be partitioned and assigned to the nodes of concurrent computers. However, very little has been done to automate the tasks of partitioning and assignment at the task and object level of granularity. Thus, this paper describes automated techniques for distributed/parallel configuration of object-based applications, and demonstrates the technique on Ada programs. The granularity of partitioning is at the level of the Ada program unit (a program unit is an object, a class, a task, a package (possibly a generic template) or a subprogram). The partitioning is performed by constructing a call-rendezvous graph (CRG) for an application program. The nodes of the graph represent the program units, and the edges denote call and task interaction/rendezvous relationships. The CRG is augmented with edge weights depicting inter-program-unit communication relationships and concurrency relationships, resulting in a weighted CRG (WCRC). The partitioning algorithm repeatedly “cuts” edges of the WCRG with the goal of producing a set of partitions among which (1) there is a small amount of communication and (2) there is a large degree of potential for concurrent execution. Following the partitioning of the WCRG into tightly coupled clusters, a random neural network is employed to assign clusters to physical processors
Lecture Notes in Computer Science, 2004
We present a new implementation of the old Occam language, using Microsoft .NET as the target platform. We show how Occam can be used to develop cluster and grid applications, and how such applications can be deployed. In particular, we discuss automatic placement of Occam processes onto processing nodes.
Microprocessors and Microsystems, 1989
OCCAM is a message-based parallel language that allows programs to be written with a large number of living processes. The transputer, which supports OCCAM very efficiently, allows parallel systems to be built in a simple and modular way. OCCAM and the transputer were chosen for the implementation and evaluation of a parallel PROLOC interpreter made up of a set of parallel processes communicating according to a message-passing protocol The parallel execution model adopted exploits full OR parallelism and pipeline AND parallelism, while preserving the depth-first search technique of the classical sequential model The aim of the implementation was to evaluate the actual degree of parallelism exploited by the execution model, and the efficiency of the used resolution algorithm. The number of living processes was found to be consistent with the number of processors in the architecture used (four), and the additional workload produced by the message exchanges between processes was found not to be excessively time consuming.
ArXiv, 2021
The generalizability of PBE solvers is the key to the empirical synthesis performance. Despite the importance of generalizability, related studies on PBE solvers are still limited. In theory, few existing solvers provide theoretical guarantees on generalizability, and in practice, there is a lack of PBE solvers with satisfactory generalizability on important domains such as conditional linear integer arithmetic (CLIA). In this paper, we adopt a concept from the computational learning theory, Occam learning, and perform a comprehensive study on the framework of synthesis through unification (STUN), a state-of-the-art framework for synthesizing programs with nested if-then-else operators. We prove that Eusolver, a state-of-the-art STUN solver, does not satisfy the condition of Occam learning, and then we design a novel STUN solver, PolyGen, of which the generalizability is theoretically guaranteed by Occam learning. We evaluate PolyGen on the domains of CLIA and demonstrate that PolyG...
1997
The need to declare channels as global objects of the applications implemented according to the Distributed Programming Occam-CSP model, limits the desirable reusability of the software-modules of the said applications. The integration of other languages that follow the OO paradigm is not easy either. In this paper, easy solutions are proposed to augment the reusability of languages that follow the existing model based on a class of modules called ODAs and on a previously existing language that implement the above-mentioned modules in a multicomputer based on a Transputer platform.
PROBLEMS IN PROGRAMMING, 2020
Methods and software tools for automated design and generation of OpenCL programs based on the algebra of algorithms are proposed. OpenCL is a framework for developing parallel software that executes across heterogeneous platforms consisting of general-purpose processors and/or hardware accelerators. The proposed approach consists in using high-level algebra-algorithmic specifications of programs represented in natural linguistic form and rewriting rules. The developed software tools provide the automated design of algorithm schemes based on a superposition of Glushkov algebra constructs that are considered as reusable components. The tools automatically generate code in a target programming language on the basis of the specifications. In most computing problems, a large part of hardware resources is utilized by computations inside loops, therefore the use of automatic parallelization of cyclic operators is most efficient for them. However, the existing automatic code parallelizing ...
2011
Recently we proposed occam-pi as a high-level language for programming coarse grained reconfigurable architectures. The constructs of occam-pi combine ideas from CSP and pi-calculus to facilitate expressing parallelism, communication, and reconfigurability. The feasability of this approach was illustrated by developing a compiler framework to compile occam-pi implementations to the Ambric architecture. In this paper, we demonstrate the applicability of occam-pi for programming an array of functional units, extreme Processing Platform (XPP). This is made possible by extending the compiler framework to target the XPP architecture, including automatic floating to fixed-point conversion. Different implementations of a FIR filter and a DCT algorithm were developed and evaluated on the basis of performance and resource consumption. The reported results reveal that the approach of using occam-pi to program the category of coarse grained reconfigurable architectures appears to be promising. The resulting implementations are generally much superior to those programmed in C and comparable to those hand-coded in the low-level native language NML.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.