Parallel and Distributed Computing Complete Notes
Parallel and Distributed Computing Complete Notes
Distributed Shared Memory simplifies programming by hiding the complexities of message passing and offers a larger virtual memory space by combining the memory of all nodes . However, it adds overhead due to the need to maintain data consistency across nodes and provides limited control over data placement and communication .
The Aurora Distributed Shared Data System is a software system meant to implement shared-data programming on distributed hardware systems, focusing on abstract data type utilization without needing special hardware . Conversely, the Aurora Supercomputer, being developed for Argonne National Laboratory, aims to achieve exascale computing, focusing on extreme computational power and performance to handle vast scientific workloads .
Scoped behavior optimizes communication patterns for specific needs, reducing communication overhead, while ADTs abstract implementation details, ensuring data integrity and security . This combination allows for fine-tuned control of data access and sharing in parallel systems, enhancing both performance and security by preventing unauthorized modifications and accidental data corruption .
Synchronous message passing requires the sending process to wait for the receiving process to acknowledge receipt of the message before continuing, facilitating coordinated communication but possibly increasing wait times . In contrast, asynchronous message passing allows the sending process to continue execution without waiting for acknowledgment, promoting independence and potentially improving performance but making synchronization more complex .
Challenges in using parallel search algorithms include managing coordination between tasks, dealing with load balancing among processors, ensuring correctness and completeness of search results, and efficiently handling data dependencies . These factors can complicate implementation and affect algorithm performance, particularly in highly complex data scenarios .
Threads are lightweight units of execution within a process that share the same memory space and resources, allowing for efficient communication and data access . This makes threads beneficial for tasks that require faster execution and efficient communication due to their shared access to data . In contrast, processes have separate memory spaces, requiring message passing for communication, which is less efficient but allows for greater separation and independence among concurrent tasks .
MPI provides benefits such as increased speed by utilizing multiple processors, allowing complex problems to be broken into manageable tasks, and solving them more efficiently . However, potential drawbacks include programming complexity and performance overhead associated with managing inter-process communication .
Shared-memory programming addresses synchronization and concurrency issues using synchronization tools like mutexes and semaphores to prevent race conditions . The memory model dictates how operations are perceived by threads, ensuring proper visibility and ordering, which is crucial for maintaining consistency and preventing data corruption in concurrent environments .
Parallel algorithms break down computational tasks into smaller, independent tasks that can be executed simultaneously . This increases efficiency by leveraging multiple processors to handle portions of the task concurrently, reducing total computation time and effectively utilizing hardware resources .
Frameworks like MapReduce and Hadoop facilitate distributed computing by providing structured methods for data distribution, task scheduling, and communication patterns . These frameworks abstract many complexities involved in managing distributed data processing, enabling efficient handling of large data sets on clusters, and streamlining the development process for applications in big data environments .