0% found this document useful (0 votes)

71 views3 pages

A Study of Heuristically-Based Parametric Performance Optimization Algorithms For Bigdata Computing

This document discusses various algorithms for optimizing the performance of MapReduce computing on Hadoop platforms. It proposes algorithms ranging from an exhaustive method to random and heuristically-based greedy methods. Experimental results show the naive exhaustive method finds the best performance but is infeasible due to its exponential time complexity, while the random and heuristic methods have better time complexity and find reasonably good performance improvements. The heuristic algorithm is found to provide a good balance of performance gains and optimization time.

Uploaded by

tester testgin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views3 pages

A Study of Heuristically-Based Parametric Performance Optimization Algorithms For Bigdata Computing

Uploaded by

tester testgin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

A Study of Heuristically-based Parametric Performance Optimization

Algorithms for BigData Computing

Jongyeop Kim ( 99999)

Departement of Math and Computer Science, Southern Arkansas University
100 E University, Magnolia, AR 71753, USA.
[email protected]

Abstract. Performance optimization for mapreduce computing in Hadoop platform is a tedious yet
challenging problem due to the complexity of system organization with an extensive list of configuration
parameters to be considered. In order to address and resolve this problem, various parameter optimization
algorithms are proposed in this research from a naive exhaustive method to a random and a couple of
heuristically-based greedy methods to vie with the exponentially nature of the search process for the possible
best parameter setting. In the course of exercising those algorithms, there are a few variables to be taken into
consideration in order to make each algorithm be a viable option for the given other variables provided such
as degree of the arity of each parameter that determines the degree of the base of the search process time,
sampling methods in order to relax the complexity of the search process under control or the budget.
Keywords: Big Data, Hadoop, Configuration, Performance tuning.

1. Introduction

This research concerns about the performance optimization or improvement of computation running on top of
the platform, namely Apache Hadoop in particular. However, the methods and algorithms proposed in this
research supposed to be applicable to other platforms as well without loss of generality. The diagram below [1]
shows the computation flow on Hadoop, which primarily consists of one name node and multiple slave nodes.
A MapReduce process [2] is composed of two primary functions as follows: 1) Map function: takes a set of
input/key value pairs and produces a set of intermediate ouput key/value pairs; 2) Reduce function: takes
intermediate key2 and a set of values for the key as illustrated in the following. Map (Key1, Value1) -> List
(Key2, Value 2) Reduce (Key2, List(Value2)) -> List(Value2

2. E-R Modeling

A naïve exhaustive method [7] has been proposed as an initial attempt to the problem of this research and its
pros and cons are summarized in the following table. The pros of this approach is simple and exhaustive. Cons
of this approach is, as more parameters are considered, the benchmark time grows exponentially. And the
method is incapable of further optimization due to limited range of parameter values. The cost of this approach
is asymptotically a^p, where a is the number of values of parameter under consideration and p is the number of
parameters. Thus, the cost of the method is definitely upper bounded by exponential time.

3. Logical Database Design and relation model

In this section, it is explained the results of research and at the same time is given the comprehensive discussion.
Results can be presented in figures, graphs, tables and others that make the reader understand easily. The
discussion can be made in several sub-chapters.
A naïve exhaustive method has been proposed and a 2-Phased Exhaustive Sorting-based algorithm has been
developed and implemented [7]. The algorithm consists of two phases of sorting: the first phase sorting is
performed in order to obtain an extended list (e.g., top 50 parameter settings) of top performing parameter
settings for a rough sorting against a small size of data; and then the second phase sorting is performed for
1
detailed sorting against a large size of data instead for a short list (e.g., top 5 out of the above 50) of top
performing parameter settings. The flow of the approach [8] is shown in the figure below.

4. Physical Database Design

The random method [8] performs r rounds of random sampling and each round samples a number of values
from the given range that is supposed to be greater than the a in the naïve exhaustive. The following is a pseudo
code for the main routine to implement the proposed random algorithm. Each of the p parameters samples one
from a number of values during each of the r rounds of selection.

5. Implementation methodology

Flow diagram of the proposed heuristically-based algorithm is shown below. During the parameter generation
step (B) p number of parameters are selected. With those parameters, TeraSort benchmark program is executed
and during (C) Get results function, the results are collected and saved into the Omega space (i), During (D), the
best performing parameter setting is selected and fixed out in a greedy manner. This iterates p number of
rounds.

6. Experimental Results

Experimental Results Ω (0)

• i.s.r.p decreased by -20% and fixed out

• CPU time reduced by -17.68%
• Cost : a * p ( a=18, p =7 )

Experimental Results Ω (1)

• m.j.s.m.p decreased by -40% and fixed out
• CPU time reduced by -16.73%
• Cost : a * (p -1) ( a=18, p =7 )

Ω(1) % ParIdx P(0) P(1) P(2) P(3) P(4) P(5) P(6) CPU diff
+ 40 4 100 0.04 0.8 10 0.42 0.66 0.7 40140 1290.71
- 40 5 100 0.04 0.8 10 0.1 0.48 0.7 32350 -6499.3
+ 40 5 100 0.04 0.8 10 0.1 0.64 0.7 41220 2370.71

2
Ω (1) -90% -80% -70% -60% -50% -40% -30% -20% -10%
i.s.m 2.86 6.8 3.4 6.44 6.82 6.31 8.57 -13.64 5.54
i.s.s.p -13.87 6.31 8.68 4.25 2.53 7.7 5.69 6.95 -14.28
i.s.f -12.28 6.93 8.34 0.93 5.15 4.76 1.14 6.62 5.79
m.j.r.i.b.f 3.99 6.28 2.6 2.32 6.31 4.66 2.06 6.62 9.4
m.j.s.m.p 7.21 -14.26 5.92 6.8 7.6 -16.73 4.48 3.04 -9.39
m.j.s.i.b.p 5.77 2.91 8.09 -12.56 6.64 -10.35 6.98 -10.4 6.8

Ω (1) 10% 20% 30% 40% 50% 60% 70% 80% 90%
i.s.m 4.4 9.11 -13.2 5.54 3.94 4.71 -14.67 7.67 8.24
i.s.s.p 6.72 -16.68 2.96 5.25 9.19 -11.58 8.75 -16.47 3.61
i.s.f -12.76 6.08 2.06 6.64 4.07 5.85 -12.82 7.13 -10.42
m.j.r.i.b.f 5.54 6.77 -13.05 3.32 6.54 5.56 8.37 3.5 5.31
m.j.s.m.p 7.34 -10.42 2.78 6.1 7.47 6.8 -11.86 -13.07 6.15
m.j.s.i.b.p 3.4 5.82 8.73 2.04 7.42 5.92 5.38 7.11 -10.71

7. Conclusion

This research has proposed various parametric performance optimization algorithms to achieve the possible best
performance in MapReduce computing on a Hadoop platform. There are three representative algorithms
proposed with their pros and cons concluded as follows:
- Naive Exhaustive: O (a^p) Infeasibly best performance yet very expensive
- Random: O (1*p*r) ≈ O(p^2) Feasibly moderate performance and inexpensive
- Heuristically-based: O (p^3) Feasibly great performance and reasonably inexpensive
Naive exhaustive parametric optimization took O(a^p) amount of time and the resulting performance of the
program is definitely supposed to be the best but benchmarking is infeasible and very expensive. Thus, this is
not considered as a viable solution.
Random parametric optimization took approximately quadratic time and it is considered as the least
inexpensive approach. However, the resulting performance of the benchmark is quite limited to the average.
Lastly, the proposed heuristically-based parametric optimization took slightly over the quadratic time or at
most the cubic time and the resulting performance is also promising with feasible optimization time. The
proposed heuristic employed a greedy approach, and the resulting performance is great instead of optimal since
it doesn’t guarantee the optimal performance

References

[1]."Hadoop Distributed File System Architecture." http://hadoop.apache.org/docs/current/

hadoop-project-dist/hadoop- hdfs/images/hdfsarchitecture.png. Web. 20 Jan. 2015.
[2]. Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS operating systems
review. Vol. 37. No. 5. ACM, 2003.
[3]. "The Fresh Open Source Software Archive." Http://fossies.org/linux/misc/hadoop-1.2.1.tar.gz/hadoop-
1.2.1/docs/mapred-default.html. Web. 20 Jan. 2015.
[4]. Noll, Michael G. "Benchmarking and stress testing a Hadoop cluster with TeraSort, TestDFSIO & co." 2011.4.
http://www. michael-noll. com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-
testdfsio-nnbench-mrbench (2011).
[5]. Hadoop 1.2.1 Documentation, mapred-dedault configuration parameters.
https://hadoop.apache.org/docs/r1.2.1/mapred-default.htm, Web, 20 Jan. 2015.
[6]. T. White, Hadoop: The definitive Guide. O’Reilly Media,Inc.,2012, pp 166-168.
[7]. Kim, Jongyeop, et al. "Performance evaluation and tuning for MapReduce computing in Hadoop distributed file
system." Industrial Informatics (INDIN), 2015 IEEE 13th International Conference on. IEEE, 2015.
[8]. Jongyeop Kim and Nohpill Park. "Identifiation of the Optimal Hadoop Configuration Parameter sets for MapReduce
Computing", 3rd International Conference on Applied Computing & Information Technology, July 2015.

Minor Research Project Report
No ratings yet
Minor Research Project Report
23 pages
2019, Sontakke - Optimization of Hadoop MapReduce Model in Cloud Computing Environment
No ratings yet
2019, Sontakke - Optimization of Hadoop MapReduce Model in Cloud Computing Environment
6 pages
Data Processing For Large Database Using Mapreduce Approach Using Apso
No ratings yet
Data Processing For Large Database Using Mapreduce Approach Using Apso
59 pages
Scheduling For Hadoop Cluster
No ratings yet
Scheduling For Hadoop Cluster
5 pages
CS-3032 (BD) - CS End April 2024
No ratings yet
CS-3032 (BD) - CS End April 2024
27 pages
Adaptive Parameter Control in EAs
No ratings yet
Adaptive Parameter Control in EAs
365 pages
Dynamic Data Placement in Hadoop
No ratings yet
Dynamic Data Placement in Hadoop
8 pages
Comparison and Selection
No ratings yet
Comparison and Selection
10 pages
Optimized Memory-Aware MapReduce Model
No ratings yet
Optimized Memory-Aware MapReduce Model
11 pages
COMP90055 Thesis
No ratings yet
COMP90055 Thesis
54 pages
Pugh 2022
No ratings yet
Pugh 2022
209 pages
MapReduce Performance Prediction
No ratings yet
MapReduce Performance Prediction
29 pages
A Comprehensive Study On Learning Strategies of Optimization Algorithms and Its Applications
No ratings yet
A Comprehensive Study On Learning Strategies of Optimization Algorithms and Its Applications
4 pages
Hadoop Based Feature Selection and Decision Making Models On Big Data
No ratings yet
Hadoop Based Feature Selection and Decision Making Models On Big Data
6 pages
Progressive Parametric Query Optimization
No ratings yet
Progressive Parametric Query Optimization
89 pages
PMIT 14th 181215 Research Project On Inner Map Combiner
No ratings yet
PMIT 14th 181215 Research Project On Inner Map Combiner
53 pages
IEEEBig Data 2020
No ratings yet
IEEEBig Data 2020
10 pages
Comparative Study of Data Mining Tools
No ratings yet
Comparative Study of Data Mining Tools
8 pages
Aiml Extras
No ratings yet
Aiml Extras
6 pages
Efficient Algorithm For Big Data Application
No ratings yet
Efficient Algorithm For Big Data Application
4 pages
StreamMining PDF
No ratings yet
StreamMining PDF
185 pages
Kou Rid 2015
No ratings yet
Kou Rid 2015
12 pages
Bigdata
No ratings yet
Bigdata
9 pages
Hive Optimization
No ratings yet
Hive Optimization
19 pages
Barbosa e Silva, 2013 Chapter 3
No ratings yet
Barbosa e Silva, 2013 Chapter 3
23 pages
StreamMining Manual
No ratings yet
StreamMining Manual
185 pages
Daa CT1
No ratings yet
Daa CT1
19 pages
Hadoop Admin
No ratings yet
Hadoop Admin
13 pages
M.Tech - Dissertation Presentation
No ratings yet
M.Tech - Dissertation Presentation
28 pages
Main
No ratings yet
Main
2 pages
Om Al4
No ratings yet
Om Al4
12 pages
DAA Unit 1
No ratings yet
DAA Unit 1
5 pages
Online Interactive Data Mining Tool: Sciencedirect
No ratings yet
Online Interactive Data Mining Tool: Sciencedirect
6 pages
EEX5362-2021 Case Study 519222428 PERFORMANCE MODELING
No ratings yet
EEX5362-2021 Case Study 519222428 PERFORMANCE MODELING
8 pages
Dynamic Resource Management Algorithms For Complex Systems and No
No ratings yet
Dynamic Resource Management Algorithms For Complex Systems and No
240 pages
CSCE626 Amato LN PerformanceAnalysisMethodology
No ratings yet
CSCE626 Amato LN PerformanceAnalysisMethodology
19 pages
Analyzing and Processing Data Faster Bas PDF
No ratings yet
Analyzing and Processing Data Faster Bas PDF
6 pages
Simwave Assessment
No ratings yet
Simwave Assessment
7 pages
CPU Scheduling Algorithmsreport
No ratings yet
CPU Scheduling Algorithmsreport
32 pages
Uncertainty-Wise Requirements Prioritization With Search
No ratings yet
Uncertainty-Wise Requirements Prioritization With Search
55 pages
Multi-Criteria Optimization in Embedded Systems
No ratings yet
Multi-Criteria Optimization in Embedded Systems
114 pages
Practical Design Space Exploration: 1 Luigi Nardi 2 David Koeplinger 3 Kunle Olukotun
No ratings yet
Practical Design Space Exploration: 1 Luigi Nardi 2 David Koeplinger 3 Kunle Olukotun
12 pages
Hadoop Cost-Effective Scheduling
No ratings yet
Hadoop Cost-Effective Scheduling
10 pages
Big Data and Hadoop
No ratings yet
Big Data and Hadoop
8 pages
Batch34 - Diabetis Prediction ML - Formatted
No ratings yet
Batch34 - Diabetis Prediction ML - Formatted
81 pages
United States Patent: Shah-Hosseini
No ratings yet
United States Patent: Shah-Hosseini
28 pages
Key Ideas Behind Mapreduce 3. What Is Mapreduce? 4. Hadoop Implementation of Mapreduce 5. Anatomy of A Mapreduce Job Run
No ratings yet
Key Ideas Behind Mapreduce 3. What Is Mapreduce? 4. Hadoop Implementation of Mapreduce 5. Anatomy of A Mapreduce Job Run
27 pages
Scheduling SC CamraReady PDF
No ratings yet
Scheduling SC CamraReady PDF
16 pages
Adaptive Processing of User-Defined Aggregates in Jaql: Andrey Balmin Vuk Ercegovac Rares Vernica Kevin Beyer
No ratings yet
Adaptive Processing of User-Defined Aggregates in Jaql: Andrey Balmin Vuk Ercegovac Rares Vernica Kevin Beyer
8 pages
Parallel Computing Optimization
No ratings yet
Parallel Computing Optimization
16 pages
HRA: A Multi-Criteria Framework For Ranking Metaheuristic Optimization Algorithms
No ratings yet
HRA: A Multi-Criteria Framework For Ranking Metaheuristic Optimization Algorithms
13 pages
Benchmarking in Optimization - Best Practice and Open Issues
No ratings yet
Benchmarking in Optimization - Best Practice and Open Issues
50 pages
120-Article Text-230-2-10-20201221
No ratings yet
120-Article Text-230-2-10-20201221
8 pages
HealthHub: Hospital Management System
No ratings yet
HealthHub: Hospital Management System
34 pages
Experiment 1: Installation of WEKA Tool Aim
No ratings yet
Experiment 1: Installation of WEKA Tool Aim
19 pages
DOCUMENTATION
No ratings yet
DOCUMENTATION
54 pages
CCNA 2 SRWE Practice PT Skills Assessment (PTSA) - Part 1 Answers
No ratings yet
CCNA 2 SRWE Practice PT Skills Assessment (PTSA) - Part 1 Answers
1 page
LM-2-Scenarios To Create Native App - Tools For Creating Native App
No ratings yet
LM-2-Scenarios To Create Native App - Tools For Creating Native App
18 pages
Mix Signal Userguide1 55
No ratings yet
Mix Signal Userguide1 55
1 page
OSY - Final Project (22516) Microproject
No ratings yet
OSY - Final Project (22516) Microproject
12 pages
ECE 7th Sem 2023 Results
No ratings yet
ECE 7th Sem 2023 Results
1 page
Datasheet Axis p3715 Plve Network Camera en US 359735
No ratings yet
Datasheet Axis p3715 Plve Network Camera en US 359735
3 pages
Module 01
100% (1)
Module 01
47 pages
MongoDB Basics Quiz
No ratings yet
MongoDB Basics Quiz
23 pages
Chapter - 1 - Bais
No ratings yet
Chapter - 1 - Bais
27 pages
Robotics 2.3
No ratings yet
Robotics 2.3
12 pages
Recent CyberAttacks
No ratings yet
Recent CyberAttacks
4 pages
Explore KNX and DALI Integration With NETx Software
No ratings yet
Explore KNX and DALI Integration With NETx Software
17 pages
Advanced Mobile Programming Practical Guide
No ratings yet
Advanced Mobile Programming Practical Guide
19 pages
10-kW, Three-Level, Three-Phase T-Type Inverter Reference Design For Solar String Inverters
No ratings yet
10-kW, Three-Level, Three-Phase T-Type Inverter Reference Design For Solar String Inverters
38 pages
Spark Architecture and Deploy Modes
No ratings yet
Spark Architecture and Deploy Modes
22 pages
Chung Hsing 2021 - Assignment09
No ratings yet
Chung Hsing 2021 - Assignment09
1 page
Sg2002 TRM en
No ratings yet
Sg2002 TRM en
918 pages
Veeam Backup 12 Agent Management Guide PG
No ratings yet
Veeam Backup 12 Agent Management Guide PG
686 pages
Chopra Rajiv Object Oriented Programming Using Scala and Java
No ratings yet
Chopra Rajiv Object Oriented Programming Using Scala and Java
21 pages
ADA MID SEM QUESTION BANK - Final
No ratings yet
ADA MID SEM QUESTION BANK - Final
2 pages
Industrial HMI Installation Guide
No ratings yet
Industrial HMI Installation Guide
8 pages
PySpark Code Quality Guide
No ratings yet
PySpark Code Quality Guide
4 pages
Simple Codes
No ratings yet
Simple Codes
6 pages
Engineering Systems Assignment
No ratings yet
Engineering Systems Assignment
12 pages
LDPC
No ratings yet
LDPC
5 pages
Lattice FPGA Programming and Architecture
No ratings yet
Lattice FPGA Programming and Architecture
23 pages
R203205C
No ratings yet
R203205C
7 pages
Cloud Computing Privacy Survey
No ratings yet
Cloud Computing Privacy Survey
33 pages
5th International Conference On IOT, Big Data and Security (IOTBS 2025)
No ratings yet
5th International Conference On IOT, Big Data and Security (IOTBS 2025)
3 pages
Flutter & Dart Complete Guide 2020
No ratings yet
Flutter & Dart Complete Guide 2020
16 pages

A Study of Heuristically-Based Parametric Performance Optimization Algorithms For Bigdata Computing

Uploaded by

A Study of Heuristically-Based Parametric Performance Optimization Algorithms For Bigdata Computing

Uploaded by

A Study of Heuristically-based Parametric Performance Optimization

Algorithms for BigData Computing

Jongyeop Kim ( 99999)

3. Logical Database Design and relation model

4. Physical Database Design

Experimental Results Ω (0)

• i.s.r.p decreased by -20% and fixed out

Experimental Results Ω (1)

[1]."Hadoop Distributed File System Architecture." http://hadoop.apache.org/docs/current/

You might also like