0% found this document useful (0 votes)

15 views9 pages

Google Architecture

The document outlines Google's architecture, emphasizing its scalability and efficiency in handling large data sets through systems like MapReduce and BigTable. It details the infrastructure, including the Google File System (GFS), which supports high reliability and performance, and discusses the use of commodity hardware for cost efficiency. Additionally, it highlights lessons learned and future directions for improving data management and system reliability across distributed clusters.

Uploaded by

17280164070

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views9 pages

Google Architecture

Uploaded by

17280164070

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

http://highscalability.

com/google-architecture

Google Architecture
Saturday, November 22, 2008 at 10:01AM

Todd Hoff in BigTable, C, Cluster File System, Example, Geo-distributed Clusters, Java,

Linux, Map Reduce, Python

Update 2: Sorting 1 PB with MapReduce. PB is not peanut-butter-and-

jelly misspelled. It's 1 petabyte or 1000 terabytes or 1,000,000 gigabytes.
It took six hours and two minutes to sort 1PB (10 trillion 100-byte
records) on 4,000 computers and the results were replicated thrice on
48,000 disks.
Update: Greg Linden points to a new Google article MapReduce:
simplified data processing on large clusters. Some interesting stats:
100k MapReduce jobs are executed each day; more than 20 petabytes of
data are processed per day; more than 10k MapReduce programs have
been implemented; machines are dual processor with gigabit ethernet and
4-8 GB of memory.

Google is the King of scalability. Everyone knows Google for their large,
sophisticated, and fast searching, but they don't just shine in search. Their
platform approach to building scalable applications allows them to roll out
internet scale applications at an alarmingly high competition crushing
rate. Their goal is always to build a higher performing higher scaling
infrastructure to support their products. How do they do that?

Information Sources
1. Video: Building Large Systems at Google
2. Google Lab: The Google File System
3. Google Lab: MapReduce: Simplified Data Processing on
http://weibo.com/developerworks 2012-11-11 整理第 1／9页
http://highscalability.com/google-architecture

Large Clusters
4. Google Lab: BigTable.
5. Video: BigTable: A Distributed Structured Storage System.
6. Google Lab: The Chubby Lock Service for Loosely-Coupled
Distributed Systems.
7. How Google Works by David Carr in Baseline Magazine.
8. Google Lab: Interpreting the Data: Parallel Analysis with
Sawzall.
9. Dare Obasonjo's Notes on the scalability conference.

Platform
1. Linux
2. A large diversity of languages: Python, Java, C++

What's Inside?
The Stats
1. Estimated 450,000 low-cost commodity servers in 2006
2. In 2005 Google indexed 8 billion web pages. By now, who knows?
3. Currently there over 200 GFS clusters at Google. A cluster can have
1000 or even 5000 machines. Pools of tens of thousands of
machines retrieve data from GFS clusters that run as large as 5
petabytes of storage. Aggregate read/write throughput can be as
high as 40 gigabytes/second across the cluster.
4. Currently there are 6000 MapReduce applications at Google and
hundreds of new applications are being written each month.
5. BigTable scales to store billions of URLs, hundreds of terabytes of
satellite imagery, and preferences for hundreds of millions of users.

The Stack
http://weibo.com/developerworks 2012-11-11 整理第 2／9页
http://highscalability.com/google-architecture

Google visualizes their infrastructure as a three layer stack:

1. Products: search, advertising, email, maps, video, chat, blogger

2. Distributed Systems Infrastructure: GFS, MapReduce, and BigTable.
3. Computing Platforms: a bunch of machines in a bunch of different
data centers
4. Make sure easy for folks in the company to deploy at a low cost.
5. Look at price performance data on a per application basis. Spend
more money on hardware to not lose log data, but spend less on
other types of data. Having said that, they don't lose data.

Reliable Storage Mechanism with GFS (Google

File System)
1. Reliable scalable storage is a core need of any application. GFS is
their core storage platform.
2. Google File System - large distributed log structured file system in
which they throw in a lot of data.
3. Why build it instead of using something off the shelf? Because they
control everything and it's the platform that distinguishes them from
everyone else. They required:
- high reliability across data centers
- scalability to thousands of network nodes
- huge read/write bandwidth requirements
- support for large blocks of data which are gigabytes in size.
- efficient distribution of operations across nodes to reduce
bottlenecks
4. System has master and chunk servers.
- Master servers keep metadata on the various data files. Data are
stored in the file system in 64MB chunks. Clients talk to the master
servers to perform metadata operations on files and to locate the
chunk server that contains the needed they need on disk.
http://weibo.com/developerworks 2012-11-11 整理第 3／9页
http://highscalability.com/google-architecture

- Chunk servers store the actual data on disk. Each chunk is

replicated across three different chunk servers to create redundancy
in case of server crashes. Once directed by a master server, a client
application retrieves files directly from chunk servers.
5. A new application coming on line can use an existing GFS cluster or
they can make your own. It would be interesting to understand the
provisioning process they use across their data centers.
6. Key is enough infrastructure to make sure people have choices for
their application. GFS can be tuned to fit individual application
needs.

Do Something With the Data Using MapReduce

1. Now that you have a good storage system, how do you do anything
with so much data? Let's say you have many TBs of data stored
across a 1000 machines. Databases don't scale or cost effectively
scale to those levels. That's where MapReduce comes in.
2. MapReduce is a programming model and an associated
implementation for processing and generating large data sets. Users
specify a map function that processes a key/value pair to generate a
set of intermediate key/value pairs, and a reduce function that
merges all intermediate values associated with the same
intermediate key. Many real world tasks are expressible in this
model. Programs written in this functional style are automatically
parallelized and executed on a large cluster of commodity machines.
The run-time system takes care of the details of partitioning the
input data, scheduling the program's execution across a set of
machines, handling machine failures, and managing the required
inter-machine communication. This allows programmers without
any experience with parallel and distributed systems to easily utilize
the resources of a large distributed system.
3. Why use MapReduce?
http://weibo.com/developerworks 2012-11-11 整理第 4／9页
http://highscalability.com/google-architecture

- Nice way to partition tasks across lots of machines.

- Handle machine failure.
- Works across different application types, like search and ads.
Almost every application has map reduce type operations. You can
precompute useful data, find word counts, sort TBs of data, etc.
- Computation can automatically move closer to the IO source.
4. The MapReduce system has three different types of servers.
- The Master server assigns user tasks to map and reduce servers. It
also tracks the state of the tasks.
- The Map servers accept user input and performs map operations
on them. The results are written to intermediate files
- The Reduce servers accepts intermediate files produced by map
servers and performs reduce operation on them.
5. For example, you want to count the number of words in all web
pages. You would feed all the pages stored on GFS into
MapReduce. This would all be happening on 1000s of machines
simultaneously and all the coordination, job scheduling, failure
handling, and data transport would be done automatically.
- The steps look like: GFS -> Map -> Shuffle -> Reduction -> Store
Results back into GFS.
- In MapReduce a map maps one view of data to another, producing
a key value pair, which in our example is word and count.
- Shuffling aggregates key types.
- The reductions sums up all the key value pairs and produces the
final answer.
6. The Google indexing pipeline has about 20 different map
reductions. A pipeline looks at data with a whole bunch of records
and aggregating keys. A second map-reduce comes a long, takes
that result and does something else. And so on.
7. Programs can be very small. As little as 20 to 50 lines of code.
8. One problem is stragglers. A straggler is a computation that is going
slower than others which holds up everyone. Stragglers may happen
http://weibo.com/developerworks 2012-11-11 整理第 5／9页
http://highscalability.com/google-architecture

because of slow IO (say a bad controller) or from a temporary CPU

spike. The solution is to run multiple of the same computations and
when one is done kill all the rest.
9. Data transferred between map and reduce servers is compressed.
The idea is that because servers aren't CPU bound it makes sense to
spend on data compression and decompression in order to save on
bandwidth and I/O.

Storing Structured Data in BigTable

1. BigTable is a large scale, fault tolerant, self managing system that
includes terabytes of memory and petabytes of storage. It can handle
millions of reads/writes per second.
2. BigTable is a distributed hash mechanism built on top of GFS. It is
not a relational database. It doesn't support joins or SQL type
queries.
3. It provides lookup mechanism to access structured data by key. GFS
stores opaque data and many applications needs has data with
structure.
4. Commercial databases simply don't scale to this level and they don't
work across 1000s machines.
5. By controlling their own low level storage system Google gets more
control and leverage to improve their system. For example, if they
want features that make cross data center operations easier, they can
build it in.
6. Machines can be added and deleted while the system is running and
the whole system just works.
7. Each data item is stored in a cell which can be accessed using a row
key, column key, or timestamp.
8. Each row is stored in one or more tablets. A tablet is a sequence of
64KB blocks in a data format called SSTable.
9. BigTable has three different types of servers:
http://weibo.com/developerworks 2012-11-11 整理第 6／9页
http://highscalability.com/google-architecture

- The Master servers assign tablets to tablet servers. They track

where tablets are located and redistributes tasks as needed.
- The Tablet servers process read/write requests for tablets. They
split tablets when they exceed size limits (usually 100MB -
200MB). When a tablet server fails, then a 100 tablet servers each
pickup 1 new tablet and the system recovers.
- The Lock servers form a distributed lock service. Operations like
opening a tablet for writing, Master aribtration, and access control
checking require mutual exclusion.
10. A locality group can be used to physically store related bits of data
together for better locality of reference.
11. Tablets are cached in RAM as much as possible.

Hardware
1. When you have a lot of machines how do you build them to be cost
efficient and use power efficiently?
2. Use ultra cheap commodity hardware and built software on top to
handle their death.
3. A 1,000-fold computer power increase can be had for a 33 times
lower cost if you you use a failure-prone infrastructure rather than
an infrastructure built on highly reliable components. You must
build reliability on top of unreliability for this strategy to work.
4. Linux, in-house rack design, PC class mother boards, low end
storage.
5. Price per wattage on performance basis isn't getting better. Have
huge power and cooling issues.
6. Use a mix of collocation and their own data centers.

Misc
1. Push changes out quickly rather than wait for QA.

http://weibo.com/developerworks 2012-11-11 整理第 7／9页

http://highscalability.com/google-architecture

2. Libraries are the predominant way of building programs.

3. Some are applications are provided as services, like crawling.
4. An infrastructure handles versioning of applications so they can be
release without a fear of breaking things.

Future Directions for Google

1. Support geo-distributed clusters.
2. Create a single global namespace for all data. Currently data is
segregated by cluster.
3. More and better automated migration of data and computation.
4. Solve consistency issues that happen when you couple wide area
replication with network partitioning (e.g. keeping services up even
if a cluster goes offline for maintenance or due to some sort of
outage).

Lessons Learned
1. Infrastructure can be a competitive advantage. It certainly is for
Google. They can roll out new internet services faster, cheaper, and
at scale at few others can compete with. Many companies take a
completely different approach. Many companies treat infrastructure
as an expense. Each group will use completely different
technologies and their will be little planning and commonality of
how to build systems. Google thinks of themselves as a systems
engineering company, which is a very refreshing way to look at
building software.
2. Spanning multiple data centers is still an unsolved problem.
Most websites are in one and at most two data centers. How to fully
distribute a website across a set of data centers is, shall we say,
tricky.
3. Take a look at Hadoop if you don't have the time to rebuild all this
http://weibo.com/developerworks 2012-11-11 整理第 8／9页
http://highscalability.com/google-architecture

infrastructure from scratch yourself. Hadoop is an open source

implementation of many of the same ideas presented here.
4. An under appreciated advantage of a platform approach is junior
developers can quickly and confidently create robust applications on
top of the platform. If every project needs to create the same
distributed infrastructure wheel you'll run into difficulty because the
people who know how to do this are relatively rare.
5. Synergy isn't always crap. By making all parts of a system work
together an improvement in one helps them all. Improve the file
system and everyone benefits immediately and transparently. If
every project uses a different file system then there's no continual
incremental improvement across the entire stack.
6. Build self-managing systems that work without having to take
the system down. This allows you to more easily rebalance
resources across servers, add more capacity dynamically, bring
machines off line, and gracefully handle upgrades.
7. Create a Darwinian infrastructure. Perform time consuming
operation in parallel and take the winner.
8. Don't ignore the Academy. Academia has a lot of good ideas that
don't get translated into production environments. Most of what
Google has done has prior art, just not prior large scale deployment.
9. Consider compression. Compression is a good option when you
have a lot of CPU to throw around and limited IO.

Article originally appeared on High Scalability (http://highscalability.com/).

See website for complete article licensing information.

http://weibo.com/developerworks 2012-11-11 整理第 9／9页

Google Architecture Insights
No ratings yet
Google Architecture Insights
7 pages
CC
No ratings yet
CC
17 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Mapreduce: Simplified Data Processing On Large Clusters
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters
38 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
65 pages
Google Infrastructure Case Study Overview
No ratings yet
Google Infrastructure Case Study Overview
40 pages
Unit 4
No ratings yet
Unit 4
41 pages
TM2 ch02 Mapreduce
No ratings yet
TM2 ch02 Mapreduce
51 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Google Architecture Case Study
No ratings yet
Google Architecture Case Study
44 pages
Google MapReduce: Simplified Data Processing
No ratings yet
Google MapReduce: Simplified Data Processing
19 pages
MapReduce for Data Scientists
No ratings yet
MapReduce for Data Scientists
213 pages
Unit-4 CC
No ratings yet
Unit-4 CC
72 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
Unit - 4-Cloud
No ratings yet
Unit - 4-Cloud
122 pages
Big Data Slides
No ratings yet
Big Data Slides
26 pages
UNIT-IV Notes
No ratings yet
UNIT-IV Notes
15 pages
Cloud Application Requirements Overview
No ratings yet
Cloud Application Requirements Overview
21 pages
Week 02
No ratings yet
Week 02
115 pages
TLW Assignment 3 27-Sep-2024 10-32-28
No ratings yet
TLW Assignment 3 27-Sep-2024 10-32-28
28 pages
Google Cloud Data Platform & Services: Gregor Hohpe
No ratings yet
Google Cloud Data Platform & Services: Gregor Hohpe
35 pages
Google App Engine and Google File System
No ratings yet
Google App Engine and Google File System
5 pages
Introduction to Distributed Data Processing
No ratings yet
Introduction to Distributed Data Processing
2 pages
Google Distributed Systems Overview
No ratings yet
Google Distributed Systems Overview
23 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
GFS vs HDFS in Cloud Computing
No ratings yet
GFS vs HDFS in Cloud Computing
26 pages
Big Data and NoSQL Systems Overview
No ratings yet
Big Data and NoSQL Systems Overview
51 pages
An Overview of Google File System (GFS) - Medium
No ratings yet
An Overview of Google File System (GFS) - Medium
10 pages
Bigtable A System For Distributed Structured Storage: Motivation
No ratings yet
Bigtable A System For Distributed Structured Storage: Motivation
9 pages
Hadoop: Scalable Data Analytics Solutions
No ratings yet
Hadoop: Scalable Data Analytics Solutions
11 pages
Programming Environment For GAE
No ratings yet
Programming Environment For GAE
35 pages
PDC Lecture 13
No ratings yet
PDC Lecture 13
32 pages
Introduction to MapReduce and Hadoop
No ratings yet
Introduction to MapReduce and Hadoop
36 pages
Refer Slide Time: 00:15
No ratings yet
Refer Slide Time: 00:15
31 pages
Group E
No ratings yet
Group E
29 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
No ratings yet
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
36 pages
02 Hadoop
No ratings yet
02 Hadoop
117 pages
8 Conclusions References: To Appear in OSDI 2004
No ratings yet
8 Conclusions References: To Appear in OSDI 2004
1 page
Understanding MapReduce for Big Data
No ratings yet
Understanding MapReduce for Big Data
42 pages
GFS - Architecture M5 GFS - Architecture M5
No ratings yet
GFS - Architecture M5 GFS - Architecture M5
25 pages
CS621 Week 15
No ratings yet
CS621 Week 15
64 pages
Unit 5
No ratings yet
Unit 5
35 pages
Hadoop Introduction
No ratings yet
Hadoop Introduction
29 pages
System Design Interesting Reads
No ratings yet
System Design Interesting Reads
4 pages
Hadoop: Big Data Processing Essentials
No ratings yet
Hadoop: Big Data Processing Essentials
19 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
5 pages
DBMS Final
No ratings yet
DBMS Final
21 pages
Lecture 3 - Big Data
No ratings yet
Lecture 3 - Big Data
43 pages
Big Data & Hadoop for IT Students
No ratings yet
Big Data & Hadoop for IT Students
24 pages
9 Hadoop PDF
No ratings yet
9 Hadoop PDF
59 pages
NoSQL for Tech Professionals
No ratings yet
NoSQL for Tech Professionals
40 pages
7 Related Work: To Appear in OSDI 2004
No ratings yet
7 Related Work: To Appear in OSDI 2004
1 page
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
Unit Iiibig Data Processing Demo
No ratings yet
Unit Iiibig Data Processing Demo
32 pages
Big Data Evolution & Data Wrangling
No ratings yet
Big Data Evolution & Data Wrangling
56 pages
Map Reduce Summary
No ratings yet
Map Reduce Summary
4 pages
Bda Ia1 Scheme
No ratings yet
Bda Ia1 Scheme
7 pages
Cursor + V0 Can We Build An AI Next - Js App in 8 Minutes
No ratings yet
Cursor + V0 Can We Build An AI Next - Js App in 8 Minutes
2 pages
Build A Basic Airbnb App With Cursor AI Tricks
No ratings yet
Build A Basic Airbnb App With Cursor AI Tricks
5 pages
Build A Perplexity Clone in 8min With AI
No ratings yet
Build A Perplexity Clone in 8min With AI
3 pages
Llama2 Extracted
No ratings yet
Llama2 Extracted
4 pages
Russ' 10 Ingredient Recipe For Making 1 Million TPS On $5K Hardware
No ratings yet
Russ' 10 Ingredient Recipe For Making 1 Million TPS On $5K Hardware
7 pages
Misco - A MapReduce Framework For Mobile Systems - Start of The Ambient Cloud
No ratings yet
Misco - A MapReduce Framework For Mobile Systems - Start of The Ambient Cloud
7 pages
How To Use Cursor Agent For Beginners
No ratings yet
How To Use Cursor Agent For Beginners
7 pages
I Made An IOS App With Cursor and It's Super Fun
No ratings yet
I Made An IOS App With Cursor and It's Super Fun
3 pages
Facebook at 13 Million Queries Per Second Recommends - Minimize Request Variance
No ratings yet
Facebook at 13 Million Queries Per Second Recommends - Minimize Request Variance
3 pages
Sharding The Hibernate Way
No ratings yet
Sharding The Hibernate Way
7 pages
The Performance of Distributed Data-Structures Running On A Cache-Coherent In-Memory Data Grid
No ratings yet
The Performance of Distributed Data-Structures Running On A Cache-Coherent In-Memory Data Grid
10 pages
Cloud Programming Directly Feeds Cost Allocation Back Into Software Design
No ratings yet
Cloud Programming Directly Feeds Cost Allocation Back Into Software Design
3 pages
How To Build An AI Customer Service Bot
No ratings yet
How To Build An AI Customer Service Bot
6 pages
Strategy - Diagonal Scaling - Don't Forget To Scale Out and Up
No ratings yet
Strategy - Diagonal Scaling - Don't Forget To Scale Out and Up
3 pages
Why My Slime Mold Is Better Than Your Hadoop Cluster
No ratings yet
Why My Slime Mold Is Better Than Your Hadoop Cluster
8 pages
TripAdvisor Architecture - 40M Visitors 200M Dynamic Page Views 30TB Data
No ratings yet
TripAdvisor Architecture - 40M Visitors 200M Dynamic Page Views 30TB Data
11 pages
At Some Point The Cost of Servers Outweighs The Cost of Programmers
No ratings yet
At Some Point The Cost of Servers Outweighs The Cost of Programmers
4 pages
Is It Time To Get Rid of The Linux OS Model in The Cloud
No ratings yet
Is It Time To Get Rid of The Linux OS Model in The Cloud
6 pages
10 Golden Principles For Building Successful Mobile-Web Applications
No ratings yet
10 Golden Principles For Building Successful Mobile-Web Applications
3 pages
Did The Microsoft Stack Kill MySpace
No ratings yet
Did The Microsoft Stack Kill MySpace
14 pages
Using Gossip Protocols For Failure Detection Monitoring Messaging and Other Good Things
No ratings yet
Using Gossip Protocols For Failure Detection Monitoring Messaging and Other Good Things
7 pages
The Anatomy of Search Technology - Blekko's NoSQL Database
No ratings yet
The Anatomy of Search Technology - Blekko's NoSQL Database
5 pages
Heroku Emergency Strategy - Incident Command System and 8 Hour Ops Rotations For Fresh Minds
No ratings yet
Heroku Emergency Strategy - Incident Command System and 8 Hour Ops Rotations For Fresh Minds
4 pages
facebook架构设计中文版
No ratings yet
facebook架构设计中文版
39 pages
Netflix - Continually Test by Failing Servers With Chaos Monkey
No ratings yet
Netflix - Continually Test by Failing Servers With Chaos Monkey
2 pages
Flickr Architecture
No ratings yet
Flickr Architecture
9 pages
Amazon Architecture
No ratings yet
Amazon Architecture
11 pages
YouTube Architecture
No ratings yet
YouTube Architecture
8 pages
Mongodb Tutorial
No ratings yet
Mongodb Tutorial
2 pages
DBase IV r2.0 For DOS Quick Reference
100% (2)
DBase IV r2.0 For DOS Quick Reference
100 pages
Workshop
No ratings yet
Workshop
2 pages
SAP ABAP Programming Guide
100% (1)
SAP ABAP Programming Guide
151 pages
SQL Injection Attacks: Overview and Solutions
No ratings yet
SQL Injection Attacks: Overview and Solutions
2 pages
Senior Analyst Expertise & Skills
No ratings yet
Senior Analyst Expertise & Skills
1 page
Cassandra Database Overview
No ratings yet
Cassandra Database Overview
37 pages
Wendy Afreza (LinkedinIs)
No ratings yet
Wendy Afreza (LinkedinIs)
4 pages
Google - Professional Cloud DevOps Engineer.v2023 12 30.q77
No ratings yet
Google - Professional Cloud DevOps Engineer.v2023 12 30.q77
47 pages
CV Chandhra Compressed
No ratings yet
CV Chandhra Compressed
5 pages
R22 Unit-4
No ratings yet
R22 Unit-4
29 pages
SAP List of Certification Exams
No ratings yet
SAP List of Certification Exams
8 pages
COMP2521 Quiz: Software Types
No ratings yet
COMP2521 Quiz: Software Types
4 pages
System Design Interview Prep Guide
No ratings yet
System Design Interview Prep Guide
80 pages
CIS Controls Commonly Exploited Protocols WMI v21 12 White Paper
No ratings yet
CIS Controls Commonly Exploited Protocols WMI v21 12 White Paper
42 pages
Understanding Computer Forensics Basics
No ratings yet
Understanding Computer Forensics Basics
24 pages
MD050 - Wells Fargo ACH
100% (1)
MD050 - Wells Fargo ACH
8 pages
The Perfect Server - Debian 9 (Stretch) With Apache, BIND, Dovecot, PureFTPD and ISPConfig 3.1
No ratings yet
The Perfect Server - Debian 9 (Stretch) With Apache, BIND, Dovecot, PureFTPD and ISPConfig 3.1
20 pages
Preventing SQL Injection in Web Apps
No ratings yet
Preventing SQL Injection in Web Apps
9 pages
Thesis Rishika1
No ratings yet
Thesis Rishika1
83 pages
CHP - 3 Database
No ratings yet
CHP - 3 Database
5 pages
Software Testing
No ratings yet
Software Testing
7 pages
Web Application Testing Overview
100% (1)
Web Application Testing Overview
29 pages
SEO Naming Conventions Guide
No ratings yet
SEO Naming Conventions Guide
12 pages
Blockchain-Based Secure Document Storage
No ratings yet
Blockchain-Based Secure Document Storage
8 pages
WebSphere DataPower SOA Appliances and XSLT Part 1
No ratings yet
WebSphere DataPower SOA Appliances and XSLT Part 1
23 pages
ASP.NET
100% (1)
ASP.NET
4 pages
Omci Log
No ratings yet
Omci Log
301 pages
WR User Guide3.0
No ratings yet
WR User Guide3.0
385 pages
EBS 12.2 + Oracle 19c Active Passive Data Guard Setup
No ratings yet
EBS 12.2 + Oracle 19c Active Passive Data Guard Setup
15 pages

Google Architecture

Uploaded by

Google Architecture

Uploaded by

http://highscalability.

Linux, Map Reduce, Python

Update 2: Sorting 1 PB with MapReduce. PB is not peanut-butter-and-

Google visualizes their infrastructure as a three layer stack:

1. Products: search, advertising, email, maps, video, chat, blogger

Reliable Storage Mechanism with GFS (Google

- Chunk servers store the actual data on disk. Each chunk is

Do Something With the Data Using MapReduce

- Nice way to partition tasks across lots of machines.

because of slow IO (say a bad controller) or from a temporary CPU

Storing Structured Data in BigTable

- The Master servers assign tablets to tablet servers. They track

http://weibo.com/developerworks 2012-11-11 整理 第 7／9页

2. Libraries are the predominant way of building programs.

Future Directions for Google

infrastructure from scratch yourself. Hadoop is an open source

Article originally appeared on High Scalability (http://highscalability.com/).

See website for complete article licensing information.

http://weibo.com/developerworks 2012-11-11 整理 第 9／9页

You might also like

http://weibo.com/developerworks 2012-11-11 整理第 7／9页

http://weibo.com/developerworks 2012-11-11 整理第 9／9页