0% found this document useful (0 votes)
393 views63 pages

Peer-to-Peer (P2P) Networks: Dr. Yingwu Zhu

This document provides an overview of peer-to-peer (P2P) networks. It discusses the limitations of traditional client-server architectures and how P2P systems address these limitations through decentralized architectures. Several popular unstructured P2P networks are described, including Napster, Gnutella, Kazaa, Freenet, and BitTorrent. Napster used a centralized server for indexing files while Gnutella employed decentralized query flooding. Kazaa introduced superpeers for improved scalability. Freenet routes queries along reverse paths to preserve anonymity. BitTorrent addresses free-riding and improves file distribution performance.

Uploaded by

Mamata K Thapa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
393 views63 pages

Peer-to-Peer (P2P) Networks: Dr. Yingwu Zhu

This document provides an overview of peer-to-peer (P2P) networks. It discusses the limitations of traditional client-server architectures and how P2P systems address these limitations through decentralized architectures. Several popular unstructured P2P networks are described, including Napster, Gnutella, Kazaa, Freenet, and BitTorrent. Napster used a centralized server for indexing files while Gnutella employed decentralized query flooding. Kazaa introduced superpeers for improved scalability. Freenet routes queries along reverse paths to preserve anonymity. BitTorrent addresses free-riding and improves file distribution performance.

Uploaded by

Mamata K Thapa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 63

Peer-to-Peer (P2P) Networks

Dr. Yingwu Zhu


Overview
• CS Architecture
• P2P Architecture
• Unstructured P2P Networks
– Napster, Gnutella, KaZza, Freenet
– BitTorrent
• Structured P2P Networks
– Chord, Pastry, Tapestry, CAN
– Won’t be covered here!
Client/Sever Architecture
• Well known,
powerful, reliable
server is a data source
• Clients request data
from server
• Very successful
model
– WWW (HTTP), FTP,
Web services, etc.

As more clients are added, the demand on


the server increases!!!
Client/Server Limitations
• Scalability is hard to achieve
• Presents a single point of failure
• Requires administration
• Unused resources at the network edge
– CPU cycles, storage, etc.
• P2P systems try to address these
limitations
Why Study P2P?
• Huge fraction of traffic on networks today
>=50%!
• Exciting new applications
• Next level of resource sharing
– Vs. timesharing, client-server, P2P
– E.g. Access 10’s-100’s of TB at low cost.
Users and Usage
• 60M users of file-sharing in US
– 8.5M logged in at a given time on average
• 814M units of music sold in US last year
– 140M digital tracks sold by music companies
• As of Nov, 35% of all Internet traffic was for
BitTorrent, a single file-sharing system
• Major legal battles underway between
recording industry and file-sharing companies
Share of Internet Traffic
Number of Users

Others include
BitTorrent,
eDonkey, iMesh,
Overnet, Gnutella

BitTorrent (and
others) gaining share
from FastTrack (Kazaa).
P2P Computing
• P2P computing is the sharing of computer
resources and services by direct exchange between
systems.
• These resources and services include the exchange
of information, processing cycles, cache storage,
and disk storage for files.
• P2P computing takes advantage of existing
computing power, computer storage and
networking connectivity, allowing users to leverage
their collective power to the “benefit” of all.
P2P Architecture
• All nodes are both
clients and servers
– Provide and consume data
– Any node can initiate a
connection
• No centralized data
source
– “The ultimate form of
democracy on the Internet”
– “The ultimate threat to
copy-right protection on the
Internet
What is P2P?
• A distributed system
architecture
– No centralized control
– Typically many nodes, but
unreliable and heterogeneous
– Nodes are symmetric in Internet
function
– Take advantage of distributed,
shared resources (bandwidth,
CPU, storage) on peer-nodes
– Fault-tolerant, self-organizing
– Operate in dynamic
environment, frequent join and
leave is the norm
P2P Network Characteristics
• Clients are also servers and routers
– Nodes contribute content, storage, memory, CPU
• Nodes are autonomous (no administrative
• authority)
• Network is dynamic: nodes enter and leave the
network “frequently”
• Nodes collaborate directly with each other (not
through well-known servers)
• Nodes have widely varying capabilities
P2P vs. Client/Server
• Pure P2P:
– No central server
– For certain requests any peer can function as a client,
as a router, or as a server
– The information is not located in a central location but
is distributed among all peers
– A peer may need to communicate with multiple peers
to locate a piece of information

As more peers are added, both demand


and capacity of the network increases !
P2P Benefits
• Efficient use of resources
– Unused bandwidth, storage, processing power at the edge of the
network
• Scalability
– Consumers of resources also donate resources
– Aggregate resources grow naturally with utilization
• Reliability
– Replicas
– Geographic distribution
– No single point of failure
• Ease of administration
– Nodes self organize
– No need to deploy servers to satisfy demand (c.f. scalability)
– Built-in fault tolerance, replication, and load balancing
P2P Traffics
• P2P networks generate more traffic than any
other internet application
• 2/3 of all bandwidth on some backbones
P2P Data Flow
CacheLogic P2P file format analysis (2005)
Streamsight used for Layer-7 Deep Packet Inspection
Category of P2P Systems
• Unstructured
– No restriction on overlay structures and data
placement
– Napster, Gnutella, Kazza, Freenet, Bittorrent
• Structured
– Distributed hash tables (DHTs)
– Place restrictions on overlay structures and
data placement
– Chord, Pastry, Tapestry, CAN
Napster • Share Music files, MP3 data
• Nodes register their contents (list
of files) and IPs with server
• Centralized server for searches
– The client sends queries to the
centralized server for files of
interest
Client Client – Keyword search (artist, song,
album, bitrate, etc.)
• Napster server replies with IP
Server
address of users with matching
files
Reply
• File download done on a peer to
Query peer basis
• Poor scalability
• Single point of failure
File Transfer • Legal issues  shutdown
Napster: Publish

insert(X,
123.2.21.23)
...

Publish

I have X, Y, and Z!
123.2.21.23
Napster: Search
123.2.0.18

search(A)
-->
Fetch 123.2.0.18

Query Reply

Where is file A?
Napster
• Central Napster server
– Can ensure correct results
– Bottleneck for scalability
– Single point of failure
– Susceptible to denial of service
• Malicious users
• Lawsuits, legislation
• Search is centralized
• File transfer is direct (peer-to-peer)
Gnutella: Query Flooding
Breadth-First Search (BFS)

= source
= forward
query
= processed
query
= found
result
= forward
response
Gnutella: Query Flooding
• A node/peer connects to a set of Gnutella
neighbors
• Forward queries to neighbors
• Client which has the Information
responds.
• Flood network with TTL for termination
+ Results are complete
– Bandwidth wastage
Gnutella vs. Napster
• Decentralized
– No single point of failure
– Not as susceptible to denial of service
– Cannot ensure correct results
• Flooding queries
– Search is now distributed but still not scalable
Gnutella: Random Walk
• Improved over query flooding
• Same overly structure to
Gnutella
• Forward the query to random
subset of it neighbors
+ Reduced bandwidth
requirements
– Incomplete results
– High latency

Peer nodes
Kazza (Fasttrack Networks)
• Hybrid of centralized Napster and decentralized
Gnutella
• Super-peers act as local search hubs
– Each super-peer is similar to a Napster server for a small portion
of the network
– Super-peers are automatically chosen by the system based on
their capacities (storage, bandwidth, etc.) and availability
(connection time)
• Users upload their list of files to a super-peer
• Super-peers periodically exchange file lists
• You send queries to a super-peer for files of interest
– The local super-peer may flood the queries to other super-peers
for the files of interest, if it cannot satisfy the queries.
• Exploit the heterogeneity of peer nodes
Kazza

• Uses supernodes to improve


scalability, establish hierarchy
• Uptime, bandwidth
• Closed-source
• Uses HTTP to carry out
download
• Encrypted protocol; queuing,
QoS
KaZaA: Network Design
“Super Nodes”
KaZaA: File Insert

insert(X,
123.2.21.23)
...

Publish

I have X!
123.2.21.23
KaZaA: File Search
search(A)
-->
123.2.22.50

search(A)
123.2.22.50 -->
Query Replies 123.2.0.18

Where is file A?

123.2.0.18
Freenet
• Data flows in reverse path of query
– Impossible to know if a user is initiating or forwarding a
query
– Impossible to know if a user is consuming or forwarding
data
“Smart” queries
n Requests get
routed to
correct peer
by
incremental
discovery
Bittorrent
• A popular P2P application for file
exchange!
Problems to Address
• Traditional Client/Server Sharing
– Performance deteriorates rapidly as the
number of clients increases
• Free-riding in P2P network
– Free riders only download without
contributing to the network.
Basic Idea
• Chop file into many pieces
– A piece is broken into sub-pieces ... typically 16KB in size
– Policy: Until a piece is assembled, only download sub-pieces for
that piece
– This policy lets complete pieces assemble quickly

• Replicate DIFFERENT pieces on different peers as soon


as possible
• As soon as a peer has a complete piece, it can trade it
with other peers
• Hopefully, we will be able to assemble the entire file at
the end
File Organization

File

1 2 3 4
Piece
256KB
Block
16KB

Incomplete Piece
Overall Architecture
Web Server Tracker
Web page
with link
to .torrent
nt
.torre

C
A
Peer
Peer [Seed]
B
[Leech]
Downloader Peer
“US” [Leech]
Overall Architecture
Web Server Tracker
Web page
with link
to .torrent

nce
u
nno
et-a
G

C
A
Peer
Peer [Seed]
B
[Leech]
Downloader Peer
“US” [Leech]
Overall Architecture
Web Server Tracker
Web page
with link
to .torrent

l i st
eer
e-p
ons
esp
R
C
A
Peer
Peer [Seed]
B
[Leech]
Downloader Peer
“US” [Leech]
Overall Architecture
Web Server Tracker
Web page
with link
to .torrent

Shake-hand
C
A
Sh Peer
ak
Peer e-h [Seed]
an
d B
[Leech]
Downloader Peer
“US” [Leech]
Overall Architecture
Web Server Tracker
Web page
with link
to .torrent

pieces
C
A pie
ce
s Peer
Peer [Seed]
B
[Leech]
Downloader Peer
“US” [Leech]
Overall Architecture
Web Server Tracker
Web page
with link
to .torrent

pieces
C
A pie
ce
s Peer
pie
Peer ce [Seed]
s
B
[Leech]
Downloader Peer
“US” [Leech]
Overall Architecture
Web Server Tracker
Web page
with link
to .torrent

n ce
u
n no st
t -a l i
Ge eer
-p
nse
o
e sp pieces
R
C
A pie
ce
s Peer
pie
Peer ce [Seed]
s
B
[Leech]
Downloader Peer
“US” [Leech]
Critical Elements
• 1 A web server
– To provide the ‘metainfo’ file by HTTP
– For example:
• http://bt.btchina.net Web Server

• http://bt.ydy.com/

The Lord of Ring.torrent

Troy.torrent
Critical Elements
• 2 The .torrent file
– Static ‘metainfo’ file to contain necessary
information :
• Name
• Size
Matrix.torrent
• Checksum
• IP address (URL) of the Tracker
• Pieces <hash1,hash2,….hashn>
• Piece length
Critical Elements
• 3 A BitTorrent tracker
– Non-content-sharing node
– Track peers
– For example:
• http://bt.cnxp.com:8080/announce
• http://btfans.3322.org:6969/announce
• Peer cache
– IP, port, peer id
• State information
– Completed
– Downloading
• Returns random list
Critical Elements
• 4 An end user (peer)
– Guys who want to use BitTorrent must install
corresponding software or plug-in for web
browsers.
– Downloader (leecher) : Peer has only a part ( or
none ) of the file.
– Seeder: Peer has the complete file, and chooses
to stay in the system to allow other peers to
download
Messages
• Peer – Peer messages
– TCP Sockets
• Peer – Tracker messages
– HTTP Request/Response
BitTorrent – joining a torrent
metadata file
new leecher .torrent website
1

2 join peer list 3 data


request seed/leecher
tracker 4

Peers divided into:


• seeds: have the entire file
• leechers: still downloading
1. obtain the metadata file
2. contact the tracker
3. obtain a peer list (contains seeds & leechers)
4. contact peers from that list for data
BitTorrent – exchanging data

leecher B leecher A I have !

seed

leecher C

● Verify pieces using hashes


● Download sub-pieces in parallel
● Advertise received pieces to the entire peer list
● Look for the rarest pieces
BitTorrent - unchoking

leecher B leecher A

seed

leecher C
leecher D

● Periodically calculate data-receiving rates


● Upload to (unchoke) the fastest downloaders
● Optimistic unchoking
•periodically select a peer at random and upload to it
•continuously look for the fastest partners
Demo
HTTP GET MYFILE.torrent

MYFILE.torrent
webserver http://mytracker.com:6969/
S3F5YHG6FEB user
FG5467HGF367
“register”
F456JI9N5FF4E
… list of peers
ID1 169.237.234.1:6881
ID2 190.50.34.6:5692
tracker ID3 34.275.89.143:4545


ID50 231.456.31.95:6882

Peer 40 Peer 1
Peer 2
Swarming Pieces and Sub-pieces
• A piece, typically 256KB is broken into 16KB sub-
pieces.
• Until a piece is assembled, only sub-pieces for that
piece is downloaded.
• This ensures that complete pieces assemble quickly.
• When transferring data over TCP, it is critical to
always have several requests pending at once, to
avoid a delay between pieces being sent.
• At any point in time, some number, typically 5, are
requested simultaneously.
• On piece completion, notify all (neighbor) peers.
Piece Selection
• The order of pieces is very important for good
performance.
• A bad algorithm could result in all peers
waiting for the same missing piece.
• Random Piece First policy
– Initially a peer had no pieces to trade, thus
important to get a piece ASAP.
– Policy: Peer starts with a random piece to
download.
• Rarest Piece First policy
– Policy: Download the pieces which are most rare
among your peers.
– Ensures most common pieces are left for last.
Rarest First Policy
HAVE <12,7,36>
Peer
12,7,36

12,7,14 HAVE <14>


Peer
.
.
.

14
Peer
HAVE <12,7,14>
End Game mode
• When all the sub-pieces that a peer doesn’t
have are requested, a request is sent to
every peer.
• When the sub-piece arrives, duplicate
requests are canceled.
• This ensures, completion is not prevented
due to a slow peer.
Tit-for-Tat Strategy
“Give and yet shall receive”

• Cooperate if the other per cooperates.


• Chocking mechanism.
• Choke all peers except top 4 up loaders.
• Optimistic Un-choke for eventual
cooperation and recovery.
Tit-for-Tat
Peer 1 Un-choked
Peer 2 Choked

Peer Peer 1
Slow Upload

Peer 2

Peer 1 Choked
Peer 2 Un-choked
Choking
• Ensures every nodes cooperate and prevents free-riding
problem.
• Goal is to have several bidirectional connections running
continuously.
• Choking is temporary refusal to upload, downloading occurs
as normal.
• Connection is kept open so that setup costs are not borne again
and again.
• At a given time only 4 best peers are un-choked.
• Evaluation on whom to choke/un-choke is performed every 10
seconds.
• Optimistic Un-choke every 30 seconds.
– Give a chance for newly joined peer to get data to download
Choking Algorithm
• Goal is to have several bidirectional
connections running continuously
• Upload to peers who have uploaded to
you recently
• Unutilized connections are uploaded to on
a trial basis to see if better transfer rates
could be found using them
Choking Specifics
• A peer always unchokes a fixed number of its peers
(default of 4)
• Decision to choke/unchoke done based on current
download rates, which is evaluated on a rolling 20-
second average
• Evaluation on who to choke/unchoke is performed
every 10 seconds
– This prevents wastage of resources by rapidly choking/unchoking
peers
– Supposedly enough for TCP to ramp up transfers to their full capacity
• Which peer is the optimistic unchoke is rotated every
30 seconds
Anti-Snubbing
• Policy: When over a minute has gone by without
receiving a single sub-piece from a particular peer,
do not upload to it except as an optimistic unchoke
• A peer might find itself being simultaneously
choked by all its peers that it was just
downloading from
• Download will lag until optimistic unchoke finds
better peers
• Policy: If choked by everyone, increase the number
of simultaneous optimistic unchokes to more than
one
Up-load only or Seeding mode
• Once the download is complete, has no
download rates to compare, nor requires
them.
• Which node to upload?
• Policy: Upload to top 4 peers with
maximum upload rate.
– Ensures faster replication.
Structured P2P Networks
• Distributed Hash Tables (DHTs)
– put(k,v) and v=get(k) interface
• Representative systems
– Chord, Pastry, Tapestry, and CAN

You might also like