LinuxVirtualServerTutorial
Horms(SimonHorman)horms@[Link]
VALinuxSystemsJapan,[Link]
July2003.RevisedMarch2004
[Link]
withassistancefrom
Abstract:
TheLinuxVirtualServerProject(LVS)allowsloadbalancingofnetworkedservicessuchasweband
[Link]
[Link]
demonstratehowtousevariousfeaturesofLVStoloadbalanceInternetservices,andhowthiscanbe
[Link]
advancedtopicswhichhavebeenthesubjectofrecentdevelopmentincludingmaintainingactive
connectionsinahighlyavailableenvironmentandusingactivefeedbacktobetterdistributeload.
Introduction
TheLinuxVirtualServerProject(LVS)[Link]
[Link]
[Link]
[Link],fromemailto
theXWindowsSystem.
LVSitselfrunsonLinux,howeveritisabletoloadbalanceconnectionsfromendusersrunningany
[Link]
UDP,LVScanbeused.
LVSisveryhighperformance.Itisabletohandleupwardsof100,[Link]
[Link]
isalsoabletoloadbalancesaturated1Gbitlinkandbeyondusinghigherendcommodityhardware.
LVSBasics
[Link],andhowto
[Link]
andUDPservices.
Terminology
LinuxDirector:HostwithLinuxandLVSinstalledwhichreceivespacketsfromendusersand
forwardsthemtorealservers.
EndUser:Hostthatoriginatesaconnection.
RealServer:[Link]
Apache.
Asinglehostmaybeactinmorethanoneoftheaboverolesatthesametime.
VirtualIPAddress(VIP):TheIPaddressassignedtoaservicethataLinuxDirectorwillhandle.
RealIPAddress(RIP):TheIPaddressofaRealServer.
Layer4Switching
Figure1:LVSNAT
Layer4SwitchingworksbymultiplexingincomingTCP/IPconnectionsandUDP/IPdatagramstoreal
[Link]
[Link]
[Link],theintegrityoftheconnectionismaintained.
ForwardingPackets
TheLinuxVirtualServerhasthreedifferentwaysofforwardingpackets;networkaddresstranslation
(NAT),IPIPencapsulation(tunnelling)anddirectrouting.
NetworkAddressTranslation(NAT):Amethodofmanipulatingthesourceand/ordestination
portand/[Link]
usedtoenableRFC1918[2]privatenetworkstoaccesstheInternet.Inthecontextoflayer4
switching,packetsarereceivedfromendusersandthedestinationportandIPaddressare
[Link]
whichtimethemappingisundonesotheenduserseesrepliesfromtheexpectedsource.
DirectRouting:[Link]
isnotmodified,sotherealserversmustbeconfiguredtoaccepttrafficforthevirtualserver'sIP
[Link]
addressedtothevirtualserver'[Link]
[Link],thelinuxdirectordoesnotneedtobeinthereturnpath.
IPIPEncapsulation(Tunnelling):AllowspacketsaddressedtoanIPaddresstoberedirectedto
anotheraddress,possiblyonadifferentnetwork.Inthecontextoflayer4switchingthe
behaviourisverysimilartothatofdirectrouting,exceptthatwhenpacketsareforwardedthey
areencapsulatedinanIPpacket,[Link]
advantageofusingtunnellingisthatrealserverscanbeonadifferentnetworks.
Figure2:LVSDirectRouting
VirtualServices
OntheLinuxDirectoravirtualserviceisdefinedbyeitheranIPaddress,portandprotocol,ora
[Link]
setandaconnectionisreceivedfromthesameIPaddressbeforethetimeouthasexpired,thenthe
connectionwillbeforwardedtothesamerealserverastheoriginalconnection.
IPAddress,PortandProtocol:Avirtualservermaybespecifiedby:
AnIPAddress:TheIPaddressthatenduserswillusetoaccesstheservice.
Aport:Theportthatenduserswillconnectto.
[Link].
FirewallMark:Packetsmaybemarkedwitha32bitunsignedvalueusingipchainsoriptables.
TheLinuxVirtualServerisabletouseusethismarktodesignatepacketsdestinedforavirtual
[Link]
[Link]
[Link]
serverforbothHTTPandHTTPS.
Scheduling
Thevirtualserviceisassignedaschedulingalgorithmthatisusedtoallocateincomingconnectionsto
[Link]
schedulerscanbeimplementedwithoutmodifyingthecoreLVScode.
[Link]
[Link]
realserverinturnandallocatingconnectionstotherealserverwiththeleastnumberofconnections
[Link]
theweightingoftherealserver,morepowerfulrealserverscanbesetwithahigherweightandthus,
willbeallocatedmoreconnections.
[Link]
[Link]
LVStoloadbalancetransparentproxies.
InstallingLVS
Somedistributions,[Link]
[Link]
UltraMonkeyprovidespackagesbuiltagainstDebianSid(Unstable)andWoody(Stable/3.0)and
[Link]
[Link]
usefultounderstandhowthisprocessworks.
[Link]
[Link],eachversionofLVSwascloselytiedtoaversionoftheKernel.
Thenetfilterpacketfilteringarchitecture[4]whichispartofthe2.4kernelshasallowedLVStobe
[Link]
[Link],
thisdiscussionwillfocusonusingLVSasamoduleasthisapproachiseasierandmoreflexible.
1. ObtainandUnpackKernel
[Link]
[Link]
shouldunpackthekernelintothelinux2.4.20directory.
[Link].bz2
2. ObtainandUnpackLVS
[Link]
unpackedusingthefollowingcommandwhichshouldpackthekernelintotheipvs1.0.9
directory.
[Link]
3. ApplyLVSPatchestoKernel
[Link]
patchesusethefollowing:
cdlinux2.4.20/
patchpq<../ipvs1.0.9/linuxkernel_ksyms_c.diff
patchpq<../ipvs1.0.9/linuxnet_netsyms_c.diff
[Link]
ARPrequestsandareusedonrealserverswithLVSdirectrouting.
patchpq<../ipvs1.0.9/contrib/patches/[Link]
4. Configurethekernel
Firstensurethatthetreeisclean:
makemrproper
[Link]
makemenuconfig,[Link]
youuse,besuretocompileinnetfiltersupport,[Link]
suggestedthatwherepossibletheseoptionsarebuiltasmodules.
Networkingoptions>
Networkpacketfiltering(replacesipchains)
<m>IP:tunnelling
IP:NetfilterConfiguration>
<m>Connectiontracking(requiredformasq/NAT)
<m>FTPprotocolsupport
<m>IPtablessupport(requiredforfiltering/masq/NAT)
<m>Packetfiltering
<m>REJECTtargetsupport
<m>FullNAT
<m>MASQUERADEtargetsupport
<m>REDIRECTtargetsupport
<m>NAToflocalconnections(READHELP)(NEW)
<m>Packetmangling
<m>MARKtargetsupport
<m>LOGtargetsupport
5. BuildandInstalltheKernel
Asthekernelhasbeenreconfiguredthebuilddependenciesneedtobereconstructed.
makedep
Thekernelandmodulesmaynowbebuildusing:
makebzImagemodules
[Link]
themodulesunder/lib/modules/2.4.20/andthekernelin/boot/vmlinuz2.4.20
makeinstallmodules_install
6. Updatebootloader
Inthecaseofgrubisusedasthebootloaderthenanewentryshouldbeaddedto
/etc/[Link]/bootpartitionis/dev/[Link]
entriesin/etc/[Link].
title2.4.20LVS
root(hd0,0)
kernel/vmlinuz2.4.20roroot=/dev/hda3
Ifthebootloaderislilothenanewentryshouldbeaddedto/etc/[Link]
exampleassumesthatthe/partitionis/dev/[Link]/etc/[Link]
shouldbeusedasaguide.
image=/boot/vmlinuz2.4.20
label=2.4.20lvs
readonly
root=/dev/hda2
Once/etc/[Link].
lilo
AddedLinuxLVS*
AddedLinux
AddedLinuxOLD
7. Rebootthesystem.
Atyourbootloader'spromptbesuretobootthenewlycreatedkernel.
8. BuildandInstallLVS
ThecommandstobuildLVSshouldberunfromtheipvs1.0.9/ipvs/[Link]
andinstallusethefollowingcommands./kernel/source/linux2.4.20shouldbethe
rootdirectorythatthekernelwasjustbuiltin.
makeKERNELSOURCE=/kernel/source/linux2.4.20all
makeKERNELSOURCE=/kernel/source/linux2.4.20modules_install
9. BuildandInstallIpvsadm
[Link]
ipvs1.0.9/ipvs/ipvsadm/[Link].
makeall
makeinstall
LVSNAT
[Link]
[Link]
packetsfromtherealserverhavetheirsourceIPaddresschangedfromthatoftherealservertothe
VIP.
Figure3:LVSNATExample
LinuxDirector
[Link]/etc/[Link]
thenrunningsysctlp.
net.ipv4.ip_forward=1
Bringup172.17.60.201oneth0:[Link]
[Link].
ifconfigeth0:0172.17.60.201netmask255.255.0.0broadcast172.17.255.255
ConfigureLVS
ipvsadmAt172.17.60.201:80
ipvsadmat172.17.60.201:80r192.168.6.4:80m
ipvsadmat172.17.60.201:80r192.168.6.5:80m
RealServers
[Link]
VIPontheservernetworkthedefaultgateway.
Makesurethatthedesireddaemonislisteningonport80tohandleconnectionsfromendusers.
TestingandDebugging
Testingcanbedonebyconnectingto172.17.60.201:80fromoutsidetheservernetwork.
Runningapackettracingtoolonthelinuxdirectorsandrealserversisveryusefulfordebugging
[Link]
[Link],thereare
varietyoftoolsavailableforvariousoperatingsystems.
Thefollowingtraceshowsaconnectionbeingopenedbyanenduser10.2.3.4totheVIP172.17.60.201
[Link]
[Link]
stillhavetheenduser'[Link]
[Link]
[Link]
itistheVIP.
tcpdumpnianyport80
[Link].96549910.2.3.4.34802>[Link].80:
S2555236140:2555236140(0)win5840
<mss1460,sackOK,timestamp166909970,nop,wscale0>
[Link].96764510.2.3.4.34802>[Link].80:
S2555236140:2555236140(0)win5840
<mss1460,sackOK,timestamp166909970,nop,wscale0>
[Link].966976192.[Link]>[Link].34802:
S2733565972:2733565972(0)ack2555236141win5792
<mss1460,sackOK,timestamp12871109116690997,nop,wscale0>(DF)
[Link].968653172.[Link]>[Link].34802:
S2733565972:2733565972(0)ack2555236141win5792
<mss1460,sackOK,timestamp12871109116690997,nop,wscale0>(DF)
[Link].97124110.2.3.4.34802>[Link].80:
.ack1win5840<nop,nop,timestamp16690998128711091>
[Link].97138710.2.3.4.34802>[Link].80:
.ack1win5840<nop,nop,timestamp16690998128711091>
ctrlc
ipvsadmLncanbeusedtoshowthenumberofactiveconnections.
ipvsadmLn
IPVirtualServerversion1.0.9(size=4096)
ProtLocalAddress:PortSchedulerFlags
>RemoteAddress:PortForwardWeightActiveConnInActConn
TCP172.17.60.201:80rr
>[Link]:80Masq173
>[Link]:80Masq184
ipvsadmLstatswillshowthenumberofpacketsandbytessentandreceivedpersecond.
ipvsadmLnstats
IPVirtualServerversion1.0.9(size=4096)
ProtLocalAddress:PortConnsInPktsOutPktsInBytesOutBytes
>RemoteAddress:Port
TCP172.17.60.201:8011417161153193740112940
>[Link]:80578215679464255842
>[Link]:80578955869909857098
ipvsadmLratewillshowthetotalnumberofpacketsandbytessentandreceived.
ipvsadmLnrate
IPVirtualServerversion1.0.9(size=4096)
ProtLocalAddress:PortCPSInPPSOutPPSInBPSOutBPS
>RemoteAddress:Port
TCP172.17.60.201:80562752751873941283
>[Link]:8028137137934420634
>[Link]:8028138137939520649
ipvsadmLzerowillzeroallthestatisticscounters.
LVSDirectRouting
Figure4:LVSDirectRoutingExample
LVSDirectRoutingworksbyforwardingpackets,unchanged,totheMACaddressesofrealservers.
Asthepacketisunmodifiedtherealserversneedtobeconfiguredtoaccepttrafficaddressedtothe
[Link].
Astheincomingpacketsarenotmodifiedbythelinuxdirectorthereturnpacketsdonotneedtopass
[Link],[Link]
servicesforendusersonthesamelocalnetworkasthereturnpacketscanbesentdirectlytotheend
userratherthanforcingthemtogothroughthelinuxdirector.
LinuxDirector
[Link]/etc/[Link]
thenrunningsysctlp.
net.ipv4.ip_forward=1
Bringup172.17.60.201oneth0:[Link]
[Link].
ifconfigeth0:0172.17.60.201netmask255.255.0.0broadcast172.17.255.255
ConfigureLVS
ipvsadmAt172.17.60.201:80
ipvsadmat172.17.60.201:80r172.17.60.199:80g
ipvsadmat172.17.60.201:80r172.17.60.200:80g
Therealserverscansendreplypacketsdirectlytotheenduserswithoutthemneedingtobe
[Link],thelinuxdirectordoesnotneedtobethegatewayforthe
realservers.
However,insomesituations,forinstancebecausethelinuxdirectorreallyisthegatewaytothe
realserver'snetwork,itisdesirabletoroutereturnpacketsfromtherealserversviathelinux
[Link]
[Link],itwilldropthepacketsasbeingbogus.
[Link]
suppliedbyJulianAnastasovwhichaddprocentriesthatallowthispacketdroppingbehaviour
[Link]
[Link]
RealServers
Makesurereturnpacketsarenotroutedthroughlinuxdirectorunlessyouhavepatchedthe
kernelasdescribedabove.
Makesurethatthedesireddaemontohandleconnectionsfromendusersislisteningonport80
[Link]
[Link]
usingthefollowingcommand.
ifconfiglo:0172.17.60.201netmask255.255.255.255
Notethatthenetmaskshouldbe255.255.255.255,regardlessoftheactualnetmaskofthe
[Link]
addressescoveredbythenetmaskareboundtotheinterface.Thetypicalcaseis127.0.0.1with
anetmaskof255.0.0.0whichsetsuptheloopbackinterfacetoacceptallof127.0.0.0/[Link],
asweonlywantlo:0toacceptpacketsfor172.17.60.201thenetmaskmustbe255.255.255.255.
[Link]
[Link]
[Link],addthefollowing
linesto/etc/[Link].
#Enableconfigurationofhiddendevices
[Link]=1
#Maketheloopbackinterfacehidden
[Link]=1
TestingandDebugging
Testingcanbedonebyconnectingto172.17.60.201:80fromanynetwork.
[Link],notethatwhen
[Link]
handledbyLVStheyaresentdirectlytotheenduserbytherealservertheoutgoingpacketandbyte
statisticswillbezero.
LVSTunnel
Figure5:LVSTunnelExampleSameTopologyastheLVSDirectRoutingExample
[Link]
areforwardedtotherealserversusingIPencapsulatedinIP,ratherthanjustsendinganewethernet
[Link]
director.
LinuxDirector
[Link]/etc/[Link]
thenrunningsysctlp.
net.ipv4.ip_forward=1
Bringup172.17.60.201oneth0:[Link],thisisbestdoneaspartofthenetworking
[Link].
ifconfigeth0:0172.17.60.201netmask255.255.0.0broadcast172.17.255.255
ConfigureLVS
ipvsadmAt172.17.60.201:80
ipvsadmat172.17.60.201:80r172.17.60.199:80i
ipvsadmat172.17.60.201:80r172.17.60.200:80i
Ifyouwishtousethelinuxdirectorasagatewayrouterfortherealservers,whichisnot
necessary,pleaseseeinformationonhowtopatchthekerneltodothisinthedirectrouting
section.
RealServers
Makesurereturnpacketsarenotroutedthroughlinuxdirectorunlessyouhavepatchedthe
kernelasdescribedinthedirectroutingsection.
Makesurethatthedesireddaemonisrunningonport80toacceptconnectionsfromtheend
users.
[Link],thisisbestdoneaspartofthenetworking
[Link].
ifconfigtunl0172.17.60.201netmask255.255.255.255
[Link]
/etc/[Link].
net.ipv4.ip_forward=1
#Enableconfigurationofhiddendevices
[Link]=1
#Makethetunl0interfacehidden
[Link]=1
TestingandDebugging
Testingcanbedonebyconnectingto172.17.60.201:[Link]
directrouting.
HighAvailability
[Link]
willact,asfarasendusersareconcerned,[Link],themore
serversthatareinthesystem,[Link],itisimportant
tomakeuseofhighavailabilitytechniquestoensurethatthevirtualserviceismaintainedevenif
individualserversfail.
Heartbeat
HeartbeatbeusedtomonitorapairoflinuxdirectorsandensurethatoneofthemownstheVIPatany
[Link]
[Link]
[Link]
defined.
[Link]
[Link]
[Link]
receivetheseARPpacketsandthussendsubsequentpacketsfortheVIPtothenewlinuxdirector.
[Link]
providedorbuiltfromsourceusingthefollowingcommands.
./ConfigureMebuild
make
makeinstall
SampleConfiguration
Figure6:HeartbeatExample
Configurationisdoneusingthreefilesthatcanbefoundin/etc/ha.d.
[Link]:Thisconfiguresthebaseparametersforheartbeatsuchaswhichinterfacestousefor
communication,[Link]
namesusedmustmatchtheoutputofunamenonthemembernodes.
logfacilitylocal0
keepalive2
deadtime10
warntime10
initdead10
nice_failbackon
mcasteth0225.0.0.769411
nodewalter
nodewendy
haresources:Setstheresourcesthataremanagedbyheartbeat.
walter172.17.60.201/24/eth0
authkeys:[Link]
mode600.
auth2
2sha1ultramonkey
[Link]
[Link].
AstheVIP,172.17.60.201ismanagedbyheartbeatitshouldnotbebroughtuponthelinuxdirectors
byothermeans.
[Link]
onwhicheverlinuxdirectoristhemaster.
TestingandDebugging
[Link]
theprogressofthetakeoverbyexaminingthelogssenttosyslog,typicallyfoundin
/var/log/messages.Asnice_failbackison,thecurrentlyactivelinuxdirectorwillnowact
asthemasterandwhenthefailedlinuxdirectorcomesbackonlineitwillactasastandby.
Ipfail
Thedesignofheartbeatissuchthatifanycommunicationchannelisavailabletoahost,thenitwillbe
[Link]
linksontheinternalandexternalnetwork,itmaybedesirableforfailovertooccurifeitherlinkfails
[Link].
Figure7:HeartbeatwithoutIPfail
Theipfailpluginforheartbeatmakesthispossiblebymonitoringoneormoreexternalhostsknownas
[Link]
[Link],ifahostcannotaccessapingnode,[Link],ifan
interfacefailsontheactivelinuxdirector,thenoneofthepingnodesshouldbecomeunavailableand
failoverwilloccur.
Figure8:HeartbeatwithIPfail
[Link]
[Link]
heartbeatdocumentation.
SampleConfiguration
Touseipfailwiththeheartbeatsetupdiscussedpreviously,thefollowingshouldbeaddedtoheartbeat's
[Link].
ping172.17.0.254
respawnhacluster/usr/lib/heartbeat/ipfail
[Link],
astherearenosuitablequorumdevicesontheinternalnetworkinthisdemonstration.
Therespawndirectivetellsheartbeattorun/usr/lib/heartbeat/[Link]
exitswithastatusotherthan100,andtokillitwhenheartbeatexits.
Afteraddingtheseoptionsheartbeatneedstoberestarted.
/etc/init.d/heartbeatrestart
TestingandDebugging
TestinganddebuggingcanbedoneasperHeartbeatitself.
Ldirectord
[Link]
[Link]
usedintandemtocreateahighavailabilityLVScluster.
Ldirectordchecksservicesontherealserversbyconnectingtothem,makingaknownrequestand
[Link],HTTPS,FTP,IMAP,POP,
SMTP,[Link],whichisusually
[Link]
patchesbyusers.
[Link],theconnectcheck,
[Link]
ifthereisnotacheckfortheprotocolsuppliedbyldirectord.
SampleConfiguration
[Link]
options,suchaswheretologerrorsto,[Link]
[Link]
checked.
#GlobalDirectives
checktimeout=10
checkinterval=2
autoreload=no
logfile="local0"
quiescent=yes
#VirtualServerforHTTP
virtual=[Link]:80
fallback=[Link]:80
real=[Link]:80masq
real=[Link]:80masq
service=http
request="[Link]"
receive="TestPage"
scheduler=rr
protocol=tcp
checktype=negotiate
Ldirectordmaybestartedbyrunningtheldirectordcommand,theldirectordinitscriptorbyaddingit
[Link]
themasterandstandbylinuxdirectorsatthesametime.
OnceldirectordhasstartedtheLVSkerneltablewillbepopulated.
ipvsadmLn
IPVirtualServerversion1.0.7(size=4096)
ProtLocalAddress:PortSchedulerFlags
>RemoteAddress:PortForwardWeightActiveConnInActConn
TCP172.17.60.201:80rr
>[Link]:80Masq100
>[Link]:80Masq100
>[Link]:80Local000
[Link],whena
[Link]
theeffectthatexitingconnectionstotherealservermaycontinue,butnonewconnectionswillbe
[Link]
changedtoremovetherealserverfromthevirtualservicebysettingtheglobalconfigurationoption
quiescent=no.
TestingandDebugging
[Link]
[Link]
thatservesendusers'[Link].
IneachcaseldirectordshouldupdatetheLVSkerneltableaccordinglywhichcanbeexaminedusing
[Link],theconfigurationabovesetstheselogstobe
writtentosyslog,typicallytheywillshowupin/var/log/syslog.
Forextradebugginginformationldirectordcanberunindebuggingmode,inwhichcaseitwilllog
[Link]
[Link]
[Link],whichshouldbein/etc/ha.d/.Debuggingcanbeterminatedusingctrlc.
[Link]
Keepalived
KeepalivedprovidesanimplementationoftheVRRPv2protocolwhichisspecifiedinRFC2338[1].It
isanalternativemethodofmanagingaVIPonanetworksothatitisownedbyonlyonehostatany
[Link].
[Link]
[Link].
ThereisanotherimplementationofVRRPv2forLinuxfrom[Link]
timeofwritingthekeepalivedimplementationappearstobemuchmorecomplete.
KeepalivedalsofeaturesservicelevelmonitoringofrealserversandmanipulatestheLVSkerneltable
[Link]:
TCP_CHECK:Checktomakesureaconnectioncanbeopenedtotheserviceontherealserver.
HTTP_GET:FetchaknownURLfromtherealserverandcomparethechecksumofthepageto
theexpectedchecksum.
SSL_GET:SSLversionofHTTP_GET
MISC_CHECK:Checkusinganexternalscript.
ItalsoprovidesanAPItoimplementnewchecks.
TheVRRPDandLVS/HealthCheckfeaturescanbeusedindividuallyorincombination.
[Link]
./[Link]
availableinthemainDebiantree.
Toconfigurekeepalived/etc/keepalived/[Link]
dividedupintosections.
global_defs:Globaldefinitionssuchaswheretosendemailalerts,ifatall,andthenameofthe
cluster.
vrrp_instance:EncapsulatesasetofvirtualIPaddressesassociatedwithaparticularinterface.
Eachinstanceshouldhaveauniqueid.
vrrp_sync_group:Groupstogethervrrp_instancessuchthatalltheinstanceswillbeownedbya
[Link]
interfacesalwaysenduponthesamemachine.
virtual_server:AvirtualservicehandledbyLVS.
real_server:Arealservertocheck.Containedwithinavirtual_server.
NotethattheVRRPimplementationworksonamaster/slavesystem.Soeachvrrp_instanceshouldbe
markedasa"MASTER"ononenodeanda"SLAVE"[Link],itdidnot
appearpossibletoconfigurekeepalivedtohavebehaviouranalogoustoheartbeat'snice_failback.That
isanodewillholdaresourceuntilitfails,inwhichcaseanothernodewilltakeitoveruntilitinturn
[Link]
spuriousfailovers.
SampleConfiguration
Forthesakeofbrevity,theexampleconfigurationfilesareinAppendixA.
Tocreatethechecksumsfortheconfigurationfile,[Link]
[Link],showingyouhowthe
[Link]
[Link],togeneratethehashfortheURL[Link]
followingcommandisused.
genhashs192.168.6.5p80u/
[lotsofoutputomitted]
90bfbce6bc089a41f1fddca9aeaba452
[Link]
typicallycanbefoundin/var/log/[Link]
[Link].
ipvsadmLn
IPVirtualServerversion1.0.7(size=4096)
ProtLocalAddress:PortSchedulerFlags
>RemoteAddress:PortForwardWeightActiveConnInActConn
TCP172.17.60.201:80lc
>[Link]:80Masq100
>[Link]:80Masq100
[Link]
ipcommand.
ipaddrsh
[lo:omitted]
2:eth0:<BROADCAST,MULTICAST,UP>mtu1500qdiscpfifo\_fastqlen100
link/ether[Link]r[Link]
inet172.17.60.207/16brd172.17.255.255scopeglobaleth0
inet172.17.60.201/32scopeglobaleth0
3:eth1:<BROADCAST,MULTICAST,UP>mtu1500qdiscpfifo\_fastqlen100
link/ether[Link]r[Link]
inet192.168.6.3/24brd192.168.6.255scope
globaleth1inet192.168.6.1/32scopeglobaleth1
Ifafailoveroccursthesameaddressesshouldappearontheslave,andthenbackonthemasteronceit
isrestored.
NewDevelopments
ActiveFeedback
Ldirectord,[Link]
[Link],thesetoolsdonotmonitorthe
realtimeservingcapacityoftherealserversanddonotallocateconnectionsproportionaltothis.
Thiscanbeparticularlyproblematicinsituationswheresomeconnectionsrequiresignificantlymore
[Link],ifsomeconnectionsareaplainHTMLfilefetched
fromdisk,[Link],sucha
scalinganimageorretrievingpartofthepagefromadatabase.
Feedbackdimplementsaframeworkthatallowsrealtimeinformationfromfromtherealserversto
[Link],feedbackd
[Link]
[Link]
Feedbackdhastwokeycomponents,feedbackdagentwhichrunsontherealserversandmonitorstheir
[Link]
suppliedsimplymonitorsCPUloadusing/proc/[Link],feedbackdmasterrunson
[Link]'swhichconnectandmanipulates
theweightsoftherealserversintheLVSkerneltableaccordingly.
[Link]
enhancementsweremadetoallowfeedbackdmastertoberestartedwithoutgiving"addressinuse"
[Link]
feedbackdtoworkwithActive/[Link]
totheauthorandwillhopefullyshowupinthenextversion.
TheonlyconfigurationrequiredforfeedbackdmasteristoestablishtheLVSvirtualservicesthatwill
[Link]
feedbackdmasterbymatchingtheprotocolandportinformationsentbythefeedbackagentsrunning
[Link]
[Link]:
ipvsadmAt172.17.60.201:80
[Link]
thecurrentdistribution.
FeedbackdAgentisconfiguredbymodifying/etc/[Link]
LinuxDirectorrunningfeedbackdmasterisspecifiedasaretheservicesthattherealservershould
join.
director=[Link]
service=http
protocol=TCP
port=80
module=[Link]
forwarding=NAT
Again,torunfeedbackdagentsimplyrunthecommandonthecommandline.
Testing
Asaprimitivetest,oneoftherealserverscanbeloadedmanuallyandtheeffectsofthisontheLVS
[Link]
feedbackdcanbefoundinJeremyKerr'spaperonthefeedbackd[3].
ConnectionSynchronisationExistingSolution
Configuringtwolinuxdirectorsinanactive/standbyconfigurationisausefulwaytoprovidehigh
[Link],thestandbycanautomaticallytakeovertheIPaddressof
[Link],whensuchafailoveroccurs
connectionsthatarecurrentlyinprogressareterminated.
[Link]
synchronisingconnectioninformationbetweentheactiveandstandbylinuxdirectorsthisproblemcan
[Link],whenastandbylinuxdirectorbecomestheactivelinuxdirector,itwillhave
informationaboutthecurrentlyactiveconnectionsandwillbeabletocontinuetoforwardtheirpackets.
Thecriticalpieceofinformationrequirediswhichrealservertoforwardpacketsforagivenconnection
[Link].
[Link]
master/slavesystemwherebythelinuxdirectorconfiguredasthemastersendssynchronisation
[Link]
updatetheirLVSconnectiontableaccordingly.
Aconnectionissynchronisedoncethenumberofpacketspassesathreshold(3)andthenevery
frequency(50)[Link]
periodicallyflushed.Thesynchronisationinformationforupto50connectionscanbepackedintoa
[Link].
Sendingandreceivingsynchronisationinformationbythemasterandslavesrespectivelyisdonebya
[Link]
commands.
ipvsadmstartdaemonmaster#RunontheMasterLinuxDirector
ipvsadmstartdaemonbackup#RunontheSlaveLinuxDirector
TestingandDebugging
ThesynchronisationofconnectionscanbemonitoredusingipvsadmLcn,whichlistsLVS
[Link]
moments,whensynchronisationhasoccurs,theyshouldalsoappearontheslaves.
ipvsadmLcn#OntheMasterLinuxDirector
IPVSconnectionentries
proexpirestatesourcevirtualdestination
TCP01:00TIME_WAIT172.16.4.222:34939172.17.60.201:80192.168.6.5:80
TCP01:01TIME_WAIT172.16.4.222:34940172.17.60.201:80192.168.6.4:80
TCP15:00ESTABLISHED172.16.4.222:34941172.17.60.201:80192.168.6.5:80
ipvsadmLcn#OntheSlaveLinuxDirector
IPVSconnectionentries
proexpirestatesourcevirtualdestination
TCP01.20ESTABLISHED172.16.4.222:34939172.17.60.201:80192.168.6.5:80
TCP01.23ESTABLISHED172.16.4.222:34940172.17.60.201:80192.168.6.4:80
TCP08.99ESTABLISHED172.16.4.222:34941172.17.60.201:80192.168.6.5:80
TheoutputshowstwoconnectionsonthemasterlinuxdirectorthatareintheTIME_WAITstate,that
[Link],
thatistheenduserandtherealserverstillhaveanopenconnectiontoeachother.
[Link],allthe
[Link]
[Link]
unnecessarysynchronisationoverheadasthestateoftheconnectionsontheslaveisnotcritical.
Youcanfurthermonitorwhichlinuxdirectorishandlingconnectionsbyaddingthefollowingiptables
ruletoeachlinuxdirector.
iptablesAINPUTd172.17.60.201jACCEPT
ThiscanbemonitoredusingiptablesLINPUTvn.
iptablesLINPUTvn#OntheActiveLinuxDirector
ChainINPUT(policyACCEPT1553packets,211Kbytes)
pktsbytestargetprotoptinoutsourcedestination
51551ACCEPTall**[Link]/0172.17.60.201
iptablesLINPUTvn#OntheStandByLinuxDirector
ChainINPUT(policyACCEPT2233packets,328Kbytes)
pktsbytestargetprotoptinoutsourcedestination
00ACCEPTall**[Link]/0172.17.60.201
Totestthatconnectionsynchronisationisworkingcorrectlyopenaconnectiontothevirtualservice
[Link],thiscanbedonebyavarietyof
[Link].
OncetheVIPhasfailedovertotheslavelinuxdirectortheconnectionshouldcontinue.
Streamingisausefulwaytotestthis,asstreamingconnectionsbytheirnatureareopenforalongtime.
Italsoprovidesintuitivefeedbackasthevideoand/[Link]
byincreasingthebuffersizeofthestreamingclientsoftwarethepausecanbeeliminated.
Problems
Themainproblemwiththisimplementationisthemaster/[Link]
Directorfailsandthencomesbackonline,thenconnectionstotheslavewillnotbesynchronisedtothe
[Link],[Link]
[Link]
peertopeerrelationshipbetweenthesynchronisationdaemonswouldbeacleanerapproach.
ConnectionSynchronisationNewSolution
[Link]
peerbasiswhereanynodemaysendorreceivesynchronisationinformation.
[Link]
[Link]
daemonreceivesinformationovermulticastitreversesthisprocessbysendingtheinformationinto
LVSinthekernelviaanetlinksocket.
Theideaofmovingthecodetotheuserspacewastoallowmoresophisticatedsynchronisation
[Link]
[Link],thereisno
particularadvantagetokeepingitinthekernel.
Thecodecomprises:
ModifiedLVSkernelmodulestoallowthesynchronisationdaemontogetinformationabout
[Link]
[Link]/slaveinkernel
daemons.
KernelPatchtoregisterthenewnetlinksocket
libip_vs_user_sync:Conveniencelibraryforcommunicatingusingthenetlinksocket.
ip_vs_user_sync_simple:Simplesynchronisationdaemonimplementedusingthisframework.
[Link]
Running
Installingandcompilingisabittrickyasthisisnewcodeandthereareanumberofsupportlibraries
[Link],makesurethattheLVSkernelsynchronisationdaemonsarenotrunningusing
ipvsadmstopdaemonandstarttheuserspacedaemonfromthecommandlineorusingthe
ip_vs_user_sync_simpleinitscript.
TestingandDebugging
Debuggingmessagesforip_vs_user_sync_simplearesenttosyslogbydefaultandaretypicallywritten
to/var/log/[Link],itisrecommendedtorunit
[Link]
modifyingip_vs_user_sync_simple.conforonthecommandline.
ip_vs_user_sync_simpledebuglog_facility
[Link],asthere
isnomaster/backuprelationshipconnectionscanbemaintainedthroughmultiplefailovers.
ActiveActive
Active/[Link],ifoneassumes
thatalinuxdirectordoesnotfailorgettakendownformaintenanceveryoften,thenmostofthetime
[Link]
throughputofthenetworkislimitedtothatofonelinuxdirector.
HavingActiveActivelinuxdirectorsaddressesthisproblembyallowingmorethanonelinuxdirector
toloadbalanceconnections,forthesamevirtualservices,atthesametime.
Figure9:ActiveActiveBlockDiagram
Ihavemadeanimplementationofthiswhichworksasfollows:
EachlinuxdirectorisgiventhesamehardwareandIPaddress
Thismeansthatallthelinuxdirectorswillreceivepacketsforconnectionsforthevirtual
service.
ItalsomeansthatthereisnolongeranyneedforipaddressfailoverorVRRPv2.
Aheartbeathelper,Sarurunswithheartbeatoneachlinuxdirector.
Heartbeatdoesn'tallocateanyresources,justprovidesamechanismtodeterminewhich
linuxdirectorsareavailable.
Saruusesthisinformationtodividethespaceofallpossibleincomingconnections
betweenthelinuxdirectors.
Thisisdonebyelectingamasterwhichwillmaketheallocations.
Theallocationsaredonebydividingupblocksofsourceordestinationportsor
addresses.
AnetfilterkernelmoduleisusedtoonlyacceptpacketsasdictatedbySaru.
Running
[Link]
syslogafterstartupandthesetypicallyappearin/var/log/[Link]
verboselybysettingthedebugoption,[Link]
debuggingpurposesthisoptionisrecommendedinconjunctionwithhavingsarulogtotheterminal.
sarudebuglog_facility
[Link]
[Link]
time,[Link].
TheMACandIPaddressofaninterfacecanbesetusingtheipcommand.
iplinkseteth0down
iplinkseteth0address[Link]
iplinkseteth0up
iprouteadddefaultvia172.16.0.254
ipaddradddeveth0192.168.20.40/24broadcast255.255.255.0
RulestofilteroutalltraffictotheVIPthatarenotacceptedbySaruareinsertedusingtheiptables
[Link],Ifthisisnotthecasethen
netfilter'sconnectiontrackingshouldbeusedtoensurethatagivenconnectionwillalwaysbehandled
bythesamelinuxdirector.
iptablesF
iptablesAINPUTd172.17.60.201ptcpmsaruid1jACCEPT
iptablesAINPUTd172.17.60.201pudpmsaruid1jACCEPT
iptablesAINPUTd172.17.60.201picmpmicmpicmptypeechorequest\
msaruid1sensesrcaddrjACCEPT
iptablesAINPUTd172.17.60.201picmpmicmpicmptype!echorequest\
jACCEPT
iptablesAINPUTd172.17.60.201jDROP
IfLVSNATisbeingusedthenthefollowingrulesarealsorequiredtopreventalltheLinuxDirectors
sendingrepliesonbehalfofthetherealservers.
iptablestnatAPOSTROUTINGs192.168.6.0/24d192.168.6.0/24jACCEPT
iptablestnatAPOSTROUTINGs192.168.6.0/24mstatestateINVALID\
jDROP
iptablestnatAPOSTROUTINGs192.168.6.0/24mstatestateESTABLISHED\
jACCEPT
iptablestnatAPOSTROUTINGs192.168.6.0/24mstatestateRELATED\
jACCEPT
iptablestnatAPOSTROUTINGs192.168.6.0/24ptcpmstatestateNEW\
tcpflagsSYN,ACK,FIN,RSTSYNmsaruid1jMASQUERADE
iptablestnatAPOSTROUTINGs192.168.6.0/24pudpmstatestateNEW\
msaruid1jMASQUERADE
iptablestnatAPOSTROUTINGs192.168.6.0/24picmp\
micmpicmptypeechorequestmstatestateNEW\
msaruid1sensedstaddrjMASQUERADE
iptablestnatAPOSTROUTINGs192.168.6.0/24picmp\
micmpicmptype!echorequestmstatestateNEW\
jMASQUERADE
iptablestnatAPOSTROUTINGs192.168.6.0/24jDROP
TestingandDebugging
Whichlinuxdirectorisacceptingpacketsforanindividualconnectioncanbemonitoredusing
[Link]
LinuxDirectorA.
ipvsadmLINPUTnv#OnLinuxDirectorA
ChainINPUT(policyACCEPT92541packets,14Mbytes)
pktsbytestargetprotoptinoutsourcedestination
51551ACCEPTtcp**[Link]/0172.17.60.201saruid1sensesrcport
00ACCEPTudp**[Link]/0172.17.60.201saruid1sensesrcport
00ACCEPTicmp**[Link]/0172.17.60.201icmptype8saruid1
sensesrcaddr
00ACCEPTicmp**[Link]/0172.17.60.201icmp!type8
00DROPall**[Link]/0172.17.60.201
ipvsadmLINPUTnv#OnLinuxDirectorB
ChainINPUT(policyACCEPT92700packets,15Mbytes)
pktsbytestargetprotoptinoutsourcedestination
00ACCEPTtcp**[Link]/0172.17.60.201saruid1sensesrcport
00ACCEPTudp**[Link]/0172.17.60.201saruid1sensesrcport
00ACCEPTicmp**[Link]/0172.17.60.201icmptype8saruid1
sensesrcaddr
00ACCEPTicmp**[Link]/0172.17.60.201icmp!type8
51551DROPall**[Link]/0172.17.60.201
Conclusion
[Link],
[Link]
techniquesthatcanbeusedtofurtherenhanceLVSclustersincludingusingactivefeedbackto
[Link]
synchronisationandactiveactivetechniquestomultiplelinuxdirectorstobetterworktogether.
LVSitselfisaverypowerfultoolandhasmanyfeaturesthatwerenotwithinthescopeofthis
[Link];firewallmarkstogroupvirtualservices,specialisedschedulingalgorithms
[Link]
ofLVStomeetthenewneedsofusersandtoreflecttheeverincreasingcomplexityoftheInternet.
SampleConfigurationfilesforkeepalived
Sampleconfigurationfileforkeepalivedmaster.
global_defs{
notification_email{
admin@[Link]
}
notification_email_fromadmin@[Link]
smtp_server210.128.90.2
smtp_connect_timeout30
lvs_idLVS_DEVEL
}
vrrp_sync_groupVG1{
group{
VI_1
VI_2
}
}
vrrp_instanceVI_1{
stateMASTER
interfaceeth0
virtual_router_id51
priority100
advert_int1
authentication{
auth_typePASS
auth_pass1111
}
virtual_ipaddress{
[Link]
}
}
vrrp_instanceVI_2{
stateMASTER
interfaceeth1
virtual_router_id52
priority100
advert_int1
authentication{
auth_typePASS
auth_pass1111
}
virtual_ipaddress{
[Link]
}
}
virtual_server172.17.60.20180{
delay_loop6
lb_algolc
lb_kindNAT
nat_mask255.255.255.0
!persistence_timeout50
protocolTCP
real_server192.168.6.480{
weight1
HTTP_GET{
url{
path/
digest55fd843c4e99e96c1ef28e7dbb10c51b
}
connect_timeout3
nb_get_retry3
delay_before_retry3
}
}
real_server192.168.6.580{
weight1
HTTP_GET{
url{
path/
digest90bfbce6bc089a41f1fddca9aeaba452
}
connect_timeout3
nb_get_retry3
delay_before_retry3
}
}
sorry_server127.0.0.180
}
SampleConfigurationfileforkeepalived(Slave)
global_defs{
notification_email{
admin@[Link]
}
notification_email_fromadmin@[Link]
smtp_server210.128.90.2
smtp_connect_timeout30
lvs_idLVS_DEVEL
}
vrrp_sync_groupVG1{
group{
VI_1
VI_2
}
}
vrrp_instanceVI_1{
stateSLAVE
interfaceeth0
virtual_router_id51
priority100
advert_int1
authentication{
auth_typePASS
auth_pass1111
}
virtual_ipaddress{
[Link]
}
}
vrrp_instanceVI_2{
stateSLAVE
interfaceeth1
virtual_router_id52
priority100
advert_int1
authentication{
auth_typePASS
auth_pass1111
}
virtual_ipaddress{
[Link]
}
}
virtual_server172.17.60.20180{
delay_loop6
lb_algolc
lb_kindNAT
nat_mask255.255.255.0
!persistence_timeout50
protocolTCP
real_server192.168.6.480{
weight1
HTTP_GET{
url{
path/
digest55fd843c4e99e96c1ef28e7dbb10c51b
}
connect_timeout3
nb_get_retry3
delay_before_retry3
}
}
real_server192.168.6.580{
weight1
HTTP_GET{
url{
path/
digest90bfbce6bc089a41f1fddca9aeaba452
}
connect_timeout3
nb_get_retry3
delay_before_retry3
}
}
sorry_server127.0.0.180
}
Bibliography
1
[Link].
Rfc2338:Virtualrouterredundancyprotocol.
[Link]
2
[Link].
Rfc1918:Addressallocationforprivateinternets.
[Link]
JeremyKerr.
Usingdynamicfeebacktooptimiseloadbalancingdecisions.
[Link]
4
NetfilterCoreTeam.
Netfilterfirewalling,natandpacketmanglingforlinux2.4.
[Link]
Horms20040623