Title | Netflix Linux Perf Analysis 60s |
---|---|
Course | Algorithms And Computational Complexity |
Institution | University of Washington |
Pages | 7 |
File Size | 358.3 KB |
File Type | |
Total Downloads | 32 |
Total Views | 132 |
Application Performance Analysis...
67
More NextBlog»
[email protected] NewPost SignOut
Monday,November30,2015
Links
LinuxPerformanceAnalysisin60,000Milliseconds
NetflixUS&CanadaBlog NetflixAmericaLatinaBlog
YoulogintoaLinuxserverwithaperformanceissue:whatdoyoucheckinthefirstminute? NetflixBrasilBlog
AtNetflixwehaveamassiveEC2Linuxcloud,andnumerousperformanceanalysistoolstomonitorand investigateitsperformance.TheseincludeAtlasforcloudwidemonitoring,andVectorforondemand instanceanalysis.Whilethosetoolshelpussolvemostissues,wesometimesneedtologintoaninstance andrunsomestandardLinuxperformancetools.
NetflixBeneluxBlog
Inthispost,theNetflixPerformanceEngineeringteamwillshowyouthefirst60secondsofanoptimized performanceinvestigationatthecommandline,usingstandardLinuxtoolsyoushouldhaveavailable.
NetflixNordicsBlog
NetflixFranceBlog
NetflixUK&IrelandBlog NetflixISPSpeedIndex
First60Seconds:Summary In60secondsyoucangetahighlevelideaofsystemresourceusageandrunningprocessesbyrunningthe followingtencommands.Lookforerrorsandsaturationmetrics,astheyarebotheasytointerpret,andthen resourceutilization.Saturationiswherearesourcehasmoreloadthanitcanhandle,andcanbeexposed eitherasthelengthofarequestqueue,ortimespentwaiting.
OpenpositionsatNetflix NetflixWebsite FacebookNetflixPage NetflixUIEngineering
uptime dmesg | tail vmstat 1 mpstat -P ALL 1 pidstat 1 iostat -xz 1 free -m sar -n DEV 1 sar -n TCP,ETCP 1 top
RSSFeed
AbouttheNetflixTechBlog
Someofthesecommandsrequirethesysstatpackageinstalled.Themetricsthesecommandsexposewill helpyoucompletesomeoftheUSEMethod:amethodologyforlocatingperformancebottlenecks.This involvescheckingutilization,saturation,anderrormetricsforallresources(CPUs,memory,disks,e.t.c.). Alsopayattentiontowhenyouhavecheckedandexoneratedaresource,asbyprocessofeliminationthis narrowsthetargetstostudy,anddirectsanyfollowoninvestigation. Thefollowingsectionssummarizethesecommands,withexamplesfromaproductionsystem.Formore informationaboutthesetools,seetheirmanpages.
ThisisaNetflixblogfocusedon technologyandtechnologyissues. We'llshareourperspectives, decisionsandchallengesregarding thesoftwarewebuildanduseto createtheNetflixservice.
BlogArchive ▼2015(43) ▼November(5) LinuxPerformance Analysisin60,000 Milliseconds
1.uptime $ uptime 23:51:26 up 21:31,
NetflixDACHBlog
1 user,
load average: 30.02, 26.43, 19.02
Thisisaquickwaytoviewtheloadaverages,whichindicatethenumberoftasks(processes)wantingto run.OnLinuxsystems,thesenumbersincludeprocesseswantingtorunonCPU,aswellasprocesses blockedinuninterruptibleI/O(usuallydiskI/O).Thisgivesahighlevelideaofresourceload(ordemand),but can’tbeproperlyunderstoodwithoutothertools.Worthaquicklookonly. Thethreenumbersareexponentiallydampedmovingsumaverageswitha1minute,5minute,and15 minuteconstant.Thethreenumbersgiveussomeideaofhowloadischangingovertime.Forexample,if you’vebeenaskedtocheckaproblemserver,andthe1minutevalueismuchlowerthanthe15minute value,thenyoumighthaveloggedintoolateandmissedtheissue.
CreatingYourOwn EC2SpotMarket Part2 SleepyPuppy ExtensionforBurp Suite GlobalContinuous Deliverywith Spinnaker NetflixHackDay Autumn2015 ►October(5) ►September(6)
Intheexampleabove,theloadaveragesshowarecentincrease,hitting30forthe1minutevalue,compared to19forthe15minutevalue.Thatthenumbersarethislargemeansalotofsomething:probablyCPU demand;vmstatormpstatwillconfirm,whicharecommands3and4inthissequence.
►August(6) ►July(3) ►June(2)
2.dmesg|tail
►May(2) ►April(3)
$ dmesg | tail [1880957.563150] [...] [1880957.563400] [1880957.563408] ss:0kB [2320864.954447] nters.
►March(3) perl invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
►February(5)
Out of memory: Kill process 18694 (perl) score 246 or sacrifice child Killed process 18694 (perl) total-vm:1972392kB, anon-rss:1953348kB, file-r
►January(3)
TCP: Possible SYN flooding on port 7001. Dropping request.
Check SNMP cou
►2014(37) ►2013(52) ►2012(37) ►2011(17)
Thisviewsthelast10systemmessages,ifthereareany.Lookforerrorsthatcancauseperformance issues.Theexampleaboveincludestheoomkiller,andTCPdroppingarequest.
►2010(8)
Don’tmissthisstep!dmesgisalwaysworthchecking.
Labels
3.vmstat1
acceleratedcompositing(2) $ vmstat 1 procs ---------memory---------- ---swap-- -----io---- -system-- ------cpu----r b swpd free buff cache si so bi bo in cs us sy id wa st 34 0 0 200889792 73708 591828 0 0 0 5 6 10 96 1 3 0 0 32 0 0 200889920 73708 591860 0 0 0 592 13284 4282 98 1 1 0 0 32 0 0 200890112 73708 591860 0 0 0 0 9501 2154 99 1 0 0 0 32 0 0 200889568 73712 591856 0 0 0 48 11900 2459 99 0 0 0 0 32 0 0 200890208 73712 591860 0 0 0 0 15898 4840 98 1 1 0 0 ^C
Shortforvirtualmemorystat,vmstat(8)isacommonlyavailabletool(firstcreatedforBSDdecadesago).It printsasummaryofkeyserverstatisticsoneachline. vmstatwasrunwithanargumentof1,toprintonesecondsummaries.Thefirstlineofoutput(inthisversion ofvmstat)hassomecolumnsthatshowtheaveragesinceboot,insteadoftheprevioussecond.Fornow, skipthefirstline,unlessyouwanttolearnandrememberwhichcolumniswhich. Columnstocheck:
adwords(1) Aegisthus(1) algorithms(2) aminator(1) analytics(4) Android(2) angular(1) api(16) appender(1) Archaius(2) architecturaldesign(1)
r:NumberofprocessesrunningonCPUandwaitingforaturn.Thisprovidesabettersignalthan loadaveragesfordeterminingCPUsaturation,asitdoesnotincludeI/O.Tointerpret:an“r”value greaterthantheCPUcountissaturation.
architecture(1)
free:Freememoryinkilobytes.Iftherearetoomanydigitstocount,youhaveenoughfree memory.The“freem”command,includedascommand7,betterexplainsthestateoffree memory.
Astyanax(3)
Asgard(1)
authentication(1) automation(2)
si,so:Swapinsandswapouts.Ifthesearenonzero,you’reoutofmemory.
autoscaling(3)
us,sy,id,wa,st:ThesearebreakdownsofCPUtime,onaverageacrossallCPUs.Theyare usertime,systemtime(kernel),idle,waitI/O,andstolentime(byotherguests,orwithXen,the guest'sownisolateddriverdomain).
availability(4) AWS(29)
TheCPUtimebreakdownswillconfirmiftheCPUsarebusy,byaddinguser+systemtime.Aconstant degreeofwaitI/Opointstoadiskbottleneck;thisiswheretheCPUsareidle,becausetasksareblocked waitingforpendingdiskI/O.YoucantreatwaitI/OasanotherformofCPUidle,onethatgivesaclueasto whytheyareidle.
benchmark(2) bigdata(10) billing(1)
SystemtimeisnecessaryforI/Oprocessing.Ahighsystemtimeaverage,over20%,canbeinterestingto explorefurther:perhapsthekernelisprocessingtheI/Oinefficiently.
Blitz4j(1)
Intheaboveexample,CPUtimeisalmostentirelyinuserlevel,pointingtoapplicationlevelusageinstead. TheCPUsarealsowellover90%utilizedonaverage.Thisisn’tnecessarilyaproblem;checkforthedegree ofsaturationusingthe“r”column.
Cable(1)
build(3)
caching(2) Cassandra(13)
4.mpstatPALL1 chaosengineering(1) $ mpstat -P ALL 1 Linux 3.13.0-49-generic (titanclusters-xxxxx)
07/14/2015
chaosmonkey(5)
_x86_64_ (32 CPU)
ci(1) 07:38:49 07:38:50 07:38:50 07:38:50 07:38:50 07:38:50 [...]
PM PM PM PM PM PM
CPU all 0 1 2 3
%usr 98.47 96.04 97.00 98.00 96.97
%nice 0.00 0.00 0.00 0.00 0.00
%sys %iowait 0.75 0.00 2.97 0.00 1.00 0.00 1.00 0.00 0.00 0.00
%irq 0.00 0.00 0.00 0.00 0.00
%soft 0.00 0.00 0.00 0.00 0.00
%steal 0.00 0.00 0.00 0.00 0.00
%guest 0.00 0.00 0.00 0.00 0.00
%gnice 0.00 0.00 0.00 0.00 0.00
%idle 0.78 0.99 2.00 1.00 3.03
classloaders(1) Clojure(1) cloud(23) cloudarchitecture(15)
ThiscommandprintsCPUtimebreakdownsperCPU,whichcanbeusedtocheckforanimbalance.A singlehotCPUcanbeevidenceofasinglethreadedapplication.
cloudprize(3) CO2(1)
5.pidstat1
collection(1) computervision(1)
$ pidstat 1 Linux 3.13.0-49-generic (titanclusters-xxxxx)
07/14/2015
_x86_64_
(32 CPU)
07:41:02 07:41:03 07:41:03 07:41:03 07:41:03 07:41:03 07:41:03
PM UID PM 0 PM 0 PM 0 PM 0 PM 0 PM 60004
PID %usr %system 9 0.00 0.94 4214 5.66 5.66 4354 0.94 0.94 6521 1596.23 1.89 6564 1571.70 7.55 60154 0.94 4.72
%guest %CPU 0.00 0.94 0.00 11.32 0.00 1.89 0.00 1598.11 0.00 1579.25 0.00 5.66
CPU 1 15 8 27 28 9
Command rcuos/0 mesos-slave java java java pidstat
07:41:03 07:41:04 07:41:04 07:41:04 07:41:04 07:41:04 ^C
PM UID PM 0 PM 0 PM 0 PM 108 PM 60004
PID %usr %system 4214 6.00 2.00 6521 1590.00 1.00 6564 1573.00 10.00 6718 1.00 0.00 60154 1.00 4.00
%guest %CPU 0.00 8.00 0.00 1591.00 0.00 1583.00 0.00 1.00 0.00 5.00
CPU 15 27 28 0 9
Command mesos-slave java java snmp-pass pidstat
concurrency(1) configuration(2) configurationmanagement(2) conformitymonkey(1) contentplatformengineering(2) continuousdelivery(4) coordination(2) costmanagement(1) crypto(1) Cryptography(2)
Pidstatisalittleliketop’sperprocesssummary,butprintsarollingsummaryinsteadofclearingthescreen. Thiscanbeusefulforwatchingpatternsovertime,andalsorecordingwhatyousaw(copynpaste)intoa recordofyourinvestigation.
CSS(2) CUDA(1) dart(1)
TheaboveexampleidentifiestwojavaprocessesasresponsibleforconsumingCPU.The%CPUcolumnis thetotalacrossallCPUs;1591%showsthatthatjavaprocessesisconsumingalmost16CPUs.
database(4) datamigration(1)
6.iostatxz1
datapipeline(4)
$ iostat -xz 1 Linux 3.13.0-49-generic (titanclusters-xxxxx) avg-cpu:
%user 73.96
Device: xvda xvdb xvdc dm-0 dm-1 dm-2 [...] ^C
rrqm/s 0.00 0.01 0.01 0.00 0.00 0.00
%nice %system %iowait 0.00 3.73 0.03
datascience(6) 07/14/2015
%steal 0.06
_x86_64_ (32 CPU)
DataStax(2)
%idle 22.21
datavisualization(1) deadlock(1)
wrqm/s 0.23 0.00 0.00 0.00 0.00 0.00
r/s 0.21 1.02 1.02 0.69 0.00 0.09
w/s 0.18 8.94 8.86 2.32 0.94 0.07
rkB/s 4.52 127.97 127.79 10.47 0.01 1.35
wkB/s avgrq-sz avgqu-sz 2.08 34.37 0.00 598.53 145.79 0.00 595.94 146.50 0.00 31.69 28.01 0.01 3.78 8.00 0.33 0.36 22.50 0.00
await 9.98 0.43 0.45 3.23 345.84 2.55
deeplearning(1) Denominator(2) dependencyinjection(1) device(3) deviceproliferation(1) devops(2)
Thisisagreattoolforunderstandingblockdevices(disks),boththeworkloadappliedandtheresulting performance.Lookfor: r/s,w/s,rkB/s,wkB/s:Thesearethedeliveredreads,writes,readKbytes,andwriteKbytesper secondtothedevice.Usetheseforworkloadcharacterization.Aperformanceproblemmay simplybeduetoanexcessiveloadapplied.
distributed(10) DNS(1) Docker(1) Dockerhub(1)
await:TheaveragetimefortheI/Oinmilliseconds.Thisisthetimethattheapplicationsuffers, asitincludesbothtimequeuedandtimebeingserviced.Largerthanexpectedaveragetimescan beanindicatorofdevicesaturation,ordeviceproblems.
DSL(1)
avgqusz:Theaveragenumberofrequestsissuedtothedevice.Valuesgreaterthan1canbe evidenceofsaturation(althoughdevicescantypicallyoperateonrequestsinparallel,especially virtualdeviceswhichfrontmultiplebackenddisks.)
DynECT(1)
Dyn(1)
efficiency(1) ElasticLoadBalancer(1)
%util:Deviceutilization.Thisisreallyabusypercent,showingthetimeeachsecondthatthe devicewasdoingwork.Valuesgreaterthan60%typicallyleadtopoorperformance(which shouldbeseeninawait),althoughitdependsonthedevice.Valuescloseto100%usually indicatesaturation. Ifthestoragedeviceisalogicaldiskdevicefrontingmanybackenddisks,then100%utilizationmayjust meanthatsomeI/Oisbeingprocessed100%ofthetime,however,thebackenddisksmaybefarfrom saturated,andmaybeabletohandlemuchmorework. BearinmindthatpoorperformingdiskI/Oisn’tnecessarilyanapplicationissue.Manytechniquesare
elasticsearch(1) ELB(1) EMR(2) encoding(2) energy(1) eucalyptus(1)
typicallyusedtoperformI/Oasynchronously,sothattheapplicationdoesn’tblockandsufferthelatency directly(e.g.,readaheadforreads,andbufferingforwrites).
eureka(2) evcache(1)
7.freem
failover(2) falcor(2)
$ free -m total Mem: 245998 -/+ buffers/cache: Swap: 0
used 24545 23944 0
free 221453 222053 0
shared 83
buffers 59
cached 541
faulttolerance(12) flamegraphs(1) Flow(1)
Therighttwocolumnsshow:
footprint(1)
buffers:Forthebuffercache,usedforblockdeviceI/O.
FRP(1)
cached:Forthepagecache,usedbyfilesystems.
functionalreactive(1)
Wejustwanttocheckthatthesearen’tnearzeroinsize,whichcanleadtohigherdiskI/O(confirmusing iostat),andworseperformance.Theaboveexamplelooksfine,withmanyMbytesineach.
garbage(1) garbagecollection(1)
The“/+buffers/cache”provideslessconfusingvaluesforusedandfreememory.Linuxusesfreememory forthecaches,butcanreclaimitquicklyifapplicationsneedit.Soinawaythecachedmemoryshouldbe includedinthefreememorycolumn,whichthislinedoes.There’sevenawebsite,linuxatemyram,aboutthis confusion.
gc(1) Genie(4) Governator(1)
ItcanbeadditionallyconfusingifZFSonLinuxisused,aswedoforsomeservices,asZFShasitsownfile systemcachethatisn’treflectedproperlybythefreemcolumns.Itcanappearthatthesystemislowon freememory,whenthatmemoryisinfactavailableforusefromtheZFScacheasneeded.
GPU(2) green(1) Groovy(1)
8.sarnDEV1
HackDay(3) $ sar -n DEV 1 Linux 3.13.0-49-generic (titanclusters-xxxxx)
07/14/2015
_x86_64_
Hadoop(12)
(32 CPU)
HBase(1) 12:16:48 12:16:49 12:16:49 12:16:49
AM AM AM AM
IFACE eth0 lo docker0
rxpck/s 18763.00 14.00 0.00
txpck/s 5032.00 14.00 0.00
rxkB/s 20686.42 1.36 0.00
txkB/s 478.30 1.36 0.00
rxcmp/s 0.00 0.00 0.00
txcmp/s 0.00 0.00 0.00
rxm
12:16:49 12:16:50 12:16:50 12:16:50 ^C
AM AM AM AM
IFACE eth0 lo docker0
rxpck/s 19763.00 20.00 0.00
txpck/s 5101.00 20.00 0.00
rxkB/s 21999.10 3.25 0.00
txkB/s 482.56 3.25 0.00
rxcmp/s 0.00 0.00 0.00
txcmp/s 0.00 0.00 0.00
rxm
highvolume(4) highvolumedistributedsystems(8) Hive(2) HTML5(7) https(1) Hystrix(5) IBM(1)
Usethistooltochecknetworkinterfacethroughput:rxkB/sandtxkB/s,asameasureofworkload,andalso tocheckifanylimithasbeenreached.Intheaboveexample,eth0receiveisreaching22Mbytes/s,whichis 176Mbits/sec(wellunder,say,a1Gbit/seclimit). Thisversionalsohas%ifutilfordeviceutilization(maxofbothdirectionsforfullduplex),whichissomething wealsouseBrendan’snicstattooltomeasure.Andlikewithnicstat,thisishardtogetright,andseemsto notbeworkinginthisexample(0.00).
ice(1) initialization(1) innovation(3) insights(1) interprocesscommunication(1)
9.sarnTCP,ETCP1
Ipv6(2)
$ sar -n TCP,ETCP 1 Linux 3.13.0-49-generic (titanclusters-xxxxx) 12:17:19 AM 12:17:20 AM
active/s passive/s 1.00 0.00
12:17:19 AM 12:17:20 AM
atmptf/s 0.00
12:17:20 AM 12:17:21 AM
active/s passive/s 1.00 0.00
12:17:20 AM 12:17:21 AM ^C
atmptf/s 0.00
iseg/s 10233.00
isolation(1) 07/14/2015
oseg/s 18846.00
_x86_64_
(32 CPU)
ISP(1) java(5) JavaScript(17)
estres/s retrans/s isegerr/s 0.00 0.00 0.00 iseg/s 8359.00
orsts/s 0.00
jenkins(1)
oseg/s 6039.00
estres/s retrans/s isegerr/s 0.00 0.00 0.00
ThisisasummarizedviewofsomekeyTCPmetrics.Theseinclude:
jclouds(1)
kafka(1) orsts/s 0.00
Karyon(2) lifecycle(1) linux(2) lipstick(2)
active/s:NumberoflocallyinitiatedTCPconnectionspersecond(e.g.,viaconnect()).
loadbalancing(3)
passive/s:NumberofremotelyinitiatedTCPconnectionspersecond(e.g.,viaaccept()).
localization(1)
retrans/s:NumberofTCPretransmitspersecond.
localizationplatformengineering(1)
Theactiveandpassivecountsareoftenusefulasaroughmeasureofserverload:numberofnewaccepted connections(passive),andnumberofdownstreamconnections(active).Itmighthelptothinkofactiveas outbound,andpassiveasinbound,butthisisn’tstrictlytrue(e.g.,consideralocalhosttolocalhost connection). Retransmitsareasignofanetworkorserverissue;itmaybeanunreliablenetwork(e.g.,thepublic Internet),oritmaybedueaserverbeingoverloadedanddroppingpackets.Theexampleaboveshowsjust onenewTCPconnectionpersecond.
locking(1) locks(1) log4j(1) logging(2) machinelearning(5) MapReduce(1)
10.top
meetup(3)
$ top top - 00:15:40 up 21:56, 1 user, load average: 31.09, 29.87, 29.92 Tasks: 871 total, 1 running, 868 sleeping, 0 stopped, 2 zombie %Cpu(s): 96.8 us, 0.4 sy, 0.0 ni, 2.7 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 25190241+total, 24921688 used, 22698073+free, 60448 buffers KiB Swap: 0 total, 0 used, 0 free. 554208 cached Mem PID 20248 4213 66128 5235 4299 1 2 3 5 6 8
USER root root titancl+ root root root root root root root root
PR NI VIRT RES 20 0 0.227t 0.012t 20 0 2722544 64640 20 0 24344 2332 20 0 38.227g 547004 20 0 20.015g 2.682g 20 0 33620 2920 20 0 0 0 20 0 0 0 0 -20 0 0 20 0 0 0 20 0 0 0
SHR 18748 4423...