`

Linux System and Performance Monitoring(总结篇)

阅读更多

Linux System and Performance Monitoring(总结篇) 
Date: 2009.07.21 
Author: Darren Hoch 
译: Tonnyom[AT]hotmail.com
结束语: 这是该译文的最后一篇,在这篇中,作者提供了一个案例环境,用之前几篇所阐述的理论以及涉及到的工具,对其进行一个整体的系统性能检查.对大家更好理解系统性能监控,进行一次实战演习.
 
BTW:在中文技术网站上,类似内容的文章,大体是来自该作者06-07年所著论文,此译文是建立在作者为OSCON 2009重写基础上的.所以部分内容可能会存在重复雷同,特此说明下.
附录 A: 案例学习 - 性能监控之循序渐进
某一天,一个客户打电话来需要技术帮助,并抱怨平常15秒就可以打开的网页现在需要20分钟才可以打开.
具体系统配置如下:
RedHat Enterprise Linux 3 update 7 
Dell 1850 Dual Core Xenon Processors, 2 GB RAM, 75GB 15K Drives 
Custom LAMP software stack(译注:Llinux+apache+mysql+php 环境)
性能分析之步骤
1. 首先使用vmstat 查看大致的系统性能情况:
# vmstat 1 10 
procs memory swap io system cpu 
r b swpd free buff cache si so bi bo in cs us sy id wa 
1 0 249844 19144 18532 1221212 0 0 7 3 22 17 25 8 17 18 
0 1 249844 17828 18528 1222696 0 0 40448 8 1384 1138 13 7 65 14 
0 1 249844 18004 18528 1222756 0 0 13568 4 623 534 3 4 56 37 
2 0 249844 17840 18528 1223200 0 0 35200 0 1285 1017 17 7 56 20 
1 0 249844 22488 18528 1218608 0 0 38656 0 1294 1034 17 7 58 18 
0 1 249844 21228 18544 1219908 0 0 13696 484 609 559 5 3 54 38 
0 1 249844 17752 18544 1223376 0 0 36224 4 1469 1035 10 6 67 17 
1 1 249844 17856 18544 1208520 0 0 28724 0 950 941 33 12 49 7 
1 0 249844 17748 18544 1222468 0 0 40968 8 1266 1164 17 9 59 16 
1 0 249844 17912 18544 1222572 0 0 41344 12 1237 1080 13 8 65 13
分析: 
1,不会是内存不足导致,因为swapping 始终没变化(si 和 so).尽管空闲内存不多(free),但swpd 也没有变化. 
2,CPU 方面也没有太大问题,尽管有一些运行队列(procs r),但处理器还始终有50% 多的idle(CPU id). 
3,有太多的上下文切换(cs)以及disk block从RAM中被读入(bo). 
4,CPU 还有平均20% 的I/O 等待情况.
结论: 
从以上总结出,这是一个I/O 瓶颈.
2. 然后使用iostat 检查是谁在发出IO 请求:
# iostat -x 1 
Linux 2.4.21-40.ELsmp (mail.example.com) 03/26/2007
avg-cpu: %user %nice %sys %idle 
30.00 0.00 9.33 60.67
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util 
/dev/sda 7929.01 30.34 1180.91 14.23 7929.01 357.84 3964.50 178.92 6.93 0.39 0.03 0.06 6.69 
/dev/sda1 2.67 5.46 0.40 1.76 24.62 57.77 12.31 28.88 38.11 0.06 2.78 1.77 0.38 
/dev/sda2 0.00 0.30 0.07 0.02 0.57 2.57 0.29 1.28 32.86 0.00 3.81 2.64 0.03 
/dev/sda3 7929.01 24.58 1180.44 12.45 7929.01 297.50 3964.50 148.75 6.90 0.32 0.03 0.06 6.68
avg-cpu: %user %nice %sys %idle 
9.50 0.00 10.68 79.82
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util 
/dev/sda 0.00 0.00 1195.24 0.00 0.00 0.00 0.00 0.00 0.00 43.69 3.60 0.99 117.86 
/dev/sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
/dev/sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
/dev/sda3 0.00 0.00 1195.24 0.00 0.00 0.00 0.00 0.00 0.00 43.69 3.60 0.99 117.86
avg-cpu: %user %nice %sys %idle 
9.23 0.00 10.55 79.22
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util 
/dev/sda 0.00 0.00 1200.37 0.00 0.00 0.00 0.00 0.00 0.00 41.65 2.12 0.99 112.51 
/dev/sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
/dev/sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
/dev/sda3 0.00 0.00 1200.37 0.00 0.00 0.00 0.00 0.00 0.00 41.65 2.12 0.99 112.51
分析: 
1,看上去只有/dev/sda3 分区很活跃,其他分区都很空闲. 
2,差不多有1200 读IOPS,磁盘本身是支持200 IOPS左右(译注:参考之前的IOPS 计算公式). 
3,有超过2秒,实际上没有一个读磁盘(rkb/s).这和在vmstat 看到有大量I/O wait是有关系的. 
4,大量的read IOPS(r/s)和在vmstat 中大量的上下文是匹配的.这说明很多读操作都是失败的.
结论: 
从以上总结出,部分应用程序带来的读请求,已经超出了I/O 子系统可处理的范围.
3. 使用top 来查找系统最活跃的应用程序
# top -d 1 
11:46:11 up 3 days, 19:13, 1 user, load average: 1.72, 1.87, 1.80 
176 processes: 174 sleeping, 2 running, 0 zombie, 0 stopped 
CPU states: cpu user nice system irq softirq iowait idle 
total 12.8% 0.0% 4.6% 0.2% 0.2% 18.7% 63.2% 
cpu00 23.3% 0.0% 7.7% 0.0% 0.0% 36.8% 32.0% 
cpu01 28.4% 0.0% 10.7% 0.0% 0.0% 38.2% 22.5% 
cpu02 0.0% 0.0% 0.0% 0.9% 0.9% 0.0% 98.0% 
cpu03 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0% 
Mem: 2055244k av, 2032692k used, 22552k free, 0k shrd, 18256k buff 
1216212k actv, 513216k in_d, 25520k in_c 
Swap: 4192956k av, 249844k used, 3943112k free 1218304k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
14939 mysql 25 0 379M 224M 1117 R 38.2 25.7% 15:17.78 mysqld 
4023 root 15 0 2120 972 784 R 2.0 0.3 0:00.06 top 
1 root 15 0 2008 688 592 S 0.0 0.2 0:01.30 init 
2 root 34 19 0 0 0 S 0.0 0.0 0:22.59 ksoftirqd/0 
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 
4 root 10 -5 0 0 0 S 0.0 0.0 0:00.05 events/0
分析: 
1,占用资源最多的好像就是mysql 进程,其他都处于完全idle 状态. 
2,在top(wa) 看到的数值,和在vmstat 看到的wio 数值是有关联的.
结论: 
从以上总结出,似乎就只有mysql 进程在请求资源,因此可以推论它就是导致问题的关键.
4. 现在已经确定是mysql 在发出读请求,使用strace 来检查它在读请求什么.
# strace -p 14939
Process 14939 attached - interrupt to quit 
read(29, "\3\1\237\1\366\337\1\222%\4\2\0\0\0\0\0012P/d", 20) = 20 
read(29, "ata1/strongmail/log/strongmail-d"..., 399) = 399 
_llseek(29, 2877621036, [2877621036], SEEK_SET) = 0 
read(29, "\1\1\241\366\337\1\223%\4\2\0\0\0\0\0012P/da", 20) = 20 
read(29, "ta1/strongmail/log/strongmail-de"..., 400) = 400 
_llseek(29, 2877621456, [2877621456], SEEK_SET) = 0 
read(29, "\1\1\235\366\337\1\224%\4\2\0\0\0\0\0012P/da", 20) = 20 
read(29, "ta1/strongmail/log/strongmail-de"..., 396) = 396 
_llseek(29, 2877621872, [2877621872], SEEK_SET) = 0 
read(29, "\1\1\245\366\337\1\225%\4\2\0\0\0\0\0012P/da", 20) = 20 
read(29, "ta1/strongmail/log/strongmail-de"..., 404) = 404 
_llseek(29, 2877622296, [2877622296], SEEK_SET) = 0 
read(29, "\3\1\236\2\366\337\1\226%\4\2\0\0\0\0\0012P/d", 20) = 20
分析: 
1,大量的读操作都在不断寻道中,说明mysql 进程产生的是随机IO. 
2,看上去似乎是,某一sql 查询导致读操作.
结论: 
从以上总结出,所有的读IOPS 都是mysql 进程在执行某些读查询时产生的.
5. 使用mysqladmin 命令,来查找是哪个慢查询导致的.
# ./mysqladmin -pstrongmail processlist
+----+------+-----------+------------+---------+------+----------+---------------------------------------- 
| Id | User | Host | db | Command | Time | State | Info 
+----+------+-----------+------------+---------+------+----------+---------------------------------------- 
| 1 | root | localhost | strongmail | Sleep | 10 | | 
| 2 | root | localhost | strongmail | Sleep | 8 | | 
| 3 | root | localhost | root | Query | 94 | Updating | update `failures` set 
`update_datasource`='Y' where database_id='32' and update_datasource='N' and | 
| 14 | root | localhost | | Query | 0 | | show processlist
分析: 
1,MySQL 数据库里,似乎在不断的运行table update查询. 
2,基于这个update 查询,数据库是对所有的table 进行索引.
结论: 
从以上总结出,MySQL里这些update 查询问题,都是在尝试对所有table 进行索引.这些产生的读请求正是导致系统性能下降的原因.
后续
把以上这些性能信息移交给了相关开发人员,用于分析他们的PHP 代码.一个开发人员对代码进行了临时性优化.某个查询如果出错了,也最多到100K记录.数据库本身考虑最多存在4百万记录.最后,这个查询不会再给数据库带来负担了.
References 
• Ezlot, Phillip – Optimizing Linux Performance, Prentice Hall, Princeton NJ 2005 ISBN – 0131486829 
• Johnson, Sandra K., Huizenga, Gerrit – Performance Tuning for Linux Servers, IBM Press, Upper Saddle River NJ 2005 ISBN 013144753X 
• Bovet, Daniel Cesati, Marco – Understanding the Linux Kernel, O’Reilly Media, Sebastoppl CA 2006, ISBN 0596005652 
• Blum, Richard – Network Performance Open Source Toolkit, Wiley, Indianapolis IN 2003, ISBN 0-471-43301-2 
• Understanding Virtual Memory in RedHat 4, Neil Horman, 12/05 http://people.redhat.com/nhorman/papers/rhel4_vm.pdf 
• IBM, Inside the Linux Scheduler, http://www.ibm.com/developerworks/linux/library/l-scheduler/ 
• Aas, Josh, Understanding the Linux 2.6.8.1 CPU Scheduler, http://josh.trancesoftware.com/linux/linux_cpu_scheduler.pdf 
• Wieers, Dag, Dstat: Versatile Resource Statistics Tool, http://dag.wieers.com/home-made/dstat/
分享到:
评论

相关推荐

    Linux System and Performance Monitoring

    Linux System and Performance Monitoring

    Linux System and Performance Monitoring 英文教程

    一个简短的44页的英文教程,简单实用,针对性强

    Linux System and Performance Monitoring

    分CPU篇,memory篇,i/o篇,network篇 讲述如何对系统性能进行监测。 讲得很透彻 , 而且还很全面。 理论结合实际 , 其中案例分析都很好。不花哨 , 采用的工具及命令都是最基本的 , 有助于实际操作 。

    UNIX and Linux System Administration Handbook 5th Ed

    UNIX® and Linux® System Administration Handbook, Fifth Edition, is today’s definitive guide to installing, configuring, and maintaining any UNIX or Linux system, including systems that supply core ...

    UNIX And Linux System Administration Handbook, 5th Edition

    UNIX® and Linux® System Administration Handbook, Fifth Edition, is today’s definitive guide to installing, configuring, and maintaining any UNIX or Linux system, including systems that supply core ...

    Extreme Linux Performance Monitoring Part II

    Disk IO subsystems are the slowest part of any Linux system. This is due mainly to their distance from the CPU and the fact that disks require the physics to work (rotation and seek). If the time ...

    Prentice.Hall.Performance.Tuning.for.Linux.Servers

    components, performance issues, and optimization opportunities Master core Linux performance tuning principles and strategies Utilize free, open source tools for measurement, monitoring, system ...

    Monitoring_and_managing_system_status_and_performance-en-US.pdf

    Redhat 监控管理系统状态及性能 Red_Hat_Enterprise_Linux-8-Monitoring_and_managing_system_status_and_performance-en-US.pdf

    Linux: Powerful Server Administration

    This Learning Path is intended for system administrators with a basic understanding of Linux operating systems and written with the novice-to-intermediate Linux user in mind. To get the most of this ...

    .Linux.Hacker

    Title: Linux Hacker Author: Mr Ajay Kumar Tiwari Length: 136 pages Edition: 1 Language: English Publisher: CreateSpace Independent Publishing Platform ...Chapter 12: System Monitoring and Performance

    Linux Shell Scripting Cookbook - Third Edition

    From there, you'll learn text processing, web interactions, network and system monitoring, and system tuning. Software engineers will learn how to examine system applications, how to use modern ...

    CentOS.7.Linux.Server.Cookbook.2nd.Ed.pdf

    Install and configure CentOS 7 Linux server system from scratch using normal and advanced methods Maintain a performance-based and secure server solution by deploying expert configuration advice and ...

    SystemTap_Beginners_Guide

    For system administrators, SystemTap can be used as a performance monitoring tool for Fedora. It is most useful when other similar tools cannot precisely pinpoint a bottleneck in the system, requiring...

    Sles performance tuning

    2 System Monitoring Utilities 9 2.1 Multi-Purpose Tools. . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 System Information. . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Processes. . . . ....

    Apache Accumulo for Developers

    Shows you how to build Accumulo, Hadoop, and ZooKeeper clusters from scratch on both Windows and Linux Allows you to get hands-on knowledge about how to run Accumulo on Amazon EC2, Google Cloud ...

    Python Network Programming Cookbook, 2nd Edition - 2017

    system administration and deployment tasks over SSH. You can run commands, install packages, or set up new websites remotely from your laptop. Chapter 7, Working with Web Services – XML-RPC, SOAP, ...

    Oracle WebLogic Server 10gR3: Troubleshooting Methodologies

    JVM Management: Java SE 6.0 Monitoring and Management Architecture Identifying Processes and Threads Obtaining a Thread Dump Using WLS Memory: Define Java Heap Garbage Collection Review Configuring ...

    8-07-14_MegaCLI for linux_windows

    SCGCQ00355536 (DFCT) - MegaCli and StorCli can not manage both controllers of different types n the same system under FreeBSD SCGCQ00362808 (DFCT) - MegaCli 32 Crashes in Windows in specific system ...

    Sams.Publishing.Ubuntu.Unleashed.2008.Edition.pdf

    12 System-Monitoring Tools..........289 13 Backing Up............301 14 Networking............325 15 Remote Access with SSH and Telnet.......371 Part IV Ubuntu as a Server 16 File and Print..............

Global site tag (gtag.js) - Google Analytics