前面好几篇文章写都是模拟器里面的一个叫做log_generator的组件的调优过程,这次我们讲讲另外一个组件client_simulator。
client_simulator这个组件内在逻辑非常简单,希望模拟大量的客户端,同时按照一定的频率向一台HTTP服务器请求编码过的chunk数据,这个一定的频率指的是,每秒发送10次请求,每次请求一个不断变化的url。编写client_simulator这个组件的目的是验证该HTTP服务器的内存置换策略的有效性。
组里负责该组件的同事,写完Python代码后上机一跑,性能不忍直视,大致只能模拟100个客户端。要知道,这个结果可是在一台8核4G的虚拟机上跑的啊,总共也就打出了130M左右的带宽,离我们的目标700节点,差距太大。
看了下该组件的代码,好嘛,又是坑爹的多线程。直接改成多进程+多线程的模式。进程和CPU数一致,每进程启动线程数= 期望节点数/进程数。
重构完上机一跑,嗯,有所改善,CPU都吃的差不多满了,但是整体性能提高不大,大概能模拟300个客户端了。
这可咋办,再看一次代码?代码很简单,就是两层循环,外层生成进程,内层生成线程,线程使用网络神库requests请求数据。再没有其他复杂逻辑了~
既然又是一个IO密集的程序,那么自然系统资源都消耗在网络数据传输上了,难道要重构requests?我可没这能力。
死马当作活马医,再次上yappi分析,得到下面的结果:
Clock type: CPU
Ordered by: totaltime, desc
name ncall tsub ttot tavg
/usr/lib64/python2.7/threading.py:754 Thread.run 10 0.000704 14.011758 1.401176
/home/admin/comrade/comrade/faker.py:23 fake_sdk 10 0.124799 14.010940 1.401094
/usr/lib/python2.7/site-packages/requests/api.py:61 get 2070 0.045381 13.767441 0.006651
/usr/lib/python2.7/site-packages/requests/api.py:16 request 2070 0.041756 13.719202 0.006628
/usr/lib/python2.7/site-packages/requests/sessions.py:441 Session.request 2070 0.059906 11.679531 0.005642
/usr/lib/python2.7/site-packages/requests/sessions.py:589 Session.send 2070 0.089826 7.909129 0.003821
/usr/lib/python2.7/site-packages/requests/adapters.py:388 HTTPAdapter.send 2070 0.064437 6.610118 0.003193
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:447 HTTPConnectionPool.urlopen 2070 0.085589 3.781414 0.001827
/usr/lib/python2.7/site-packages/requests/sessions.py:401 Session.prepare_request 2070 0.078741 3.018275 0.001458
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:322 HTTPConnectionPool._make_request 2070 0.077418 2.430796 0.001174
/usr/lib/python2.7/site-packages/requests/models.py:299 PreparedRequest.prepare 2070 0.038006 1.528752 0.000739
/usr/lib/python2.7/site-packages/requests/adapters.py:290 HTTPAdapter.get_connection 2070 0.024181 1.526116 0.000737
/usr/lib/python2.7/site-packages/urllib3/poolmanager.py:266 PoolManager.connection_from_url 2070 0.017157 1.346354 0.000650
/usr/lib/python2.7/site-packages/urllib3/poolmanager.py:206 PoolManager.connection_from_host 2070 0.015797 1.236867 0.000598
/usr/lib/python2.7/site-packages/requests/structures.py:42 CaseInsensitiveDict.__init__ 12420 0.108715 1.226514 0.000099
/usr/lib/python2.7/site-packages/urllib3/poolmanager.py:229 PoolManager.connection_from_context 2070 0.018828 1.213145 0.000586
/usr/lib64/python2.7/httplib.py:1053 HTTPConnection.getresponse 2070 0.034232 1.161452 0.000561
/usr/lib/python2.7/site-packages/requests/sessions.py:340 Session.__init__ 2070 0.060470 1.106475 0.000535
/usr/lib/python2.7/site-packages/requests/sessions.py:50 merge_setting 14490 0.103324 1.102030 0.000076
/usr/lib/python2.7/site-packages/urllib3/poolmanager.py:242 PoolManager.connection_from_pool_key 2070 0.034751 1.082293 0.000523
/usr/lib64/python2.7/httplib.py:1015 HTTPConnection.request 2070 0.010323 1.073503 0.000519
/usr/lib64/python2.7/httplib.py:1036 HTTPConnection._send_request 2070 0.074046 1.063180 0.000514
/usr/lib64/python2.7/httplib.py:437 HTTPResponse.begin 2070 0.056034 1.041176 0.000503
/usr/lib64/python2.7/_abcoll.py:526 update 22770 0.277791 1.022363 0.000045
/usr/lib/python2.7/site-packages/requests/adapters.py:253 HTTPAdapter.build_response 2070 0.050085 1.008311 0.000487
/usr/lib/python2.7/site-packages/requests/sessions.py:398 Session.__exit__ 2070 0.007221 0.889536 0.000430
/usr/lib/python2.7/site-packages/requests/sessions.py:705 Session.close 2070 0.016573 0.882314 0.000426
/usr/lib/python2.7/site-packages/urllib3/poolmanager.py:170 PoolManager._new_pool 2070 0.038002 0.854147 0.000413
/usr/lib/python2.7/site-packages/requests/adapters.py:313 HTTPAdapter.close 4140 0.021252 0.841980 0.000203
/usr/lib/python2.7/site-packages/requests/models.py:810 Response.content 2070 0.023343 0.821442 0.000397
果然和预期的一样,最占用CPU资源的是requests库。咦,为何有那么多sessions.py模块的调用,我们的需求,明明只是简单的HTTP GET请求。
啊,啊,啊,我们是不是杀鸡用牛刀了,赶紧的,把requests换成urllib2吧。
替换完,上机一跑,性能杠杠滴,带宽瞬间打满。