前言
昨天在ptipython下输入urllib代码补全时出现提示有一个urllib3的python包,可能是某些应用使用的依赖包,于是在pip搜索了一下,github地址,看到官方介绍,貌似很酷炫的样子:称该包是一个高性能,线程安全,且能重用tcp连接的库。
测试
根据官方的benchmark test我添加了urllib2的测试,为了体现urllib3的重用连接优点,我特意找了一个接口,能够让urllib3最大限度重用tcp连接,测试demo如下:
#!/usr/bin/env python
from __future__ import print_function
import sys
import time
import urllib
import urllib2
sys.path.append('../')
import urllib3
# URLs to download. Doesn't matter as long as they're from the same host, so we
# can take advantage of connection re-using.
TO_DOWNLOAD = [
'http://www.gandong.com/xml/132.xml',
'http://www.gandong.com/xml/135.xml',
'http://www.gandong.com/xml/134.xml',
'http://www.gandong.com/xml/136.xml',
'http://www.gandong.com/xml/137.xml',
'http://www.gandong.com/xml/138.xml',
'http://www.gandong.com/xml/139.xml',
'http://www.gandong.com/xml/151.xml',
'http://www.gandong.com/xml/152.xml',
'http://www.gandong.com/xml/153.xml',
'http://www.gandong.com/xml/154.xml',
'http://www.gandong.com/xml/156.xml',
'http://www.gandong.com/xml/158.xml',
'http://www.gandong.com/xml/163.xml',
'http://www.gandong.com/xml/164.xml',
]
def urllib_get(url_list):
assert url_list
for url in url_list:
now = time.time()
r = urllib.urlopen(url)
elapsed = time.time() - now
print("Got in %0.3f: %s" % (elapsed, url))
def urllib2_get(url_list):
assert url_list
for url in url_list:
now = time.time()
r = urllib2.urlopen(url)
elapsed = time.time() -now
print("Got in %0.3f: %s" % (elapsed, url))
def pool_get(url_list):
assert url_list
pool = urllib3.PoolManager()
for url in url_list:
now = time.time()
r = pool.request('GET', url, assert_same_host=False)
elapsed = time.time() - now
print("Got in %0.3fs: %s" % (elapsed, url))
if __name__ == '__main__':
print("Running pool_get ...")
now = time.time()
pool_get(TO_DOWNLOAD)
pool_elapsed = time.time() - now
print("Running urllib_get ...")
now = time.time()
urllib_get(TO_DOWNLOAD)
urllib_elapsed = time.time() - now
print("Running urllib2_get ...")
now = time.time()
urllib2_get(TO_DOWNLOAD)
urllib2_elapsed = time.time() - now
print("Completed pool_get in %0.3fs" % pool_elapsed)
print("Completed urllib_get in %0.3fs" % urllib_elapsed)
print("Completed urllib2_get in %0.3fs" % urllib_elapsed)
第一次结果:
Completed pool_get in 0.504s
Completed urllib_get in 0.323s
Completed urllib2_get in 0.323s
第二次结果:
Completed pool_get in 0.317s
Completed urllib_get in 0.457s
Completed urllib2_get in 0.457s
我的测试网络稳定,反而在第一次时urllib3慢于urllib2和urllib,明显urllib和urllib2在抓取上机制相同。我多次测试后,上述结果基本保持稳定。可见,urllib3并没有官方说的那么高性能。只能说明该包对于重用tcp连接的重用率极低。在速度上面并没有明显的提升。
总结
相对于urllib,urllib3并没有在速度上有所提升,反而更慢,不过其官方特性说使用了安全连接,这点没有去测试,如果描述真实的话,对于学校的一些模拟登录来说,可能会更适用一点。对一些基本的网络抓取,还是使用urllib更快速。不过对于urllib的设计来说,还是有一个可取之处,类风格使用新式类,语义描述更明确直观,返回的对象是response等。