learn from 《Python高性能(第2版)》
1. 数据结构与算法
列表、双端队列
- list底层是数组,在 开头 插入和删除的时间复杂度 O(n), 在结尾插入和删除是 O(1),访问任意元素是 O(1)
- deque底层是 双向链表实现,开头结尾操作时间复杂度均为 O(1),代价是访问中间元素是 O(n)
- 在有序列表里查找元素,可以使用二分查找,bisect模块,时间O(log n)
字典、反向索引
在N篇文档中查找包含 X 单词的所有文档 [doc for doc in docs if 'X' in doc]
 当N非常大的时候这样的效率是很低的
首次可以生成 单词 : [包含该单词的 doc 的 id],以后每次查询的时间复杂度是 O(1)
集合
底层也是哈希
堆
插入和删除元素的时间复杂度都是 O(log n)
heapq
>>> import heapq
>>> c = [10,1,3,2,5]
>>> heapq.heapify(c)
>>> c
[1, 2, 3, 10, 5]
>>> heapq.heappop(c)  # 弹出最小的值 时间复杂度O(1)
1
>>> heapq.heappush(c, 9)
>>> c
[2, 5, 3, 10, 9]
PriorityQueue 优先队列,它是线程、进程安全的
>>> from queue import PriorityQueue
>>> q = PriorityQueue()
>>> for x in [2, 5, 10, 9, 1]:
...     q.put(x)
>>> q
<queue.PriorityQueue object at 0x0000017DD91EED08>
>>> while not q.empty():
...     q.get()  # 取出最小值
...
1
2
5
9
10
如果是元组等,将按第一个元素排序,一样的话,按第二个,以此类推
>>> q.put([1,2])
>>> q.put([1,3])
>>> q.put([1,-1])
>>> q.get()
[1, -1]
>>> q.get()
[1, 2]
>>> q.get()
[1, 3]
字典树
标准库没有实现,有第三方包实现
 字典树可以快速查找前缀字符串,课用于文字补全
pip install patricia-trie
另外还有 C语言编写优化过的字典树库 datrie, marisa-trie
https://pypi.org/project/datrie/
 https://pypi.org/project/marisa-trie/0.7.1/
from random import choice
from patricia import trie
from string import ascii_lowercase
import cProfiledef random_string(length):return ''.join(choice(ascii_lowercase) for i in range(length))def gen_strings():strings = [random_string(32) for _ in range(100000)]return stringsstrings = gen_strings()
strings_dict = {s: 0 for s in strings}
strings_trie = trie(**strings_dict)def func1():matches = [s for s in strings if s.startswith('abc')]print(matches)def func2():matches = list(strings_trie.iter('abc'))print(matches)if __name__ == "__main__":cProfile.run('func1()', sort='tottime')cProfile.run('func2()', sort='tottime')
D:\ProgramData\Anaconda3\envs\cv\python.exe D:/gitcode/Python_learning/Python-High-Performance-Second-Edition-master/Chapter02/trie.py 
['abcimostjvrhrhuswjhshhsnoyawtuaq', 'abcymrvuyenkyuppjuwnstzpaxarjpjo', 'abcbltxrvuktanyoiezntijmaryojbvg', 'abckxrzwjmdjaekmiqquezgwjabdhohe', 'abcecoluitahquyqxoidjiwaupatgszd', 'abcppzcrqjxgzuoiuskethybnlgbougt', 'abcbaznirxvyhnvabmbhwmsefdtultpj']100006 function calls in 0.036 secondsOrdered by: internal timencalls  tottime  percall  cumtime  percall filename:lineno(function)100000    0.019    0.000    0.019    0.000 {method 'startswith' of 'str' objects}1    0.017    0.017    0.036    0.036 trie.py:18(<listcomp>)1    0.000    0.000    0.036    0.036 {built-in method builtins.exec}1    0.000    0.000    0.000    0.000 {built-in method builtins.print}1    0.000    0.000    0.036    0.036 trie.py:17(func1)1    0.000    0.000    0.036    0.036 <string>:1(<module>)1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}['abcimostjvrhrhuswjhshhsnoyawtuaq', 'abcymrvuyenkyuppjuwnstzpaxarjpjo', 'abcbltxrvuktanyoiezntijmaryojbvg', 'abcbaznirxvyhnvabmbhwmsefdtultpj', 'abckxrzwjmdjaekmiqquezgwjabdhohe', 'abcecoluitahquyqxoidjiwaupatgszd', 'abcppzcrqjxgzuoiuskethybnlgbougt']83 function calls (66 primitive calls) in 0.000 secondsOrdered by: internal timencalls  tottime  percall  cumtime  percall filename:lineno(function)1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}25/8    0.000    0.000    0.000    0.000 patricia.py:57(_items)1    0.000    0.000    0.000    0.000 {built-in method builtins.print}3    0.000    0.000    0.000    0.000 patricia.py:161(_next)1    0.000    0.000    0.000    0.000 trie.py:21(func2)1    0.000    0.000    0.000    0.000 patricia.py:354(iter)8    0.000    0.000    0.000    0.000 patricia.py:51(_keys)7    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}9    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}8    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}1    0.000    0.000    0.000    0.000 patricia.py:366(_accumulate)8    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}1    0.000    0.000    0.000    0.000 <string>:1(<module>)3    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}5    0.000    0.000    0.000    0.000 {built-in method builtins.len}1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
可以看出 trie 快了至少好几十倍
2. 缓存
lru_cache
例如:计算斐波那契数列,动态规划
 python 有 functools.lru_cache装饰器,可指定大小 maxsize
>>> from functools import lru_cache
>>> @lru_cache()
... def sum2(a, b):
...     print('calculate')
...     return a+b
...
>>> sum2(1,2)
calculate
3
>>> sum2(1,2)  # 没有进行计算,直接输出结果了
3
>>> sum2(1,3)
calculate
4
缓存性能查看
>>> sum2.cache_info()
CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
>>> sum2.cache_clear()
>>> sum2.cache_info()
CacheInfo(hits=0, misses=0, maxsize=128, currsize=0)
from functools import lru_cache
import cProfile
@lru_cache(maxsize=None)
def fibonacci_mem(n):if n < 1:  return 1else:  return fibonacci_mem(n - 1) + fibonacci_mem(n - 2)def fibonacci(n):if n < 1:  return 1else:  return fibonacci(n - 1) + fibonacci(n - 2)if __name__ == '__main__':cProfile.run('fibonacci(30)', sort='tottime')print(fibonacci_mem)cProfile.run('fibonacci_mem(30)', sort='tottime')
输出:
D:\ProgramData\Anaconda3\envs\cv\python.exe D:/gitcode/Python_learning/Python-High-Performance-Second-Edition-master/Chapter02/cache.py 4356620 function calls (4 primitive calls) in 1.695 secondsOrdered by: internal timencalls  tottime  percall  cumtime  percall filename:lineno(function)
4356617/1    1.695    0.000    1.695    1.695 cache.py:8(fibonacci)1    0.000    0.000    1.695    1.695 {built-in method builtins.exec}1    0.000    0.000    1.695    1.695 <string>:1(<module>)1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}<functools._lru_cache_wrapper object at 0x000001882C23E550>35 function calls (4 primitive calls) in 0.000 secondsOrdered by: internal timencalls  tottime  percall  cumtime  percall filename:lineno(function)1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}32/1    0.000    0.000    0.000    0.000 cache.py:3(fibonacci_mem)1    0.000    0.000    0.000    0.000 <string>:1(<module>)1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
joblib
提供了基于磁盘的简单缓存
pip install joblib
import os
from joblib import Memory
mem = Memory(cachedir=os.path.join(os.path.dirname(__file__), 'cache_dir'))@mem.cache
def fibonacci(n):if n < 1:  return 1else:  return fibonacci(n - 1) + fibonacci(n - 2)if __name__ == '__main__':fibonacci(32)

3. 推导和生成器
两者可以替代显式的 for 循环,效率比 for 循环要高
对生成器对象进行迭代时,每次只返回一个计算结果,可以节省内存使用
jupyter 中
%load_ext memory_profiler
import osnumbers = range(1000000)def map_comprehension(numbers):a = [n * 2 for n in numbers]b = [n ** 2 for n in a]c = [n ** 0.33 for n in b]return max(c)%memit map_comprehension(numbers) 
peak memory: 236.46 MiB, increment: 112.98 MiB
def map_normal(numbers):a = map(lambda n: n * 2, numbers)b = map(lambda n: n ** 2, a)c = map(lambda n: n ** 0.33, b)return max(c)%memit map_normal(numbers)
peak memory: 123.73 MiB, increment: 0.00 MiB
可以看出 生成器版本更节省内存