1.安装依赖的包
```
 "# 读取docx\n",
     "!pip install python-docx\n",
     "!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple python-docx\n",
     "# 中英文分词\n",
     "!pip install jieba\n",
     "!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple jieba\n",
     "# 输出到excel\n",
     "!pip install pandas"
     "!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas"
 ```
2.读取docx文件到一个大字符串
```python
 import docx
 from docx import Document
 document = docx.Document("Python.docx")
 content = " ".join([para.text for para in document.paragraphs])
 ```
3. 中文分词
```
 import jieba
seg_list = jieba.cut(content,cut_all=False)
 print(type(seg_list))
# 过滤标点符号,无意义的单个字
 seg_list = [
     word
     for word in seg_list
     if len(word) >1
 ]
 print(seg_list[:30])
 ```
4.统计词频
```
 from collections import Counter
 counter = Counter(seg_list)
 for key,count in list(counter.items())[:10]:
     print(key,count)
 ```
5. 构造pandas并且排序
```
 import pandas as pd
 df = pd.DataFrame(list(counter.items()), columns = ['word','count'])
 df.sort_values(by="count",ascending=False,inplace=True)
 df.head()
 ```
将list转化为dict
```
 a=['hello','world','1','2']
 b= dict(zip(a[0::2],a[1::2]))
 b
 ```