做自己视频教程的网站改变网站的域名
做自己视频教程的网站,改变网站的域名,张家界住房和城乡建设局网站,遵义网站建设制作摘要#xff1a;
通过识别BERT对话情绪状态的实例#xff0c;展现在昇思MindSpore AI框架中大语言模型的原理和实际使用方法、步骤。
一、环境配置
%%capture captured_output
# 实验环境已经预装了mindspore2.2.14#xff0c;如需更换mindspore版本#xff0c;可更改下…摘要
通过识别BERT对话情绪状态的实例展现在昇思MindSpore AI框架中大语言模型的原理和实际使用方法、步骤。
一、环境配置
%%capture captured_output
# 实验环境已经预装了mindspore2.2.14如需更换mindspore版本可更改下面mindspore的版本号
!pip uninstall mindspore -y
!pip install -i https://pypi.mirrors.ustc.edu.cn/simple mindspore2.2.14
# 该案例在 mindnlp 0.3.1 版本完成适配如果发现案例跑不通可以指定mindnlp版本执行!pip install mindnlp0.3.1
!pip install mindnlp
输出
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting mindnlpDownloading https://pypi.tuna.tsinghua.edu.cn/packages/72/37/ef313c23fd587c3d1f46b0741c98235aecdfd93b4d6d446376f3db6a552c/mindnlp-0.3.1-py3-none-any.whl (5.7 MB)━━━━━━━━━━━━━━━━ 5.7/5.7 MB 14.2 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: mindspore in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from mindnlp) (2.2.14)
Requirement already satisfied: tqdm in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from mindnlp) (4.66.4)
Requirement already satisfied: requests in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from mindnlp) (2.32.3)
Collecting datasets (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/60/2d/963b266bb8f88492d5ab4232d74292af8beb5b6fdae97902df9e284d4c32/datasets-2.20.0-py3-none-any.whl (547 kB)━━━━━━━━━━━━━━━━ 547.8/547.8 kB 21.2 MB/s eta 0:00:00
Collecting evaluate (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c2/d6/ff9baefc8fc679dcd9eb21b29da3ef10c81aa36be630a7ae78e4611588e1/evaluate-0.4.2-py3-none-any.whl (84 kB)━━━━━━━━━━━━━━━━ 84.1/84.1 kB 24.8 MB/s eta 0:00:00
Collecting tokenizers (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ba/26/139bd2371228a0e203da7b3e3eddcb02f45b2b7edd91df00e342e4b55e13/tokenizers-0.19.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.6 MB)━━━━━━━━━━━━━━━━ 3.6/3.6 MB 14.7 MB/s eta 0:00:00a 0:00:01
Collecting safetensors (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c6/02/28e6280ed0f1bde89eed644b80f2ece4e5ae212dc9ee70d7f56fadc93602/safetensors-0.4.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB)━━━━━━━━━━━━━━━━ 1.2/1.2 MB 17.8 MB/s eta 0:00:00a 0:00:01
Collecting sentencepiece (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a3/69/e96ef68261fa5b82379fdedb325ceaf1d353c6e839ec346d8244e0da5f2f/sentencepiece-0.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB)━━━━━━━━━━━━━━━━ 1.3/1.3 MB 14.4 MB/s eta 0:00:00a 0:00:01
Collecting regex (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/70/70/fea4865c89a841432497d1abbfd53878513b55c6543245fabe31cf8df0b8/regex-2024.5.15-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (774 kB)━━━━━━━━━━━━━━━━ 774.7/774.7 kB 15.3 MB/s eta 0:00:00a 0:00:01
Collecting addict (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/6a/00/b08f23b7d7e1e14ce01419a467b583edbb93c6cdb8654e54a9cc579cd61f/addict-2.4.0-py3-none-any.whl (3.8 kB)
Collecting ml-dtypes (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/50/96/13d7c3cc82d5ef597279216cf56ff461f8b57e7096a3ef10246a83ca80c0/ml_dtypes-0.4.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.2 MB)━━━━━━━━━━━━━━━━ 2.2/2.2 MB 11.9 MB/s eta 0:00:00a 0:00:01
Collecting pyctcdecode (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a5/8a/93e2118411ae5e861d4f4ce65578c62e85d0f1d9cb389bd63bd57130604e/pyctcdecode-0.5.0-py2.py3-none-any.whl (39 kB)
Collecting jieba (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c6/cb/18eeb235f833b726522d7ebed54f2278ce28ba9438e3135ab0278d9792a2/jieba-0.42.1.tar.gz (19.2 MB)━━━━━━━━━━━━━━━━ 19.2/19.2 MB 16.5 MB/s eta 0:00:0000:0100:01Preparing metadata (setup.py) ... done
Collecting pytest7.2.0 (from mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/67/68/a5eb36c3a8540594b6035e6cdae40c1ef1b6a2bfacbecc3d1a544583c078/pytest-7.2.0-py3-none-any.whl (316 kB)━━━━━━━━━━━━━━━━ 316.8/316.8 kB 16.7 MB/s eta 0:00:00
Requirement already satisfied: attrs19.2.0 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from pytest7.2.0-mindnlp) (23.2.0)
Requirement already satisfied: iniconfig in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from pytest7.2.0-mindnlp) (2.0.0)
Requirement already satisfied: packaging in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from pytest7.2.0-mindnlp) (23.2)
Requirement already satisfied: pluggy2.0,0.12 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from pytest7.2.0-mindnlp) (1.5.0)
Requirement already satisfied: exceptiongroup1.0.0rc8 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from pytest7.2.0-mindnlp) (1.2.0)
Requirement already satisfied: tomli1.0.0 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from pytest7.2.0-mindnlp) (2.0.1)
Requirement already satisfied: filelock in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from datasets-mindnlp) (3.15.3)
Requirement already satisfied: numpy1.17 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from datasets-mindnlp) (1.26.4)
Collecting pyarrow15.0.0 (from datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/87/60/cc0645eb4ef73f88847e40a7f9d238bae6b7409d6c1f6a5d200d8ade1f09/pyarrow-16.1.0-cp39-cp39-manylinux_2_28_aarch64.whl (38.1 MB)━━━━━━━━━━━━━━━━ 38.1/38.1 MB 14.2 MB/s eta 0:00:0000:0100:01
Collecting pyarrow-hotfix (from datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e4/f4/9ec2222f5f5f8ea04f66f184caafd991a39c8782e31f5b0266f101cb68ca/pyarrow_hotfix-0.6-py3-none-any.whl (7.9 kB)
Requirement already satisfied: dill0.3.9,0.3.0 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from datasets-mindnlp) (0.3.8)
Requirement already satisfied: pandas in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from datasets-mindnlp) (2.2.2)
Collecting xxhash (from datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7c/b9/93f860969093d5d1c4fa60c75ca351b212560de68f33dc0da04c89b7dc1b/xxhash-3.4.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (220 kB)━━━━━━━━━━━━━━━━ 220.6/220.6 kB 15.6 MB/s eta 0:00:00
Collecting multiprocess (from datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/da/d9/f7f9379981e39b8c2511c9e0326d212accacb82f12fbfdc1aa2ce2a7b2b6/multiprocess-0.70.16-py39-none-any.whl (133 kB)━━━━━━━━━━━━━━━━ 133.4/133.4 kB 15.8 MB/s eta 0:00:00
Collecting fsspec2024.5.0,2023.1.0 (from fsspec[http]2024.5.0,2023.1.0-datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ba/a3/16e9fe32187e9c8bc7f9b7bcd9728529faa725231a0c96f2f98714ff2fc5/fsspec-2024.5.0-py3-none-any.whl (316 kB)━━━━━━━━━━━━━━━━ 316.1/316.1 kB 16.8 MB/s eta 0:00:00
Collecting aiohttp (from datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/eb/45/eebe8d2215328434f33ccb44a05d2741ff7ed4b96b56ca507e2ecf598b73/aiohttp-3.9.5-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB)━━━━━━━━━━━━━━━━ 1.2/1.2 MB 17.1 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: huggingface-hub0.21.2 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from datasets-mindnlp) (0.23.4)
Requirement already satisfied: pyyaml5.1 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from datasets-mindnlp) (6.0.1)
Requirement already satisfied: charset-normalizer4,2 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from requests-mindnlp) (3.3.2)
Requirement already satisfied: idna4,2.5 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from requests-mindnlp) (3.7)
Requirement already satisfied: urllib33,1.21.1 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from requests-mindnlp) (2.2.2)
Requirement already satisfied: certifi2017.4.17 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from requests-mindnlp) (2024.6.2)
Requirement already satisfied: protobuf3.13.0 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from mindspore-mindnlp) (5.27.1)
Requirement already satisfied: asttokens2.0.4 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from mindspore-mindnlp) (2.0.5)
Requirement already satisfied: pillow6.2.0 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from mindspore-mindnlp) (10.3.0)
Requirement already satisfied: scipy1.5.4 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from mindspore-mindnlp) (1.13.1)
Requirement already satisfied: psutil5.6.1 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from mindspore-mindnlp) (5.9.0)
Requirement already satisfied: astunparse1.6.3 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from mindspore-mindnlp) (1.6.3)
Collecting pygtrie3.0,2.1 (from pyctcdecode-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ec/cd/bd196b2cf014afb1009de8b0f05ecd54011d881944e62763f3c1b1e8ef37/pygtrie-2.5.0-py3-none-any.whl (25 kB)
Collecting hypothesis7,6.14 (from pyctcdecode-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ae/ea/526a7a629fcf6c78a1a6d37f988ca7e02e5b5785ec4de8a194deb40529f4/hypothesis-6.104.2-py3-none-any.whl (462 kB)━━━━━━━━━━━━━━━━ 462.4/462.4 kB 14.4 MB/s eta 0:00:00
Requirement already satisfied: six in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from asttokens2.0.4-mindspore-mindnlp) (1.16.0)
Requirement already satisfied: wheel1.0,0.23.0 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from astunparse1.6.3-mindspore-mindnlp) (0.43.0)
Collecting aiosignal1.1.2 (from aiohttp-datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/76/ac/a7305707cb852b7e16ff80eaf5692309bde30e2b1100a1fcacdc8f731d97/aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting frozenlist1.1.1 (from aiohttp-datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/57/15/172af60c7e150a1d88ecc832f2590721166ae41eab582172fe1e9844eab4/frozenlist-1.4.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (239 kB)━━━━━━━━━━━━━━━━ 239.4/239.4 kB 17.1 MB/s eta 0:00:00
Collecting multidict7.0,4.5 (from aiohttp-datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d0/10/2ff646c471e84af25fe8111985ffb8ec85a3f6e1ade8643bfcfcc0f4d2b1/multidict-6.0.5-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (125 kB)━━━━━━━━━━━━━━━━ 125.9/125.9 kB 31.0 MB/s eta 0:00:00
Collecting yarl2.0,1.0 (from aiohttp-datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c6/d6/5b30ae1d8a13104ee2ceb649f28f2db5ad42afbd5697fd0fc61528bb112c/yarl-1.9.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (300 kB)━━━━━━━━━━━━━━━━ 300.9/300.9 kB 20.5 MB/s eta 0:00:00
Collecting async-timeout5.0,4.0 (from aiohttp-datasets-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a7/fa/e01228c2938de91d47b307831c62ab9e4001e747789d0b05baf779a6488c/async_timeout-4.0.3-py3-none-any.whl (5.7 kB)
Requirement already satisfied: typing-extensions3.7.4.3 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from huggingface-hub0.21.2-datasets-mindnlp) (4.11.0)
Collecting sortedcontainers3.0.0,2.1.0 (from hypothesis7,6.14-pyctcdecode-mindnlp)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/32/46/9cb0e58b2deb7f82b84065f37f3bffeb12413f947f9388e4cac22c4621ce/sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Requirement already satisfied: python-dateutil2.8.2 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from pandas-datasets-mindnlp) (2.9.0.post0)
Requirement already satisfied: pytz2020.1 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from pandas-datasets-mindnlp) (2024.1)
Requirement already satisfied: tzdata2022.7 in /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages (from pandas-datasets-mindnlp) (2024.1)
Building wheels for collected packages: jiebaBuilding wheel for jieba (setup.py) ... doneCreated wheel for jieba: filenamejieba-0.42.1-py3-none-any.whl size19314459 sha256352f23b7dc8b4bade2f918165e055bc707601544400a4918136ba69f220ce9f6Stored in directory: /home/nginx/.cache/pip/wheels/1a/76/68/b6d79c4db704bb18d54f6a73ab551185f4711f9730c0c15d97
Successfully built jieba
Installing collected packages: sortedcontainers, sentencepiece, pygtrie, jieba, addict, xxhash, safetensors, regex, pytest, pyarrow-hotfix, pyarrow, multiprocess, multidict, ml-dtypes, hypothesis, fsspec, frozenlist, async-timeout, yarl, pyctcdecode, aiosignal, tokenizers, aiohttp, datasets, evaluate, mindnlpAttempting uninstall: pytestFound existing installation: pytest 8.0.0Uninstalling pytest-8.0.0:Successfully uninstalled pytest-8.0.0Attempting uninstall: fsspecFound existing installation: fsspec 2024.6.0Uninstalling fsspec-2024.6.0:Successfully uninstalled fsspec-2024.6.0
Successfully installed addict-2.4.0 aiohttp-3.9.5 aiosignal-1.3.1 async-timeout-4.0.3 datasets-2.20.0 evaluate-0.4.2 frozenlist-1.4.1 fsspec-2024.5.0 hypothesis-6.104.2 jieba-0.42.1 mindnlp-0.3.1 ml-dtypes-0.4.0 multidict-6.0.5 multiprocess-0.70.16 pyarrow-16.1.0 pyarrow-hotfix-0.6 pyctcdecode-0.5.0 pygtrie-2.5.0 pytest-7.2.0 regex-2024.5.15 safetensors-0.4.3 sentencepiece-0.2.0 sortedcontainers-2.4.0 tokenizers-0.19.1 xxhash-3.4.1 yarl-1.9.4[notice] A new release of pip is available: 24.1 - 24.1.1
[notice] To update, run: python -m pip install --upgrade pip
显示mindspore模块的基本信息
!pip show mindspore
输出
Name: mindspore
Version: 2.2.14
Summary: MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
Home-page: https://www.mindspore.cn
Author: The MindSpore Authors
Author-email: contactmindspore.cn
License: Apache 2.0
Location: /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages
Requires: asttokens, astunparse, numpy, packaging, pillow, protobuf, psutil, scipy
Required-by: mindnlp 二、模型简介
BERT是一种新型语言模型
全称Bidirectional Encoder Representations from Transformers
中文双向表达的编码变换
Google发布于2018年 用于自然语言处理场景类似的预训练语言模型有 问答 命名实体识别 自然语言推理 文本分类等 BERT模型涉及 Transformer的Encoder 双向结构 BERT模型的主要创新点 pre-train方法 用Masked Language Model捕捉词语 用Next Sentence Prediction捕捉句子 用Masked Language Model方法训练BERT对话时 随机把语料库中15%的单词做Mask操作。 Mask操作的三种情况 80%的单词直接用[Mask]替换 10%的单词直接替换成另一个新的单词 10%的单词保持不变。 问答Question Answering (QA)
自然语言推断Natural Language Inference (NLI)
Next Sentence Prediction预训练任务 目的 让模型理解两个句子之间的联系。 训练内容 输入是句子A和B B有一半的几率是A的下一句 预测B是不是A的下一句 训练结果 Embedding table 12层Transformer权重BERT-BASE 或24层Transformer权重BERT-LARGE。 微调Fine-tuning下游任务 文本分类 相似度判断 阅读理解等。 对话情绪识别Emotion Detection简称EmoTect 对话文本 判断文本情绪类别 积极 消极 中性 计算置信度。 示例
导入mindspore dataset nn context mindnlp等模块
import os
import mindspore
from mindspore.dataset import text, GeneratorDataset, transforms
from mindspore import nn, context
from mindnlp._legacy.engine import Trainer, Evaluator
from mindnlp._legacy.engine.callbacks import CheckpointCallback, BestModelCallback
from mindnlp._legacy.metrics import Accuracy
输出
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 1.037 seconds.
Prefix dict has been built successfully. 三、准备数据集
1. 数据集说明
实验数据集采用百度飞桨的机器人聊天数据 已标注 分词预处理
数据两列制表符\t分隔 情绪分类 0消极 1中性 2积极 中文文本 空格分词 utf8编码
数据示例
label--text_a
0--谁骂人了我从来不骂人我骂的都不是人你是人吗
1--我有事等会儿就回来和你聊
2--我见到你很高兴谢谢你帮我
2.下载数据集
# download dataset
!wget https://baidu-nlp.bj.bcebos.com/emotion_detection-dataset-1.0.0.tar.gz -O emotion_detection.tar.gz
!tar xvf emotion_detection.tar.gz
输出
--2024-07-01 13:38:50-- https://baidu-nlp.bj.bcebos.com/emotion_detection-dataset-1.0.0.tar.gz
Resolving baidu-nlp.bj.bcebos.com (baidu-nlp.bj.bcebos.com)... 119.249.103.5, 113.200.2.111, 2409:8c04:1001:1203:0:ff:b0bb:4f27
Connecting to baidu-nlp.bj.bcebos.com (baidu-nlp.bj.bcebos.com)|119.249.103.5|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1710581 (1.6M) [application/x-gzip]
Saving to: ‘emotion_detection.tar.gz’emotion_detection.t 100%[] 1.63M 8.04MB/s in 0.2s 2024-07-01 13:38:50 (8.04 MB/s) - ‘emotion_detection.tar.gz’ saved [1710581/1710581]data/
data/test.tsv
data/infer.tsv
data/dev.tsv
data/train.tsv
data/vocab.txt
3.定义数据集类
# prepare dataset
class SentimentDataset:Sentiment Dataset
def __init__(self, path):self.path pathself._labels, self._text_a [], []self._load()
def _load(self):with open(self.path, r, encodingutf-8) as f:dataset f.read()lines dataset.split(\n)for line in lines[1:-1]:label, text_a line.split(\t)self._labels.append(int(label))self._text_a.append(text_a)
def __getitem__(self, index):return self._labels[index], self._text_a[index]
def __len__(self):return len(self._labels) 四、数据加载和数据预处理
数据加载和预处理函数
process_dataset()
import numpy as np
def process_dataset(source, tokenizer, max_seq_len64, batch_size32, shuffleTrue):is_ascend mindspore.get_context(device_target) Ascendcolumn_names [label, text_a]dataset GeneratorDataset(source, column_namescolumn_names, shuffleshuffle)# transformstype_cast_op transforms.TypeCast(mindspore.int32)def tokenize_and_pad(text):if is_ascend:tokenized tokenizer(text, paddingmax_length,
truncationTrue, max_lengthmax_seq_len)else:tokenized tokenizer(text)return tokenized[input_ids], tokenized[attention_mask]# map dataset
dataset dataset.map(operationstokenize_and_pad, input_columnstext_a,
output_columns[input_ids, attention_mask])
dataset dataset.map(operations[type_cast_op], input_columnslabel,
output_columnslabels)# batch datasetif is_ascend:dataset dataset.batch(batch_size)else:dataset dataset.padded_batch(batch_size,
pad_info{input_ids: (None, tokenizer.pad_token_id),
attention_mask: (None, 0)})return dataset
数据预处理部分采用静态Shape处理 昇腾NPU环境下暂不支持动态Shape
from mindnlp.transformers import BertTokenizer
tokenizer BertTokenizer.from_pretrained(bert-base-chinese)
输出
100%━━━━━━━━━━━━━━━━━━━━━ 49.0/49.0 [00:0000:00, 3.05kB/s]━107k/0.00 [00:0500:00, 36.3kB/s]━263k/0.00 [00:1500:00, 10.2kB/s]━━━━━━━━━━━━━━━━━━━━━ 624/? [00:0000:00, 56.0kB/s] tokenizer.pad_token_id
输出
0
取训练数据集的列名
dataset_train process_dataset(SentimentDataset(data/train.tsv), tokenizer)
dataset_val process_dataset(SentimentDataset(data/dev.tsv ), tokenizer)
dataset_test process_dataset(SentimentDataset(data/test.tsv ), tokenizer, shuffleFalse)
dataset_train.get_col_names()
输出
[input_ids, attention_mask, labels]
遍历显示训练数据集
print(next(dataset_train.create_tuple_iterator()))
输出
[Tensor(shape[32, 64], dtypeInt64, value
[[ 101, 2769, 4638 ... 0, 0, 0],[ 101, 2769, 3221 ... 0, 0, 0],[ 101, 758, 1282 ... 0, 0, 0],...[ 101, 1217, 678 ... 0, 0, 0],[ 101, 872, 679 ... 0, 0, 0],[ 101, 872, 3766 ... 0, 0, 0]]),Tensor(shape[32, 64], dtypeInt64, value
[[1, 1, 1 ... 0, 0, 0],[1, 1, 1 ... 0, 0, 0],[1, 1, 1 ... 0, 0, 0],...[1, 1, 1 ... 0, 0, 0],[1, 1, 1 ... 0, 0, 0],[1, 1, 1 ... 0, 0, 0]]),Tensor(shape[32], dtypeInt32, value[1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1])]
五、模型构建
BERT 模型 BertForSequenceClassification模块构建 加载预训练权重 设置情感三分类 自动混合精度 实例化优化器 实例化评价指标 设置模型训练的权重保存策略 构建训练器 模型开始训练
from mindnlp.transformers import BertForSequenceClassification, BertModel
from mindnlp._legacy.amp import auto_mixed_precision
# set bert config and define parameters for training
model BertForSequenceClassification.from_pretrained(bert-base-chinese, num_labels3)
model auto_mixed_precision(model, O1)
optimizer nn.Adam(model.trainable_params(), learning_rate2e-5)
(), learning_rate2e-5)
输出
100%━━━━━━━━━━━━━━━━━━ 392M/392M [00:5300:00, 6.82MB/s] The following parameters in checkpoint files are not loaded:
[cls.predictions.bias, cls.predictions.transform.dense.bias, cls.predictions.transform.dense.weight, cls.seq_relationship.bias, cls.seq_relationship.weight, cls.predictions.transform.LayerNorm.bias, cls.predictions.transform.LayerNorm.weight]
The following parameters in models are missing parameter:
[classifier.weight, classifier.bias]
metric Accuracy()
# define callbacks to save checkpoints
ckpoint_cb CheckpointCallback(save_pathcheckpoint, ckpt_namebert_emotect, epochs1, keep_checkpoint_max2)
best_model_cb BestModelCallback(save_pathcheckpoint, ckpt_namebert_emotect_best, auto_loadTrue)
# 构建训练器
trainer Trainer(networkmodel, train_datasetdataset_train,eval_datasetdataset_val, metricsmetric,epochs5, optimizeroptimizer, callbacks[ckpoint_cb, best_model_cb])%%time
# start training
trainer.run(tgt_columnslabels)
输出
The train will start from the checkpoint saved in checkpoint.
Epoch 0: 100%━━━━━━━━━━━━━━ 302/302 [04:0700:00, 2.25s/it, loss0.3460012]
Checkpoint: bert_emotect_epoch_0.ckpt has been saved in epoch: 0.
Evaluate: 100%━━━━━━━━━━━━━━ 34/34 [00:0700:00, 1.07it/s]
Evaluate Score: {Accuracy: 0.9351851851851852}
---------------Best Model: bert_emotect_best.ckpt has been saved in epoch: 0.---------------
Epoch 1: 100%━━━━━━━━━━━━━━ 302/302 [02:3800:00, 1.95it/s, loss0.19017023]
Checkpoint: bert_emotect_epoch_1.ckpt has been saved in epoch: 1.
Evaluate: 100%━━━━━━━━━━━━━━ 34/34 [00:0500:00, 7.48it/s]
Evaluate Score: {Accuracy: 0.9564814814814815}
---------------Best Model: bert_emotect_best.ckpt has been saved in epoch: 1.---------------
Epoch 2: 100%━━━━━━━━━━━━━━ 302/302 [02:4000:00, 1.92it/s, loss0.12662967]
The maximum number of stored checkpoints has been reached.
Checkpoint: bert_emotect_epoch_2.ckpt has been saved in epoch: 2.
Evaluate: 100%━━━━━━━━━━━━━━ 34/34 [00:0400:00, 7.59it/s]
Evaluate Score: {Accuracy: 0.9740740740740741}
---------------Best Model: bert_emotect_best.ckpt has been saved in epoch: 2.---------------
Epoch 3: 100%━━━━━━━━━━━━━━ 302/302 [02:4000:00, 1.92it/s, loss0.08593981]
The maximum number of stored checkpoints has been reached.
Checkpoint: bert_emotect_epoch_3.ckpt has been saved in epoch: 3.
Evaluate: 100%━━━━━━━━━━━━━━ 34/34 [00:0400:00, 7.51it/s]
Evaluate Score: {Accuracy: 0.9833333333333333}
---------------Best Model: bert_emotect_best.ckpt has been saved in epoch: 3.---------------
Epoch 4: 100%━━━━━━━━━━━━━━ 302/302 [02:4100:00, 1.92it/s, loss0.05900709]
The maximum number of stored checkpoints has been reached.
Checkpoint: bert_emotect_epoch_4.ckpt has been saved in epoch: 4.
Evaluate: 100%━━━━━━━━━━━━━━ 34/34 [00:0400:00, 7.39it/s]
Evaluate Score: {Accuracy: 0.9879629629629629}
---------------Best Model: bert_emotect_best.ckpt has been saved in epoch: 4.---------------
Loading best model from checkpoint with [Accuracy]: [0.9879629629629629]...
---------------The model is already load the best model from bert_emotect_best.ckpt.---------------
CPU times: user 22min 58s, sys: 13min 25s, total: 36min 24s
Wall time: 15min 30s
六、模型验证
验证评估 测试数据集 准确率
evaluator Evaluator(networkmodel, eval_datasetdataset_test, metricsmetric)
evaluator.run(tgt_columnslabels)
输出
Evaluate: 100%━━━━━━━━━━━━━━ 33/33 [00:0800:00, 1.20s/it]
Evaluate Score: {Accuracy: 0.8822393822393823}
七、模型推理
遍历推理数据集展示结果与标签。
dataset_infer SentimentDataset(data/infer.tsv)
def predict(text, labelNone):label_map {0: 消极, 1: 中性, 2: 积极}
text_tokenized Tensor([tokenizer(text).input_ids])logits model(text_tokenized)predict_label logits[0].asnumpy().argmax()info finputs: {text}, predict: {label_map[predict_label]}if label is not None:info f , label: {label_map[label]}print(info)
from mindspore import Tensor
for label, text in dataset_infer:predict(text, label) 输出
inputs: 我 要 客观, predict: 中性 , label: 中性
inputs: 靠 你 真是 说 废话 吗, predict: 消极 , label: 消极
inputs: 口嗅 会, predict: 中性 , label: 中性
inputs: 每次 是 表妹 带 窝 飞 因为 窝路痴, predict: 中性 , label: 中性
inputs: 别说 废话 我 问 你 个 问题, predict: 消极 , label: 消极
inputs: 4967 是 新加坡 那 家 银行, predict: 中性 , label: 中性
inputs: 是 我 喜欢 兔子, predict: 积极 , label: 积极
inputs: 你 写 过 黄山 奇石 吗, predict: 中性 , label: 中性
inputs: 一个一个 慢慢来, predict: 中性 , label: 中性
inputs: 我 玩 过 这个 一点 都 不 好玩, predict: 消极 , label: 消极
inputs: 网上 开发 女孩 的 QQ, predict: 中性 , label: 中性
inputs: 背 你 猜 对 了, predict: 中性 , label: 中性
inputs: 我 讨厌 你 哼哼 哼 。 。, predict: 消极 , label: 消极
inputs: 我 讨厌 你 哼哼 哼 。 。, predict: 消极 , label: 消极
八、自定义推理数据集
predict(家人们咱就是说一整个无语住了 绝绝子叠buff)
输出
inputs: 家人们咱就是说一整个无语住了 绝绝子叠buff, predict: 中性
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/diannao/89286.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!