情感分析预训练模型SKEP教程

2023-11-24 20:09:29

情感分析预训练模型SKEP使用教程
本项目将演示如何使用情感分析预训练模型SKEP完成句子级情感分析、对象级情感分析以及观点抽取任务。此外，通过从情感分析任务，引入和介绍传统文本分类模型如TextCNN等、预训练模型SKEP及其在 PaddleNLP 的使用方式。本项目主要包括“任务介绍”、“常用数据”、“传统情感分析模型TextCNN”、“情感分析预训练模型SKEP”等四个部分。In [ ]
!pip install --upgrade paddlenlp
情感分析任务
众所周知，人类自然语言中包含了丰富的情感色彩：表达人的情绪（如悲伤、快乐）、表达人的心情（如倦怠、忧郁）、表达人的喜好（如喜欢、讨厌）、表达人的个性特征和表达人的立场等等。情感分析在商品喜好、消费决策、舆情分析等场景中均有应用。利用机器自动分析这些情感倾向，不但有助于帮助企业了解消费者对其产品的感受，为产品改进提供依据；同时还有助于企业分析商业伙伴们的态度，以便更好地进行商业决策。通常情况下，人们把情感分析任务看成一个三分类问题： 情感分析任务正向： 表示正面积极的情感，如高兴，幸福，惊喜，期待等。
负向： 表示负面消极的情感，如难过，伤心，愤怒，惊恐等。
其他： 其他类型的情感。
情感分析数据
ChnSenticorp数据集是公开中文情感分析数据集， 其为2分类数据集。PaddleNLP已经内置该数据集，一键即可加载。In [ ]
from paddlenlp.datasets import load_datasettrain_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])idx = 0
for data in train_ds:print(data)idx += 1if idx >= 3:break
传统情感分类模型TextCNN
传统情感分类模型通过CNN、RNN、LSTM、GRU等网络，将文本表征为一个向量。由于RNN、LSTM、GRU等循环神经网络不能并行计算，而CNN在速度方面却有着无可比拟的效果，且由于它的可并行性广被工业界喜爱。2014年Yoon Kim提出TextCNN网络用于文本分类任务中，同时取得不错的效果。在文本中，并不是所有的文本都是全部依赖，可与利用n-gram信息，捕捉文本的局部相关性特征。CNN的原理也是如此，通过卷积核，来补捉文本的局部相关性特征。同时可以使用多个不同的卷积核，来捕捉多个ngram信息。PaddleNLP提供了序列化建模模块paddlenlp.seq2vec模块，该模块可以将文本抽象成一个携带语义的文本向量。关于seq2vec模块更多信息参考：[paddlenlp.seq2vec是什么？快来看看如何用它完成情感分析任务]https://aistudio.baidu.com/aistudio/projectdetail/1283423()接下来，我们看看如何实现TextCNN模型。paddle.nn.Embedding组建word-embedding层
paddlenlp.seq2vec.CNNEncoder组建句子建模层
paddle.nn.Linear构造二分类器
In [ ]
import paddle
import paddle.nn as nn
import paddle.nn.functional as Fimport paddlenlp as nlpclass TextCNNModel(nn.Layer):"""This class implements the Text Convolution Neural Network model.At a high level, the model starts by embedding the tokens and running them througha word embedding. Then, we encode these representations with a `CNNEncoder`.The CNN has one convolution layer for each ngram filter size. Each convolution operation givesout a vector of size num_filter. The number of times a convolution layer will be usedis `num_tokens - ngram_size + 1`. The corresponding maxpooling layer aggregates all theseoutputs from the convolution layer and outputs the max. Lastly, we take the output of the encoder to create a final representation,which is passed through some feed-forward layers to output a logits (`output_layer`)."""def __init__(self,vocab_size,num_classes,emb_dim=128,padding_idx=0,num_filter=128,ngram_filter_sizes=(1, 2, 3),fc_hidden_size=96):super().__init__()self.embedder = nn.Embedding(vocab_size, emb_dim, padding_idx=padding_idx)self.encoder = nlp.seq2vec.CNNEncoder(emb_dim=emb_dim,num_filter=num_filter,ngram_filter_sizes=ngram_filter_sizes)self.fc = nn.Linear(self.encoder.get_output_dim(), fc_hidden_size)self.output_layer = nn.Linear(fc_hidden_size, num_classes)def forward(self, text):# Shape: (batch_size, num_tokens, embedding_dim)embedded_text = self.embedder(text)# Shape: (batch_size, len(ngram_filter_sizes)*num_filter)encoder_out = self.encoder(embedded_text)encoder_out = paddle.tanh(encoder_out)# Shape: (batch_size, fc_hidden_size)fc_out = paddle.tanh(self.fc(encoder_out))# Shape: (batch_size, num_classes)logits = self.output_layer(fc_out)return logitsmodel = TextCNNModel(len(vocab.idx_to_token),len(train_ds.label_list), padding_idx=vocab.to_indices('[PAD]'))
model = paddle.Model(model)
构建词汇表
由于TextCNN模型输入的是文本单词，所以我们还需要对文本进行切词操作。首先需要对整体语料构造词表。通过切词统计词频，去除低频词，从而完成构造词表。我们使用jieba作为中文切词工具。停用词表，我们从网上直接获取：https://github.com/goto456/stopwords/blob/master/baidu_stopwords.txtIn [ ]
import os
from collections import Counter
from itertools import chainimport jiebadef sort_and_write_words(all_words, file_path):words = list(chain(*all_words))words_vocab = Counter(words).most_common()with open(file_path, "w", encoding="utf8") as f:f.write('[UNK]\n[PAD]\n')# filter the count of words below 5# 过滤低频词，词频<5for word, num in words_vocab:if num < 5:continuef.write(word + "\n")all_texts = [data['text'] for data in train_ds]
all_texts += [data['text'] for data in dev_ds]
all_texts += [data['text'] for data in test_ds]
all_words = []
for text in all_texts:words = jieba.lcut(text)words = [word for word in words if word.strip() !='']all_words.append(words)# 写入词表
sort_and_write_words(all_words, "work/vocab.txt")
In [ ]
# 词汇表大小
!wc -l work/vocab.txt
# 停用词表大小
!wc -l work/stopwords.txt
还需对数据作以下处理：将原始数据处理成模型可以读入的格式。首先使用jieba切词，之后将jieba切完后的单词映射词表中单词id。使用paddle.io.DataLoader接口多线程异步加载数据。In [ ]
from functools import partialfrom paddlenlp.data import JiebaTokenizer, Pad, Stack, Tuple, Vocabfrom utils import create_dataloader,convert_examplevocab = Vocab.load_vocabulary("work/vocab.txt", unk_token='[UNK]', pad_token='[PAD]')
tokenizer = JiebaTokenizer(vocab)
trans_fn = partial(convert_example, tokenizer=tokenizer, is_test=False)# 将读入的数据batch化处理，便于模型batch化运算。
# batch中的每个句子将会padding到这个batch中的文本最大长度batch_max_seq_len。
# 当文本长度大于batch_max_seq时，将会截断到batch_max_seq_len；当文本长度小于batch_max_seq时，将会padding补齐到batch_max_seq_len.
batch_size = 64
batchify_fn = lambda samples, fn=Tuple(Pad(axis=0, pad_val=vocab.token_to_idx.get('[PAD]', 1)),  # word_idsStack(dtype="int64")  # label
): [data for data in fn(samples)]
train_loader = create_dataloader(train_ds,trans_fn=trans_fn,batch_size=batch_size,mode='train',batchify_fn=batchify_fn)
dev_loader = create_dataloader(dev_ds,trans_fn=trans_fn,batch_size=batch_size,mode='validation',batchify_fn=batchify_fn)
TextCNN模型训练
处理完了数据之后，还需要定义优化器和损失函数。此处选择准确率Accuracy作为评价指标。In [ ]
# 定义优化器、损失和评价指标.
optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=5e-5)
criterion = paddle.nn.CrossEntropyLoss()
metric = paddle.metric.Accuracy()model.prepare(optimizer, criterion, metric)
# 开始训练和评估
model.fit(train_loader, dev_loader, epochs=5, save_dir='./textcnn_ckpt')
情感分析预训练模型SKEP
近年来，大量的研究表明基于大型语料库的预训练模型（Pretrained Models, PTM）可以学习通用的语言表示，有利于下游NLP任务，同时能够避免从零开始训练模型。随着计算能力的发展，深度模型的出现（即 Transformer）和训练技巧的增强使得 PTM 不断发展，由浅变深。情感预训练模型SKEP（Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis）。SKEP利用情感知识增强预训练模型， 在14项中英情感分析典型任务上全面超越SOTA，此工作已经被ACL 2020录用。SKEP是百度研究团队提出的基于情感知识增强的情感预训练算法，此算法采用无监督方法自动挖掘情感知识，然后利用情感知识构建预训练目标，从而让机器学会理解情感语义。SKEP为各类情感分析任务提供统一且强大的情感语义表示。论文地址：https://arxiv.org/abs/2005.05635百度研究团队在三个典型情感分析任务，句子级情感分类（Sentence-level Sentiment Classification），评价对象级情感分类（Aspect-level Sentiment Classification）、观点抽取（Opinion Role Labeling），共计14个中英文数据上进一步验证了情感预训练模型SKEP的效果。实验表明，以通用预训练模型ERNIE（内部版本）作为初始化，SKEP相比ERNIE平均提升约1.2%，并且较原SOTA平均提升约2%，具体效果如下表：同样地，以之前的句子级情感分类ChnSentiCorp为例，我们看看SKEP的性能表现如何。SKEP模型加载
PaddleNLP已经实现了SKEP预训练模型，可以通过一行代码实现SKEP加载。In [ ]
from paddlenlp.transformers import SkepForSequenceClassification, SkepTokenizermodel = SkepForSequenceClassification.from_pretrained(pretrained_model_name_or_path="skep_ernie_1.0_large_ch", num_classes=2)#len(train_ds.label_list))
tokenizer = SkepTokenizer.from_pretrained(pretrained_model_name_or_path="skep_ernie_1.0_large_ch")
SkepForSequenceClassification可用于句子级情感分析和对象级情感分析任务。其通过预训练模型SKEP获取输入文本的表示，之后将文本表示进行分类。pretrained_model_name_or_path：模型名称。支持"skep_ernie_1.0_large_ch"，"skep_ernie_2.0_large_en"，"skep_roberta_large_en"。"skep_ernie_1.0_large_ch"：是SKEP模型在预训练ernie_1.0_large_ch基础之上在海量中文数据上继续预训练得到的中文预训练模型；
"skep_ernie_2.0_large_en"：是SKEP模型在预训练ernie_2.0_large_en基础之上在海量英文数据上继续预训练得到的英文预训练模型；
"skep_roberta_large_en"：是SKEP模型在预训练roberta_large_en基础之上在海量英文数据上继续预训练得到的英文预训练模型；
num_classes: 数据集分类类别数。关于SKEP模型实现详细信息参考：https://github.com/PaddlePaddle/PaddleNLP/tree/develop/paddlenlp/transformers/skep数据处理
同样地，我们需要将原始ChnSentiCorp数据处理成模型可以读入的数据格式。SKEP模型对中文文本处理按照字粒度进行处理，我们可以使用PaddleNLP内置的SkepTokenizer完成一键式处理。In [ ]
def convert_example(example,tokenizer,max_seq_length=512,is_test=False):"""Builds model inputs from a sequence or a pair of sequence for sequence classification tasksby concatenating and adding special tokens. And creates a mask from the two sequences passed to be used in a sequence-pair classification task.A skep_ernie_1.0_large_ch/skep_ernie_2.0_large_en sequence has the following format:::- single sequence: ``[CLS] X [SEP]``- pair of sequences: ``[CLS] A [SEP] B [SEP]``A skep_ernie_1.0_large_ch/skep_ernie_2.0_large_en sequence pair mask has the following format:::0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1| first sequence    | second sequence |If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).Args:example(obj:`list[str]`): List of input data, containing text and label if it have label.tokenizer(obj:`PretrainedTokenizer`): This tokenizer inherits from :class:`~paddlenlp.transformers.PretrainedTokenizer` which contains most of the methods. Users should refer to the superclass for more information regarding methods.max_seq_len(obj:`int`): The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.is_test(obj:`False`, defaults to `False`): Whether the example contains label or not.Returns:input_ids(obj:`list[int]`): The list of token ids.token_type_ids(obj: `list[int]`): List of sequence pair mask.label(obj:`int`, optional): The input label if not is_test."""encoded_inputs = tokenizer(text=example["text"], max_seq_len=max_seq_length)input_ids = encoded_inputs["input_ids"]token_type_ids = encoded_inputs["token_type_ids"]if not is_test:label = example["label"]return input_ids, token_type_ids, labelelse:return input_ids, token_type_idstrain_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])batch_size = 32
max_seq_length = 128
trans_func = partial(convert_example,tokenizer=tokenizer,max_seq_length=max_seq_length)
batchify_fn = lambda samples, fn=Tuple(Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input_idsPad(axis=0, pad_val=tokenizer.pad_token_type_id),  # token_type_idsStack(dtype="int64")  # labels
): [data for data in fn(samples)]train_data_loader = create_dataloader(train_ds,mode='train',batch_size=batch_size,batchify_fn=batchify_fn,trans_fn=trans_func)
dev_data_loader = create_dataloader(dev_ds,mode='dev',batch_size=batch_size,batchify_fn=batchify_fn,trans_fn=trans_func)
模型训练和评估
定义损失函数、优化器以及评价指标后，即可开始训练。In [13]
import timefrom utils import evaluateepochs = 1
ckpt_dir = "skep_ckpt"
num_training_steps = len(train_data_loader) * epochs
# 除所有的bias和LayerNorm参数，其他参数均需权重衰减
decay_params = [p.name for n, p in model.named_parameters()if not any(nd in n for nd in ["bias", "norm"])
]
optimizer = paddle.optimizer.AdamW(learning_rate=3e-6,parameters=model.parameters(),weight_decay=0.01,apply_decay_param_fun=lambda x: x in decay_params)
criterion = paddle.nn.loss.CrossEntropyLoss()
metric = paddle.metric.Accuracy()global_step = 0
tic_train = time.time()
for epoch in range(1, epochs + 1):for step, batch in enumerate(train_data_loader, start=1):input_ids, token_type_ids, labels = batchlogits = model(input_ids, token_type_ids)loss = criterion(logits, labels)probs = F.softmax(logits, axis=1)correct = metric.compute(probs, labels)metric.update(correct)acc = metric.accumulate()global_step += 1if global_step % 10 == 0:print("global step %d, epoch: %d, batch: %d, loss: %.5f, accu: %.5f, speed: %.2f step/s"% (global_step, epoch, step, loss, acc,10 / (time.time() - tic_train)))tic_train = time.time()loss.backward()optimizer.step()optimizer.clear_grad()if global_step % 100 == 0:save_dir = os.path.join(ckpt_dir, "model_%d" % global_step)if not os.path.exists(save_dir):os.makedirs(save_dir)evaluate(model, criterion, metric, dev_data_loader)model.save_pretrained(save_dir)tokenizer.save_pretrained(save_dir)
模型预测
使用训练得到的模型还可以对文本进行情感预测。In [ ]
from utils import predictdata = ['这个宾馆比较陈旧了，特价的房间也很一般。总体来说一般','怀着十分激动的心情放映，可是看着看着发现，在放映完毕后，出现一集米老鼠的动画片','作为老的四星酒店，房间依然很整洁，相当不错。机场接机服务很好，可以在车上办理入住手续，节省时间。',
]
label_map = {0: 'negative', 1: 'positive'}results = predict(model, data, tokenizer, label_map, batch_size, max_seq_length)
for idx, text in enumerate(data):print('Data: {} \t Label: {}'.format(text, results[idx]))
对象级情感分析
在情感分析任务中，研究人员除了分析句子的情感类型外，还细化到以句子中具体的“方面”为分析主体进行情感分析（aspect-level），如下：这个薯片口味有点咸，太辣了，不过口感很脆。关于薯片的口味方面是一个负向评价（咸，太辣），然而对于口感方面却是一个正向评价（很脆）。我很喜欢夏威夷，就是这边的海鲜太贵了。关于夏威夷是一个正向评价（喜欢），然而对于夏威夷的海鲜却是一个负向评价（价格太贵）。同样SKEP支持对象级情感分析任务。运行以下命令即可完成对象级情感分析任务。In [ ]
# 对象级情感分析训练
!python train_aspect.py --save_dir skep_aspect
In [ ]
# 对象级情感分析预测
!python predict_aspect.py --params_path skep_aspect/model_900/model_state.pdparams
观点抽取
给定一个用户评论文本，抽取其中表达观点的三元组（维度词、评价词、情感极性）示例：这家旅店服务还是不错的，但是房间比较简陋观点1：<服务，不错，积极>
观点2：<房间，简陋，消极>
In [ ]
# 观点抽取训练
!python train_opinion.py --save_dir skep_opinion
In [ ]
# 观点抽取预测
!python predict_opinion.py --params_path skep_opinion/model_900/model_state.pdparams

utils

# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import paddledef read_vocab(vocab_path):vocab = {}with open(vocab_path, "r", encoding="utf8") as f:for idx, line in enumerate(f):word = line.strip("\n")vocab[word] = idxreturn vocabdef create_dataloader(dataset,trans_fn=None,mode='train',batch_size=1,batchify_fn=None):"""Creats dataloader.Args:dataset(obj:`paddle.io.Dataset`): Dataset instance.trans_fn(obj:`callable`, optional, defaults to `None`): function to convert a data sample to input ids, etc.mode(obj:`str`, optional, defaults to obj:`train`): If mode is 'train', it will shuffle the dataset randomly.batch_size(obj:`int`, optional, defaults to 1): The sample number of a mini-batch.batchify_fn(obj:`callable`, optional, defaults to `None`): function to generate mini-batch data by mergingthe sample list, None for only stack each fields of sample in axis0(same as :attr::`np.stack(..., axis=0)`).Returns:dataloader(obj:`paddle.io.DataLoader`): The dataloader which generates batches."""if trans_fn:dataset = dataset.map(trans_fn)shuffle = True if mode == 'train' else Falseif mode == "train":sampler = paddle.io.DistributedBatchSampler(dataset=dataset, batch_size=batch_size, shuffle=shuffle)else:sampler = paddle.io.BatchSampler(dataset=dataset, batch_size=batch_size, shuffle=shuffle)dataloader = paddle.io.DataLoader(dataset, batch_sampler=sampler, collate_fn=batchify_fn)return dataloaderdef convert_example(example, tokenizer, is_test=False):"""Builds model inputs from a sequence for sequence classification tasks. It use `jieba.cut` to tokenize text.Args:example(obj:`list[str]`): List of input data, containing text and label if it have label.tokenizer(obj: paddlenlp.data.JiebaTokenizer): It use jieba to cut the chinese string.is_test(obj:`False`, defaults to `False`): Whether the example contains label or not.Returns:input_ids(obj:`list[int]`): The list of token ids.valid_length(obj:`int`): The input sequence valid length.label(obj:`numpy.array`, data type of int64, optional): The input label if not is_test."""input_ids = tokenizer.encode(example["text"])input_ids = np.array(input_ids, dtype='int64')if not is_test:label = np.array(example["label"], dtype="int64")return input_ids, labelelse:return input_ids@paddle.no_grad()
def evaluate(model, criterion, metric, data_loader):"""Given a dataset, it evals model and computes the metric.Args:model(obj:`paddle.nn.Layer`): A model to classify texts.criterion(obj:`paddle.nn.Layer`): It can compute the loss.metric(obj:`paddle.metric.Metric`): The evaluation metric.data_loader(obj:`paddle.io.DataLoader`): The dataset loader which generates batches."""model.eval()metric.reset()losses = []for batch in data_loader:input_ids, token_type_ids, labels = batchlogits = model(input_ids, token_type_ids)loss = criterion(logits, labels)losses.append(loss.numpy())correct = metric.compute(logits, labels)metric.update(correct)accu = metric.accumulate()print("eval loss: %.5f, accu: %.5f" % (np.mean(losses), accu))model.train()metric.reset()@paddle.no_grad()
def predict(model, data, tokenizer, label_map, batch_size=1, max_seq_length=128):examples = []for text in data:input_ids, token_type_ids = convert_example(text,tokenizer,max_seq_length=max_seq_length,is_test=True)examples.append((input_ids, token_type_ids))# Seperates data into some batches.batches = [examples[idx:idx + batch_size]for idx in range(0, len(examples), batch_size)]batchify_fn = lambda samples, fn=Tuple(Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input idsPad(axis=0, pad_val=tokenizer.pad_token_type_id),  # token type ids): [data for data in fn(samples)]results = []model.eval()for batch in batches:input_ids, token_type_ids = batchify_fn(batch)input_ids = paddle.to_tensor(input_ids)token_type_ids = paddle.to_tensor(token_type_ids)logits = model(input_ids, token_type_ids)probs = F.softmax(logits, axis=1)idx = paddle.argmax(probs, axis=1).numpy()idx = idx.tolist()labels = [label_map[i] for i in idx]results.extend(labels)return results

train_aspect

# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.from functools import partial
import argparse
import os
import random
import timeimport numpy as np
import paddle
import paddle.nn.functional as F
from paddlenlp.data import Stack, Tuple, Pad
from paddlenlp.datasets import load_dataset
from paddlenlp.transformers import SkepForSequenceClassification, SkepTokenizer# yapf: disable
parser = argparse.ArgumentParser()
parser.add_argument("--save_dir", default='./checkpoint', type=str, help="The output directory where the model checkpoints will be written.")
parser.add_argument("--max_seq_length", default=400, type=int, help="The maximum total input sequence length after tokenization. ""Sequences longer than this will be truncated, sequences shorter will be padded.")
parser.add_argument("--batch_size", default=6, type=int, help="Batch size per GPU/CPU for training.")
parser.add_argument("--learning_rate", default=3e-6, type=float, help="The initial learning rate for Adam.")
parser.add_argument("--weight_decay", default=0.0, type=float, help="Weight decay if we apply some.")
parser.add_argument("--epochs", default=50, type=int, help="Total number of training epochs to perform.")
parser.add_argument("--init_from_ckpt", type=str, default=None, help="The path of checkpoint to be loaded.")
parser.add_argument("--seed", type=int, default=1000, help="random seed for initialization")
parser.add_argument('--device', choices=['cpu', 'gpu', 'xpu'], default="gpu", help="Select which device to train model, defaults to gpu.")
args = parser.parse_args()
# yapf: enabledef set_seed(seed):"""Sets random seed."""random.seed(seed)np.random.seed(seed)paddle.seed(seed)def convert_example(example,tokenizer,max_seq_length=512,is_test=False,dataset_name="chnsenticorp"):"""Builds model inputs from a sequence or a pair of sequence for sequence classification tasksby concatenating and adding special tokens. And creates a mask from the two sequences passed to be used in a sequence-pair classification task.A skep_ernie_1.0_large_ch/skep_ernie_2.0_large_en sequence has the following format:::- single sequence: ``[CLS] X [SEP]``- pair of sequences: ``[CLS] A [SEP] B [SEP]``A skep_ernie_1.0_large_ch/skep_ernie_2.0_large_en sequence pair mask has the following format:::0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1| first sequence    | second sequence |If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).note: There is no need token type ids for skep_roberta_large_ch model.Args:example(obj:`list[str]`): List of input data, containing text and label if it have label.tokenizer(obj:`PretrainedTokenizer`): This tokenizer inherits from :class:`~paddlenlp.transformers.PretrainedTokenizer` which contains most of the methods. Users should refer to the superclass for more information regarding methods.max_seq_len(obj:`int`): The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.is_test(obj:`False`, defaults to `False`): Whether the example contains label or not.dataset_name((obj:`str`, defaults to "chnsenticorp"): The dataset name, "chnsenticorp" or "sst-2".Returns:input_ids(obj:`list[int]`): The list of token ids.token_type_ids(obj: `list[int]`): List of sequence pair mask.label(obj:`numpy.array`, data type of int64, optional): The input label if not is_test."""encoded_inputs = tokenizer(text=example["text"],text_pair=example["text_pair"],max_seq_len=max_seq_length)input_ids = encoded_inputs["input_ids"]token_type_ids = encoded_inputs["token_type_ids"]if not is_test:label = np.array([example["label"]], dtype="int64")return input_ids, token_type_ids, labelelse:return input_ids, token_type_idsdef create_dataloader(dataset,mode='train',batch_size=1,batchify_fn=None,trans_fn=None):if trans_fn:dataset = dataset.map(trans_fn)shuffle = True if mode == 'train' else Falseif mode == 'train':batch_sampler = paddle.io.DistributedBatchSampler(dataset, batch_size=batch_size, shuffle=shuffle)else:batch_sampler = paddle.io.BatchSampler(dataset, batch_size=batch_size, shuffle=shuffle)return paddle.io.DataLoader(dataset=dataset,batch_sampler=batch_sampler,collate_fn=batchify_fn,return_list=True)if __name__ == "__main__":set_seed(args.seed)paddle.set_device(args.device)rank = paddle.distributed.get_rank()if paddle.distributed.get_world_size() > 1:paddle.distributed.init_parallel_env()train_ds = load_dataset("seabsa16", "phns", splits=["train"])model = SkepForSequenceClassification.from_pretrained('skep_ernie_1.0_large_ch', num_classes=len(train_ds.label_list))tokenizer = SkepTokenizer.from_pretrained('skep_ernie_1.0_large_ch')trans_func = partial(convert_example,tokenizer=tokenizer,max_seq_length=args.max_seq_length)batchify_fn = lambda samples, fn=Tuple(Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input_idsPad(axis=0, pad_val=tokenizer.pad_token_type_id),  # token_type_idsStack(dtype="int64")  # labels): [data for data in fn(samples)]train_data_loader = create_dataloader(train_ds,mode='train',batch_size=args.batch_size,batchify_fn=batchify_fn,trans_fn=trans_func)if args.init_from_ckpt and os.path.isfile(args.init_from_ckpt):state_dict = paddle.load(args.init_from_ckpt)model.set_dict(state_dict)model = paddle.DataParallel(model)num_training_steps = len(train_data_loader) * args.epochs# Generate parameter names needed to perform weight decay.# All bias and LayerNorm parameters are excluded.decay_params = [p.name for n, p in model.named_parameters()if not any(nd in n for nd in ["bias", "norm"])]optimizer = paddle.optimizer.AdamW(learning_rate=args.learning_rate,parameters=model.parameters(),weight_decay=args.weight_decay,apply_decay_param_fun=lambda x: x in decay_params)criterion = paddle.nn.loss.CrossEntropyLoss()metric = paddle.metric.Accuracy()global_step = 0tic_train = time.time()for epoch in range(1, args.epochs + 1):for step, batch in enumerate(train_data_loader, start=1):input_ids, token_type_ids, labels = batchlogits = model(input_ids, token_type_ids)loss = criterion(logits, labels)probs = F.softmax(logits, axis=1)correct = metric.compute(probs, labels)metric.update(correct)acc = metric.accumulate()global_step += 1if global_step % 10 == 0 and rank == 0:print("global step %d, epoch: %d, batch: %d, loss: %.5f, accu: %.5f, speed: %.2f step/s"% (global_step, epoch, step, loss, acc,10 / (time.time() - tic_train)))tic_train = time.time()loss.backward()optimizer.step()optimizer.clear_grad()if global_step % 100 == 0 and rank == 0:save_dir = os.path.join(args.save_dir, "model_%d" % global_step)if not os.path.exists(save_dir):os.makedirs(save_dir)# Need better way to get inner model of DataParallelmodel._layers.save_pretrained(save_dir)tokenizer.save_pretrained(save_dir)

#predict_aspect

# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.from functools import partial
import argparse
import os
import random
import timeimport numpy as np
import paddle
import paddle.nn.functional as F
from paddlenlp.data import Stack, Tuple, Pad
from paddlenlp.datasets import load_dataset
from paddlenlp.transformers import SkepForSequenceClassification, SkepTokenizer# yapf: disable
parser = argparse.ArgumentParser()
parser.add_argument("--params_path", type=str, required=True, help="The path to model parameters to be loaded.")
parser.add_argument("--max_seq_length", default=400, type=int, help="The maximum total input sequence length after tokenization. ""Sequences longer than this will be truncated, sequences shorter will be padded.")
parser.add_argument("--batch_size", default=6, type=int, help="Batch size per GPU/CPU for prediction.")
parser.add_argument('--device', choices=['cpu', 'gpu', 'xpu'], default="gpu", help="Select which device to train model, defaults to gpu.")
args = parser.parse_args()
# yapf: enable@paddle.no_grad()
def predict(model, data_loader, label_map):"""Given a prediction dataset, it gives the prediction results.Args:model(obj:`paddle.nn.Layer`): A model to classify texts.data_loader(obj:`paddle.io.DataLoader`): The dataset loader which generates batches.label_map(obj:`dict`): The label id (key) to label str (value) map."""model.eval()results = []for batch in data_loader:input_ids, token_type_ids = batchlogits = model(input_ids, token_type_ids)probs = F.softmax(logits, axis=1)idx = paddle.argmax(probs, axis=1).numpy()idx = idx.tolist()labels = [label_map[i] for i in idx]results.extend(labels)return resultsdef convert_example(example,tokenizer,max_seq_length=512,is_test=False,dataset_name="chnsenticorp"):"""Builds model inputs from a sequence or a pair of sequence for sequence classification tasksby concatenating and adding special tokens. And creates a mask from the two sequences passed to be used in a sequence-pair classification task.A skep_ernie_1.0_large_ch/skep_ernie_2.0_large_en sequence has the following format:::- single sequence: ``[CLS] X [SEP]``- pair of sequences: ``[CLS] A [SEP] B [SEP]``A skep_ernie_1.0_large_ch/skep_ernie_2.0_large_en sequence pair mask has the following format:::0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1| first sequence    | second sequence |If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).note: There is no need token type ids for skep_roberta_large_ch model.Args:example(obj:`list[str]`): List of input data, containing text and label if it have label.tokenizer(obj:`PretrainedTokenizer`): This tokenizer inherits from :class:`~paddlenlp.transformers.PretrainedTokenizer` which contains most of the methods. Users should refer to the superclass for more information regarding methods.max_seq_len(obj:`int`): The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.is_test(obj:`False`, defaults to `False`): Whether the example contains label or not.dataset_name((obj:`str`, defaults to "chnsenticorp"): The dataset name, "chnsenticorp" or "sst-2".Returns:input_ids(obj:`list[int]`): The list of token ids.token_type_ids(obj: `list[int]`): List of sequence pair mask.label(obj:`numpy.array`, data type of int64, optional): The input label if not is_test."""encoded_inputs = tokenizer(text=example["text"],text_pair=example["text_pair"],max_seq_len=max_seq_length)input_ids = encoded_inputs["input_ids"]token_type_ids = encoded_inputs["token_type_ids"]if not is_test:label = np.array([example["label"]], dtype="int64")return input_ids, token_type_ids, labelelse:return input_ids, token_type_idsdef create_dataloader(dataset,mode='train',batch_size=1,batchify_fn=None,trans_fn=None):if trans_fn:dataset = dataset.map(trans_fn)shuffle = True if mode == 'train' else Falseif mode == 'train':batch_sampler = paddle.io.DistributedBatchSampler(dataset, batch_size=batch_size, shuffle=shuffle)else:batch_sampler = paddle.io.BatchSampler(dataset, batch_size=batch_size, shuffle=shuffle)return paddle.io.DataLoader(dataset=dataset,batch_sampler=batch_sampler,collate_fn=batchify_fn,return_list=True)if __name__ == "__main__":test_ds = load_dataset("seabsa16", "phns", splits=["test"])label_map = {0: 'negative', 1: 'positive'}model = SkepForSequenceClassification.from_pretrained('skep_ernie_1.0_large_ch', num_classes=len(label_map))tokenizer = SkepTokenizer.from_pretrained('skep_ernie_1.0_large_ch')trans_func = partial(convert_example,tokenizer=tokenizer,max_seq_length=args.max_seq_length,is_test=True)batchify_fn = lambda samples, fn=Tuple(Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input_idsPad(axis=0, pad_val=tokenizer.pad_token_type_id),  # token_type_ids): [data for data in fn(samples)]test_data_loader = create_dataloader(test_ds,mode='test',batch_size=args.batch_size,batchify_fn=batchify_fn,trans_fn=trans_func)if args.params_path and os.path.isfile(args.params_path):state_dict = paddle.load(args.params_path)model.set_dict(state_dict)print("Loaded parameters from %s" % args.params_path)results = predict(model, test_data_loader, label_map)for idx, text in enumerate(test_ds.data):print('Data: {} \t Label: {}'.format(text, results[idx]))

train_opinion

# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.from functools import partial
import argparse
import os
import random
import timeimport numpy as np
import paddle
import paddle.nn.functional as F
from paddlenlp.data import Stack, Tuple, Pad
from paddlenlp.datasets import load_dataset
from paddlenlp.metrics import ChunkEvaluator
from paddlenlp.transformers import SkepCrfForTokenClassification, SkepModel, SkepTokenizer# yapf: disable
parser = argparse.ArgumentParser()
parser.add_argument("--save_dir", default='./checkpoint', type=str, help="The output directory where the model checkpoints will be written.")
parser.add_argument("--max_seq_length", default=128, type=int, help="The maximum total input sequence length after tokenization. ""Sequences longer than this will be truncated, sequences shorter will be padded.")
parser.add_argument("--batch_size", default=32, type=int, help="Batch size per GPU/CPU for training.")
parser.add_argument("--learning_rate", default=5e-7, type=float, help="The initial learning rate for Adam.")
parser.add_argument("--weight_decay", default=0.0, type=float, help="Weight decay if we apply some.")
parser.add_argument("--epochs", default=10, type=int, help="Total number of training epochs to perform.")
parser.add_argument("--init_from_ckpt", type=str, default=None, help="The path of checkpoint to be loaded.")
parser.add_argument("--seed", type=int, default=1000, help="random seed for initialization")
parser.add_argument('--device', choices=['cpu', 'gpu', 'xpu'], default="gpu", help="Select which device to train model, defaults to gpu.")
args = parser.parse_args()
# yapf: enabledef set_seed(seed):"""Sets random seed."""random.seed(seed)np.random.seed(seed)paddle.seed(seed)def convert_example_to_feature(example,tokenizer,max_seq_len=512,no_entity_label="O",is_test=False):"""Builds model inputs from a sequence or a pair of sequence for sequence classification tasksby concatenating and adding special tokens. And creates a mask from the two sequences passed to be used in a sequence-pair classification task.A skep_ernie_1.0_large_ch/skep_ernie_2.0_large_en sequence has the following format:::- single sequence: ``[CLS] X [SEP]``- pair of sequences: ``[CLS] A [SEP] B [SEP]``A skep_ernie_1.0_large_ch/skep_ernie_2.0_large_en sequence pair mask has the following format:::0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1| first sequence    | second sequence |If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).Args:example(obj:`list[str]`): List of input data, containing text and label if it have label.tokenizer(obj:`PretrainedTokenizer`): This tokenizer inherits from :class:`~paddlenlp.transformers.PretrainedTokenizer` which contains most of the methods. Users should refer to the superclass for more information regarding methods.max_seq_len(obj:`int`): The maximum total input sequence length after tokenization.Sequences longer than this will be truncated, sequences shorter will be padded.no_entity_label(obj:`str`, defaults to "O"): The label represents that the token isn't an entity. is_test(obj:`False`, defaults to `False`): Whether the example contains label or not.Returns:input_ids(obj:`list[int]`): The list of token ids.token_type_ids(obj: `list[int]`): List of sequence pair mask.label(obj:`list[int]`, optional): The input label if not test data."""tokens = example['tokens']labels = example['labels']tokenized_input = tokenizer(tokens,return_length=True,is_split_into_words=True,max_seq_len=max_seq_len)input_ids = tokenized_input['input_ids']token_type_ids = tokenized_input['token_type_ids']seq_len = tokenized_input['seq_len']if is_test:return input_ids, token_type_ids, seq_lenelse:labels = labels[:(max_seq_len - 2)]encoded_label = np.array([no_entity_label] + labels + [no_entity_label], dtype="int64")return input_ids, token_type_ids, seq_len, encoded_labeldef create_dataloader(dataset,mode='train',batch_size=1,batchify_fn=None,trans_fn=None):if trans_fn:dataset = dataset.map(trans_fn)shuffle = True if mode == 'train' else Falseif mode == 'train':batch_sampler = paddle.io.DistributedBatchSampler(dataset, batch_size=batch_size, shuffle=shuffle)else:batch_sampler = paddle.io.BatchSampler(dataset, batch_size=batch_size, shuffle=shuffle)return paddle.io.DataLoader(dataset=dataset,batch_sampler=batch_sampler,collate_fn=batchify_fn,return_list=True)if __name__ == "__main__":paddle.set_device(args.device)rank = paddle.distributed.get_rank()if paddle.distributed.get_world_size() > 1:paddle.distributed.init_parallel_env()train_ds = load_dataset("cote", "dp", splits=['train'])# The COTE_DP dataset labels with "BIO" schema.label_map = {label: idx for idx, label in enumerate(train_ds.label_list)}# `no_entity_label` represents that the token isn't an entity. no_entity_label_idx = label_map.get("O", 2)# `ignore_label` is using to pad input labels.ignore_label = -1set_seed(args.seed)skep = SkepModel.from_pretrained('skep_ernie_1.0_large_ch')model = SkepCrfForTokenClassification(skep, num_classes=len(train_ds.label_list))tokenizer = SkepTokenizer.from_pretrained('skep_ernie_1.0_large_ch')trans_func = partial(convert_example_to_feature,tokenizer=tokenizer,max_seq_len=args.max_seq_length,no_entity_label=no_entity_label_idx,is_test=False)batchify_fn = lambda samples, fn=Tuple(Pad(axis=0, pad_val=tokenizer.vocab[tokenizer.pad_token]),  # input idsPad(axis=0, pad_val=tokenizer.vocab[tokenizer.pad_token]),  # token type idsStack(dtype='int64'),  # sequence lensPad(axis=0, pad_val=ignore_label)  # labels): [data for data in fn(samples)]train_data_loader = create_dataloader(train_ds,mode='train',batch_size=args.batch_size,batchify_fn=batchify_fn,trans_fn=trans_func)if args.init_from_ckpt and os.path.isfile(args.init_from_ckpt):state_dict = paddle.load(args.init_from_ckpt)model.set_dict(state_dict)model = paddle.DataParallel(model)num_training_steps = len(train_data_loader) * args.epochs# Generate parameter names needed to perform weight decay.# All bias and LayerNorm parameters are excluded.decay_params = [p.name for n, p in model.named_parameters()if not any(nd in n for nd in ["bias", "norm"])]optimizer = paddle.optimizer.AdamW(learning_rate=args.learning_rate,parameters=model.parameters(),weight_decay=args.weight_decay,apply_decay_param_fun=lambda x: x in decay_params)metric = ChunkEvaluator(label_list=train_ds.label_list, suffix=True)global_step = 0tic_train = time.time()for epoch in range(1, args.epochs + 1):for step, batch in enumerate(train_data_loader, start=1):input_ids, token_type_ids, seq_lens, labels = batchloss = model(input_ids, token_type_ids, seq_lens=seq_lens, labels=labels)avg_loss = paddle.mean(loss)global_step += 1if global_step % 10 == 0 and rank == 0:print("global step %d, epoch: %d, batch: %d, loss: %.5f, speed: %.2f step/s"% (global_step, epoch, step, avg_loss,10 / (time.time() - tic_train)))tic_train = time.time()loss.backward()optimizer.step()optimizer.clear_grad()if global_step % 100 == 0 and rank == 0:save_dir = os.path.join(args.save_dir, "model_%d" % global_step)if not os.path.exists(save_dir):os.makedirs(save_dir)file_name = os.path.join(save_dir, "model_state.pdparam")# Need better way to get inner model of DataParallelpaddle.save(model._layers.state_dict(), file_name)

predict_opinion

# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.import argparse
import os
from functools import partialimport numpy as np
import paddle
import paddle.nn.functional as F
from paddlenlp.data import Stack, Tuple, Pad
from paddlenlp.datasets import load_dataset
from paddlenlp.transformers import SkepCrfForTokenClassification, SkepModel, SkepTokenizer# yapf: disable
parser = argparse.ArgumentParser()
parser.add_argument("--params_path", type=str, required=True, help="The path to model parameters to be loaded.")
parser.add_argument("--max_seq_length", default=128, type=int, help="The maximum total input sequence length after tokenization. ""Sequences longer than this will be truncated, sequences shorter will be padded.")
parser.add_argument("--batch_size", default=32, type=int, help="Batch size per GPU/CPU for training.")
parser.add_argument('--device', choices=['cpu', 'gpu', 'xpu'], default="gpu", help="Select which device to train model, defaults to gpu.")
args = parser.parse_args()
# yapf: enabledef convert_example(example, tokenizer, max_seq_length=512, is_test=False):"""Builds model inputs from a sequence or a pair of sequence for sequence classification tasksby concatenating and adding special tokens. And creates a mask from the two sequences passed to be used in a sequence-pair classification task.A skep_ernie_1.0_large_ch/skep_ernie_2.0_large_en sequence has the following format:::- single sequence: ``[CLS] X [SEP]``- pair of sequences: ``[CLS] A [SEP] B [SEP]``A skep_ernie_1.0_large_ch/skep_ernie_2.0_large_en sequence pair mask has the following format:::0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1| first sequence    | second sequence |If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).Args:example(obj:`list[str]`): List of input data, containing text and label if it have label.tokenizer(obj:`PretrainedTokenizer`): This tokenizer inherits from :class:`~paddlenlp.transformers.PretrainedTokenizer` which contains most of the methods. Users should refer to the superclass for more information regarding methods.max_seq_len(obj:`int`): The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.Returns:input_ids(obj:`list[int]`): The list of token ids.token_type_ids(obj: `list[int]`): List of sequence pair mask. """tokens = example["tokens"]encoded_inputs = tokenizer(tokens,return_length=True,is_split_into_words=True,max_seq_len=max_seq_length)input_ids = encoded_inputs["input_ids"]token_type_ids = encoded_inputs["token_type_ids"]seq_len = encoded_inputs["seq_len"]return input_ids, token_type_ids, seq_len@paddle.no_grad()
def predict(model, data_loader, label_map):"""Given a prediction dataset, it gives the prediction results.Args:model(obj:`paddle.nn.Layer`): A model to classify texts.data_loader(obj:`paddle.io.DataLoader`): The dataset loader which generates batches.label_map(obj:`dict`): The label id (key) to label str (value) map."""model.eval()results = []for input_ids, token_type_ids, seq_lens in data_loader:preds = model(input_ids, token_type_ids, seq_lens=seq_lens)tags = parse_predict_result(preds.numpy(), seq_lens.numpy(), label_map)results.extend(tags)return resultsdef parse_predict_result(predictions, seq_lens, label_map):"""Parses the prediction results to the label tag."""pred_tag = []for idx, pred in enumerate(predictions):seq_len = seq_lens[idx]# drop the "[CLS]" and "[SEP]" tokentag = [label_map[i] for i in pred[1:seq_len - 1]]pred_tag.append(tag)return pred_tagdef create_dataloader(dataset,mode='train',batch_size=1,batchify_fn=None,trans_fn=None):if trans_fn:dataset = dataset.map(trans_fn)shuffle = True if mode == 'train' else Falseif mode == 'train':batch_sampler = paddle.io.DistributedBatchSampler(dataset, batch_size=batch_size, shuffle=shuffle)else:batch_sampler = paddle.io.BatchSampler(dataset, batch_size=batch_size, shuffle=shuffle)return paddle.io.DataLoader(dataset=dataset,batch_sampler=batch_sampler,collate_fn=batchify_fn,return_list=True)if __name__ == "__main__":paddle.set_device(args.device)test_ds = load_dataset("cote", "dp", splits=['test'])# The COTE_DP dataset labels with "BIO" schema.label_map = {0: "B", 1: "I", 2: "O"}# `no_entity_label` represents that the token isn't an entity. no_entity_label_idx = 2skep = SkepModel.from_pretrained('skep_ernie_1.0_large_ch')model = SkepCrfForTokenClassification(skep, num_classes=len(test_ds.label_list))tokenizer = SkepTokenizer.from_pretrained('skep_ernie_1.0_large_ch')if args.params_path and os.path.isfile(args.params_path):state_dict = paddle.load(args.params_path)model.set_dict(state_dict)print("Loaded parameters from %s" % args.params_path)trans_func = partial(convert_example,tokenizer=tokenizer,max_seq_length=args.max_seq_length)batchify_fn = lambda samples, fn=Tuple(Pad(axis=0, pad_val=tokenizer.vocab[tokenizer.pad_token]),  # input idsPad(axis=0, pad_val=tokenizer.vocab[tokenizer.pad_token]),  # token type idsStack(dtype='int64'),  # sequence lens): [data for data in fn(samples)]test_data_loader = create_dataloader(test_ds,mode='test',batch_size=args.batch_size,batchify_fn=batchify_fn,trans_fn=trans_func)results = predict(model, test_data_loader, label_map)for idx, example in enumerate(test_ds.data):print(len(example['tokens']), len(results[idx]))print('Data: {} \t Label: {}'.format(example, results[idx]))

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > 好家伙，直播啥都可以卖
下一篇 > 想让抖音做到月入10W+,这些蹭热度的技巧必须要掌握！超详细！丨国仁网络资讯

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce

情感分析预训练模型SKEP教程

相关文章