在使用 spacy 进行 NLP 时出现以下错误:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-164-8ef00790b0bb> in <module>2 opt = nlp.begin_training()3 for i in range(n):
----> 4     loss = train(nlp, train_data, opt)5     acc = evaluate(nlp, valid_text, valid_label)6     print(f"Loss: {loss['textcat']:.3f} \t Accuracy: {accuracy:.3f}")<ipython-input-155-47db869d5b7c> in train(model, train, optimizer, batch_size)8     for batch in batches:9         text, label = zip(*batch)
---> 10         model.update(text, label, sgd=optimizer, losses=loss)11     return loss~\AppData\Roaming\Python\Python37\site-packages\spacy\language.py in update(self, docs, golds, drop, sgd, losses, component_cfg)508             sgd = self._optimizer509         # Allow dict of args to GoldParse, instead of GoldParse objects.
--> 510         docs, golds = self._format_docs_and_golds(docs, golds)511         grads = {}512 ~\AppData\Roaming\Python\Python37\site-packages\spacy\language.py in _format_docs_and_golds(self, docs, golds)480                     err = Errors.E151.format(unexp=unexpected, exp=expected_keys)481                     raise ValueError(err)
--> 482                 gold = GoldParse(doc, **gold)483             doc_objs.append(doc)484             gold_objs.append(gold)gold.pyx in spacy.gold.GoldParse.__init__()TypeError: object of type 'float' has no len()原因:
数据中有 NaN,需要处理它
解决方法:
- 直接丢弃,train = train.dropna()
- 替换为空字符串,train = train.fillna(" ")