一、说明
本文是Hugging Face的用户手册。加入 Hugging Face 社区,在模型、数据集和空间上进行协作,通过加速推理获得更快的示例。
二、变形金刚
适用于 PyTorch、TensorFlow 和 JAX 的先进机器学习。
Transformers 提供 API 和工具,可轻松下载和训练最先进的预训练模型。使用预训练模型可以降低计算成本和碳足迹,并节省从头开始训练模型所需的时间和资源。这些模型支持不同模式的常见任务,例如:
    自然语言处理:文本分类、命名实体识别、问答、语言建模、摘要、翻译、多项选择和文本生成。
     计算机视觉:图像分类、对象检测和分割。
    音频:自动语音识别和音频分类。
     多模态:表格问答、光学字符识别、从扫描文档中提取信息、视频分类和视觉问答。
Transformer 支持 PyTorch、TensorFlow 和 JAX 之间的框架互操作性。这提供了在模型生命周期的每个阶段使用不同框架的灵活性;在一个框架中用三行代码训练模型,然后在另一个框架中加载它进行推理。还可以将模型导出为 ONNX 和 TorchScript 等格式,以便在生产环境中进行部署。
立即加入 Hub、论坛或 Discord 上不断壮大的社区!
如果您正在寻求 Hugging Face 团队的定制支持
三、内容
该文档分为五个部分:
-  “入门”提供了库的快速浏览以及启动和运行的安装说明。 
-  如果您是初学者,教程是一个很好的起点。本节将帮助您获得开始使用库所需的基本技能。 
-  操作指南向您展示如何实现特定目标,例如微调用于语言建模的预训练模型或如何编写和共享自定义模型。 
-  CONCEPTUAL GUIDES 对变形金刚的模型、任务和设计理念🤗背后的基本概念和思想进行了更多的讨论和解释。 
-  API 描述了所有类和函数: 
MAIN CLASSES 详细介绍了最重要的类,如配置、模型、分词器和管道。
 MODELS 详细介绍了与库中实现的每个模型相关的类和函数。
 INTERNAL HELPERS 详细介绍了内部使用的实用程序类和函数。
四、支持的模型和框架
下表显示了库中每个模型的当前支持,无论它们是否具有 Python 分词器(称为“慢速”)。由 🤗 Tokenizers 库支持的“快速”分词器,无论它们在 Jax 中是否支持(通过 Flax)、PyTorch 和/或 TensorFlow。
| Model | PyTorch support | TensorFlow support | Flax Support | 
|---|---|---|---|
| ALBERT | ✅ | ✅ | ✅ | 
| ALIGN | ✅ | ❌ | ❌ | 
| AltCLIP | ✅ | ❌ | ❌ | 
| Audio Spectrogram Transformer | ✅ | ❌ | ❌ | 
| Autoformer | ✅ | ❌ | ❌ | 
| Bark | ✅ | ❌ | ❌ | 
| BART | ✅ | ✅ | ✅ | 
| BARThez | ✅ | ✅ | ✅ | 
| BARTpho | ✅ | ✅ | ✅ | 
| BEiT | ✅ | ❌ | ✅ | 
| BERT | ✅ | ✅ | ✅ | 
| Bert Generation | ✅ | ❌ | ❌ | 
| BertJapanese | ✅ | ✅ | ✅ | 
| BERTweet | ✅ | ✅ | ✅ | 
| BigBird | ✅ | ❌ | ✅ | 
| BigBird-Pegasus | ✅ | ❌ | ❌ | 
| BioGpt | ✅ | ❌ | ❌ | 
| BiT | ✅ | ❌ | ❌ | 
| Blenderbot | ✅ | ✅ | ✅ | 
| BlenderbotSmall | ✅ | ✅ | ✅ | 
| BLIP | ✅ | ✅ | ❌ | 
| BLIP-2 | ✅ | ❌ | ❌ | 
| BLOOM | ✅ | ❌ | ✅ | 
| BORT | ✅ | ✅ | ✅ | 
| BridgeTower | ✅ | ❌ | ❌ | 
| BROS | ✅ | ❌ | ❌ | 
| ByT5 | ✅ | ✅ | ✅ | 
| CamemBERT | ✅ | ✅ | ❌ | 
| CANINE | ✅ | ❌ | ❌ | 
| Chinese-CLIP | ✅ | ❌ | ❌ | 
| CLAP | ✅ | ❌ | ❌ | 
| CLIP | ✅ | ✅ | ✅ | 
| CLIPSeg | ✅ | ❌ | ❌ | 
| CLVP | ✅ | ❌ | ❌ | 
| CodeGen | ✅ | ❌ | ❌ | 
| CodeLlama | ✅ | ❌ | ✅ | 
| Conditional DETR | ✅ | ❌ | ❌ | 
| ConvBERT | ✅ | ✅ | ❌ | 
| ConvNeXT | ✅ | ✅ | ❌ | 
| ConvNeXTV2 | ✅ | ✅ | ❌ | 
| CPM | ✅ | ✅ | ✅ | 
| CPM-Ant | ✅ | ❌ | ❌ | 
| CTRL | ✅ | ✅ | ❌ | 
| CvT | ✅ | ✅ | ❌ | 
| Data2VecAudio | ✅ | ❌ | ❌ | 
| Data2VecText | ✅ | ❌ | ❌ | 
| Data2VecVision | ✅ | ✅ | ❌ | 
| DeBERTa | ✅ | ✅ | ❌ | 
| DeBERTa-v2 | ✅ | ✅ | ❌ | 
| Decision Transformer | ✅ | ❌ | ❌ | 
| Deformable DETR | ✅ | ❌ | ❌ | 
| DeiT | ✅ | ✅ | ❌ | 
| DePlot | ✅ | ❌ | ❌ | 
| DETA | ✅ | ❌ | ❌ | 
| DETR | ✅ | ❌ | ❌ | 
| DialoGPT | ✅ | ✅ | ✅ | 
| DiNAT | ✅ | ❌ | ❌ | 
| DINOv2 | ✅ | ❌ | ❌ | 
| DistilBERT | ✅ | ✅ | ✅ | 
| DiT | ✅ | ❌ | ✅ | 
| DonutSwin | ✅ | ❌ | ❌ | 
| DPR | ✅ | ✅ | ❌ | 
| DPT | ✅ | ❌ | ❌ | 
| EfficientFormer | ✅ | ✅ | ❌ | 
| EfficientNet | ✅ | ❌ | ❌ | 
| ELECTRA | ✅ | ✅ | ✅ | 
| EnCodec | ✅ | ❌ | ❌ | 
| Encoder decoder | ✅ | ✅ | ✅ | 
| ERNIE | ✅ | ❌ | ❌ | 
| ErnieM | ✅ | ❌ | ❌ | 
| ESM | ✅ | ✅ | ❌ | 
| FairSeq Machine-Translation | ✅ | ❌ | ❌ | 
| Falcon | ✅ | ❌ | ❌ | 
| FastSpeech2Conformer | ✅ | ❌ | ❌ | 
| FLAN-T5 | ✅ | ✅ | ✅ | 
| FLAN-UL2 | ✅ | ✅ | ✅ | 
| FlauBERT | ✅ | ✅ | ❌ | 
| FLAVA | ✅ | ❌ | ❌ | 
| FNet | ✅ | ❌ | ❌ | 
| FocalNet | ✅ | ❌ | ❌ | 
| Funnel Transformer | ✅ | ✅ | ❌ | 
| Fuyu | ✅ | ❌ | ❌ | 
| GIT | ✅ | ❌ | ❌ | 
| GLPN | ✅ | ❌ | ❌ | 
| GPT Neo | ✅ | ❌ | ✅ | 
| GPT NeoX | ✅ | ❌ | ❌ | 
| GPT NeoX Japanese | ✅ | ❌ | ❌ | 
| GPT-J | ✅ | ✅ | ✅ | 
| GPT-Sw3 | ✅ | ✅ | ✅ | 
| GPTBigCode | ✅ | ❌ | ❌ | 
| GPTSAN-japanese | ✅ | ❌ | ❌ | 
| Graphormer | ✅ | ❌ | ❌ | 
| GroupViT | ✅ | ✅ | ❌ | 
| HerBERT | ✅ | ✅ | ✅ | 
| Hubert | ✅ | ✅ | ❌ | 
| I-BERT | ✅ | ❌ | ❌ | 
| IDEFICS | ✅ | ❌ | ❌ | 
| ImageGPT | ✅ | ❌ | ❌ | 
| Informer | ✅ | ❌ | ❌ | 
| InstructBLIP | ✅ | ❌ | ❌ | 
| Jukebox | ✅ | ❌ | ❌ | 
| KOSMOS-2 | ✅ | ❌ | ❌ | 
| LayoutLM | ✅ | ✅ | ❌ | 
| LayoutLMv2 | ✅ | ❌ | ❌ | 
| LayoutLMv3 | ✅ | ✅ | ❌ | 
| LayoutXLM | ✅ | ❌ | ❌ | 
| LED | ✅ | ✅ | ❌ | 
| LeViT | ✅ | ❌ | ❌ | 
| LiLT | ✅ | ❌ | ❌ | 
| LLaMA | ✅ | ❌ | ✅ | 
| Llama2 | ✅ | ❌ | ✅ | 
| LLaVa | ✅ | ❌ | ❌ | 
| Longformer | ✅ | ✅ | ❌ | 
| LongT5 | ✅ | ❌ | ✅ | 
| LUKE | ✅ | ❌ | ❌ | 
| LXMERT | ✅ | ✅ | ❌ | 
| M-CTC-T | ✅ | ❌ | ❌ | 
| M2M100 | ✅ | ❌ | ❌ | 
| MADLAD-400 | ✅ | ✅ | ✅ | 
| Marian | ✅ | ✅ | ✅ | 
| MarkupLM | ✅ | ❌ | ❌ | 
| Mask2Former | ✅ | ❌ | ❌ | 
| MaskFormer | ✅ | ❌ | ❌ | 
| MatCha | ✅ | ❌ | ❌ | 
| mBART | ✅ | ✅ | ✅ | 
| mBART-50 | ✅ | ✅ | ✅ | 
| MEGA | ✅ | ❌ | ❌ | 
| Megatron-BERT | ✅ | ❌ | ❌ | 
| Megatron-GPT2 | ✅ | ✅ | ✅ | 
| MGP-STR | ✅ | ❌ | ❌ | 
| Mistral | ✅ | ❌ | ❌ | 
| Mixtral | ✅ | ❌ | ❌ | 
| mLUKE | ✅ | ❌ | ❌ | 
| MMS | ✅ | ✅ | ✅ | 
| MobileBERT | ✅ | ✅ | ❌ | 
| MobileNetV1 | ✅ | ❌ | ❌ | 
| MobileNetV2 | ✅ | ❌ | ❌ | 
| MobileViT | ✅ | ✅ | ❌ | 
| MobileViTV2 | ✅ | ❌ | ❌ | 
| MPNet | ✅ | ✅ | ❌ | 
| MPT | ✅ | ❌ | ❌ | 
| MRA | ✅ | ❌ | ❌ | 
| MT5 | ✅ | ✅ | ✅ | 
| MusicGen | ✅ | ❌ | ❌ | 
| MVP | ✅ | ❌ | ❌ | 
| NAT | ✅ | ❌ | ❌ | 
| Nezha | ✅ | ❌ | ❌ | 
| NLLB | ✅ | ❌ | ❌ | 
| NLLB-MOE | ✅ | ❌ | ❌ | 
| Nougat | ✅ | ✅ | ✅ | 
| Nyströmformer | ✅ | ❌ | ❌ | 
| OneFormer | ✅ | ❌ | ❌ | 
| OpenAI GPT | ✅ | ✅ | ❌ | 
| OpenAI GPT-2 | ✅ | ✅ | ✅ | 
| OpenLlama | ✅ | ❌ | ❌ | 
| OPT | ✅ | ✅ | ✅ | 
| OWL-ViT | ✅ | ❌ | ❌ | 
| OWLv2 | ✅ | ❌ | ❌ | 
| PatchTSMixer | ✅ | ❌ | ❌ | 
| PatchTST | ✅ | ❌ | ❌ | 
| Pegasus | ✅ | ✅ | ✅ | 
| PEGASUS-X | ✅ | ❌ | ❌ | 
| Perceiver | ✅ | ❌ | ❌ | 
| Persimmon | ✅ | ❌ | ❌ | 
| Phi | ✅ | ❌ | ❌ | 
| PhoBERT | ✅ | ✅ | ✅ | 
| Pix2Struct | ✅ | ❌ | ❌ | 
| PLBart | ✅ | ❌ | ❌ | 
| PoolFormer | ✅ | ❌ | ❌ | 
| Pop2Piano | ✅ | ❌ | ❌ | 
| ProphetNet | ✅ | ❌ | ❌ | 
| PVT | ✅ | ❌ | ❌ | 
| QDQBert | ✅ | ❌ | ❌ | 
| Qwen2 | ✅ | ❌ | ❌ | 
| RAG | ✅ | ❌ | ❌ | 
| REALM | ✅ | ❌ | ❌ | 
| Reformer | ✅ | ❌ | ❌ | 
| RegNet | ✅ | ✅ | ✅ | 
| RemBERT | ✅ | ✅ | ❌ | 
| ResNet | ✅ | ✅ | ✅ | 
| RetriBERT | ✅ | ❌ | ❌ | 
| RoBERTa | ✅ | ✅ | ✅ | 
| RoBERTa-PreLayerNorm | ✅ | ✅ | ✅ | 
| RoCBert | ✅ | ❌ | ❌ | 
| RoFormer | ✅ | ✅ | ✅ | 
| RWKV | ✅ | ❌ | ❌ | 
| SAM | ✅ | ✅ | ❌ | 
| SeamlessM4T | ✅ | ❌ | ❌ | 
| SeamlessM4Tv2 | ✅ | ❌ | ❌ | 
| SegFormer | ✅ | ✅ | ❌ | 
| SEW | ✅ | ❌ | ❌ | 
| SEW-D | ✅ | ❌ | ❌ | 
| SigLIP | ✅ | ❌ | ❌ | 
| Speech Encoder decoder | ✅ | ❌ | ✅ | 
| Speech2Text | ✅ | ✅ | ❌ | 
| SpeechT5 | ✅ | ❌ | ❌ | 
| Splinter | ✅ | ❌ | ❌ | 
| SqueezeBERT | ✅ | ❌ | ❌ | 
| SwiftFormer | ✅ | ❌ | ❌ | 
| Swin Transformer | ✅ | ✅ | ❌ | 
| Swin Transformer V2 | ✅ | ❌ | ❌ | 
| Swin2SR | ✅ | ❌ | ❌ | 
| SwitchTransformers | ✅ | ❌ | ❌ | 
| T5 | ✅ | ✅ | ✅ | 
| T5v1.1 | ✅ | ✅ | ✅ | 
| Table Transformer | ✅ | ❌ | ❌ | 
| TAPAS | ✅ | ✅ | ❌ | 
| TAPEX | ✅ | ✅ | ✅ | 
| Time Series Transformer | ✅ | ❌ | ❌ | 
| TimeSformer | ✅ | ❌ | ❌ | 
| Trajectory Transformer | ✅ | ❌ | ❌ | 
| Transformer-XL | ✅ | ✅ | ❌ | 
| TrOCR | ✅ | ❌ | ❌ | 
| TVLT | ✅ | ❌ | ❌ | 
| TVP | ✅ | ❌ | ❌ | 
| UL2 | ✅ | ✅ | ✅ | 
| UMT5 | ✅ | ❌ | ❌ | 
| UniSpeech | ✅ | ❌ | ❌ | 
| UniSpeechSat | ✅ | ❌ | ❌ | 
| UnivNet | ✅ | ❌ | ❌ | 
| UPerNet | ✅ | ❌ | ❌ | 
| VAN | ✅ | ❌ | ❌ | 
| VideoMAE | ✅ | ❌ | ❌ | 
| ViLT | ✅ | ❌ | ❌ | 
| VipLlava | ✅ | ❌ | ❌ | 
| Vision Encoder decoder | ✅ | ✅ | ✅ | 
| VisionTextDualEncoder | ✅ | ✅ | ✅ | 
| VisualBERT | ✅ | ❌ | ❌ | 
| ViT | ✅ | ✅ | ✅ | 
| ViT Hybrid | ✅ | ❌ | ❌ | 
| VitDet | ✅ | ❌ | ❌ | 
| ViTMAE | ✅ | ✅ | ❌ | 
| ViTMatte | ✅ | ❌ | ❌ | 
| ViTMSN | ✅ | ❌ | ❌ | 
| VITS | ✅ | ❌ | ❌ | 
| ViViT | ✅ | ❌ | ❌ | 
| Wav2Vec2 | ✅ | ✅ | ✅ | 
| Wav2Vec2-BERT | ✅ | ❌ | ❌ | 
| Wav2Vec2-Conformer | ✅ | ❌ | ❌ | 
| Wav2Vec2Phoneme | ✅ | ✅ | ✅ | 
| WavLM | ✅ | ❌ | ❌ | 
| Whisper | ✅ | ✅ | ✅ | 
| X-CLIP | ✅ | ❌ | ❌ | 
| X-MOD | ✅ | ❌ | ❌ | 
| XGLM | ✅ | ✅ | ✅ | 
| XLM | ✅ | ✅ | ❌ | 
| XLM-ProphetNet | ✅ | ❌ | ❌ | 
| XLM-RoBERTa | ✅ | ✅ | ✅ | 
| XLM-RoBERTa-XL | ✅ | ❌ | ❌ | 
| XLM-V | ✅ | ✅ | ✅ | 
| XLNet | ✅ | ✅ | ❌ | 
| XLS-R | ✅ | ✅ | ✅ | 
| XLSR-Wav2Vec2 | ✅ | ✅ | ✅ | 
| YOLOS | ✅ | ❌ | ❌ | 
| YOSO | ✅ | ❌ | ❌ |