LlamaIndex

1、大语言模型开发框架的价值是什么?

SDK:Software Development Kit,它是一组软件工具和资源的集合,旨在帮助开发者创建、测试、部署和维护应用程序或软件。

所有开发框架(SDK)的核心价值,都是降低开发、维护成本。

大语言模型开发框架的价值,是让开发者可以更方便地开发基于大语言模型的应用。主要提供两类帮助:

  1. 第三方能力抽象。比如 LLM、向量数据库、搜索接口等
  2. 常用工具、方案封装
  3. 底层实现封装。比如流式接口、超时重连、异步与并行等

好的开发框架,需要具备以下特点:

  1. 可靠性、鲁棒性高
  2. 可维护性高
  3. 可扩展性高
  4. 学习成本低

举些通俗的例子:

  • 与外部功能解依赖
    • 比如可以随意更换 LLM 而不用大量重构代码
    • 更换三方工具也同理
  • 经常变的部分要在外部维护而不是放在代码里
    • 比如 Prompt 模板
  • 各种环境下都适用
    • 比如线程安全
  • 方便调试和测试
    • 至少要能感觉到用了比不用方便吧
    • 合法的输入不会引发框架内部的报错
划重点:选对了框架,事半功倍;反之,事倍功半。
举个例子:使用 SDK,4 行代码实现一个简易的 RAG 系统
!pip install --upgrade llama-index
from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderdocuments = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)query_engine = index.as_query_engine()response = query_engine.query("llama2有多少参数")
print(response)

2、LlamaIndex 介绍

「 LlamaIndex is a framework for building context-augmented LLM applications. Context augmentation refers to any use case that applies LLMs on top of your private or domain-specific data. 」

LlamaIndex 是一个为开发「上下文增强」的大语言模型应用的框架(也就是 SDK)。上下文增强,泛指任何在私有或特定领域数据基础上应用大语言模型的情况。例如:

在这里插入图片描述

  • Question-Answering Chatbots (也就是 RAG)

  • Document Understanding and Extraction (文档理解与信息抽取)

  • Autonomous Agents that can perform research and take actions (智能体应用)

LlamaIndex 有 Python 和 Typescript 两个版本,Python 版的文档相对更完善。

  • Python 文档地址:https://docs.llamaindex.ai/en/stable/

  • Python API 接口文档:https://docs.llamaindex.ai/en/stable/api_reference/

  • TS 文档地址:https://ts.llamaindex.ai/

  • TS API 接口文档:https://ts.llamaindex.ai/api/

LlamaIndex 是一个开源框架,Github 链接:https://github.com/run-llama

LlamaIndex 的核心模块

在这里插入图片描述

安装 LlamaIndex

  1. Python
pip install llama-index
  1. Typescript
# 通过 npm 安装
npm install llamaindex# 通过 yarn 安装
yarn add llamaindex# 通过 pnpm 安装
pnpm add llamaindex

本博客以 Python 版为例进行讲解。

3、数据加载(Loading)

SimpleDirectoryReader 是一个简单的本地文件加载器。它会遍历指定目录,并根据文件扩展名自动加载文件(文本内容)。

支持的文件类型:

  • .csv - comma-separated values
  • .docx - Microsoft Word
  • .epub - EPUB ebook format
  • .hwp - Hangul Word Processor
  • .ipynb - Jupyter Notebook
  • .jpeg, .jpg - JPEG image
  • .mbox - MBOX email archive
  • .md - Markdown
  • .mp3, .mp4 - audio and video
  • .pdf - Portable Document Format
  • .png - Portable Network Graphics
  • .ppt, .pptm, .pptx - Microsoft PowerPoint
import json
from pydantic.v1 import BaseModeldef show_json(data):"""用于展示json数据"""if isinstance(data, str):obj = json.loads(data)print(json.dumps(obj, indent=4))elif isinstance(data, dict) or isinstance(data, list):print(json.dumps(data, indent=4))elif issubclass(type(data), BaseModel):print(json.dumps(data.dict(), indent=4, ensure_ascii=False))def show_list_obj(data):"""用于展示一组对象"""if isinstance(data, list):for item in data:show_json(item)else:raise ValueError("Input is not a list")from llama_index.core import SimpleDirectoryReaderreader = SimpleDirectoryReader(input_dir="./data",  # 目标目录recursive=False,  # 是否递归遍历子目录required_exts=[".pdf"]  # (可选)只读取指定后缀的文件
)
documents = reader.load_data()show_json(documents[0])
print(documents[0].text)
{"id_": "358482ee-4232-45eb-a5ae-8f595f16c8cd","embedding": null,"metadata": {"page_label": "1","file_name": "llama2-extracted.pdf","file_path": "/home/jovyan/lecture-notes/07-llamaindex/data/llama2-extracted.pdf","file_type": "application/pdf","file_size": 401338,"creation_date": "2024-06-14","last_modified_date": "2024-06-14"},"excluded_embed_metadata_keys": ["file_name","file_type","file_size","creation_date","last_modified_date","last_accessed_date"],"excluded_llm_metadata_keys": ["file_name","file_type","file_size","creation_date","last_modified_date","last_accessed_date"],"relationships": {},"text": "Llama 2: OpenFoundation andFine-Tuned ChatModels\nHugo Touvron∗Louis Martin†Kevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov SoumyaBatra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel LukasBlecher Cristian CantonFerrer MoyaChen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu BrianFuller\nCynthia Gao VedanujGoswami NamanGoyal AnthonyHartshorn Saghar Hosseini RuiHou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa IsabelKloumann ArtemKorenev\nPunit Singh Koura Marie-AnneLachaux ThibautLavril Jenya Lee Diana Liskovich\nYinghai Lu YuningMao Xavier Martinet Todor Mihaylov PushkarMishra\nIgor Molybog Yixin Nie AndrewPoulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi\nAlan Schelten Ruan Silva EricMichael Smith Ranjan Subramanian XiaoqingEllenTan BinhTang\nRoss Taylor AdinaWilliams JianXiang Kuan PuxinXu ZhengYan Iliyan Zarov YuchenZhang\nAngela Fan MelanieKambadur SharanNarang Aurelien Rodriguez RobertStojnic\nSergey Edunov ThomasScialom∗\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and fine-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur fine-tuned LLMs, called Llama 2-Chat , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosed-\nsource models. We provide a detailed description of our approach to fine-tuning and safety\nimprovements of Llama 2-Chat in order to enable the community to build on our work and\ncontribute to the responsibledevelopmentof LLMs.\n∗Equal contribution, corresponding authors: {tscialom, htouvron}@meta.com\n†Second author\nContributions for all the authors can be found in Section A.1.arXiv:2307.09288v2  [cs.CL]  19 Jul 2023","mimetype": "text/plain","start_char_idx": null,"end_char_idx": null,"text_template": "{metadata_str}\n\n{content}","metadata_template": "{key}: {value}","metadata_seperator": "\n","class_name": "Document"
}
Llama 2: OpenFoundation andFine-Tuned ChatModels
Hugo Touvron∗Louis Martin†Kevin Stone†
Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov SoumyaBatra
Prajjwal Bhargava Shruti Bhosale Dan Bikel LukasBlecher Cristian CantonFerrer MoyaChen
Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu BrianFuller
Cynthia Gao VedanujGoswami NamanGoyal AnthonyHartshorn Saghar Hosseini RuiHou
Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa IsabelKloumann ArtemKorenev
Punit Singh Koura Marie-AnneLachaux ThibautLavril Jenya Lee Diana Liskovich
Yinghai Lu YuningMao Xavier Martinet Todor Mihaylov PushkarMishra
Igor Molybog Yixin Nie AndrewPoulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi
Alan Schelten Ruan Silva EricMichael Smith Ranjan Subramanian XiaoqingEllenTan BinhTang
Ross Taylor AdinaWilliams JianXiang Kuan PuxinXu ZhengYan Iliyan Zarov YuchenZhang
Angela Fan MelanieKambadur SharanNarang Aurelien Rodriguez RobertStojnic
Sergey Edunov ThomasScialom∗
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our fine-tuned LLMs, called Llama 2-Chat , are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosed-
source models. We provide a detailed description of our approach to fine-tuning and safety
improvements of Llama 2-Chat in order to enable the community to build on our work and
contribute to the responsibledevelopmentof LLMs.
∗Equal contribution, corresponding authors: {tscialom, htouvron}@meta.com
†Second author
Contributions for all the authors can be found in Section A.1.arXiv:2307.09288v2  [cs.CL]  19 Jul 2023
注意:对图像、视频、语音类文件,默认不会自动提取其中文字。如需提取,参考下面介绍的 Data Connectors

默认的 PDFReader 效果并不理想,我们可以更换文件加载器

# !pip install pymupdf
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import PyMuPDFReaderreader = SimpleDirectoryReader(input_dir="./data", # 目标目录recursive=False, # 是否递归遍历子目录required_exts=[".pdf"], # (可选)只读取指定后缀的文件file_extractor={".pdf": PyMuPDFReader()} # 指定特定的文件加载器)documents = reader.load_data()print(documents[0].text)
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron∗
Louis Martin†
Kevin Stone†
Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra
Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen
Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller
Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou
Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev
Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich
Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra
Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi
Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang
Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang
Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov
Thomas Scialom∗
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
our human evaluations for helpfulness and safety, may be a suitable substitute for closed-
source models. We provide a detailed description of our approach to fine-tuning and safety
improvements of Llama 2-Chat in order to enable the community to build on our work and
contribute to the responsible development of LLMs.
∗Equal contribution, corresponding authors: {tscialom, htouvron}@meta.com
†Second author
Contributions for all the authors can be found in Section A.1.
arXiv:2307.09288v2  [cs.CL]  19 Jul 2023

更多的 PDF 加载器还有 SmartPDFLoaderLlamaParse, 二者都提供了更丰富的解析能力,包括解析章节与段落结构等。但不是 100%准确,偶有文字丢失或错位情况,建议根据自身需求详细测试评估。

3.2、Data Connectors

用于处理更丰富的数据类型,并将其读取为 Document 的形式(text + metadata)。

更多 Data Connectors
  • 内置的文件加载器
  • 连接三方服务的数据加载器,例如数据库
  • 更多加载器可以在 LlamaHub 上找到

4、文本切分与解析(Chunking)

为方便检索,我们通常把 Document 切分为 Node

在 LlamaIndex 中,Node 被定义为一个文本的「chunk」。

4.1、使用 TextSplitters 对文本做切分

例如:TokenTextSplitter 按指定 token 数切分文本

from llama_index.core.node_parser import TokenTextSplitterfrom llama_index.core import SimpleDirectoryReaderreader = SimpleDirectoryReader(input_dir="./data",  # 目标目录recursive=False,  # 是否递归遍历子目录required_exts=[".pdf"]  # (可选)只读取指定后缀的文件
)
documents = reader.load_data()
node_parser = TokenTextSplitter(chunk_size=100,  # 每个 chunk 的最大长度chunk_overlap=50  # chunk 之间重叠长度
)nodes = node_parser.get_nodes_from_documents(documents, show_progress=False
)for node in nodes:print(node)
D:\develop\anaconda3\envs\llm-project\python.exe D:\projects\llm-project\llama-index\TextSplitters.py 
Node ID: be6157bd-acd5-419b-903b-fb335ebf1805
Text: Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo
Touvron∗ Louis Martin† Kevin Stone† Peter Albert Amjad Almahairi
Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti
Bhosale Dan Bikel Lukas Blecher
Node ID: b76c809f-16b6-4988-94e2-7feafcfdd506
Text: Louis Martin† Kevin Stone† Peter Albert Amjad Almahairi Yasmine
Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale
Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen Guillem
Cucurull David
Node ID: 578cb67c-1bff-4dd8-9329-af4c2f8f881b
Text: Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti
Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen
Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian
Fuller Cynthia Gao
Node ID: dbb31d51-67ad-4d21-93d5-5062275a66c1
Text: Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer
Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu
Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony
Hartshorn Saghar Hosseini Rui
Node ID: 2d6466e9-a4d4-454e-a2d3-631364d0a126
Text: Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian
Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn
Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian
Khabsa Isabel Kloumann
Node ID: 3dcaf924-e52e-4d93-98bc-9292a1884a40
Text: Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar
Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa
Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux
Thibaut Lavril Jenya
Node ID: 8fa64cb9-c510-4223-b0cb-e223178ebdb9
Text: Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa
Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux
Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier
Martinet Todor
Node ID: 9f5c2d90-ffe9-4efe-a5b1-68eef0119b5d
Text: Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura
Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu
Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog
Yixin Nie Andrew
Node ID: fff15a52-a0da-40c8-a09a-0fe1ea2d6ea1
Text: Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich
Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra
Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta
Kalyan Saladi Alan
Node ID: 5fa0508f-3ca7-4425-a48c-a9cde1112060
Text: Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor
Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta
Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan
Subramanian Xiaoqing Ellen Tan Binh Tang Ross
Node ID: ee895744-82ae-49ad-94c0-8368c0790dea
Text: Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta
Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan
Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams
Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov
Node ID: 0138d1ac-6358-47c3-8f52-d0054b80a568
Text: Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan
Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams
Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela
Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert
Node ID: 73429f67-589e-442d-89af-1513be6bb88f
Text: Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian
Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan
Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov Thomas Scialom∗ GenAI,
Node ID: 9d37dbc7-16ef-4036-bb8f-5f3091dba100
Text: Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie
Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov
Thomas Scialom∗ GenAI, Meta Abstract In this work, we develop and
release Llama 2, a collection of pretrained and
Node ID: 87459f44-90eb-43c8-9e4c-329777efe449
Text: Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov
Thomas Scialom∗ GenAI, Meta Abstract In this work, we develop and
release Llama 2, a collection of pretrained and fine-tuned large
language models (LLMs) ranging in scale from 7 billion to
Node ID: 8f8a5262-03a3-4e0d-9c5f-f6830bd2cf27
Text: Scialom∗ GenAI, Meta Abstract In this work, we develop and
release Llama 2, a collection of pretrained and fine-tuned large
language models (LLMs) ranging in scale from 7 billion to 70 billion
parameters. Our fine-tuned LLMs, calledLlama
Node ID: fc5b70c8-4e78-40bf-9d77-911391e5ea1c
Text: we develop and release Llama 2, a collection of pretrained and
fine-tuned large language models (LLMs) ranging in scale from 7
billion to 70 billion parameters. Our fine-tuned LLMs, calledLlama
2-Chat, are optimized for dialogue use cases. Our models outperform
open-source chat models
Node ID: 6d36968c-de01-4afa-97dc-7987a192bf30
Text: models (LLMs) ranging in scale from 7 billion to 70 billion
parameters. Our fine-tuned LLMs, calledLlama 2-Chat, are optimized for
dialogue use cases. Our models outperform open-source chat models on
most benchmarks we tested, and based on our human evaluations for
helpfulness and safety, may
Node ID: e7ef4823-6937-4d80-824e-693bd81dc149
Text: LLMs, calledLlama 2-Chat, are optimized for dialogue use cases.
Our models outperform open-source chat models on most benchmarks we
tested, and based on our human evaluations for helpfulness and safety,
may be a suitable substitute for closed- source models. We provide a
detailed description of our approach to fine-tuning
Node ID: d1c044a6-02e6-4d88-a92f-a96b65da6de0
Text: outperform open-source chat models on most benchmarks we tested,
and based on our human evaluations for helpfulness and safety, may be
a suitable substitute for closed- source models. We provide a detailed
description of our approach to fine-tuning and safety improvements
ofLlama 2-Chatin order to enable the community to build on our
Node ID: de25c573-8d6f-4d81-8cc0-2df0b8706f6d
Text: helpfulness and safety, may be a suitable substitute for closed-
source models. We provide a detailed description of our approach to
fine-tuning and safety improvements ofLlama 2-Chatin order to enable
the community to build on our work and contribute to the responsible
development of LLMs. ∗Equal contribution, corresponding
Node ID: 366723b1-8227-4c8e-9cdf-fb6596c6b607
Text: description of our approach to fine-tuning and safety
improvements ofLlama 2-Chatin order to enable the community to build
on our work and contribute to the responsible development of LLMs.
∗Equal contribution, corresponding authors: {tscialom,
htouvron}@meta.com †Second
Node ID: b1d0a429-f88c-4eaf-a5e2-438395063dd1
Text: 2-Chatin order to enable the community to build on our work and
contribute to the responsible development of LLMs. ∗Equal
contribution, corresponding authors: {tscialom, htouvron}@meta.com
†Second author Contributions for all the authors can be found in
Section
Node ID: 3bd1cf71-efdd-476c-ae63-f6813cd426ff
Text: work and contribute to the responsible development of LLMs.
∗Equal contribution, corresponding authors: {tscialom,
htouvron}@meta.com †Second author Contributions for all the authors
can be found in Section A.1. arXiv:2307.09288v2  [cs.CL]
Node ID: a419b2de-57a0-4d13-8316-89ca3c388038
Text: authors: {tscialom, htouvron}@meta.com †Second author
Contributions for all the authors can be found in Section A.1.
arXiv:2307.09288v2  [cs.CL]  19 Jul 2023
Node ID: c14bc803-2b66-4791-acf1-40c3c3824248
Text: Contents 1 Introduction 3 2 Pretraining 5 2.1 Pretraining Data .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 5 2.2
Node ID: cd88caab-f720-44d0-b60f-52d3f5a8c47a
Text: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 5 2.2 Training Details . . . . . . . . . . . .
. . . . . .
Node ID: 060e2c44-e137-48fc-997a-fef9e1253af1
Text: . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Training
Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
Node ID: 84a3b614-fa9a-4a10-933f-4ea49eb42da5
Text: . . . . 5 2.2 Training Details . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Llama
2Pretrained Model
Node ID: d147d85b-a6f9-4c26-b7d7-61e897c67c67
Text: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 5 2.3 Llama 2Pretrained Model Evaluation . . . . . . . . . .
. . . . . . . . .
Node ID: 74ecb943-d7b1-479e-b280-8a94be4c97f5
Text: . . . . . . . . . . . . . . . . . 5 2.3 Llama 2Pretrained Model
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 7 3 Fine-tuning
Node ID: d6860cfa-8486-41c4-80bd-4bc53788b13a
Text: Llama 2Pretrained Model Evaluation . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 7 3 Fine-tuning 8 3.1 Supervised
Fine-Tuning (SFT) . . . . . . . .
Node ID: 3bd146c0-a554-4e4d-94a9-24cba2a3dbc2
Text: . . . . . . . . . . . . . . . . . . . . 7 3 Fine-tuning 8 3.1
Supervised Fine-Tuning (SFT) . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
Node ID: b84e1605-0ac9-4c9f-9ff5-

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/80846.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【linux命令】git命令简单使用

git命令简单使用 1. 将代码下载到到本地2. 查看分支是否正确3. 将工作目录中的变更添加到暂存区,为下一次提交做准备4. 提交更改,添加提交信息5. 将本地的提交推送到远程仓库6.从远端仓库拉取分支代码7.查看修改日志8. 解决冲突 1. 将代码下载到到本地 …

debian系统redis-dump安装

1. ​Ruby 环境​ Redis-dump 是一个 Ruby 工具,需先安装 Ruby 和 RubyGems。 安装命令​: sudo apt update sudo apt install ruby-full build-essential[roota29d39f5fd10:/opt/redis-dump/bin# apt install ruby-full build-essential Reading pac…

微软押注“代理式AI网络”:一场重塑软件开发与工作方式的技术革命

在 2025 年 Build 开发者大会上,微软正式发布了其面向“开放代理式网络(Open Agentic Web)”的宏大战略,推出超过 50 项 AI 相关技术更新,涵盖 GitHub、Azure、Windows 和 Microsoft 365 全线产品。这一系列更新的核心…

【音频】wav文件如何解析编码格式(压缩格式)?

要确定一个WAV文件的编码格式,可以通过以下几种方法实现,包括使用操作系统自带工具、专业音频软件或编程解析文件头信息。以下是详细说明: 一、通过文件属性查看(Windows/macOS) 1. Windows系统 步骤: 右…

算法打卡第三天

10.长度最小的子数组 (力扣209题) 给定一个含有 n 个正整数的数组和一个正整数 target 。 找出该数组中满足其总和大于等于 target 的长度最小的 子数组 [numsl, numsl1, ..., numsr-1, numsr] ,并返回其长度**。**如果不存在符合条件的子…

数字电子技术基础(六十二)——使用Multisim软件绘制边沿触发的D触发器和JK触发器

1 使用Mulitism软件模拟时钟触发的D触发器 D触发器是一种基本的数字电路存储元件,它在时钟信号的边沿将输入数据D传递到输出Q。下面开始使用Multisim软件来模拟时钟触发的D触发器。 器件选择: 触发器选择:在组选项栏中点击Misc Digital&am…

自动获取新版本 js 静态文件

场景 代码里有静态js文件,发布一个版本1.0在真实环境,再修改重新发布2.0,用户如何得到新版本? 方法 一、文件名哈希策略(最推荐) 通过构建工具为文件生成唯一哈希值,使每次更新后的文件名不同…

第13天-用BeautifulSoup解析网页数据:以百度热搜可视化为例

一、BeautifulSoup简介 BeautifulSoup是Python最受欢迎的HTML/XML解析库之一,它能将复杂的网页文档转换为树形结构,支持多种解析器(如lxml、html.parser)。配合requests库,可以快速构建网页爬虫项目。 二、环境准备 pip install requests beautifulsoup4 matplotlib 三…

PyTorch中cdist和sum函数使用详解

torch.cdist 是 PyTorch 中用于计算**两个张量之间的成对距离(pairwise distance)**的函数,常用于点云处理、图神经网络、相似性度量等场景。 基本语法 torch.cdist(x1, x2, p2.0)参数说明: 参数说明x1一个形状为 [B, M, D] 或 …

智能视觉检测技术:制造业质量管控的“隐形守护者”

在工业4.0浪潮的推动下,制造业正经历一场以智能化为核心的变革。传统人工质检模式因效率低、误差率高、成本高昂等问题,逐渐难以满足现代生产对高精度、高速度的需求。智能视觉检测技术作为人工智能与机器视觉融合的产物,正成为制造业质量管控…

水浒后传-暹罗国建立新国家的故事

第一节《怒海余生》 李俊率领残部穿越台风海域,在暹罗湾遭遇葡萄牙舰队突袭。童猛为掩护船队突围,驾驶火船与敌舰同归于尽,留下最后的忠义绝唱。 第二节《血染王城》 李俊与暹罗旧贵族势力在曼谷河畔展开决战。中原阵法与暹罗象兵碰撞出惊心…

1.portainer

容器可视化工具 商业版Business、社区版Community docker容器部署portainer,对外暴露端口9443是一个自签名的证书端口。还有另外一个暴露的端口8000。 volume 要想看得到,需要通过 portainer可视化界面看到volume,就必须使用: d…

使用Starrocks制作拉链表

5月1日向ods_order_info插入3条数据: CREATE TABLE ods_order_info(dt string,id string COMMENT 订单编号,total_amount decimal(10,2) COMMENT 订单金额 ) PRIMARY KEY(dt, id) PARTITION BY (dt) DISTRIBUTED BY HASH(id) PROPERTIES ( "replication_num&q…

Linux下Docker使用阿里云镜像加速器

在中国大陆环境中配置 Docker 使用阿里云镜像加速器,并确保通过 Clash 代理访问 Docker Hub 我这里用的Debian12。 步骤 1:获取阿里云镜像加速器地址 登录阿里云容器镜像服务控制台:(qinyang.wang) 网址:阿里云登录 - 欢迎登录阿…

Electron 后台常驻服务实现(托盘 + 开机自启)

基于 electron-vite-vue 项目结构 本篇将详细介绍如何为 Electron 应用实现后台常驻运行,包括: ✅ 创建系统托盘图标(Tray)✅ 支持点击托盘菜单控制窗口显示/退出✅ 实现开机自启功能(Auto Launch) &#…

opencv的直方图

理解并运用 OpenCV 中的图像直方图 📊🖼️ 图像直方图是计算机视觉和图像处理中一种基本且强大的工具,它提供了图像像素强度分布的图形化表示。OpenCV 作为一个全面的计算机视觉库,内置了计算和可视化直方图的强大功能。本文将深…

Linux 内核探秘:从零构建 GPIO 设备驱动程序实战指南

在嵌入式系统开发领域,GPIO(通用输入 / 输出)作为硬件与软件交互的桥梁,是实现设备控制与数据采集的基础。编写高效、稳定的 GPIO 设备驱动程序,对于发挥硬件性能至关重要。本文将深入剖析 Linux 内核中 GPIO 驱动开发…

嵌入式单片机中STM32F1演示寄存器控制方法

该文以STM32F103C8T6为示例,演示如何使用操作寄存器的方法点亮(关闭LED灯),并讲解了如何调试,以及使用宏定义。 第一:操作寄存器点亮LED灯。 (1)首先我们的目的是操作板子上的LED2灯,对其实现点亮和关闭操作。打开STM32F103C8T6的原理图,找到LED2的位置。 可以看到…

牛客网 NC16407 题解:托米航空公司的座位安排问题

牛客网 NC16407 题解:托米航空公司的座位安排问题 题目分析 解题思路 本题可以采用深度优先搜索(DFS)来解决: 从左上角开始,按行优先顺序遍历每个座位对于每个座位,有两种选择: 选择该座位(如果满足条件…

智慧展馆数字孪生平台

2022年进博会上,国家会展中心凭借“数字孪生机器人调度平台”惊艳全球,实现人机协同、虚实联动的智慧运营;2023年天府农博园通过“BIMIoT”技术,贯穿展馆全生命周期管理,成为农业会展的数字化标杆。这些案例背后&#…