GRAPH RAG

news/2025/11/23 16:43:41/文章来源:https://www.cnblogs.com/lightsong/p/19260835

GRAPH RAG

https://github.com/fanqingsong/graphrag/blob/main/docs/GRAPH_RAG_EXPLAINED.md

实际应用示例

场景设置

假设你上传了三个文档：

文档1：Microsoft投资新闻

Microsoft宣布向OpenAI投资100亿美元，这是AI领域最大的投资之一。
OpenAI的CEO Sam Altman表示这次合作将加速AI技术的发展。

文档2：OpenAI技术报告

OpenAI开发了GPT-4模型，在多个领域取得突破。
公司总部位于旧金山，由Sam Altman领导。

文档3：AI行业分析

AI领域的投资正在快速增长。Microsoft、Google等科技巨头都在加大投入。
旧金山湾区是AI创新的中心。

第一步：文档处理时构建 Graph

系统处理这些文档时，会：

提取实体（Entity）：
- Microsoft (COMPANY, 重要性: 0.9)
- OpenAI (COMPANY, 重要性: 0.9)
- Sam Altman (PERSON, 重要性: 0.8)
- 旧金山 (LOCATION, 重要性: 0.6)
- GPT-4 (TECHNOLOGY, 重要性: 0.7)
建立关系（Relationship）：
- Microsoft --[投资于]--> OpenAI (强度: 0.9)
- Sam Altman --[领导]--> OpenAI (强度: 0.95)
- OpenAI --[位于]--> 旧金山 (强度: 0.8)
- OpenAI --[开发了]--> GPT-4 (强度: 0.85)
创建图结构（在Neo4j中）：

(Microsoft:COMPANY) --[投资于]--> (OpenAI:COMPANY)
(OpenAI:COMPANY) <--[领导]-- (Sam Altman:PERSON)
(OpenAI:COMPANY) --[位于]--> (旧金山:LOCATION)
(OpenAI:COMPANY) --[开发了]--> (GPT-4:TECHNOLOGY)

第二步：用户查询

用户问："Sam Altman领导的公司在AI领域有什么重要投资？"

第三步：Graph 如何帮助检索

传统向量检索（没有Graph）

可能只返回：

文档1中直接提到"Sam Altman"和"投资"的片段
得分：0.75

问题：可能遗漏间接相关的信息。

Graph增强检索（有Graph）

系统会：

初始检索：找到包含"Sam Altman"的chunk

Chunk A: "OpenAI的CEO Sam Altman表示这次合作..."

图遍历扩展：

# 从Sam Altman实体开始
Sam Altman --[领导]--> OpenAI (找到!)# 从OpenAI继续扩展
OpenAI <--[投资于]-- Microsoft (找到!)# 通过图关系找到相关chunk
Chunk B: "Microsoft宣布向OpenAI投资100亿美元..."
Chunk C: "OpenAI开发了GPT-4模型..."

多跳推理路径：

查询: Sam Altman领导的公司的投资
↓
路径1: Sam Altman → 领导 → OpenAI → 被投资于 → Microsoft
路径2: Sam Altman → 领导 → OpenAI → 开发了 → GPT-4

最终返回的增强上下文：

Chunk A (直接匹配): "OpenAI的CEO Sam Altman..."
Chunk B (图扩展): "Microsoft宣布向OpenAI投资100亿美元..."
Chunk C (图扩展): "OpenAI开发了GPT-4模型..."

第四步：LLM生成答案

有了这些通过graph连接的上下文，LLM可以生成更完整的答案：

"Sam Altman领导的OpenAI公司获得了Microsoft的100亿美元投资，这是AI领域最大的投资之一。同时，OpenAI还开发了GPT-4等重要的AI技术。"

关键优势

跨文档连接：即使"Microsoft投资"和"Sam Altman"不在同一文档，graph也能连接它们
语义理解：理解"领导"关系，而不仅仅是关键词匹配
可解释性：可以看到推理路径：Sam Altman → OpenAI → Microsoft
上下文扩展：找到间接相关但重要的信息

工作流程

文档处理流程

1. 文档上传↓
2. 文档分割成 Chunks↓
3. 为每个 Chunk 生成向量嵌入↓
4. 创建 Document 和 Chunk 节点↓
5. 提取实体（Entity）↓
6. 建立实体关系（Relationship）↓
7. 创建 Entity 节点和关系↓
8. 计算 Chunk 之间的相似度↓
9. 创建 SIMILAR_TO 关系

查询处理流程

1. 用户查询↓
2. 查询分析（分析查询类型、复杂度等）↓
3. 生成查询向量嵌入↓
4. 向量相似度搜索（找到初始 Chunks）↓
5. 图扩展（通过图关系找到相关 Chunks）↓
6. 多跳推理（可选，通过实体关系进行多跳遍历）↓
7. 合并和排序结果↓
8. 传递给 LLM 生成答案

检索模式对比

模式	向量搜索	图扩展	多跳推理	适用场景
Chunk-only	✅	❌	❌	简单查询，快速响应
Entity-only	✅	✅	❌	实体相关的查询
Hybrid	✅	✅	❌	平衡性能和准确性
Graph-enhanced	✅	✅	✅	复杂查询，需要深度推理

总结

GraphRAG 的核心优势在于：

双重检索：结合向量相似度搜索和图关系遍历
语义理解：通过向量嵌入理解语义，而非简单关键词匹配
关系推理：通过图结构进行多跳推理，发现间接关系
上下文扩展：自动扩展检索上下文，提供更完整的信息
可解释性：图结构提供清晰的推理路径

这就是为什么这个项目叫 GraphRAG：它使用图（Graph）来增强检索增强生成（RAG），使系统能够理解实体之间的关系，而不仅仅是文本相似度。

参考：

https://github.com/fanqingsong/GustoBot

本项目致力于打造一个可迁移、可扩展、面向垂直领域的智能客服模板系统。通过清晰的三层架构设计（主路由层 → 多工具子图层 → 原子工具层），你可以轻松将其迁移至其他垂直领域（如「宝可梦百科」、「中医药典」、「法律咨询」、「政务服务」等）中打造专域智能客服。仅需更换知识源与图谱结构，即可实现：

智能意图理解：自动识别问题类型，路由到最优处理模块
多工具协作：动态组合 Neo4j 图谱查询、MySQL 统计分析、向量检索、外部搜索等多种工具
PostgreSQL 优先兜底策略：结构化数据优先 → 向量兜底 → 外部搜索，确保答案质量
多模态交互：支持文本问答、图片识别/生成、文件解析等多种交互方式
知识来源可追溯：每个答案都标注来源，支持多源信息融合
安全防护机制：Guardrails 层确保问题在服务范围内，拒绝越界查询

https://github.com/HKUDS/LightRAG?tab=readme-ov-file

https://arxiv.org/abs/2410.05779

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs. However, existing RAG systems have significant limitations, including reliance on flat data representations and inadequate contextual awareness, which can lead to fragmented answers that fail to capture complex inter-dependencies. To address these challenges, we propose LightRAG, which incorporates graph structures into text indexing and retrieval processes. This innovative framework employs a dual-level retrieval system that enhances comprehensive information retrieval from both low-level and high-level knowledge discovery. Additionally, the integration of graph structures with vector representations facilitates efficient retrieval of related entities and their relationships, significantly improving response times while maintaining contextual relevance. This capability is further enhanced by an incremental update algorithm that ensures the timely integration of new data, allowing the system to remain effective and responsive in rapidly changing data environments. Extensive experimental validation demonstrates considerable improvements in retrieval accuracy and efficiency compared to existing approaches. We have made our LightRAG open-source and available at the link: this https URL

https://github.com/HKUDS/RAG-Anything

System Overview

Next-Generation Multimodal Intelligence

Modern documents increasingly contain diverse multimodal content—text, images, tables, equations, charts, and multimedia—that traditional text-focused RAG systems cannot effectively process. RAG-Anything addresses this challenge as a comprehensive All-in-One Multimodal Document Processing RAG system built on LightRAG.

As a unified solution, RAG-Anything eliminates the need for multiple specialized tools. It provides seamless processing and querying across all content modalities within a single integrated framework. Unlike conventional RAG approaches that struggle with non-textual elements, our all-in-one system delivers comprehensive multimodal retrieval capabilities.

Users can query documents containing interleaved text, visual diagrams, structured tables, and mathematical formulations through one cohesive interface. This consolidated approach makes RAG-Anything particularly valuable for academic research, technical documentation, financial reports, and enterprise knowledge management where rich, mixed-content documents demand a unified processing framework.

🎯 Key Features

🔄 End-to-End Multimodal Pipeline - Complete workflow from document ingestion and parsing to intelligent multimodal query answering
📄 Universal Document Support - Seamless processing of PDFs, Office documents, images, and diverse file formats
🧠 Specialized Content Analysis - Dedicated processors for images, tables, mathematical equations, and heterogeneous content types
🔗 Multimodal Knowledge Graph - Automatic entity extraction and cross-modal relationship discovery for enhanced understanding
⚡ Adaptive Processing Modes - Flexible MinerU-based parsing or direct multimodal content injection workflows
📋 Direct Content List Insertion - Bypass document parsing by directly inserting pre-parsed content lists from external sources
🎯 Hybrid Intelligent Retrieval - Advanced search capabilities spanning textual and multimodal content with contextual understanding

🏗️ Algorithm & Architecture

Core Algorithm

RAG-Anything implements an effective multi-stage multimodal pipeline that fundamentally extends traditional RAG architectures to seamlessly handle diverse content modalities through intelligent orchestration and cross-modal understanding.

📄

Document Parsing

→

🧠

Content Analysis

→

🔍

Knowledge Graph

→

🎯

Intelligent Retrieval

1. Document Parsing Stage

The system provides high-fidelity document extraction through adaptive content decomposition. It intelligently segments heterogeneous elements while preserving contextual relationships. Universal format compatibility is achieved via specialized optimized parsers.

Key Components:

⚙️ MinerU Integration: Leverages MinerU for high-fidelity document structure extraction and semantic preservation across complex layouts.
🧩 Adaptive Content Decomposition: Automatically segments documents into coherent text blocks, visual elements, structured tables, mathematical equations, and specialized content types while preserving contextual relationships.
📁 Universal Format Support: Provides comprehensive handling of PDFs, Office documents (DOC/DOCX/PPT/PPTX/XLS/XLSX), images, and emerging formats through specialized parsers with format-specific optimization.

2. Multi-Modal Content Understanding & Processing

The system automatically categorizes and routes content through optimized channels. It uses concurrent pipelines for parallel text and multimodal processing. Document hierarchy and relationships are preserved during transformation.

Key Components:

🎯 Autonomous Content Categorization and Routing: Automatically identify, categorize, and route different content types through optimized execution channels.
⚡ Concurrent Multi-Pipeline Architecture: Implements concurrent execution of textual and multimodal content through dedicated processing pipelines. This approach maximizes throughput efficiency while preserving content integrity.
🏗️ Document Hierarchy Extraction: Extracts and preserves original document hierarchy and inter-element relationships during content transformation.

3. Multimodal Analysis Engine

The system deploys modality-aware processing units for heterogeneous data modalities:

Specialized Analyzers:

🔍 Visual Content Analyzer:
- Integrate vision model for image analysis.
- Generates context-aware descriptive captions based on visual semantics.
- Extracts spatial relationships and hierarchical structures between visual elements.
📊 Structured Data Interpreter:
- Performs systematic interpretation of tabular and structured data formats.
- Implements statistical pattern recognition algorithms for data trend analysis.
- Identifies semantic relationships and dependencies across multiple tabular datasets.
📐 Mathematical Expression Parser:
- Parses complex mathematical expressions and formulas with high accuracy.
- Provides native LaTeX format support for seamless integration with academic workflows.
- Establishes conceptual mappings between mathematical equations and domain-specific knowledge bases.
🔧 Extensible Modality Handler:
- Provides configurable processing framework for custom and emerging content types.
- Enables dynamic integration of new modality processors through plugin architecture.
- Supports runtime configuration of processing pipelines for specialized use cases.

4. Multimodal Knowledge Graph Index

The multi-modal knowledge graph construction module transforms document content into structured semantic representations. It extracts multimodal entities, establishes cross-modal relationships, and preserves hierarchical organization. The system applies weighted relevance scoring for optimized knowledge retrieval.

Core Functions:

🔍 Multi-Modal Entity Extraction: Transforms significant multimodal elements into structured knowledge graph entities. The process includes semantic annotations and metadata preservation.
🔗 Cross-Modal Relationship Mapping: Establishes semantic connections and dependencies between textual entities and multimodal components. This is achieved through automated relationship inference algorithms.
🏗️ Hierarchical Structure Preservation: Maintains original document organization through "belongs_to" relationship chains. These chains preserve logical content hierarchy and sectional dependencies.
⚖️ Weighted Relationship Scoring: Assigns quantitative relevance scores to relationship types. Scoring is based on semantic proximity and contextual significance within the document structure.

5. Modality-Aware Retrieval

The hybrid retrieval system combines vector similarity search with graph traversal algorithms for comprehensive content retrieval. It implements modality-aware ranking mechanisms and maintains relational coherence between retrieved elements to ensure contextually integrated information delivery.

Retrieval Mechanisms:

🔀 Vector-Graph Fusion: Integrates vector similarity search with graph traversal algorithms. This approach leverages both semantic embeddings and structural relationships for comprehensive content retrieval.
📊 Modality-Aware Ranking: Implements adaptive scoring mechanisms that weight retrieval results based on content type relevance. The system adjusts rankings according to query-specific modality preferences.
🔗 Relational Coherence Maintenance: Maintains semantic and structural relationships between retrieved elements. This ensures coherent information delivery and contextual integrity.