【程序员必学】Gemini File Search保姆级教程：从零搭建RAG系统，小白也能秒变AI开发大神！

Gemini API 文件搜索（File Search）工具是一个完全托管的 RAG（检索增强生成）系统，它直接集成在 Gemini API 中。该系统能够自动管理文件存储、对你的数据进行分块、创建嵌入（Embeddings），并将最相关的内容无缝地注入到你的提示词中。

接下来，我们将使用 JavaScript/TypeScript 完成进行文件搜索的完整链路。在开始之前，请确保你已拥有 Google AI Studio 的 API 密钥并安装了最新的 SDK：

npm install @google/genai

在你的 JavaScript 环境中初始化客户端：

import { GoogleGenAI } from '@google/genai';import fs from 'fs';import path from 'path'; const ai = new GoogleGenAI({});

创建文件搜索存储

文件搜索存储是一个持久化的容器，用于存放你的文档块和嵌入。它与原始文件存储是不同的，可以容纳数千兆字节（Gigabytes）的数据。

const fileStoreName = 'my-example-store'; const createStoreOp = await ai.fileSearchStores.create({ config: { displayName: fileStoreName }}); console.log(`Store created with Name: ${createStoreOp.name}`);

通过显示名称查找存储

通常情况下，存储的创建和使用可能发生在不同的应用会话中。由于 API 会分配一个独一无二的 ID (fileSearchStores/xyz...)，因此你需要通过人类可读的displayName来查找它。

let fileStore = null;// 限制每页列表大小const pager = await ai.fileSearchStores.list({ config: { pageSize: 10 } });let page = pager.page;// 循环遍历页面直到找到匹配项searchLoop: while (true) {for (const store of page) { if (store.displayName === fileStoreName) { fileStore = store; break searchLoop; // 找到后跳出循环 } }if (!pager.hasNextPage()) break; // 没有下一页了 page = await pager.nextPage(); // 移动到下一页}if (!fileStore) {thrownewError(`Store with display name '${fileStoreName}' not found.`);}console.log(`Found store: ${fileStore.name}`);

并发上传多个文件

速度至关重要。在摄取一个包含大量文档的文件夹时，不要顺序处理。API 支持并发操作，因此我们可以使用Promise.all同时上传和处理多个文件。

我们将使用辅助方法uploadToFileSearchStore，它一步到位地处理原始文件上传和索引过程的启动。然后，我们通过监控operation.done来确保处理完成后再进行下一步。

const docsDir = "docs"; // 确保你有一个包含文本文件的 'docs' 文件夹const files = fs.readdirSync(docsDir).map(file => path.join(docsDir, file));awaitPromise.all(files.map(async (filePath) => {// 1. 启动上传和索引let operation = await ai.fileSearchStores.uploadToFileSearchStore({ file: filePath, fileSearchStoreName: fileStore.name, config: { displayName: path.basename(filePath), } });// 2. 轮询直到文档被完全处理while (!operation.done) { awaitnewPromise(resolve => setTimeout(resolve, 1000)); // 等待 1 秒 operation = await ai.operations.get({ operation }); }console.log(`Processing complete for: ${path.basename(filePath)}`);return operation;}));

自定义分块策略的高级上传

默认情况下，Gemini 会智能地处理分块。但是，对于某些特定的使用场景，你可能希望更严格地控制数据的分割方式。

你可以在上传时定义chunkingConfig，以指定maxTokensPerChunk和maxOverlapTokens等参数。你还可以使用customMetadata为文档附加键值对。

const specialDocPath = 'special-docs/technical-manual.txt';let advancedUploadOp = await ai.fileSearchStores.uploadToFileSearchStore({file: specialDocPath,fileSearchStoreName: fileStore.name,config: { displayName: 'technical-manual.txt', customMetadata: [ { key: "doc_type", stringValue: "manual" }, // 附加元数据标签 ], chunkingConfig: { whiteSpaceConfig: { maxTokensPerChunk: 500, // 更小的分块以实现更精确的检索 maxOverlapTokens: 50 // 确保上下文在分块间不丢失 } } }});// 等待文件处理完成while (!advancedUploadOp.done) {awaitnewPromise(resolve => setTimeout(resolve, 1000)); advancedUploadOp = await ai.operations.get({ operation: advancedUploadOp });}console.log("Advanced file processed.");

使用文件搜索运行生成查询 (RAG)

我们不需要手动检索文档块。我们只需告诉 Gemini 模型使用fileSearch工具，并指向我们的存储名称。Gemini 会明白它需要更多信息，会自动搜索存储并根据检索结果进行回答（Grounding）。

const response = await ai.models.generateContent({model: "gemini-2.5-flash",contents: "What is Gemini and what is the File API?",config: { tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] // 启用文件搜索工具 } }] }});console.log("Model response:", response.text);// (可选) 检查 response.candidates[0].groundingMetadata 以获取引用来源！

由于我们在第 4 步中为技术手册添加了标签，现在我们可以通过使用metadataFilter强制 Gemini只查看与该标签匹配的文档。

const responseFiltered = await ai.models.generateContent({model: "gemini-2.5-flash",contents: "How do I reset the device according to the manual?",config: { tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name], metadataFilter: 'doc_type="manual"'// 仅检索 doc_type 为 manual 的文档 } }] }});console.log("Filtered response:", responseFiltered.text);

查找存储中的特定文档

你经常需要管理存储中的单个文档。你可以通过其显示名称来查找特定文档。

const docToFind = 'doc1.txt';let targetDoc = null;let documentPager = await ai.fileSearchStores.documents.list({parent: fileStore.name,});// 遍历存储的文档列表searchDocsLoop: while (true) {for (constdocumentof documentPager.page) { if (document.displayName === docToFind) { targetDoc = document; break searchDocsLoop; } }if (!documentPager.hasNextPage()) break; documentPager = await documentPager.nextPage();}if (!targetDoc) thrownewError(`Document '${docToFind}' not found.`);

删除文档

目前，更新文件搜索中文档的标准流程是：删除旧版本，然后上传新版本。

await ai.fileSearchStores.documents.delete({ name: targetDoc.name, config: { force: true } // 需要 'force: true' 才能从存储中永久删除已索引的文档});

更新文档

文件搜索文档一旦索引后便是不可更改的（Immutable）。要“更新”文档，你必须找到它，删除它，然后上传新版本。在这一步中，我们将自动化这个完整的循环，用新信息更新doc1.txt。

const docToUpdate = 'doc1.txt'; // 假设它有新的内容const localDocPath = path.join(docsDir, docToUpdate);// 1. 根据显示名称在存储中找到现有文档 IDlet documentPager = await ai.fileSearchStores.documents.list({ parent: fileStore.name });let foundDoc = null;findLoop: while (true) {for (const doc of documentPager.page) { if (doc.displayName === docToUpdate) { foundDoc = doc; break findLoop; } }if (!documentPager.hasNextPage()) break; documentPager = await documentPager.nextPage();}// 2. 如果找到，删除它if (foundDoc) { await ai.fileSearchStores.documents.delete({ name: foundDoc.name, config: { force: true } // 'force' 是删除已索引文档所必需的 });}// 3. 上传新版本let updateOp = await ai.fileSearchStores.uploadToFileSearchStore({file: localDocPath,fileSearchStoreName: fileStore.name,config: { displayName: docToUpdate }});// 等待上传和索引完成while (!updateOp.done) { awaitnewPromise(resolve => setTimeout(resolve, 1000)); updateOp = await ai.operations.get({ operation: updateOp });}console.log("Revision uploaded and indexed successfully.");

清理：删除文件搜索存储

目前每个项目最多只能创建 10 个文件搜索存储，因此在开发完成后清理资源非常重要。

await ai.fileSearchStores.delete({ name: fileStore.name, config: { force: true }});

如何学习大模型 AI ？

由于新岗位的生产效率，要优于被取代岗位的生产效率，所以实际上整个社会的生产效率是提升的。

但是具体到个人，只能说是：

“最先掌握AI的人，将会比较晚掌握AI的人有竞争优势”。

这句话，放在计算机、互联网、移动互联网的开局时期，都是一样的道理。

我在一线互联网企业工作十余年里，指导过不少同行后辈。帮助很多人得到了学习和成长。

我意识到有很多经验和知识值得分享给大家，也可以通过我们的能力和经验解答大家在人工智能学习中的很多困惑，所以在工作繁忙的情况下还是坚持各种整理和分享。但苦于知识传播途径有限，很多互联网行业朋友无法获得正确的资料得到学习提升，故此将并将重要的AI大模型资料包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。