实现AI智能排版功能，输入杂乱的文本内容，自动调整字体，行距，分段，生成美观的文档。

下面我将为您提供一个完整的、基于Python的“AI智能排版助手”（SmartFormatter）程序。

项目概览：SmartFormatter - AI智能排版助手

核心功能：用户提供一个包含杂乱文本的Markdown文件（例如从网页复制过来的内容），程序会自动分析其内容结构和语义，智能地调整字体层级（标题、副标题）、段落间距、列表格式，并生成一个排版精美、风格统一的HTML文档。用户可以方便地将其转换为PDF或直接在线预览。

1. 实际应用场景与痛点

* 目标用户：自媒体作者、学生、研究人员、产品经理、创业者（撰写BP或白皮书）。

* 场景描述：您在网上搜集了很多关于“人工智能伦理”的资料，复制粘贴到了一个文本文件中。现在您需要将这些零散的信息整理成一篇结构清晰、排版美观的文章，用于发布或汇报。

* 传统痛点：

1. 手动排版耗时：需要手动设置各级标题、调整段落缩进、修改字体大小和颜色，过程繁琐且极易出错。

2. 风格不一致：不同的人或不同时期的排版习惯不同，导致文档看起来杂乱无章。

3. 可读性差：缺乏排版的文章，读者在阅读时会感到疲劳，难以抓住重点。

4. 格式转换麻烦：从Word转到PDF或从网页复制到PPT时，格式经常会错乱。

2. 核心逻辑讲解

本项目的工作流程就像一个经验丰富的编辑，其核心逻辑可以分为以下几步：

1. 输入与预处理：读取用户提供的Markdown文件。Markdown本身是一种轻量级标记语言，它用符号（如

"#",

"##",

"-"）来定义结构，这为我们提供了很好的分析基础。

2. 结构分析与推断：程序会逐行分析文本，利用一系列启发式规则来推断内容的结构：

* 标题推断：行首出现多个

"#"号，直接判定为标题。若没有，则分析行的长度和关键词（如“引言”、“结论”、“第一部分”），并结合其在文档中的位置来判断是否为潜在的标题。

* 列表推断：行首出现

"-"、

"*"或数字编号，判定为列表项。

* 段落划分：连续的普通文本行会被合并为一个段落。

3. 样式映射与生成：根据推断出的结构，程序会将每一部分映射到预定义的HTML/CSS样式上。例如，一级标题

"<h1>"对应大号加粗字体，列表项

"<li>"对应带项目符号的样式。

4. 内容增强与润色：

* 自动编号：为发现的列表自动添加正确的HTML有序或无序列表标签。

* 引用块识别：识别以

">"开头的行，并将其渲染为引用块。

* （可选）AI润色：在更进阶的版本中，可以调用大语言模型（LLM）API，对段落进行重写，提升语言的流畅度和专业性。

5. 渲染与输出：使用Jinja2模板引擎，将处理好的结构化数据填充到一个设计精美的HTML模板中，并保存为

".html"文件。

3. 代码模块化实现

我们将代码分为四个清晰的模块。

"config.py" (配置文件)

存放项目的基本设置。

# config.py

# 输入和输出文件的路径

INPUT_MD_FILE = "input.md"

OUTPUT_HTML_FILE = "formatted_output.html"

STYLE_THEME = "professional" # 可以选择不同的预设主题，如 'modern', 'academic'

"content_analyzer.py" (内容分析模块)

负责解析Markdown并推断结构。

# content_analyzer.py

import re

class ContentAnalyzer:

def __init__(self):

pass

def analyze_structure(self, md_lines):

"""

分析Markdown文本行，推断出结构化的内容块。

Args:

md_lines (list): Markdown文件的每一行组成的列表。

Returns:

list: 包含字典的列表，每个字典代表一个内容块。

e.g., [{'type': 'heading', 'level': 1, 'content': 'Main Title'}, ...]

"""

structured_blocks = []

# 一个简单的状态机来跟踪列表类型

in_unordered_list = False

in_ordered_list = False

for line in md_lines:

stripped_line = line.strip()

# 1. 检查是否是标题

heading_match = re.match(r'^(#{1,6})\s+(.*)', stripped_line)

if heading_match:

level = len(heading_match.group(1))

content = heading_match.group(2)

structured_blocks.append({'type': 'heading', 'level': level, 'content': content})

in_unordered_list = False

in_ordered_list = False

continue

# 2. 检查是否是引用块

if stripped_line.startswith('>'):

content = stripped_line[1:].strip()

structured_blocks.append({'type': 'blockquote', 'content': content})

in_unordered_list = False

in_ordered_list = False

continue

# 3. 检查是否是列表项

unordered_match = re.match(r'^[\*\-\+]\s+(.*)', stripped_line)

ordered_match = re.match(r'^\d+\.\s+(.*)', stripped_line)

if unordered_match:

content = unordered_match.group(1)

if not in_unordered_list:

structured_blocks.append({'type': 'ul_start'})

in_unordered_list = True

structured_blocks.append({'type': 'li', 'content': content})

in_ordered_list = False

continue

elif ordered_match:

content = ordered_match.group(1)

if not in_ordered_list:

structured_blocks.append({'type': 'ol_start'})

in_ordered_list = True

structured_blocks.append({'type': 'li', 'content': content})

in_unordered_list = False

continue

else:

# 如果不是列表项，但之前是列表，需要结束列表

if in_unordered_list:

structured_blocks.append({'type': 'ul_end'})

in_unordered_list = False

if in_ordered_list:

structured_blocks.append({'type': 'ol_end'})

in_ordered_list = False

# 4. 处理普通段落

if stripped_line: # 忽略空行

# 如果上一块不是段落，则开始一个新的段落

if not structured_blocks or structured_blocks[-1]['type'] != 'paragraph':

structured_blocks.append({'type': 'p_start'})

# 将内容追加到当前段落

if 'content' in structured_blocks[-1]:

structured_blocks[-1]['content'] += " " + stripped_line

else:

structured_blocks[-1]['content'] = stripped_line

# 处理文件末尾的列表

if in_unordered_list:

structured_blocks.append({'type': 'ul_end'})

if in_ordered_list:

structured_blocks.append({'type': 'ol_end'})

return structured_blocks

"renderer.py" (渲染器模块)

负责将结构化的数据渲染成HTML。

# renderer.py

from jinja2 import Environment, FileSystemLoader, select_autoescape

import os

class Renderer:

def __init__(self, theme='default'):

self.theme = theme

# 假设我们的模板放在 'templates' 文件夹里

self.env = Environment(

loader=FileSystemLoader('templates'),

autoescape=select_autoescape(['html', 'xml'])

)

self.template = self.env.get_template(f'{theme}_template.html')

def render(self, blocks):

"""

将结构化的内容块渲染成HTML。

"""

return self.template.render(blocks=blocks)

"main.py" (主程序入口)

将所有模块组合起来。

# main.py

import os

from content_analyzer import ContentAnalyzer

from renderer import Renderer

def main():

print("="*50)

print(" Welcome to SmartFormatter - AI Typography Assistant ")

print("="*50)

# 1. 读取输入文件

if not os.path.exists(INPUT_MD_FILE):

print(f"Error: Input file '{INPUT_MD_FILE}' not found.")

return

with open(INPUT_MD_FILE, 'r', encoding='utf-8') as f:

md_lines = f.readlines()

print(f"Read {len(md_lines)} lines from '{INPUT_MD_FILE}'.")

# 2. 分析内容结构

analyzer = ContentAnalyzer()

structured_blocks = analyzer.analyze_structure(md_lines)

print(f"Analyzed content into {len(structured_blocks)} logical blocks.")

# 3. 渲染成HTML

renderer = Renderer(theme=STYLE_THEME)

html_output = renderer.render(structured_blocks)

# 4. 输出结果

with open(OUTPUT_HTML_FILE, 'w', encoding='utf-8') as f:

f.write(html_output)

print(f"\nSuccess! Formatted document generated: '{OUTPUT_HTML_FILE}'")

print("You can open it in your web browser to view the result.")

if __name__ == "__main__":

main()

"templates/professional_template.html" (Jinja2模板)

这是定义排版的灵魂。

<!DOCTYPE html>

<head>

<title>Formatted Document</title>

<style>

body { font-family: "Helvetica Neue", Arial, sans-serif; line-height: 1.7; color: #333; max-width: 800px; margin: 40px auto; padding: 20px; }

h1, h2, h3, h4, h5, h6 { color: #1a1a1a; font-weight: 600; margin-top: 1.5em; margin-bottom: 0.5em; }

h1 { font-size: 2.2em; border-bottom: 2px solid #eee; padding-bottom: 0.3em; }

h2 { font-size: 1.8em; border-bottom: 1px solid #eee; padding-bottom: 0.3em; }

h3 { font-size: 1.5em; }

p { margin-bottom: 1.2em; text-align: justify; }

ul, ol { margin-bottom: 1.2em; padding-left: 2em; }

li { margin-bottom: 0.5em; }

blockquote { border-left: 4px solid #ccc; padding-left: 1em; margin-left: 0; color: #555; font-style: italic; }

hr { border: none; border-top: 1px dashed #ddd; margin: 2em 0; }

</style>

</head>

<body>

{% for block in blocks %}

{% if block.type == 'heading' %}

<h{{ block.level }}>{{ block.content }}</h{{ block.level }}>

{% elif block.type == 'p_start' %}

<p>{{ block.content }}</p>

{% elif block.type == 'ul_start' %}

<ul>

{% elif block.type == 'ol_start' %}

<ol>

{% elif block.type == 'li' %}

<li>{{ block.content }}</li>

{% elif block.type == 'ul_end' %}

</ul>

{% elif block.type == 'ol_end' %}

</ol>

{% elif block.type == 'blockquote' %}

<blockquote>{{ block.content }}</blockquote>

{% endif %}

{% endfor %}

</body>

</html>

安装依赖:

在运行前，需要安装

"jinja2" 库。

pip install jinja2

4. README.md 与使用说明

创建一个名为

"README.md" 的文件。

# SmartFormatter - AI智能排版助手

## 🚀 简介

SmartFormatter是一款利用规则引擎和模板技术实现的智能排版工具。它能将您杂乱无章的Markdown文本，自动转换为结构清晰、风格统一、阅读体验极佳的HTML文档，是内容创作者和办公人士的得力助手。

## 🛠️ 安装与环境配置

1. **克隆仓库**

bash

git clone "https://github.com/your_username/SmartFormatter.git" (https://github.com/your_username/SmartFormatter.git)

cd SmartFormatter

2. **安装依赖**

bash

pip install -r requirements.txt

*`requirements.txt` 内容:*

jinja2

3. **准备模板**: 根据需要修改或添加新的HTML/CSS模板文件到 `templates/` 目录下。

## 🏃 如何使用

1. **准备您的文本**: 将您需要排版的文本复制到 `input.md` 文件中。您可以简单地使用 `#`, `##`, `-` 等Markdown语法来提供结构线索。

2. **运行程序**:

bash

python main.py

3. **查看结果**: 程序会在当前目录下生成一个名为 `formatted_output.html` 的文件。直接用浏览器打开即可欣赏排版后的效果。

## 📝 核心知识点卡片

### 1. Rule-Based Systems (规则系统)

**是什么**：一种基于一组预定义规则和事实进行推理和决策的软件系统。

**本项目中的应用**：本项目就是一个典型的规则系统。我们通过正则表达式和一系列条件判断（“如果...那么...”），教会程序如何像人一样去“理解”和“格式化”文本结构。

### 2. Semantic Analysis (语义分析)

**是什么**：在自然语言处理中，语义分析是理解单词、短语和句子的含义的过程。

**本项目中的应用**：虽然本项目没有用到深度学习模型，但它实现了一种基础的语义分析——通过分析上下文和结构线索（如标题层级、列表前缀），来理解文本的语义角色，从而实现正确的排版。

### 3. Templating Engines (模板引擎)

**是什么**：一种用于将数据与静态模板结合，生成动态内容的工具。

**本项目中的应用**：我们使用Jinja2模板引擎将“内容”（Python变量）和“表现形式”（HTML/CSS）彻底分离。这种设计使得更换网站皮肤、主题变得非常简单，极大地提高了代码的可维护性。

### 4. Separation of Concerns (关注点分离)

**是什么**：一种软件设计原则，提倡将程序分解成若干个组成部分，每个部分只负责一项明确的任务。

**本项目中的应用**：本项目严格遵守了该原则。`analyzer.py`只关心“读懂”文本，`renderer.py`只关心“美化”文本，`main.py`负责协调两者。这种清晰的分工是构建健壮、可扩展软件的基石。

### 5. Minimum Viable Product (MVP) - 最小可行产品

**是什么**：一个产品最早的可工作版本，足以满足早期用户的需求，并能收集反馈以指导下一步的开发。

**本项目中的应用**：SmartFormatter本身就是一个MVP。它没有追求全能的Office插件或复杂的AI模型，而是聚焦于最核心的价值——“自动化排版”。它的成功验证了市场对自动化内容美化工具的潜在需求。

5. 总结

SmartFormatter项目是一个将逻辑思维、编程技术和用户体验设计完美结合的范例。

1. 技术与艺术的桥梁：它证明了编程不仅仅是冰冷的逻辑运算，也可以成为一种创造性的工具，用于解决实际生活和工作中遇到的美学和效率问题。

2. 从混沌到秩序：这个项目生动地诠释了如何通过技术手段，将无序的输入转化为有序、有价值的输出。这正是信息处理和数据科学的精髓所在。

3. 可扩展性与商业潜力：作为MVP，它为未来的发展留下了巨大空间。例如，可以引入更复杂的AI模型来自动识别内容类型，或者开发一个Web服务，让用户直接在线上传文本并获得排版好的文档。

总而言之，这个程序不仅是一个有用的小工具，更是一个集成了市场洞察、技术选型和架构设计的完整产品雏形，是“人工智能与创业智慧”课程的生动实践。

如果你觉得这个工具好用，欢迎关注我！

实现AI智能排版功能，输入杂乱的文本内容，自动调整字体，行距，分段，生成美观的文档。

相关文章

方差齐性是指各组数据的方差相等

正则表达式入门：快速掌握核心规则，轻松验证邮箱格式

收藏必备！小白到专家：AI大模型学习全攻略（附资料）

学生工作管理系统如何助力教育管理现代化发展

Wireshark中文版(网络抓包工具)

【深度收藏】大模型部署框架对决：Ollama与vLLM谁更适合你？从入门到生产环境全方位解析

【建议收藏】RAG技术选型指南：MaxKB还是FastGPT？一文带你读懂企业级知识库构建方案

橡胶制品：柔性赋能多领域，绿色转型启新程

Spring IoC是什么意思？3分钟讲清核心原理与作用

setcommmask有什么用？串口编程的事件过滤器详解

AI产品经理必看！手把手教你绘制AI智能体架构图

导师推荐！MBA必备10款AI论文软件测评TOP10

收藏！小白程序员入门大模型必看：别怕零基础，这门热门技术你也能掌握

圆角矩形设计优势与前端实现技巧详解

【深度学习】YOLO 进阶提升之算法改进（新型骨干网络 / 特征融合方法 / 损失函数设计）

AI落地实践：2026年十大行业应用全解析 | 程序员学习指南，建议收藏

C++ Win32窗口编程中窗口风格（Window Styles）的使用经验与相关要点总结（附源码）

索磷布韦维帕他韦Sofosbuvir/velpatasvir治疗丙型肝炎的病毒学治愈周期与泛基因型疗效

收藏！大模型面试必问：为什么有KV-Cache却没有Q-Cache？

揭秘大厂数据库基石：RocksDB 读写原理与 LSM-Tree 架构深度图解