折腾笔记[40]-使用上古A100 GPU运行qwen3-30b-a3b模型

news/2026/1/18 11:10:53/文章来源:https://www.cnblogs.com/qsbye/p/19498034

摘要

使用上古的A100-SXM4-40GB GPU通过ollama运行qwen3-30b-a3b模型.“30B-Q8 量化模型在 GPU 上回答一句自我介绍,用 28 s 生成 267 token,平均功耗 55 W,总能耗 0.44 Wh,单 token 电费不足三万分之一元,能效约 6 J/token。”.

关键信息

  • 镜像: ollama/ollama:0.6.6-rc2
  • GPU: A100-SXM4-40GB
  • GPU驱动: NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2
  • docker: Docker version 24.0.4, build 3713ee1
  • 模型: modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf
  • 主机系统: Linux Tesla 5.10.0-60.18.0.50.oe2203.x86_64 #1 SMP Wed Mar 30 03:12:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

实现

1. 在docker(已配置gpu驱动)中配置ollama

docker pull ollama/ollama:0.6.6-rc2
docker run --restart=always --name ollama -v /lvm-group1/qsbye/ollama:/root/.ollama -p 11435:11434 -e "OLLAMA_HOST=0.0.0.0" -d ollama/ollama:0.6.6-rc2

2. ollama修改默认目录(防止系统盘太满)

## 一键更新系统的ollama(本质就是重新安装最新版)
curl -fsSL https://ollama.com/install.sh | sh## 更新完验证
ollama --version## 数据盘新建ollama数据目录
sudo mkdir -p /lvm-group1/qsbye/ollama
sudo chmod 777 -R /lvm-group1/qsbye/ollama
sudo cp /usr/share/ollama/.ollama/models /lvm-group1/qsbye/ollama## ollama修改默认目录
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo vim /etc/systemd/system/ollama.service.d/override.conf

内容:

[Service]
Environment="OLLAMA_MODELS=/lvm-group1/qsbye/ollama/models"
User=ollama
Group=ollama

然后:

sudo systemctl daemon-reload
sudo systemctl restart ollama
sudo systemctl status ollama

3. 下载模型

# 使用国内源(魔搭社区)
ollama pull modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf

3. 运行模型

docker exec -d -e OLLAMA_GPU_LAYERS=999 -e OLLAMA_KEEP_ALIVE=-1 -e CUDA_VISIBLE_DEVICES=0 ollama bash -c "ollama run modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf"# 另开终端看显存
watch -n1 nvidia-smi

输出:

Every 1.0s: nvidia-smi                           Tesla: Sun Jan 18 08:59:50 2026Sun Jan 18 08:59:51 2026
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00   Driver Version: 460.106.00   CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB	   Off  | 00000000:82:00.0 Off |                    0 |
| N/A   39C    P0    43W / 400W |  36742MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   3751529	 C   /usr/bin/ollama                 36737MiB |
+-----------------------------------------------------------------------------+

4. Thinking问答测试

python -c "import requests,json,sys;[sys.stdout.write(json.loads(l)['response']) for l in requests.post('http://10.8.8.130:11435/api/generate',json={'model':'modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf','prompt':'一行python代码打印hello ollama;','stream':True},stream=True).iter_lines(decode_unicode=True) if l]"

输出:

<think>
好的,用户让我用一行Python代码打印“hello ollama”。这看起来挺简单的,但我要仔细想想有没有什么需要注意的地方。首先,Python中打印字符串的基本语法是print("内容")。所以最直接的方式就是print("hello ollama")。不过用户可能有其他需求吗?比如是否需要考虑大小写?不过例子中的“hello ollama”是小写的,所以应该没问题。有没有可能用户想用其他方法?比如使用变量或者转义字符?不过题目明确说是一行代码,所以应该直接使用print函数。另外,是否需要考虑Python版本?比如Python 2和3的区别,但现在的环境大多数是Python 3,所以没问题。还有可能用户想用更复杂的表达式,比如拼接字符串?比如print("hello" + " ollama"),但这样反而更复杂,不如直接写字符串简单。不过用户可能只是想确认基本用法,所以直接写最简单的形式最好。另外,检查是否有拼写错误,比如“ollama”是否正确?用户可能打错了,但按照问题描述,应该按照给出的字符串来处理。所以正确的代码应该是print("hello ollama")。有没有其他可能?比如使用格式化字符串,比如print(f"hello ollama"),但同样,这和直接写字符串没有区别,而且更复杂。所以还是直接使用print("hello ollama")最简洁。总结一下,用户的需求明确,只需要一行代码,所以直接使用print函数输出字符串即可。没有其他隐藏的要求,所以答案应该是这个。
</think>···python
print("hello ollama")
···

5. 打印token速率

python -c "
import requests, json, sys, time, datetime as dturl  = 'http://10.8.8.130:11435/api/generate'
payload = {'model': 'modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf','prompt': '一行python代码打印hello ollama;','stream': True
}start = dt.datetime.now()
try:r = requests.post(url, json=payload, stream=True, timeout=30)for line in r.iter_lines(decode_unicode=True):if not line:continuechunk = json.loads(line)sys.stdout.write(chunk.get('response', ''))sys.stdout.flush()# 实时 token/scnt = chunk.get('eval_count', 0)dur_ns = chunk.get('eval_duration', 0)if dur_ns:rate = cnt / (dur_ns / 1e9)sys.stdout.write(f'\r[%.1f token/s]     ' % rate)sys.stdout.flush()
except Exception as e:print('\nError:', e, file=sys.stderr)
"

输出:

[16.0 token/s]

6. 保证ollama显存不被回收

  1. 设置环境变量OLLAMA_KEEP_ALIVE=-1
  2. 每隔3分钟就调用一次模型(心跳)
## 如果没有装go编译器
pip install go-bin -i https://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
## go代码
vim ollama_heartbeat.go
go build ollama_heartbeat.go
chmod +x ollama_heartbeat
nohup ./ollama_heartbeat &
## 查看输出
tail nohup.out

代码:

// ollama_heartbeat.go
package mainimport ("bufio""bytes""encoding/json""fmt""io""net/http""os""time"
)const defaultHost = "http://127.0.0.1:11435"
const defaultModel = "modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf"func once() {host := os.Getenv("OLLAMA_HOST")if host == "" {host = defaultHost}model := os.Getenv("OLLAMA_MODEL")if model == "" {model = defaultModel}body, _ := json.Marshal(map[string]interface{}{"model":  model,"prompt": "你好","stream": true,})req, err := http.NewRequest("POST", host+"/api/generate", bytes.NewReader(body))if err != nil {fmt.Printf("[%s] heartbeat fail: %v\n", time.Now().Format("01-02 15:04:05"), err)return}req.Header.Set("Content-Type", "application/json")client := &http.Client{Timeout: 30 * time.Second}resp, err := client.Do(req)if err != nil {fmt.Printf("[%s] heartbeat fail: %v\n", time.Now().Format("01-02 15:04:05"), err)return}defer resp.Body.Close()if resp.StatusCode != http.StatusOK {fmt.Printf("[%s] heartbeat fail: status=%d\n", time.Now().Format("01-02 15:04:05"), resp.StatusCode)return}// 流式读取,累加字节数reader := bufio.NewReader(resp.Body)total := 0for {line, err := reader.ReadBytes('\n')if err == io.EOF {break}if err != nil {fmt.Printf("[%s] heartbeat fail while reading: %v\n", time.Now().Format("01-02 15:04:05"), err)return}total += len(line)}fmt.Printf("[%s] heartbeat ok, %d bytes\n", time.Now().Format("01-02 15:04:05"), total)
}func main() {for {once()time.Sleep(3 * time.Minute)}
}

输出:

[01-18 10:17:50] heartbeat ok, 17346 bytes
[01-18 10:20:59] heartbeat ok, 18120 bytes

7. 观察问答时的功率波动及单次问答token总量及能量消耗

代码:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
实时采集 Ollama 推理期间的 GPU 功率,统计 token 总量与能耗,
并保存为 CSV 后绘图输出 JPG。
"""import subprocess
import csv
import time
import datetime as dt
import requests
import sys
from PIL import Image, ImageDraw, ImageFont# -------------------- 参数 --------------------
URL = "http://10.8.8.130:11435/api/generate"
MODEL = "modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf"
PROMPT = "Please introduce yourself in one sentence."# 时间戳
ts = dt.datetime.now().strftime("%Y%m%d_%H%M%S")
csv_file = f"ollama_statistics_{ts}.csv"
jpg_file = f"ollama_statistics_{ts}.jpg"# -------------------- 功率采样 --------------------
def get_gpu_power():"""返回当前 GPU 功耗(W)"""out = subprocess.check_output(["nvidia-smi", "--query-gpu=power.draw", "--format=csv,noheader,nounits"],text=True,)return float(out.strip())# -------------------- 推理 + 采样 --------------------
power_samples = []  # [(timestamp, power_W), ...]
total_tokens = 0payload = {"model": MODEL,"prompt": PROMPT,"stream": True,
}print("Starting inference and power sampling...")
start_time = dt.datetime.now()# 推理前采样 50 次
for _ in range(50):power_samples.append((dt.datetime.now().isoformat(timespec="milliseconds"), get_gpu_power()))time.sleep(0.1)# 流式推理
try:resp = requests.post(URL, json=payload, stream=True, timeout=60)for line in resp.iter_lines(decode_unicode=True):if not line:continuechunk = line.strip()# 简单计 tokentotal_tokens += 1# 采样power_samples.append((dt.datetime.now().isoformat(timespec="milliseconds"), get_gpu_power()))time.sleep(0.01)
except Exception as e:print("Inference error:", e, file=sys.stderr)# 推理后再采样 50 次
for _ in range(50):power_samples.append((dt.datetime.now().isoformat(timespec="milliseconds"), get_gpu_power()))time.sleep(0.1)elapsed = (dt.datetime.now() - start_time).total_seconds()
avg_power = sum(p[1] for p in power_samples) / len(power_samples)
energy_wh = avg_power * elapsed / 3600  # Wh# -------------------- 保存 CSV --------------------
with open(csv_file, "w", newline="") as f:writer = csv.writer(f)writer.writerow(["timestamp", "power_W"])writer.writerows(power_samples)writer.writerow([])writer.writerow(["total_tokens", total_tokens])writer.writerow(["elapsed_s", elapsed])writer.writerow(["avg_power_W", avg_power])writer.writerow(["energy_Wh", energy_wh])
print(f"Saved {csv_file}")# -------------------- 绘图 --------------------
W, H = 800, 400
img = Image.new("RGB", (W, H), "white")
draw = ImageDraw.Draw(img)powers = [p[1] for p in power_samples]
times  = [p[0] for p in power_samples]# 坐标轴范围
margin = 60
x0, y0 = margin, margin
x1, y1 = W - margin, H - marginmin_p, max_p = min(powers), max(powers)
pad = (max_p - min_p) * 0.1
min_p, max_p = min_p - pad, max_p + pad# 折线坐标
coords = [(x0 + (i / (len(powers) - 1)) * (x1 - x0),y1 - ((p - min_p) / (max_p - min_p)) * (y1 - y0),)for i, p in enumerate(powers)
]# 边框
draw.rectangle([x0, y0, x1, y1], outline="black")# 折线
for i in range(len(coords) - 1):draw.line([coords[i], coords[i + 1]], fill="blue", width=2)# 标题
title = f"Ollama Power Sampling  tokens={total_tokens}  energy={energy_wh:.2f} Wh"
draw.text((W // 2, 10), title, fill="black", anchor="mt")# 轴标签
draw.text((x0, y0 - 10), f"{max_p:.1f} W", fill="black", anchor="lt")
draw.text((x0, y1 + 5), f"{min_p:.1f} W", fill="black", anchor="lt")
draw.text((x0 - 5, y1), times[0][-8:], fill="black", anchor="rt")
draw.text((x1, y1), times[-1][-8:], fill="black", anchor="lt")img.save(jpg_file)
print(f"Plotted {jpg_file}")

查看数据:

python -m http.server 8888
sudo firewall-cmd --permanent --add-port=8888/tcp
sudo firewall-cmd --reload
sudo firewall-cmd --list-all

访问: [http://10.8.8.130:8888].

数据:

功率
ollama_statistics_20260118_104449
timestamp,power_W
2026-01-18T10:44:49.482,48.52
2026-01-18T10:44:49.594,50.4
2026-01-18T10:44:49.713,48.52
2026-01-18T10:44:49.823,48.52
2026-01-18T10:44:49.934,48.52
2026-01-18T10:44:50.044,48.52
2026-01-18T10:44:50.157,48.52
2026-01-18T10:44:50.267,48.52
2026-01-18T10:44:50.378,48.52
2026-01-18T10:44:50.489,48.52
2026-01-18T10:44:50.600,48.52
2026-01-18T10:44:50.710,50.37
2026-01-18T10:44:50.834,48.52
2026-01-18T10:44:50.944,48.52
2026-01-18T10:44:51.055,48.52
2026-01-18T10:44:51.167,48.52
2026-01-18T10:44:51.277,48.52
2026-01-18T10:44:51.388,48.52
2026-01-18T10:44:51.499,48.52
2026-01-18T10:44:51.610,48.52
2026-01-18T10:44:51.720,48.52
2026-01-18T10:44:51.832,50.37
2026-01-18T10:44:51.957,48.52
2026-01-18T10:44:52.068,48.52
2026-01-18T10:44:52.178,48.52
2026-01-18T10:44:52.289,48.52
2026-01-18T10:44:52.399,48.52
2026-01-18T10:44:52.509,48.52
2026-01-18T10:44:52.620,48.52
2026-01-18T10:44:52.730,48.52
2026-01-18T10:44:52.842,48.52
2026-01-18T10:44:52.952,50.37
2026-01-18T10:44:53.078,49.03
2026-01-18T10:44:53.189,48.52
2026-01-18T10:44:53.299,48.52
2026-01-18T10:44:53.409,48.52
2026-01-18T10:44:53.519,48.52
2026-01-18T10:44:53.630,48.52
2026-01-18T10:44:53.740,48.52
2026-01-18T10:44:53.849,48.52
2026-01-18T10:44:53.959,48.52
2026-01-18T10:44:54.070,50.4
2026-01-18T10:44:54.198,49.03
2026-01-18T10:44:54.308,48.52
2026-01-18T10:44:54.420,48.52
2026-01-18T10:44:54.530,48.52
2026-01-18T10:44:54.640,48.52
2026-01-18T10:44:54.750,48.52
2026-01-18T10:44:54.859,48.52
2026-01-18T10:44:54.969,48.52
2026-01-18T10:44:55.185,50.37
2026-01-18T10:44:55.247,65.36
2026-01-18T10:44:55.308,65.36
2026-01-18T10:44:55.368,57.65
2026-01-18T10:44:55.427,51.72
2026-01-18T10:44:55.488,51.72
2026-01-18T10:44:55.548,62.63
2026-01-18T10:44:55.608,62.63
2026-01-18T10:44:55.668,60.33
2026-01-18T10:44:55.728,51.3
2026-01-18T10:44:55.788,51.3
2026-01-18T10:44:55.848,62.21
2026-01-18T10:44:55.908,62.21
2026-01-18T10:44:55.975,61.27
2026-01-18T10:44:56.035,52.63
2026-01-18T10:44:56.095,52.63
2026-01-18T10:44:56.164,66.71
2026-01-18T10:44:56.225,50.79
2026-01-18T10:44:56.315,50.37
2026-01-18T10:44:56.406,50.37
2026-01-18T10:44:56.478,57.65
2026-01-18T10:44:56.544,52.63
2026-01-18T10:44:56.606,52.63
2026-01-18T10:44:56.666,66.29
2026-01-18T10:44:56.726,66.29
2026-01-18T10:44:56.786,50.79
2026-01-18T10:44:56.848,54.42
2026-01-18T10:44:56.908,54.42
2026-01-18T10:44:56.968,68.59
2026-01-18T10:44:57.028,68.59
2026-01-18T10:44:57.088,53.05
2026-01-18T10:44:57.148,54.42
2026-01-18T10:44:57.206,54.42
2026-01-18T10:44:57.277,69.48
2026-01-18T10:44:57.338,51.72
2026-01-18T10:44:57.399,59.91
2026-01-18T10:44:57.490,59.91
2026-01-18T10:44:57.521,59.91
2026-01-18T10:44:57.589,58.07
2026-01-18T10:44:57.651,53.48
2026-01-18T10:44:57.711,53.48
2026-01-18T10:44:57.777,69.01
2026-01-18T10:44:57.843,51.3
2026-01-18T10:44:57.912,51.3
2026-01-18T10:44:57.973,68.59
2026-01-18T10:44:58.033,68.59
2026-01-18T10:44:58.108,50.79
2026-01-18T10:44:58.184,68.08
2026-01-18T10:44:58.244,51.3
2026-01-18T10:44:58.306,51.3
2026-01-18T10:44:58.375,59.44
2026-01-18T10:44:58.439,59.44
2026-01-18T10:44:58.500,55.35
2026-01-18T10:44:58.620,55.35
2026-01-18T10:44:58.642,55.35
2026-01-18T10:44:58.681,67.23
2026-01-18T10:44:58.743,67.23
2026-01-18T10:44:58.804,53.98
2026-01-18T10:44:58.867,53.05
2026-01-18T10:44:58.948,53.05
2026-01-18T10:44:59.027,49.0
2026-01-18T10:44:59.102,65.87
2026-01-18T10:44:59.170,53.09
2026-01-18T10:44:59.242,53.09
2026-01-18T10:44:59.333,54.93
2026-01-18T10:44:59.404,64.42
2026-01-18T10:44:59.468,52.16
2026-01-18T10:44:59.537,52.16
2026-01-18T10:44:59.606,63.57
2026-01-18T10:44:59.669,53.05
2026-01-18T10:44:59.754,53.05
2026-01-18T10:44:59.791,63.57
2026-01-18T10:44:59.853,63.57
2026-01-18T10:44:59.922,49.42
2026-01-18T10:44:59.983,56.28
2026-01-18T10:45:00.046,56.28
2026-01-18T10:45:00.107,63.57
2026-01-18T10:45:00.170,50.79
2026-01-18T10:45:00.231,50.79
2026-01-18T10:45:00.291,60.37
2026-01-18T10:45:00.351,60.37
2026-01-18T10:45:00.412,62.63
2026-01-18T10:45:00.471,51.3
2026-01-18T10:45:00.533,51.3
2026-01-18T10:45:00.593,59.95
2026-01-18T10:45:00.653,59.95
2026-01-18T10:45:00.714,62.63
2026-01-18T10:45:00.774,52.63
2026-01-18T10:45:00.877,62.67
2026-01-18T10:45:00.901,62.67
2026-01-18T10:45:00.960,62.67
2026-01-18T10:45:01.020,60.77
2026-01-18T10:45:01.081,51.72
2026-01-18T10:45:01.157,51.72
2026-01-18T10:45:01.221,61.27
2026-01-18T10:45:01.283,51.72
2026-01-18T10:45:01.343,51.72
2026-01-18T10:45:01.402,61.74
2026-01-18T10:45:01.469,61.74
2026-01-18T10:45:01.545,49.84
2026-01-18T10:45:01.607,62.25
2026-01-18T10:45:01.669,62.25
2026-01-18T10:45:01.729,57.65
2026-01-18T10:45:01.789,52.16
2026-01-18T10:45:01.849,52.16
2026-01-18T10:45:01.910,63.57
2026-01-18T10:45:02.002,60.33
2026-01-18T10:45:02.032,60.33
2026-01-18T10:45:02.092,52.16
2026-01-18T10:45:02.151,52.16
2026-01-18T10:45:02.212,62.25
2026-01-18T10:45:02.273,62.25
2026-01-18T10:45:02.333,60.37
2026-01-18T10:45:02.393,51.3
2026-01-18T10:45:02.453,51.3
2026-01-18T10:45:02.513,61.31
2026-01-18T10:45:02.575,61.31
2026-01-18T10:45:02.638,60.37
2026-01-18T10:45:02.700,52.67
2026-01-18T10:45:02.767,52.67
2026-01-18T10:45:02.836,62.21
2026-01-18T10:45:02.917,61.74
2026-01-18T10:45:02.997,50.79
2026-01-18T10:45:03.128,66.71
2026-01-18T10:45:03.153,66.71
2026-01-18T10:45:03.203,49.84
2026-01-18T10:45:03.267,49.84
2026-01-18T10:45:03.338,66.71
2026-01-18T10:45:03.409,53.05
2026-01-18T10:45:03.481,53.05
2026-01-18T10:45:03.553,50.79
2026-01-18T10:45:03.619,57.14
2026-01-18T10:45:03.690,57.14
2026-01-18T10:45:03.760,49.42
2026-01-18T10:45:03.825,59.44
2026-01-18T10:45:03.889,59.44
2026-01-18T10:45:03.970,49.42
2026-01-18T10:45:04.031,64.42
2026-01-18T10:45:04.102,50.37
2026-01-18T10:45:04.202,50.79
2026-01-18T10:45:04.267,50.79
2026-01-18T10:45:04.330,59.02
2026-01-18T10:45:04.390,59.02
2026-01-18T10:45:04.451,63.06
2026-01-18T10:45:04.512,50.79
2026-01-18T10:45:04.573,50.79
2026-01-18T10:45:04.634,61.31
2026-01-18T10:45:04.695,61.31
2026-01-18T10:45:04.757,60.81
2026-01-18T10:45:04.817,51.72
2026-01-18T10:45:04.878,51.72
2026-01-18T10:45:04.941,65.36
2026-01-18T10:45:05.002,65.36
2026-01-18T10:45:05.063,57.14
2026-01-18T10:45:05.125,53.09
2026-01-18T10:45:05.190,53.09
2026-01-18T10:45:05.254,64.94
2026-01-18T10:45:05.340,52.16
2026-01-18T10:45:05.437,56.28
2026-01-18T10:45:05.503,56.28
2026-01-18T10:45:05.565,60.37
2026-01-18T10:45:05.639,53.98
2026-01-18T10:45:05.703,53.98
2026-01-18T10:45:05.765,62.21
2026-01-18T10:45:05.836,51.72
2026-01-18T10:45:05.897,51.72
2026-01-18T10:45:05.957,67.66
2026-01-18T10:45:06.022,49.84
2026-01-18T10:45:06.083,49.84
2026-01-18T10:45:06.153,57.14
2026-01-18T10:45:06.214,57.14
2026-01-18T10:45:06.274,58.07
2026-01-18T10:45:06.337,53.05
2026-01-18T10:45:06.398,68.08
2026-01-18T10:45:06.524,68.08
2026-01-18T10:45:06.546,56.72
2026-01-18T10:45:06.581,56.72
2026-01-18T10:45:06.643,53.48
2026-01-18T10:45:06.705,53.48
2026-01-18T10:45:06.766,67.66
2026-01-18T10:45:06.827,67.66
2026-01-18T10:45:06.889,50.37
2026-01-18T10:45:06.953,56.28
2026-01-18T10:45:07.015,56.28
2026-01-18T10:45:07.086,53.48
2026-01-18T10:45:07.146,53.48
2026-01-18T10:45:07.207,53.48
2026-01-18T10:45:07.264,68.08
2026-01-18T10:45:07.333,68.08
2026-01-18T10:45:07.390,52.16
2026-01-18T10:45:07.459,54.42
2026-01-18T10:45:07.518,54.42
2026-01-18T10:45:07.579,52.16
2026-01-18T10:45:07.668,52.16
2026-01-18T10:45:07.715,52.16
2026-01-18T10:45:07.776,67.23
2026-01-18T10:45:07.838,67.23
2026-01-18T10:45:07.899,50.37
2026-01-18T10:45:07.960,55.35
2026-01-18T10:45:08.021,55.35
2026-01-18T10:45:08.083,69.01
2026-01-18T10:45:08.145,52.67
2026-01-18T10:45:08.206,52.67
2026-01-18T10:45:08.267,60.81
2026-01-18T10:45:08.328,60.81
2026-01-18T10:45:08.388,67.23
2026-01-18T10:45:08.466,55.35
2026-01-18T10:45:08.527,55.35
2026-01-18T10:45:08.588,67.66
2026-01-18T10:45:08.648,52.67
2026-01-18T10:45:08.710,53.05
2026-01-18T10:45:08.830,53.05
2026-01-18T10:45:08.890,69.53
2026-01-18T10:45:08.958,52.67
2026-01-18T10:45:09.014,52.67
2026-01-18T10:45:09.071,58.59
2026-01-18T10:45:09.137,58.59
2026-01-18T10:45:09.221,52.12
2026-01-18T10:45:09.283,67.23
2026-01-18T10:45:09.348,67.23
2026-01-18T10:45:09.411,54.93
2026-01-18T10:45:09.476,58.07
2026-01-18T10:45:09.549,58.07
2026-01-18T10:45:09.615,52.16
2026-01-18T10:45:09.676,58.07
2026-01-18T10:45:09.736,58.07
2026-01-18T10:45:09.797,51.72
2026-01-18T10:45:09.935,51.72
2026-01-18T10:45:09.957,51.72
2026-01-18T10:45:09.984,58.07
2026-01-18T10:45:10.048,58.07
2026-01-18T10:45:10.110,62.25
2026-01-18T10:45:10.171,51.72
2026-01-18T10:45:10.242,51.72
2026-01-18T10:45:10.302,67.23
2026-01-18T10:45:10.364,50.37
2026-01-18T10:45:10.424,50.37
2026-01-18T10:45:10.486,56.28
2026-01-18T10:45:10.547,56.28
2026-01-18T10:45:10.613,64.94
2026-01-18T10:45:10.679,51.72
2026-01-18T10:45:10.743,51.72
2026-01-18T10:45:10.804,67.66
2026-01-18T10:45:10.878,50.37
2026-01-18T10:45:10.941,50.37
2026-01-18T10:45:11.070,69.01
2026-01-18T10:45:11.130,49.42
2026-01-18T10:45:11.192,55.35
2026-01-18T10:45:11.263,55.35
2026-01-18T10:45:11.325,56.72
2026-01-18T10:45:11.386,53.09
2026-01-18T10:45:11.448,53.09
2026-01-18T10:45:11.510,68.08
2026-01-18T10:45:11.573,68.08
2026-01-18T10:45:11.635,50.37
2026-01-18T10:45:11.722,63.57
2026-01-18T10:45:11.784,51.3
2026-01-18T10:45:11.846,51.3
2026-01-18T10:45:11.908,64.94
2026-01-18T10:45:11.973,64.94
2026-01-18T10:45:12.034,53.98
2026-01-18T10:45:12.097,55.35
2026-01-18T10:45:12.203,67.23
2026-01-18T10:45:12.232,67.23
2026-01-18T10:45:12.284,50.79
2026-01-18T10:45:12.346,50.79
2026-01-18T10:45:12.416,59.44
2026-01-18T10:45:12.437,59.44
2026-01-18T10:45:12.547,49.03
2026-01-18T10:45:12.658,49.03
2026-01-18T10:45:12.768,49.03
2026-01-18T10:45:12.878,49.03
2026-01-18T10:45:12.988,49.03
2026-01-18T10:45:13.099,49.03
2026-01-18T10:45:13.209,50.82
2026-01-18T10:45:13.420,49.03
2026-01-18T10:45:13.532,49.03
2026-01-18T10:45:13.641,48.52
2026-01-18T10:45:13.752,48.52
2026-01-18T10:45:13.861,48.52
2026-01-18T10:45:13.971,48.52
2026-01-18T10:45:14.082,48.52
2026-01-18T10:45:14.193,48.52
2026-01-18T10:45:14.303,49.03
2026-01-18T10:45:14.413,50.4
2026-01-18T10:45:14.543,48.52
2026-01-18T10:45:14.653,48.52
2026-01-18T10:45:14.763,48.52
2026-01-18T10:45:14.874,48.52
2026-01-18T10:45:14.985,48.52
2026-01-18T10:45:15.095,49.03
2026-01-18T10:45:15.205,49.03
2026-01-18T10:45:15.315,48.52
2026-01-18T10:45:15.425,48.52
2026-01-18T10:45:15.534,50.82
2026-01-18T10:45:15.665,48.52
2026-01-18T10:45:15.775,48.52
2026-01-18T10:45:15.885,48.52
2026-01-18T10:45:15.995,48.52
2026-01-18T10:45:16.106,48.52
2026-01-18T10:45:16.217,49.03
2026-01-18T10:45:16.327,48.52
2026-01-18T10:45:16.437,48.52
2026-01-18T10:45:16.546,48.52
2026-01-18T10:45:16.657,50.4
2026-01-18T10:45:16.788,48.52
2026-01-18T10:45:16.898,48.52
2026-01-18T10:45:17.009,48.52
2026-01-18T10:45:17.119,48.52
2026-01-18T10:45:17.229,49.03
2026-01-18T10:45:17.339,48.52
2026-01-18T10:45:17.449,48.52
2026-01-18T10:45:17.558,48.52
2026-01-18T10:45:17.670,48.52
2026-01-18T10:45:17.781,50.37
2026-01-18T10:45:17.908,49.03
2026-01-18T10:45:18.017,48.52total_tokens,267
elapsed_s,28.644992
avg_power_W,55.060354223433244
energy_Wh,0.4381120572909476

8. 分析

从这份数据可以得出以下结论(所有数值均为单次问答):

  1. 时间

    • 总耗时 ≈ 28.6 s
    • 采样点 267 个 → 平均 10.7 次/s,完整覆盖推理前-中-后三个阶段。
  2. Token 效率

    • 共 267 个 token(含输入+输出,流式逐 token 返回)
    • 吞吐 ≈ 267 / 28.6 ≈ 9.3 token/s
    • 每 token 延迟 ≈ 107 ms
  3. 功耗

    • 基线待机 48–50 W
    • 推理峰值 69.5 W,平均 55.1 W,抬升约 15 W
    • 动态范围 21 W(48 W → 69 W)
  4. 能耗

    • 总能量 0.438 Wh(≈ 1.58 kJ)
    • 单 token 能耗 0.438 Wh / 267 ≈ 1.64 mWh
    • 按 0.6 ¥/kWh 估算,电费 ≈ 0.00026 ¥(0.026 分钱)
  5. 能效比

    • 9.3 token/s ÷ 55 W ≈ 0.17 token/J
    • 或 6 J/token,相当于点亮 6 W LED 灯泡 1 秒。
  6. 对比参考

    • 同尺寸纯 GPU 量化模型通常 10–20 token/s,此处 9.3 token/s 略低,可能受网络/API 开销或 CPU 预处理限制。
    • 单 token 1.64 mWh 与文献中 30 B 级量化模型 1–3 mWh 相符,属于正常水平。

一句话总结
“30B-Q8 量化模型在上古 GPU 上回答一句自我介绍,用 28 s 生成 267 token,平均功耗 55 W,总能耗 0.44 Wh,单 token 电费不足三万分之一元,能效约 6 J/token。”

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/1177924.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

028动态规划之字符串DP——算法备赛 - 实践

028动态规划之字符串DP——算法备赛 - 实践pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "Consolas", "…

研究生写论文必备的3款降AI工具,导师都说自然 - 还在做实验的师兄

研究生论文对学术规范和表达自然度要求更高,普通降AI工具容易改出「机翻味」。本文推荐3款导师认可的降AI工具:嘎嘎降AI(学术味保留好,达标率99.26%)、比话降AI(可降至0%,有退款保障)、AIGCleaner(英文论文专…

手把手教你降论文AI率:从检测到修改的完整操作指南 - 还在做实验的师兄

这篇教程带你走完降AI的全流程:检测定位问题、工具处理、人工精修、验证达标。核心工具推荐嘎嘎降AI(达标率99.26%)。全程约1小时,新手也能搞定。手把手教你降论文AI率:从检测到修改的完整操作指南TL;DR:这篇教程…

职业院校智慧校园评价指标体系如何构建?这份指南请收好

✅作者简介&#xff1a;合肥自友科技 &#x1f4cc;核心产品&#xff1a;智慧校园平台(包括教工管理、学工管理、教务管理、考务管理、后勤管理、德育管理、资产管理、公寓管理、实习管理、就业管理、离校管理、科研平台、档案管理、学生平台等26个子平台) 。公司所有人员均有多…

论文AI率太高被退回?5招教你快速解决 - 还在做实验的师兄

论文被退回说AI率太高,别慌!这篇文章教你5招快速解决:了解AI率高的真正原因、避开3个常见误区、3个有效的手动修改技巧、借助专业工具快速降AI、最后再做一轮检查。按这个流程走,基本都能解决问题。论文AI率太高被…

深聊江南电缆官方销售热线,电缆选购有哪些要点? - 工业品牌热点

本榜单依托全维度市场调研与真实行业口碑,深度筛选出五家标杆电缆制造企业,为工程采购、项目选型提供客观依据,助力精准匹配适配的电缆供应伙伴。 TOP1 推荐:无锡江南电缆有限公司 推荐指数:★★★★★ | 口碑评分…

DeepSeek写的论文怎么降AI?这6款工具亲测有效 - 还在做实验的师兄

DeepSeek写的论文AI率动辄90%以上,直接提交必翻车。实测嘎嘎降AI能把AI率从95%降到9%,3分钟处理完,4.8元/千字。知网要求严的话用比话降AI,承诺降到15%以下否则退款。DeepSeek写的论文怎么降AI?这6款工具亲测有效…

导师严选2026 AI论文软件TOP8:MBA毕业论文写作全解析

导师严选2026 AI论文软件TOP8&#xff1a;MBA毕业论文写作全解析 2026年MBA论文写作工具测评&#xff1a;为何需要一份专业榜单&#xff1f; MBA学位论文的撰写不仅是学术能力的体现&#xff0c;更是对研究方法、逻辑思维和数据分析的综合考验。随着AI技术在学术领域的广泛应用…

题目1112:C语言考试练习题_一元二次方程

#include<iostream> #include<iomanip> #include<cmath> using namespace std; int main(){double a,b,c;cin>>a>>b>>c;double x1(-b(pow(b*b-4*a*c,0.5)))/2*a;//不可以写为1/2&#xff0c;一定是0.5&#xff0c;不可以是b^2,一定是b*b d…

049.二维差分

一维差分 对于原始数组a[] 通过d[i]=a[i]-a[i-1]初始化出d[]差分数组 对差分数组进行若干次修改 // 在[l,r]上加k void change(int l,int r,int k){d[l]+=k;d[r+1]-=k; }最后update得到最终的a[] void update(int n){f…

2025年本地市场热门重型回弹仪品牌推荐,智能非金属超声检测仪/超声波回弹仪/数显碳化深度尺/高强回弹仪回弹仪供应商推荐榜单 - 品牌推荐师

随着我国基础设施建设的持续深化与既有建筑安全评估需求的日益增长,作为混凝土强度无损检测的关键设备,重型回弹仪的市场关注度显著提升。行业正经历从传统机械式向数字化、智能化、高精度方向的转型。然而,面对市场…

融智学形式本体论:一种基于子全域与超子域的统一认知架构

融智学形式本体论&#xff1a;一种基于子全域与超子域的统一认知架构摘要本文正式提出并系统阐述 “融智学形式本体论” 。它以三个不可再分的元子&#xff08;物理、意义、文法&#xff09;为基底&#xff0c;构建一个称为 “分层集合范畴” 的数学结构&#xff0c;实现了对物…

动态电压恢复器(DVR)模型 Matlab/simulink 质量过硬, 可用于治理电能质量问...

动态电压恢复器&#xff08;DVR&#xff09;模型 Matlab/simulink 质量过硬&#xff0c; 可用于治理电能质量问题&#xff1a;仿真总时长0.7s&#xff0c;DVR始终接入&#xff0c;具体如下&#xff1a; 0.1-0.2s治理电压暂降&#xff1b; 0.3-0.4s治理电压暂升&#xff1b; 0.…

2026年国内可靠的全自动超声波清洗机厂家哪家靠谱,单臂超声波清洗机/晶圆清洗机,全自动超声波清洗机公司联系方式 - 品牌推荐师

近年来,随着制造业对精密清洗需求的持续攀升,全自动超声波清洗机凭借高效、环保、一致性强的技术优势,成为汽车零部件、半导体、精密五金等行业的核心设备。然而,市场供应商鱼龙混杂,技术实力、服务能力与定制化水…

MATLAB环境下基于数据驱动的随机子空间(SSI-DATA)和协方差驱动的随机子空间(SSI...

MATLAB环境下基于数据驱动的随机子空间(SSI-DATA)和协方差驱动的随机子空间(SSI-COV)的结构模态参数识别方法&#xff0c;可用于土木&#xff0c;航空航天&#xff0c;机械等领域。 本品为程序&#xff0c;已调通&#xff0c;可直接运行。一、系统概述 本系统是一套基于MATLAB开…

Apache 详解(在 Ubuntu 24 中安装和配置 Apache,超详细)

零散知识讲解 目录零散知识讲解站点配置和全局配置的区别www-data 用户介绍什么是进程的上下文切换?TCP 连接的三个阶段客户端和服务器通信的过程开启 AcceptFilter 和关闭 AcceptFilter的区别在 Ubuntu 24 中安装和配…

Invicti Standard v26.1.0 发布 - 企业级 Web 应用与 API 安全

Invicti Standard v26.1.0 for Windows - 企业级 Web 应用与 API 安全Invicti Standard v26.1.0 for Windows - 企业级 Web 应用与 API 安全 Invicti (formerly Netsparker) | Web Application and API Security for E…

4.4 虚拟人口型驱动:让静态图像开口说话的魔法

4.4 虚拟人口型驱动:让静态图像开口说话的魔法 引言 在前三节中,我们学习了虚拟人的视觉外观生成、扩散模型与ControlNet技术以及声音克隆技术。现在,我们来到了让虚拟人真正"活起来"的关键环节——口型驱动技术。这项技术能够让静态的虚拟人图像根据语音内容同…

leetcode 881. Boats to Save People 救生艇

Problem: 881. Boats to Save People 救生艇 解题过程 排序&#xff0c;然后查找可以配对的&#xff0c;而且右上界是不断缩小的&#xff0c;用到了状态数组 优化版本只需要求出可以配对的&#xff0c;然后总数减去配对数量 Code class Solution { public:int numRescueBoats…

5.2 多模态OCR架构:Donut、TrOCR、LayoutLMv3全面对比

5.2 多模态OCR架构:Donut、TrOCR、LayoutLMv3全面对比 引言 在上一节中,我们回顾了OCR技术的发展历程,从传统的模板匹配方法到现代的深度学习和生成式AI技术。随着多模态学习的兴起,OCR技术也迎来了新的发展机遇。现代多模态OCR架构不仅能够识别文本内容,还能理解文档的…