目录
- 查看模型路径
- 压测命令
- 安装压力测试工具
- 或者用官方示例 Python 脚本
- 实时监控显存/GPU使用率
- 或者查看特定进程
- top 或 htop 实时查看
- 或者更精确
查看模型路径
curl http://127.0.0.1:8000/v1/models
{"object":"list","data":[{"id":"/data/models/Qwen1.5-14B-Chat-AWQ","object":"model","created":1768828444,"owned_by":"vllm","root":"/data/models/Qwen1.5-14B-Chat-AWQ","parent":null,"max_model_len":4096,"permission":[{"id":"modelperm-954558153c0727e8","object":"model_permission","created":1768828444,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}(py312) root@4eaebd1dd72f:/data/logs#
(py312) root@4eaebd1dd72f:/da
curl -X POST http://127.0.0.1:8000/v1/completions
-H "Content-Type: application/json"
-d '{
"model": "/data/models/Qwen1.5-14B-Chat-AWQ",
"prompt": "Say hello",
"max_tokens": 10
}'
压测命令
安装压力测试工具
pip install locust
或者用官方示例 Python 脚本
python -m vllm.entrypoints.benchmark
--model Qwen/Qwen-14B-2.5
--dtype float16
--batch-size 1
--num-batches 10
--max-seq-len 512
--use-8bit
实时监控显存/GPU使用率
watch -n 1 nvidia-smi
或者查看特定进程
nvidia-smi -i 0 -q -d MEMORY,UTILIZATION
top 或 htop 实时查看
htop
或者更精确
watch -n 1 "ps -eo pid,cmd,%cpu,%mem --sort=-%cpu | head -20"