以 ubuntu-server 24.04 上运行 netdata v1.47 为例
一、驱动安装
1、安装显卡驱动
https://www.nvidia.com/en-us/drivers/
选择显卡型号下载驱动安装文件,比如:NVIDIA-Linux-x86_64-580.126.09.run
执行安装:
chmod +x NVIDIA-Linux-x86_64-580.126.09.run ./NVIDIA-Linux-x86_64-580.126.09.run
2、验证驱动是否安装成功
执行 nvidia-smi,观察终端是否有如下返回:
root@ubuntu:~# nvidia-smi Fri Jan 23 14:33:13 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3050 Off | 00000000:01:00.0 Off | N/A | | 30% 24C P8 10W / 70W | 0MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
3、安装 NVIDIA Container Toolkit
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
二、docker 配置
1、配置 daemon.json
nvidia-ctk runtime configure --runtime=docker --set-as-default
命令运行完后会在 /etc/docker/daemon.json 中添加红色加粗部分
{"bip": "192.168.222.1/24","default-runtime": "nvidia","log-driver": "json-file","log-opts": {"max-file": "5","max-size": "10m"},"runtimes": {"nvidia": {"args": [],"path": "nvidia-container-runtime"}} }
cat /etc/docker/daemon.json 确保红色部分添加成功
2、重启 docker 服务
systemctl daemon-reload
systemctl restart docker
三、容器配置
1、添加 deploy 配置(红色加粗部分)
services:netdata:image: netdata/netdata:v1.47container_name: netdatahostname: ubuntu-netdatapid: hostports:- 19999:19999restart: unless-stoppedcap_add:- SYS_PTRACE- SYS_ADMINsecurity_opt:- apparmor:unconfinedenvironment:- DEFAULT_LANGUAGE=zh_CNvolumes:- ./netdataconfig/netdata:/etc/netdata- netdatalib:/var/lib/netdata- netdatacache:/var/cache/netdata- /:/host/root:ro,rslave- /etc/passwd:/host/etc/passwd:ro- /etc/group:/host/etc/group:ro- /etc/localtime:/etc/localtime:ro- /proc:/host/proc:ro- /sys:/host/sys:ro- /etc/os-release:/host/etc/os-release:ro- /var/log:/host/var/log:ro- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:resources:reservations:devices:- driver: nvidiacount: allcapabilities: [gpu]volumes:netdatalib:netdatacache:
2、启动 compose,在容器中验证 nvidia-smi
docker exec -it netdata nvidia-smi
观察终端是否有如下返回:
root@ubuntu:~# docker exec -it netdata nvidia-smi Fri Jan 23 14:34:28 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3050 Off | 00000000:01:00.0 Off | N/A | | 30% 24C P8 10W / 70W | 0MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
如有返回则表示容器已经可以使用 gpu 资源
参考链接:
https://learn.netdata.cloud/docs/netdata-agent/installation/docker
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html