Skip to content

trevorndodds/gpu-metrics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

gpu-metrics

https://grafana.com/dashboards/7320

Build and run Docker Container:

docker build -t <user>/gpu-elastic-metrics:latest .

To run it:

docker run -d --runtime=nvidia --restart always --name cuda-gpu-metrics \ 
-e GPU_METRICS_CLUSTER_URL='http://elasticURL:9200' <user>/gpu-elastic-metrics:latest 

Alternative: Create Systemd service

Create Folder and copy webhook:

sudo mkdir -p /data/scripts/gpu/
sudo cp /tmp/gpu_elastic.py /data/scripts/gpu/

Create Service

sudo vi /etc/systemd/system/gpu_elastic.service

Copy the below in the service file:

[Unit]
Description=GPU Metric Service
After=multi-user.target

[Service]
Environment=PYTHONUNBUFFERED=true
Type=simple
ExecStart=/data/scripts/gpu/gpu_elastic.py
User=root
WorkingDirectory=/data/scripts/gpu
Restart=on-failure

[Install]
WantedBy=multi-user.target

Execute:

sudo chmod 664 /etc/systemd/system/gpu_elastic.service
sudo chmod +x /data/scripts/gpu/gpu_elastic.py

Register and Start the Service:

sudo systemctl enable gpu_elastic.service 
sudo systemctl daemon-reload
sudo systemctl start gpu_elastic.service

To View Status and logs execute:

sudo systemctl status gpu_elastic.service -l
sudo journalctl -u gpu_elastic.service -xn -l
sudo journalctl -u gpu_elastic.service

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published