@ai-ml/nvidia-container-toolkit

Project ID: 97743

Description

warning This is not guaranteed to work, be useful in any way, or not eat your machine. warning

Installation Instructions

Setup the NVIDIA kmod via RPMFusion. Ensure this is working before continuing.

Install the CUDA libraries:

$ sudo dnf install xorg-x11-drv-nvidia-cuda

Setup the COPR repo:

$ sudo dnf copr enable @ai-ml/nvidia-container-toolkit

Install the NVIDIA container toolkit:

$ sudo dnf install nvidia-container-toolkit nvidia-container-toolkit-selinux

Setup the CDI configuration:

$ sudo nvidia-ctk cdi generate -output /etc/cdi/nvidia.yaml

Test everything is working:

$ podman run --device nvidia.com/gpu=all --rm fedora:latest nvidia-smi

If everything is working, you should see something like:

  Fri Mar 22 03:18:29 2024         
  +-----------------------------------------------------------------------------------------+  
  | NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |  
  |-----------------------------------------+------------------------+----------------------+  
  | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |  
  | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |  
  |                                         |                        |               MIG M. |  
  |=========================================+========================+======================|  
  |   0  NVIDIA GeForce RTX 3080 Ti     Off |   00000000:08:00.0  On |                  N/A |  
  |  0%   46C    P8             38W /  350W |     482MiB /  12288MiB |     12%      Default |  
  |                                         |                        |                  N/A |  
  +-----------------------------------------+------------------------+----------------------+  
                                                                                             
  +-----------------------------------------------------------------------------------------+  
  | Processes:                                                                              |  
  |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |  
  |        ID   ID                                                               Usage      |  
  |=========================================================================================|  
  +-----------------------------------------------------------------------------------------+

Test the Difference Between CPU and GPU Performance

Download a PyTorch environment and ensure it runs:

$ podman run --device nvidia.com/gpu=all --rm docker.io/pytorch/pytorch:latest

Test CPU:

$ time podman run --device nvidia.com/gpu=all --rm docker.io/pytorch/pytorch:latest bash -c 'pip install pytorch-benchmark && python -c "import torch; import json; from torchvision.models import efficientnet_b0; from pytorch_benchmark import benchmark; model = efficientnet_b0().to(\"cpu\"); sample = torch.randn(8, 3, 224, 224); results = benchmark(model, sample, num_runs=200); print(json.dumps(results, indent=4))"'

Test GPU:

$ time podman run --device nvidia.com/gpu=all --rm docker.io/pytorch/pytorch:latest bash -c 'pip install pytorch-benchmark && python -c "import torch; import json; from torchvision.models import efficientnet_b0; from pytorch_benchmark import benchmark; model = efficientnet_b0().to(\"cuda\"); sample = torch.randn(8, 3, 224, 224); results = benchmark(model, sample, num_runs=200); print(json.dumps(results, indent=4))"'

Test SELinux Policy

Test if the shipped NVIDIA DGX SELinux policy is working (the nvidia-container-toolkit-selinux package enables container_use_devices by default) :

$ sudo setsebool container_use_devices 0
$ podman run --device nvidia.com/gpu=all \
             --security-opt label=type:nvidia_container_t \
             --rm fedora:latest nvidia-smi

Re-run testing and then reset SELinux.

$ sudo setsebool container_use_devices 1

Monitoring CPU vs GPU Usage

Use the nvtop utility to monitor what load is on your CPU/GPU:

$ sudo dnf install nvtop
$ nvtop

Something Fun, InstructLab (ilab)

Use InstructLab to run a pre-trained Large Language Model (LLM) and chat with it.

Clone InstructLab and build the container:

$ git clone https://github.com/instructlab/instructlab.git ~/instructlab
$ podman build ~/instructlab/containers/cuda -t instructlab
$ podman run -it --rm --device nvidia.com/gpu=all instructlab

Initialize InstructLab inside the container and start chatting with the LLM:

$ ilab init
$ ilab download
$ ilab chat

Please note, the models, config, etc. are downloaded into the container and will be deleted after the container is stopped.

Something Fun, Image Generation

Use imaginAIry to generate prompt based images using multiple models and the StableStudio web UI.

Build a F40 based container with imaginAIry installed:

$ mkdir ~/imaginary
$ tee ~/imaginary/Dockerfile << EOF
FROM registry.fedoraproject.org/fedora:40
RUN dnf -y install python3-pip rust cargo openssl-devel gcc-c++ libglvnd-glx
RUN pip install imaginairy && pip check
EXPOSE 8000
CMD ["aimg", "server"]
EOF
$ podman build ~/imaginary -t fedora-imaginairy

Run the server:

$ podman run --device nvidia.com/gpu=all \
             -p 8000:8000 \
             --rm fedora-imaginairy

Visit the web UI at localhost:8000 and start generating images based on prompts.

Please note, the models are downloaded into the container and will be deleted after the container is stopped.

Active Releases

The following unofficial repositories are provided as-is by owner of this project. Contact the owner directly for bugs or issues (IE: not bugzilla).

Release	Architectures	Repo Download
Fedora 38	aarch64 (45), x86_64 (72)	Fedora 38 (0 downloads)
Fedora 39	aarch64 (62), x86_64 (209)	Fedora 39 (90 downloads)
Fedora 40	aarch64 (38), x86_64 (425)	Fedora 40 (115 downloads)
Fedora rawhide	aarch64 (56), x86_64 (65)	Fedora rawhide (38 downloads)