Installation
TOC
Prerequisites
- NvidiaDriver v565+
- Kubernetes v1.32+
- ACP v4.1+
- Cluster administrator access to your ACP cluster
- CDI must be enabled in the underlying container runtime (such as containerd, see Enable CDI)
- DRA and corresponding API groups must be enabled(see Enable DRA).
Procedure
Installing Nvidia driver in your gpu node
Prefer to Installation guide of Nvidia Official website
Installing Nvidia Container Runtime
Prefer to Installation guide of Nvidia Container Toolkit
Downloading Cluster plugin
Alauda Build of NVIDIA DRA Driver for GPUs cluster plugin can be retrieved from Customer Portal.
Please contact Consumer Support for more information.
Uploading the Cluster plugin
For more information on uploading the cluster plugin, please refer to Uploading Cluster Plugins
Installing Alauda Build of NVIDIA DRA Driver for GPUs
-
Add label "nvidia-device-enable=pgpu-dra" in your GPU node for
nvidia-dra-driver-gpu-kubelet-pluginschedule.INFONote: On the same node, you can only set one of the following labels:
gpu=on,nvidia-device-enable=pgpu, ornvidia-device-enable=pgpu-dra. -
Go to the
Administrator->Marketplace->Cluster Pluginpage, switch to the target cluster, and then deploy theAlauda Build of NVIDIA DRA Driver for GPUsCluster plugin.
Verify DRA setup
-
Check DRA driver and DRA controller pods:
You should get results similar to:
-
Verify ResourceSlice objects:
For GPU nodes, you should see output similar to:
-
Deploy workloads with DRA.
INFONote:Fill in the
selectorfield of the followingResourceClaimTemplateresource according to your specific GPU model.You can use common expression language (CEL) to select devices based on specific attributes.Create spec file:
Apply spec:
Obtain output of container in the pod:
The output is expected to show the GPU UUID from the container. Example: