Issue with NvRmGpuLibOpen failed in NVIDIA Docker on a Yocto image

From RidgeRun Developer Wiki

Follow Us On Twitter LinkedIn Email Share this page


Introduction

Docker containers are platform-agnostic, but also hardware-agnostic. This presents a problem when using specialized hardware such as NVIDIA GPUs which require kernel modules and user-level libraries to operate.  As a result, Docker does not natively support NVIDIA GPUs within containers. nvidia-docker is essentially a wrapper around the docker command that transparently provisions a container with the necessary components to execute code on the GPU. It is only absolutely necessary when using nvidia-docker run to execute a container that uses GPUs. (Information obtained from NVIDIA Developer documentation).

Overview

After adding the NVIDIA Docker recipe to the Yocto build:

IMAGE_INSTALL:append = " nvidia-docker"

To verify that Docker can create a container with GPU support, the following command was executed.

docker run --rm --runtime=nvidia --gpus all ubuntu:22.04 bash -lc 'echo NVIDIA_RUNTIME_OK'

However, it returns the following error:

root@hadron-gmsl:~# docker run --rm --runtime=nvidia --gpus all ubuntu:22.04 bash -lc 'echo NVIDIA_RUNTIME_OK'
[ 1894.040552] docker0: port 1(vethaf61436) entered blocking state
[ 1894.040562] docker0: port 1(vethaf61436) entered disabled state
[ 1894.040701] device vethaf61436 entered promiscuous mode
[ 1894.040743] kauditd_printk_skb: 10 callbacks suppressed
[ 1894.040747] audit: type=1700 audit(1772658752.001:117): dev=vethaf61436 prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
[ 1894.040852] audit: type=1300 audit(1772658752.001:117): arch=c00000b7 syscall=206 success=yes exit=40 a0=c a1=40011786f0 a2=28 a3=0 items=0 ppid=1 pid=1026 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="dockerd" exe="/usr/bin/dockerd" subj=kernel key=(null)
[ 1894.040858] audit: type=1327 audit(1772658752.001:117): proctitle=2F7573722F62696E2F646F636B657264002D480066643A2F2F002D2D636F6E7461696E6572643D2F72756E2F636F6E7461696E6572642F636F6E7461696E6572642E736F636B
[ 1894.389813] audit: type=1334 audit(1772658752.350:118): prog-id=43 op=LOAD
[ 1894.390398] audit: type=1334 audit(1772658752.350:119): prog-id=44 op=LOAD
[ 1894.390420] audit: type=1300 audit(1772658752.350:119): arch=c00000b7 syscall=280 success=yes exit=16 a0=5 a1=400018d8b0 a2=78 a3=0 items=0 ppid=1764 pid=1775 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="docker-runc" exe="/usr/bin/runc" subj=kernel key=(null)
[ 1894.390434] audit: type=1327 audit(1772658752.350:119): proctitle=2F7573722F62696E2F646F636B65722D72756E63002D2D726F6F74002F7661722F72756E2F646F636B65722F72756E74696D652D72756E632F6D6F6279002D2D6C6F67002F72756E2F636F6E7461696E6572642F696F2E636F6E7461696E6572642E72756E74696D652E76322E7461736B2F6D6F62792F643238613062643935
[ 1894.390505] audit: type=1334 audit(1772658752.350:120): prog-id=45 op=LOAD
[ 1894.390516] audit: type=1300 audit(1772658752.350:120): arch=c00000b7 syscall=280 success=yes exit=18 a0=5 a1=400018d640 a2=78 a3=0 items=0 ppid=1764 pid=1775 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="docker-runc" exe="/usr/bin/runc" subj=kernel key=(null)
[ 1894.390520] audit: type=1327 audit(1772658752.350:120): proctitle=2F7573722F62696E2F646F636B65722D72756E63002D2D726F6F74002F7661722F72756E2F646F636B65722F72756E74696D652D72756E632F6D6F6279002D2D6C6F67002F72756E2F636F6E7461696E6572642F696F2E636F6E7461696E6572642E72756E74696D652E76322E7461736B2F6D6F62792F643238613062643935
��RmDeInit completed successfully
RmDeInit completed successfully
��[ 1895.021925] docker0: port 1(vethaf61436) entered disabled state
[ 1895.023030] device vethaf61436 left promiscuous mode
[ 1895.023050] docker0: port 1(vethaf61436) entered disabled state
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: NvRmMemInitNvmap failed with No such file or directory
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
NvRmMemInitNvmap failed with No such file or directory
356: Memory Manager Not ��RmDeInit completed su��supported



****NvRmMemMgrInit failed**** error type: 196626


��ccessfully
��libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
NvRmMemInitNvmap failed with No such file or directory
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
Auto-detected mode as 'legacy'
NvRmMemInitNvmap failed with No such file or directory
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
NvRmMemInitNvmap failed with No such file or directory
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
NvRmMemInitNvmap failed with No such file or directory
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
nvidia-container-cli: detection error: nvml error: unknown error: unknown.

The error indicates an issue where the system is trying to access NvRmMemInitNvmap, which means Initialize the NVMap memory manager in the NVIDIA Resource Manager. In simpler terms, it refers to initializing the NVIDIA memory allocator (nvmap). This component is a kernel-level driver used on NVIDIA Tegra (Jetson) platforms to manage shared memory between the CPU, GPU, and other hardware engines.

This issue occurs because the nvmap device is not loaded on the board.

root@hadron-gmsl:~# ls -l /dev/nvmap
ls: /dev/nvmap: No such file or directory

Solution

It is important to verify that the required device tree overlays are included in the PLUGIN_MANAGER_OVERLAYS variable used during the flashing process.

In the flashvars file, ensure that the PLUGIN_MANAGER_OVERLAYS variable is defined correctly so that it expands to the overlays required by the board configuration:

PLUGIN_MANAGER_OVERLAYS="@PLUGIN_MANAGER_OVERLAYS@"

If these definition are not included, they must be added to the flashvars file.

After confirming that these overlays are present in flashvars. they should also be defined in the machine configuration. This configuration can typically be found in a path such as:

meta-<custom-machine>/conf/machine/<machine>.conf

or in the machine configuration corresponding to the Jetson platform being used.

Add the following configuration:

TEGRA_PLUGIN_MANAGER_OVERLAYS = "\
	     tegra234-carveouts.dtbo \
	     tegra-optee.dtbo \
	     tegra234-p3768-0000+p3767-0000-dynamic.dtbo \
         "
PLUGIN_MANAGER_OVERLAYS = "${@','.join(d.getVar('TEGRA_PLUGIN_MANAGER_OVERLAYS').split())}"

In JetPack 6 / L4T r36, the device tree used by the kernel is not just the base DTB, but the DTB combined with several overlays applied by the Plugin Manager (typically stored in QSPI and applied by the bootloader). If those overlays are not applied, the resulting device tree may be incomplete—especially regarding reserved memory, carveouts, and security-related memory regions. The nvmap driver relies on this memory layout information to initialize correctly, so if these definitions are missing, nvmap may fail during initialization and /dev/nvmap may not be created.

The required overlays provide key pieces of this configuration. tegra234-carveouts.dtbo defines reserved memory carveouts used by NVIDIA components such as GPU and multimedia engines. tegra-optee.dtbo describes secure memory regions used by OP-TEE, ensuring the secure world memory does not conflict with other reservations. Finally, tegra234-p3768-0000+p3767-0000-dynamic.dtbo contains board-specific dynamic adjustments applied by the plugin manager for the specific SOM and carrier board combination. Together, these overlays ensure the final device tree correctly describes the system memory layout required for nvmap and other platform components to function properly.



For direct inquiries, please refer to the contact information available on our Contact page. Alternatively, you may complete and submit the form provided at the same link. We will respond to your request at our earliest opportunity.


Links to RidgeRun Resources and RidgeRun Artificial Intelligence Solutions can be found in the footer below.