NVIDIA Jetson TX2 - Argus vs V4L2 Latency Analysis

From RidgeRun Developer Wiki


Introduction to NVIDIA Jetson TX2 - Argus vs V4L2 Latency Analysis

In this wiki, a practical example of the measurement of latency when using libargus and V4L2 is shown. This wiki assumes that the reader is familiar with the Jetson TX2 VI (Video Input) concepts outlined in NVIDIA Jetson TX2 - Video Input Timing Concepts and the proposed measurement techniques in NVIDIA Jetson TX2 - VI Latency Measurement Techniques.

Jetson TX2 latency measurement

Experiment setup

The experiment consisted of an OV5693 capturing at 1280x720@120 fps. Each captured frame was labeled as metadata_timestamp-system_timestamp.jpg, so that metadata_timestamp corresponds to the timestamp extracted with getSensorTimestamp, and system timestamp is the timestamp (clock monotonic) captured after the sensor timestamp right after the frame metadata is retrieved.


During camera capture, a kernel module captures the system timestamp (clock monotonic) and then turns on a GPIO (initially off) that has a LED connected within the field of view of the camera. The timestamp is printed after the GPIO has been turned on to avoid extra delays between the timestamp extraction and the LED turn on.


The board was power-cycled per every iteration of the experiment.


It is possible to associate the first frame where the led was turned on, whose name includes the metadata_timestamp (t1) and the system timestamp (t2) with the timestamp printed by the kernel module (t0), as shown in figure 1. In summary:

  • t0 is the system timestamps acquired at the kernel module when turning on the LED.
  • t1 is the metadata_timestamps, that is, the timestamps obtained with getSensorTimestamps() and coming out of Argus (from either the ISP or the RTCPU).
  • t2 is the system timestamps obtained within the Argus application just right at the moment we extract t1.


Before performing the test, the following assumptions were made:

  • t1 is a timestamp set during ISP processing, after image capture. It shall be greater than the RTCPU timestamps (frame acquisition time).
  • t1 shall be greater than t0 since it includes post-capture processing.
  • t0 - t1 shall be always negative.
  • t2 - t0 shall be always positive and provide a measurement of the total time since the sensor integration until its acquisition at the user space level.


Figure 1: Latency measurement setup
Figure 1: Latency measurement setup

The values of the deltas were computed as described below:

  • delta0: t0 - t1 (ns)
  • delta1: t1 - t2 (ns)
  • delta2: t0 - t2 (ns)


Some of the initial assumptions had to be reviewed since there were non-consistent results in the time stamping of the frames, meaning that there might be different clock sources for the frame's time stamps as it passes through the different modules of the capture interface (starting at the VI and ending at the userspace application).

Description of the experiment

The experiment consisted of enabling tracing, using RTCPU reference for led timestamp (t0) and frame timestamp (t1) (without system clock offset adjustment), so that tracing VI information could be associated with the frame of interest (in which the led turns on). For extracting the frame ID, it was found that getCaptureId() function was misleading, instead, the value was extracted of status->frame directly in the vi_notify.c file.


Some additional VI's concepts to keep in mind:

  • According to the Parker Technical Reference Manual, ISP is physically separated from VI, and works in a different clock domain, often running faster, then VI must buffer data for ISP. Also, it mentions that there is a FIFO in the ISP that stores a small number of VI packets.
  • According to L4T multimedia API reference, getSensorTimestamp() (t1) returns the time that the first data from the capture arrives from the sensor.
  • According to NVIDIA's, getSensorTimestamp() (t1) function call returns the software shutter event time.
  • VI has a notification engine that logs events timestamped by RTCPU clock, and can be accessed in CPU by enabling tracing. There are four events of interest for this experiment:
  1. ISPBUF_FS: Occurs when the Frame Start event that was detected in CHANSEL reaches the ISPBUF.
  2. CHANSEL_PXL_SOF: First pixel of frame.
  3. CHANSEL_PXL_EOF: Last pixel of frame.
  4. ISPBUF_FE: Occurs when the Frame End event that was detected in CHANSEL reaches the ISPBUF.

Results

V4l2 path vs ISP path timing comparison

An additional delta, called delta3, is defined and will be the time elapsed since the ATOMP_FE(for the v4l2 case) or ISPBUF_FE(for argus case) event to the moment when the frame data is available in user space. With this delta, it should be possible to get an approximate delay of the ISP path with respect to the v4l2 path.

Finally, delta4, the time elapsed since the PXL_SOF event to the ATOMP_FE (for the v4l2 case) or ISPBUF_FE (for argus case) event to have an idea of the time elapsed between the moment when the first pixel packet arrives to VI to when the frame data has finished arriving to ISP or ATOMP in the v4l2 case.

A total of 10 iterations of the experiment were done with a resolution of 2592x1458@30fps. The average results are summarized below:

  • Summary of argus results
Delta Value (ms)
delta1 40.9
delta2 44.2
delta3 16.4
delta4 24.5
  • Summary of v4l2 results
Delta Value (ms)
delta1 34.4
delta2 48.2
delta3 9.9
delta4 24.5

Conclusion

The difference between the delta3 of both paths is approximately 6.5ms, it might be the case that it is the actual ISP latency. The latency added previous to the ISP processing is of 24.5ms.

Related links

  1. NVIDIA Jetson TX2 - Video Input Timing Concepts
  2. NVIDIA Jetson TX2 - VI Latency Measurement Techniques


For direct inquiries, please refer to the contact information available on our Contact page. Alternatively, you may complete and submit the form provided at the same link. We will respond to your request at our earliest opportunity.


Links to RidgeRun Resources and RidgeRun Artificial Intelligence Solutions can be found in the footer below.