NVIDIA Jetson TX2 - Video Input Timing Concepts
This wiki aims to explain the main concepts required to understand the Jetson TX2 video input system inner workings related to camera image capture latency. To read more about the latency measurement techniques see NVIDIA Jetson TX2 - VI Latency Measurement Techniques
Jetson TX2 Video Input
Also called VI or VI4, it is the module in charge of capturing the pixel data provided by the CSI and route it to the main memory where it can be accessed from user space applications. The Parker technical reference manual [1] dedicates chapter 27 to the VI.
Latency Relevant Blocks Description
Figure 1 shows the top level diagram of the VI.
For latency evaluation purposes, the CHANSEL, ATOMP, NOTIFY, RTCPU, ISP and ISPBUF blocks are important.
CHANSEL
VI has 12 channels. A channel is a virtual pipeline on the VI. The CHANSEL block is responsible for assigning each pixel packet to the corresponding channel and also it keeps a per-frame state of each channel. It triggers some notifications to the NOTIFY block on certain channel events. For latency, we are interested on the following events:
- CHANSEL_PXL_SOF: occur at the beginning of channel-delimited frames.
- CHANSEL_PXL_EOF: occur at the end of channel-delimited frames.
- CHANSEL_LOAD_FRAMED: [1] affirms that this event is emitted when a LOAD command is received for a channel while that channel is currently in a frame. Also, [2] mentions that this event is emitted to process the frame and [3] mentions that this event means that the frame has been loaded by the VI engine.
ATOMP
Is the memory atom packer, it takes pixel packets and produces memory requests. It also sends the write acknowledgments for end of frame and end of line events to NOTIFY. This block triggers the following events:
- ATOMP_FS: occur when the NVCSI (the TX2 CSI-2 host implementation) FS that was matched in CHANSEL for a channel reaches the ATOMP.
- ATOMP_FE: occur when the NVCSI FE that was matched in CHANSEL for a channel reaches the ATOMP.
NOTIFY
It is the notification engine used throughout the VI. Receives event packet and passes them to the real time CPU (RTCPU) where they get timestamped. Each event packet has its corresponding channel number and frame number.
RTCPU
Receives notifications from various sources and time-stamp them. Then these timestamped notifications get enqueued into an internal buffer that is accessible to other processors through DMA. The RTCPU uses the TSC clock time to set the timestamps. More information on the TSC clock can be found on the Timestamping System Clock section of this wiki. In [1] it is mentioned that the RTCPU is a Cortex-R5 camera processor.
ISP
The image signal processor also called ISP or ISP4, can be used to perform debayer and/or image correction operations. This block can be bypassed depending on the capture mechanism used. According to [1] the ISP is physically distant from the VI and runs on a different clock domain (ISPs clock runs faster), then the VI has to buffer the data using the ISPBUF.
ISPBUF
Buffers the pixel data from the VI to the ISP. This block emits two events to NOTIFY:
- ISPBUF_FS: Occur when the Frame Start event that was detected in CHANSEL reaches the ISPBUF.
- ISPBUF_FE: Occur when the Frame End event that was detected in CHANSEL reaches the ISPBUF.
Additionally, to the blocks described above the memory controller is very important when the ISP is bypassed.
Memory Controller (MC)
Runs on a different clock domain from the VI, and is the module to which the VI outputs formatted pixel data to main memory.
Timestamping System Clock
The clock used to timestamp the events emitted by the VI is TSC (TimeStamp Counter). According to [4], the TSC is started during boot around 7 seconds before the kernel starts. Often, it is desired to measure the latency between the start of frame event on the VI to the moment when the frame is available on a userspace application.
From userspace, applications usually obtain the current system time using clock_gettime() with the desired clock as an argument [5]. Since the events generated on the VI are timestamped using TSC, it is a good idea to use this same clock to obtain the timestamp when the frame is available on userspace, and according to [4] clock_gettime() with CLOCK_MONOTONIC should be used for this purpose since CLOCK_MONOTONIC is based on TSC and runs at the same rate, however, there is a known issue with this approach [6]: given that TSC starts before the kernel boot and CLOCK_MONOTONIC is started during kernel boot, CLOCK_MONOTONIC is usually behind TSC by around 5s, and this offset changes from one boot to the next. This issue has been observed on Jetpack 3.2.1 and a workaround has been added to Jetpack 4.2 in order to adjust the CHANSEL_PXL_SOF timestamp to be in sync with CLOCK_MONOTONIC.
Since this offset adjustment workaround might add some error to the latency estimation, an ideal technique to measure the latency would attempt to obtain TSC on userspace and avoid any adjustments.
More information on TSC can be found in [1] chapter 7.
Capture Paths
Depending on the mechanism used to capture images, the path of the pixel data is different. There are two paths often used:
Path with ISP
- CSI->VI->ISP->MC, the VI passes the image data to the image signal processor and it passes it to the memory controller which puts data into the main memory. This path is used when an application uses libargus, the nvgstcapture application and GStreamer pipelines with nvcamerasrc or nvarguscamerasrc. This path traces will show CHANSEL and ISPBUF notifications.
Path without ISP
- CSI->VI->MC, the vi passes the image data directly to the memory controller and from there we can access it in main memory. As the ISP is bypassed on this path, the image is passed as comes from the sensor to the main memory without further processing.
This path is used when the capture is started with v4l2-ctl, a v4l2 application and GStreamer pipeline with v4l2src.
Events
This path will show CHANSEL and ATOMP notifications. The order of events triggered on the VI and reported by the RTCPU traces is the following:
- ATOMP_FS
- CHANSEL_PXL_SOF
- CHANSEL_LOAD_FRAMED
- CHANSEL_PXL_EOF
- ATOMP_FE
Currently, it is unclear if CHANSEL_LOAD_FRAMED indicates that VI has received the frame since according to [2], the CHANSEL_LOAD_FRAMED event is supposed to be emitted until after an FE event. However, the traces show the flow above consistently for successful video capture. Another question that related to the actual CHANSEL_LOAD_FRAMED meaning is if CHANSEL_PXL_EOF and ATOMP_FE are actually being generated by the last pixel packet arrival or if they are being generated by a timeout since it has been observed that the delay between the FS and FE events is related to the framerate.
Using a v4l2-application, the captured buffers bring a timestamp along with the frame data. The buffer timestamp corresponds to the vi_timestamp of the CHANSEL_PXL_SOF event.
See also
- NVIDIA Parker Series SoC Technical Reference Manual. Version v1.0p, June 21, 2017.
- https://devtalk.nvidia.com/default/topic/1037809/jetson-tx2/jetpack3-2-1-tx2-csi-mipi-can-only-get-the-the-first-frame-of-image-but-tx1-works-fine-/post/5272711/#5272711
- https://devtalk.nvidia.com/default/topic/1066027/jetson-tx2/2-lane-csi-custom-fpga-with-rgb888-output-for-l4t-r28-2-1/post/5398656/#5398656
- https://devtalk.nvidia.com/default/topic/1062511/jetson-agx-xavier/clocks-linux-system-time-vs-tsc-vs-rtc/post/5381131/#5381131
- https://linux.die.net/man/3/clock_gettime
- https://devtalk.nvidia.com/default/topic/1038131/jetson-tx2/vi-capture-system-time-offset/1
Related links
For direct inquiries, please refer to the contact information available on our Contact page. Alternatively, you may complete and submit the form provided at the same link. We will respond to your request at our earliest opportunity.
Links to RidgeRun Resources and RidgeRun Artificial Intelligence Solutions can be found in the footer below.