GstInference - Supported backends - NCSDK
Make sure you also check GstInference's companion project: R2Inference |
GstInference |
---|
Introduction |
Getting started |
Supported architectures |
InceptionV1 InceptionV3 YoloV2 AlexNet |
Supported backends |
Caffe |
Metadata and Signals |
Overlay Elements |
Utils Elements |
Legacy pipelines |
Example pipelines |
Example applications |
Benchmarks |
Model Zoo |
Project Status |
Contact Us |
|
The NCSDK Intel® Movidius™ Neural Compute SDK (Intel® Movidius™ NCSDK) enables deployment of deep neural networks on compatible devices such as the Intel® Movidius™ Neural Compute Stick. The NCSDK includes a set of software tools to compile, profile, and validate DNNs (Deep Neural Networks) as well as APIs on C/C++ and Python for application development.
The NCSDK has two general usages:
- Profiling, tuning, and compiling a DNN models.
- Prototyping user applications, that run accelerated with a neural compute device hardware, using the NCAPI.
Installation
You can install the NCSDK on a system running Linux directly, downloading a Docker container, on a virtual machine or using a Python virtual environment. All the possible installation paths are documented on the official installation guide.
We also provide an installation guide with troubleshooting on the Intel Movidius Installation wiki page
Tools
mvNCCheck
Checks the validity of a Caffe or TensorFlow model on a neural compute device. The check is done by running an inference on both the device and in software and then comparing the results to determine a if the network passes or fails. This tool works best with image classification networks. You can check all the available options on the official documentation.
For example lets test the googlenet caffe model downloaded by the ncappzoo repo:
mvNCCheck -w bvlc_googlenet.caffemodel -i ../../data/images/nps_electric_guitar.png -s 12 -id 546 deploy.prototxt -S 255 -M 110
- -w indicates the weights file
- -i the input image
- -s the number of shaves
- -id the expected label id for the input image (you can find the id for any imagenet model here)
- -S is the scaling sice
- -M is the substracted mean after scaling
Most of these parameters are available from the model documentation. The command produces the following result:
lob generated USB: Transferring Data... USB: Myriad Execution Finished USB: Myriad Connection Closing. USB: Myriad Connection Closed. Result: (1000,) 1) 546 0.99609 2) 402 0.0038853 3) 420 8.9228e-05 4) 327 0.0 5) 339 0.0 Expected: (1000,) 1) 546 0.99609 2) 402 0.0039177 3) 420 9.0837e-05 4) 889 1.2875e-05 5) 486 5.3644e-06 ------------------------------------------------------------ Obtained values ------------------------------------------------------------ Obtained Min Pixel Accuracy: 0.0032552085031056777% (max allowed=2%), Pass Obtained Average Pixel Accuracy: 7.264380030846951e-06% (max allowed=1%), Pass Obtained Percentage of wrong values: 0.0% (max allowed=0%), Pass Obtained Pixel-wise L2 error: 0.00011369892179413199% (max allowed=1%), Pass Obtained Global Sum Difference: 7.236003875732422e-05 ------------------------------------------------------------
mvNCCompile
Compiles a network and weights files from Caffe or TensorFlow models into a graph file that is compatible with the NCAPI.
For example, giving a caffe model (bvlc_googlenet.caffemodel) and a network description (deploy.prototxt):
mvNCCompile -w bvlc_googlenet.caffemodel -s 12 deploy.prototxt
This command will output the graph and output_expected.npy files, that will be used later on the API
mvNCProfile
Compiles a network, runs it on a connected neural compute device, and outputs profiling info on the terminal and on an HTML file. The profiling data contains layer performance and execution time of the model. The html version of the report also contains a graphical representation of the model. For example, to profile the googlenet network:
mvNCProfile deploy.prototxt -s 12
The output looks like:
mvNCProfile v02.00, Copyright @ Intel Corporation 2017 ****** WARNING: using empty weights ****** Layer inception_3b/1x1 forced to im2col_v2, because its output is used in concat /usr/local/bin/ncsdk/Controllers/FileIO.py:65: UserWarning: You are using a large type. Consider reducing your data sizes for best performance Blob generated USB: Transferring Data... Time to Execute : 115.95 ms USB: Myriad Execution Finished Time to Execute : 98.03 ms USB: Myriad Execution Finished USB: Myriad Connection Closing. USB: Myriad Connection Closed. Network Summary Detailed Per Layer Profile Bandwidth time # Name MFLOPs (MB/s) (ms) ======================================================================= 0 data 0.0 55877.1 0.005 1 conv1/7x7_s2 236.0 2453.0 5.745 2 pool1/3x3_s2 1.8 1346.8 1.137 3 pool1/norm1 0.0 711.3 0.538 4 conv2/3x3_reduce 25.7 471.6 0.828 5 conv2/3x3 693.6 305.9 11.957 6 conv2/norm2 0.0 771.6 1.488 7 pool2/3x3_s2 1.4 1403.3 0.818 8 inception_3a/1x1 19.3 554.6 0.560 9 inception_3a/3x3_reduce 28.9 458.3 0.703 10 inception_3a/3x3 173.4 319.2 4.716 11 inception_3a/5x5_reduce 4.8 1035.8 0.283 12 inception_3a/5x5 20.1 716.0 0.872 13 inception_3a/pool 1.4 648.5 0.443 14 inception_3a/pool_proj 9.6 657.0 0.455 15 inception_3b/1x1 51.4 446.0 0.999 16 inception_3b/3x3_reduce 51.4 445.1 1.001 17 inception_3b/3x3 346.8 261.0 8.228 18 inception_3b/5x5_reduce 12.8 879.9 0.453 19 inception_3b/5x5 120.4 536.8 2.510 20 inception_3b/pool 1.8 678.7 0.564 21 inception_3b/pool_proj 25.7 631.2 0.656 22 pool3/3x3_s2 0.8 1213.8 0.591 23 inception_4a/1x1 36.1 364.0 0.977 24 inception_4a/3x3_reduce 18.1 490.3 0.545 25 inception_4a/3x3 70.4 306.0 2.187 26 inception_4a/5x5_reduce 3.0 763.2 0.254 27 inception_4a/5x5 7.5 455.1 0.414 28 inception_4a/pool 0.8 604.6 0.297 29 inception_4a/pool_proj 12.0 613.0 0.389 30 inception_4b/1x1 32.1 349.6 0.995 31 inception_4b/3x3_reduce 22.5 385.6 0.780 32 inception_4b/3x3 88.5 280.9 2.888 33 inception_4b/5x5_reduce 4.8 576.7 0.373 34 inception_4b/5x5 15.1 339.7 0.885 35 inception_4b/pool 0.9 617.8 0.310 36 inception_4b/pool_proj 12.8 579.5 0.438 37 inception_4c/1x1 25.7 415.5 0.762 38 inception_4c/3x3_reduce 25.7 410.3 0.771 39 inception_4c/3x3 115.6 288.2 3.462 40 inception_4c/5x5_reduce 4.8 574.7 0.374 41 inception_4c/5x5 15.1 339.7 0.885 42 inception_4c/pool 0.9 615.3 0.311 43 inception_4c/pool_proj 12.8 577.3 0.440 44 inception_4d/1x1 22.5 382.9 0.786 45 inception_4d/3x3_reduce 28.9 489.2 0.679 46 inception_4d/3x3 146.3 402.9 2.981 47 inception_4d/5x5_reduce 6.4 728.9 0.305 48 inception_4d/5x5 20.1 408.5 0.979 49 inception_4d/pool 0.9 629.5 0.304 50 inception_4d/pool_proj 12.8 630.8 0.403 51 inception_4e/1x1 53.0 297.7 1.531 52 inception_4e/3x3_reduce 33.1 277.0 1.294 53 inception_4e/3x3 180.6 290.3 4.902 54 inception_4e/5x5_reduce 6.6 492.8 0.466 55 inception_4e/5x5 40.1 378.6 1.322 56 inception_4e/pool 0.9 633.0 0.312 57 inception_4e/pool_proj 26.5 446.8 0.731 58 pool4/3x3_s2 0.4 1245.4 0.250 59 inception_5a/1x1 20.9 616.4 0.786 60 inception_5a/3x3_reduce 13.0 569.7 0.582 61 inception_5a/3x3 45.2 570.7 1.786 62 inception_5a/5x5_reduce 2.6 329.2 0.391 63 inception_5a/5x5 10.0 459.6 0.601 64 inception_5a/pool 0.4 531.7 0.146 65 inception_5a/pool_proj 10.4 514.9 0.546 66 inception_5b/1x1 31.3 607.0 1.133 67 inception_5b/3x3_reduce 15.7 612.0 0.625 68 inception_5b/3x3 65.0 606.1 2.366 69 inception_5b/5x5_reduce 3.9 375.0 0.410 70 inception_5b/5x5 15.1 475.0 0.866 71 inception_5b/pool 0.4 531.7 0.146 72 inception_5b/pool_proj 10.4 513.7 0.547 73 pool5/7x7_s1 0.1 405.5 0.236 74 loss3/classifier 0.0 2559.7 0.764 75 prob 0.0 10.0 0.192 --------------------------------------------------------------------------------------------- Total inference time 93.66 --------------------------------------------------------------------------------------------- Generating Profile Report 'output_report.html'...
API
You can find the full documentation of the C API here and the Python API here. Gst-Inference uses only the C API and R2Inference takes care of devices, graphs, models and fifos. Because of this, we will only take a look at the options that you can change when using the C API through R2Inference.
R2Inference changes the options of the framework via the "IParameters" class. First you need to create an object:
r2i::RuntimeError error; std::shared_ptr<r2i::IParameters> parameters = factory->MakeParameters (error);
Then call the "Set" or "Get" virtual functions:
parameters->Set(<option>, <value>) parameters->Get(<option>, <value>)
Device Options
All the device options from the API are read only.
Option | Value | Description |
---|---|---|
NC_RO_DEVICE_THERMAL_STATS | float array | An array of lenght NC_RO_DEVICE_THERMAL_STATS with the temperature history of the device on Celsius. |
NC_RO_THERMAL_THROTTLING_LEVEL | 0,1,2 |
|
NC_RO_DEVICE_STATE | ncDeviceState_t enum value |
|
NC_RO_DEVICE_CURRENT_MEMORY_USED | positive int | Memory used on the device. |
NC_RO_DEVICE_MEMORY_SIZE | positive int | Total memory available on the device. |
NC_RO_DEVICE_MAX_FIFO_NUM | positive int | Max number of fifos. |
NC_RO_DEVICE_ALLOCATED_FIFO_NUM | positive int | Number of fifos currently allocated. |
NC_RO_DEVICE_MAX_GRAPH_NUM | positive int | Max number of graphs. |
NC_RO_ALLOCATED_GRAPH_NUM | positive int | Number of graphs currently allocated. |
NC_RO_DEVICE_OPTION_CLASS_LIMIT | positive int | Highest option class supported. |
NC_RO_DEVICE_FW_VERSION | [major, minor, hardware type, build number] | Device firmware version. |
NC_RO_DEVICE_HW_VERSION | ncDeviceHwVersion_t enum value |
|
NC_RO_DEVICE_MVTENSOR_VERSION | [major, minor] | mvtensor library version. |
NC_RO_DEVICE_NAME | string | Device name. |
Fifo Options
Fifo options are read only if they begin with the prefix NC_RO_FIFO and read/write if they begin with NC_RW_FIFO. Most of the R/W options on the FIFO can only be modified between creation and allocation, and R2Inference does both in a single method (Engine->Start()), so it is impossible to write on these options.
Option | Value | Description |
---|---|---|
NC_RW_FIFO_TYPE | ncFifoType_t enum value |
|
NC_RW_FIFO_DATA_TYPE | ncFifoDataType_t enum value |
|
NC_RO_FIFO_CAPACITY | positive int | FIFO queue size. |
NC_RO_FIFO_READ_FILL_LEVEL | positive int | Elements on an output FIFO queue. |
NC_RO_FIFO_WRITE_FILL_LEVEL | positive int | Elements on an input FIFO queue. |
NC_RO_FIFO_GRAPH_TENSOR_DESCRIPTOR | ncTensorDescriptor_t struct | Shape of the tensor on the FIFO. |
NC_RO_FIFO_STATE | ncFifoState_t enum value |
|
NC_RO_FIFO_NAME | string | FIFO name. |
NC_RO_FIFO_ELEMENT_DATA_SIZE | positive int | Size in bits of the FIFO elements. |
NC_RW_FIFO_HOST_TENSOR_DESCRIPTOR | ncTensorDescriptor_t struct | Shape of the tensor on application. |
Global Options
Pay special attention to the log level enumeration, because it is ordered counter intuitively. 1 is actually the highest log level, 4 is the lowest and 0 the default.
Option | Value | Description |
---|---|---|
NC_RW_LOG_LEVEL | ncLogLevel_t enum value |
|
NC_RO_API_VERSION | [major, minor, hotfix, release] | API version |
Graph Options
Option | Value | Description |
---|---|---|
NC_RO_GRAPH_STATE | ncGraphState_t enum value |
|
NC_RO_GRAPH_TIME_TAKEN | positive floats | Time per layer for the last inference in milliseconds. |
NC_RO_GRAPH_INPUT_TENSOR_DESCRIPTORS | ncTensorDescriptor_t struct | Array of graph inputs. |
NC_RO_GRAPH_OUTPUT_TENSOR_DESCRIPTORS | ncTensorDescriptor_t struct | Array of graph outputs. |
NC_RO_GRAPH_DEBUG_INFO | string | Debug information. |
NC_RO_GRAPH_NAME | string | Graph name. |
NC_RO_GRAPH_OPTION_CLASS_LIMIT | positive int | The highest option class supported. |
NC_RO_GRAPH_VERSION | [major, minor] | The version of the compiled graph. |
NC_RO_GRAPH_TIME_TAKEN_ARRAY_SIZE | positive int | Length of the time array (number of layers). |