Exploring Rosetta element for character recognition
Getting started with AI on NXP i.MX8M Plus RidgeRun documentation is currently under development. |
Rosetta
Rosetta element is intended to make the preprocess and postprocess, related to the Rosetta model inference. The input and output tensors are going to be processed in order to extract the information from the model and finally extract the character from an image.
Preprocess
There are some required characteristics that have to match the input tensors, those are explained below:
1) Convert the input image into a grayscale one.
2) Normalize the image as follows:
Where is the output image, is the input image, and for this particular case.
3) Resize the image to be 100x32 pixels.
Postprocess
The output tensor of Rosetta is 1x26x37, which means, one phrase of 26 characters and each character has the probability to be from [A - Z] or [0 - 9] in the English Alphabet.
An array of the maximum probability for each character is extracted from the Rosetta tensor, and this is going to be processed with the following algorithm explained in C++:
Imagine that from an image, the array of the positions where the max probabilities are looks like the following list:
maxIndixes = [23 23 0 0 0 0 0 0 0 0 11 0 0 0 24 0 0 19 0 29 0 0 0 0 0 11]
This array is postprocessed in the following algorithm written in C++:
string concatenateChars(int maxIndixes[26]) { // Characters that Rosetta can predicts: string chars[38] = {"_", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"}; string final_phrase = " "; // The for loop that iterates the positions where the max probabilities are: for (int i = 0; i < 26; ++i) { if (maxIndixes[i] != 0 && !(i > 0 && (maxIndixes[i-1] == maxIndixes[i]))) { final_phrase += chars[maxIndixes[i]]; } } return final_phrase; }
The result of processing the maxIndixes array is manisa.