Exploring Rosetta element for character recognition

Rosetta

Rosetta element is intended to make the preprocess and postprocess, related to the Rosetta model inference. The input and output tensors are going to be processed in order to extract the information from the model and finally extract the character from an image.

Preprocess

There are some required characteristics that have to match the input tensors, those are explained below:

1) Convert the input image into a grayscale one.

2) Normalize the image as follows:

$M(x,y)=\alpha \cdot I(x,y)+\beta$

Where $M(x,y)$ is the output image, $I(x,y)$ is the input image, $\alpha =1/127.5$ and $\beta =-1$ for this particular case.

3) Resize the image to be 100x32 pixels.

Postprocess

The output tensor of Rosetta is 1x26x37, which means, one phrase of 26 characters and each character has the probability to be from [A - Z] or [0 - 9] in the English Alphabet.

An array of the maximum probability for each character is extracted from the Rosetta tensor, and this is going to be processed with the following algorithm explained in C++:

Imagine that from an image, the array of the positions where the max probabilities are looks like the following list:


maxIndixes = [23 23 0 0 0 0 0 0 0 0 11 0 0 0 24 0 0 19 0 29 0 0 0 0 0 11]

This array is postprocessed in the following algorithm written in C++:

string concatenateChars(int maxIndixes[26])
{
  // Characters that Rosetta can predicts:
  string chars[38] = {"_", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"};
  
  string final_phrase = " ";

  // The for loop that iterates the positions where the max probabilities are:
  for (int i = 0; i < 26; ++i)
  {
    if (maxIndixes[i] != 0 && !(i > 0 && (maxIndixes[i-1] == maxIndixes[i])))
    {
      final_phrase += chars[maxIndixes[i]];
    }
  }
  return final_phrase;
}

The result of processing the maxIndixes array is manisa.

Previous: GStreamer

Index

Next: GStreamer/Example pipelines/

❯