A Brief Exploration of OCR Technology: 4. Text Localization

By 苏剑林 | June 24, 2016

After the first part, we have successfully extracted the text features of the image. Next, we will perform text localization. The main process is divided into two steps: 1. Proximity search, with the goal of circling single lines of text; 2. Text cutting, with the goal of segmenting single lines of text into individual characters.

Proximity Search

We can perform a connected component search on the extracted feature map, treating each resulting connected component as a Chinese character. This works for most Chinese characters, but it is not suitable for some simpler characters such as "小" (small), "旦" (dawn), "八" (eight), and "元" (yuan). Because these characters lack connectivity, they end up being split apart, as shown in Figure 13. Therefore, we need to use a proximity search algorithm to integrate regions that likely form a single character and obtain single-line text regions.

Figure 13: Direct search for connected components will split characters like '元' apart
Figure 13: Direct search for connected components will split characters like "元" apart

The purpose of a proximity search is to perform dilation to "stick" together regions that likely form a character. If dilation is performed without a search, it happens in all directions simultaneously, which might stick the upper and lower lines together. Therefore, we only allow a region to dilate in one specific direction. We determine the direction of dilation (up, down, left, right) by searching for neighboring regions:

Proximity Search* Starting from a connected component, find the horizontal bounding box of the connected component and expand the component to the entire rectangle. When the distance from this region to the nearest neighboring region is less than a certain range, consider dilating this rectangle. The direction of dilation is the direction of the nearest neighbor.

Since proximity is involved, we need a definition of distance. Below is a reasonable definition of distance.

Distance

Figure 14: Two sample regions
Figure 14: Two sample regions

As shown in the figure above, a rectangular region can be determined by the top-left coordinate $(x,y)$ and the bottom-right coordinate $(z,w)$, where the coordinates are calculated with the top-left corner as the origin. The center of this region is $\left(\frac{x+w}{2}, \frac{y+z}{2}\right)$. For the two regions $S$ and $S'$ in the figure, we can calculate the difference between their center vectors: $$(x_c,y_c)=\left(\frac{x'+w'}{2}-\frac{x+w}{2},\frac{y'+z'}{2}-\frac{y+z}{2}\right)\tag{10}$$ It is unreasonable to directly use $\sqrt{x_c^2+y_c^2}$ as the distance because "proximity" here should be calculated according to the boundaries rather than the center points. Therefore, we need to subtract the length of the regions: $$(x'_c,y'_c)=\left(x_c-\frac{w-x}{2}-\frac{w'-x'}{2},y_c-\frac{z-y}{2}-\frac{z'-y'}{2}\right)\tag{11}$$ The distance is defined as: $$d(S,S')=\sqrt{[\max(x'_c,0)]^2+[\max(y'_c,0)]^2}\tag{12}$$ As for the direction, it can be determined by the argument (angle) of $(x_c,y_c)$.

However, according to the previous "Proximity Search*" method, it is easy to stick the upper and lower lines of text together. Therefore, based on our horizontal layout assumption, a better method is to only allow horizontal dilation:

Proximity Search Starting from a connected component, find the horizontal bounding box of the connected component and expand the component to the entire rectangle. When the distance from this region to the nearest neighboring region is less than a certain range, consider dilating this rectangle. The direction of dilation is the direction of the nearest neighbor, and the dilation is executed if and only if that direction is horizontal.

Results

With the distance defined, we can calculate the distance between every two connected components and find the nearest neighbor for each region. We expand each region towards the direction of its nearest neighbor by one-fourth of its size. In this way, neighboring regions may merge into a new region, thereby integrating fragments.

Experiments show that the proximity search approach effectively integrates text fragments, as shown in Figure 15.

Figure 15: Text regions circled after proximity search
Figure 15: Text regions circled after proximity search

Reproduction of this article must include the original address: https://kexue.fm/archives/3818