<code>tesseract</code>バインディングを介して、OCRに<code>pytesseract</code>を使用しています。残念ながら、下付きの数字を含むテキストを抽出しようとすると、問題が発生します。下付きの数字は代わりに文字として解釈されます。たとえば、基本的なイメージでは次のようになります。 <a href="https://i.stack.imgur.com/B7HHQ.jpg"><img src="https://i.stack.imgur.com/B7HHQ.jpg" alt="enter image description here"></a> テキストを「CH3」として抽出したい。つまり、<code>3</code>の数字が画像の添え字だったことを知りたいとは思わない。<code>tesseract</code>を使用したこれの私の試みは：<pre><code>import cv2 import pytesseract img = cv2.imread('test.jpeg') # Note that I have reduced the region of interest to the known # text portion of the image text = pytesseract.image_to_string( img[200:300, 200:320], config='-l eng --oem 1 --psm 13' ) print(text) </code></pre>残念ながら、これは誤って出力されます<pre><code>'CHs' </code></pre><code>psm</code>パラメータによっては、<code>'CHa'</code>を取得することもできます。この問題は、テキストの「ベースライン」が一貫していないことに関連していると思いますが、確信はありません。このタイプの画像からテキストを正確に抽出するにはどうすればよいですか？更新-2020年5月19日<code>tesseract</code>に構成オプションを提供していないAchintha Ihalageの回答を見て、<code>psm</code>オプションを調べました。関心領域がわかっているため（この場合、EAST検出を使用してテキストのバウンディングボックスを見つけています）、<code>psm</code>の<code>tesseract</code>構成オプションは、元のコードでこれを処理します1行のテキストである必要はありません。上記の境界ボックスによって指定された関心領域に対して<code>image_to_string</code>を実行すると、出力が得られます<pre><code>CH 3 </code></pre>もちろん、簡単に処理して<code>CH3</code>を取得できます。

OCRを使用して画像の添え字番号を検出する方法

tesseractバインディングを介して、OCRにpytesseractを使用しています。残念ながら、下付きの数字を含むテキストを抽出しようとすると、問題が発生します。下付きの数字は代わりに文字として解釈されます。

たとえば、基本的なイメージでは次のようになります。

テキストを「CH3」として抽出したい。つまり、3の数字が画像の添え字だったことを知りたいとは思わない。

tesseractを使用したこれの私の試みは：

import cv2
import pytesseract

img = cv2.imread('test.jpeg')

# Note that I have reduced the region of interest to the known 
# text portion of the image
text = pytesseract.image_to_string(
    img[200:300, 200:320], config='-l eng --oem 1 --psm 13'
)
print(text)

残念ながら、これは誤って出力されます

'CHs'

psmパラメータによっては、'CHa'を取得することもできます。

この問題は、テキストの「ベースライン」が一貫していないことに関連していると思いますが、確信はありません。

このタイプの画像からテキストを正確に抽出するにはどうすればよいですか？

更新-2020年5月19日

tesseractに構成オプションを提供していないAchintha Ihalageの回答を見て、psmオプションを調べました。

関心領域がわかっているため（この場合、EAST検出を使用してテキストのバウンディングボックスを見つけています）、psmのtesseract構成オプションは、元のコードでこれを処理します1行のテキストである必要はありません。上記の境界ボックスによって指定された関心領域に対してimage_to_stringを実行すると、出力が得られます

CH

3

もちろん、簡単に処理してCH3を取得できます。

pythonocrtesseractpython-tesseract

2020/05/16dspencer

この方法は、一般的な状況により適していると思います。

import cv2
import pytesseract
from pathlib import Path

image = cv2.imread('test.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]  # (suitable for sharper black and white pictures
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]  # is OpenCV2.4 or OpenCV3
result_list = []
for c in contours:
    x, y, w, h = cv2.boundingRect(c)
    area = cv2.contourArea(c)
    if area > 200:
        detect_area = image[y:y + h, x:x + w]
        # detect_area = cv2.GaussianBlur(detect_area, (3, 3), 0)
        predict_char = pytesseract.image_to_string(detect_area, lang='eng', config='--oem 0 --psm 10')
        result_list.append((x, predict_char))
        cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), thickness=2)

result = ''.join([char for _, char in sorted(result_list, key=lambda _x: _x[0])])
print(result)  # CH3


output_dir = Path('./temp')
output_dir.mkdir(parents=True, exist_ok=True)
cv2.imwrite(f"{output_dir/Path('image.png')}", image)
cv2.imwrite(f"{output_dir/Path('clean.png')}", thresh)

より多くの参照

次の例を参照することを強くお勧めします。これはOCRの参考になる参考資料です。

2020/05/25Carson