python-tesseractを使用して認識された単語の境界ボックスを取得する

Question

Python-tesseractを使用して、画像から単語を抽出しています。これはpython OCRコードであるtesseractのラッパーです。

私は単語を取得するために次のコードを使用しています：

import tesseract api = tesseract.TessBaseAPI() api.Init(".","eng",tesseract.OEM_DEFAULT) api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz") api.SetPageSegMode(tesseract.PSM_AUTO) mImgFile = "test.jpg" mBuffer=open(mImgFile,"rb").read() result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api) print "result(ProcessPagesBuffer)=",result

これは、画像内の単語のみを返し、単語の位置/サイズ/方向（または単語を含む境界ボックス）を返しません。私もそれを得る方法があるのだろうかと思っていました

lennon310 · Accepted Answer

tesseract.GetBoxText()メソッドは、配列内の各文字の正確な位置を返します。

また、コマンドラインオプションtesseract test.jpg result hocrがあり、これは、認識された各Wordの座標を含むresult.htmlファイルを生成します。しかし、pythonスクリプトを介して呼び出すことができるかどうかはわかりません。

stwykd · Answer

pytesseract.image_to_data()を使用します

_import pytesseract from pytesseract import Output import cv2 img = cv2.imread('image.jpg') d = pytesseract.image_to_data(img, output_type=Output.DICT) n_boxes = len(d['level']) for i in range(n_boxes): (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i]) cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2) cv2.imshow('img', img) cv2.waitKey(0) _

pytesseract.image_to_data()によって返されるデータの中：

leftは、境界ボックスの左上隅から画像の左境界までの距離です。
topは、境界ボックスの左上隅から画像の上部境界までの距離です。
widthとheightは、境界ボックスの幅と高さです。
confは、その境界ボックス内のWordの予測に対するモデルの信頼度です。 confが-1の場合、対応する境界ボックスには、単一のWordではなく、テキストのブロックが含まれることを意味します。

pytesseract.image_to_boxes()によって返されるバウンディングボックスは文字を囲むので、pytesseract.image_to_data()が探しているものだと思います。

jtbr · Answer

Python tesseract は、ファイルに書き込むことなく、image_to_boxes 関数：

import cv2 import pytesseract filename = 'image.png' # read the image and get the dimensions img = cv2.imread(filename) h, w, _ = img.shape # assumes color image # run tesseract, returning the bounding boxes boxes = pytesseract.image_to_boxes(img) # also include any config options you use # draw the bounding boxes on the image for b in boxes.splitlines(): b = b.split(' ') img = cv2.rectangle(img, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2) # show annotated image and wait for keypress cv2.imshow(filename, img) cv2.waitKey(0)

khushhall · Answer

以下のコードを使用すると、各文字に対応する境界ボックスを取得できます。

import csv import cv2 from pytesseract import pytesseract as pt pt.run_tesseract('bw.png', 'output', lang=None, boxes=True, config="hocr") # To read the coordinates boxes = [] with open('output.box', 'rb') as f: reader = csv.reader(f, delimiter = ' ') for row in reader: if(len(row)==6): boxes.append(row) # Draw the bounding box img = cv2.imread('bw.png') h, w, _ = img.shape for b in boxes: img = cv2.rectangle(img,(int(b[1]),h-int(b[2])),(int(b[3]),h-int(b[4])),(255,0,0),2) cv2.imshow('output',img)

Endyd · Answer

Lennon310の下でコメントしますが、コメントするのに十分な評判がありません...

彼のコマンドラインコマンドを実行するにはtesseract test.jpg result hocr pythonスクリプト：

from subprocess import check_call tesseractParams = ['tesseract', 'test.jpg', 'result', 'hocr'] check_call(tesseractParams)