画像からテキストへpython

Question

python 3.xを使用し、次のコードを使用して画像をテキストに変換しています。

_from PIL import Image from pytesseract import image_to_string image = Image.open('image.png', mode='r') print(image_to_string(image)) _

次のエラーが発生します：

_Traceback (most recent call last): File "C:/Users/hp/Desktop/GII/Image_to_text.py", line 12, in <module> print(image_to_string(image)) File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.AMD64\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string config=config) File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.AMD64\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract stderr=subprocess.PIPE) File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.AMD64\lib\subprocess.py", line 950, in __init__ restore_signals, start_new_session) File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.AMD64\lib\subprocess.py", line 1220, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified _

私のpythonが存在するのと同じディレクトリに画像を置いたことに注意してください。また、image = Image.open('image.png', mode='r')でエラーは発生しませんが、print(image_to_string(image))。

ここで何が間違っているのか考えてみませんか？ありがとう

Łukasz Rogalski · Accepted Answer

tesseractをインストールし、パスにアクセスできるようにする必要があります。

ソースによると、pytesseractはsubprocess.Popenのラッパーであり、実行するバイナリとしてtesseractバイナリがあります。それ自体はいかなる種類のOCRも実行しません。

ソースの関連部分：

def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False, config=None): ''' runs the command: `tesseract_cmd` `input_filename` `output_filename_base` returns the exit status of tesseract, as well as tesseract's stderr output ''' command = [tesseract_cmd, input_filename, output_filename_base] if lang is not None: command += ['-l', lang] if boxes: command += ['batch.nochop', 'makebox'] if config: command += shlex.split(config) proc = subprocess.Popen(command, stderr=subprocess.PIPE) return (proc.wait(), proc.stderr.read())

ソースの別の部分を引用する：

# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY tesseract_cmd = 'tesseract'

テセラクトパスを変更する簡単な方法は次のとおりです。

import pytesseract pytesseract.tesseract_cmd = "/absolute/path/to/tesseract" # this should be done only once pytesseract.image_to_string(img)

AnkurJangra · Answer

TesseractOCRセットアップもダウンロードする必要があります。このリンクを使用してセットアップをダウンロードします： http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.01.exe

次に、次の行をコードに含めて、tesseract実行可能ファイルを使用します。pytesseract.pytesseract.tesseract_cmd= 'C：\ Program Files（x86）\ Tesseract-OCR esseract'

これは、tesseractがインストールされるデフォルトの場所です。

それでおしまい。また、これらの手順に従って、最後にコードを実行しました。

これがお役に立てば幸いです。

thrinadhn · Answer

画像からテキストを抽出するには、以下のパッケージをインストールしてくださいpnf/jpeg

pip install pytesseract pip install Pillow

python pytesseract OCR（光学式文字認識）の使用は、画像からテキストを電子的に抽出するプロセスです。

PILは、単に画像ファイルの読み取りと書き込みから、科学的な画像処理、地理情報システム、リモートセンシングなど、あらゆるものに使用されます。

from PIL import Image from pytesseract import image_to_string print(image_to_string(Image.open('/home/ABCD/Downloads/imageABC.png'),lang='eng'))

stonebig · Answer

あなたの「現在の」ディレクトリはあなたが思う場所ではありません。

==>画像へのフルパスを指定できます。例：image = Image.open（r'C：\ Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.AMD64\image .png '、mode =' r '）

prabhakar267 · Answer

これを使用してみることができますpythonライブラリ： https://github.com/prabhakar267/ocr-convert-image-to-text

パッケージのREADME）で述べたように、使用法は非常に簡単です。

usage: python main.py [-h] input_dir [output_dir] positional arguments: input_dir output_dir optional arguments: -h, --help show this help message and exit