画像をメモリにロードせずに画像サイズを取得

Question

次の方法でPILを使用して画像サイズを取得できることを理解しています

from PIL import Image im = Image.open(image_filename) width, height = im.size

ただし、画像の幅と高さを取得したいと思いますwithoutメモリに画像をロードする必要があります。それは可能ですか？私は画像サイズの統計のみを行っており、画像の内容は気にしません。処理を高速化したいだけです。

Hooked · Accepted Answer

コメントが示すように、PILは.openを呼び出すときに画像をメモリにロードしません。 PIL 1.1.7のドキュメントを見ると、.openのドキュメント文字列は次のように言っています。

def open(fp, mode="r"): "Open an image file, without loading the raster data"

ソースには次のようないくつかのファイル操作があります。

 ... prefix = fp.read(16) ... fp.seek(0) ...

しかし、これらはファイル全体を読み取ることはほとんどありません。実際、.openは、成功するとファイルオブジェクトとファイル名を返すだけです。さらに、 docs say：

open（file、mode =” r”）

指定された画像ファイルを開いて識別します。

これは遅延操作です。この関数はファイルを識別しますが、実際の画像データは、データを処理する（またはloadメソッドを呼び出す）までファイルから読み取られません。

掘り下げてみると、.openが_openを呼び出していることがわかります。これは画像形式固有のオーバーロードです。 _openの各実装は、新しいファイルで見つけることができます。 .jpegファイルはJpegImagePlugin.pyにあります。それを詳しく見てみましょう。

ここでは物事が少しトリッキーになっているように見えます。その中には、jpegマーカーが見つかったときに途切れる無限ループがあります。

 while True: s = s + self.fp.read(1) i = i16(s) if i in MARKER: name, description, handler = MARKER[i] # print hex(i), name, description if handler is not None: handler(self, i) if i == 0xFFDA: # start of scan rawmode = self.mode if self.mode == "CMYK": rawmode = "CMYK;I" # assume Adobe conventions self.tile = [("jpeg", (0,0) + self.size, 0, (rawmode, ""))] # self.__offset = self.fp.tell() break s = self.fp.read(1) Elif i == 0 or i == 65535: # padded marker or junk; move on s = "\xff" else: raise SyntaxError("no marker found")

どのように見えますかcould不正な形式のファイル全体を読み取ることができます。ただし、情報マーカーが正常に読み取られた場合は、早期にブレークアウトするはずです。関数handlerは、最終的に画像の寸法であるself.sizeを設定します。

Paulo Scardine · Answer

画像の内容を気にしない場合、PILはおそらく過剰です。

python magicモジュールの出力を解析することをお勧めします。

>>> t = magic.from_file('teste.png') >>> t 'PNG image data, 782 x 602, 8-bit/color RGBA, non-interlaced' >>> re.search('(\d+) x (\d+)', t).groups() ('782', '602')

これは、ファイルタイプシグネチャを識別するために、可能な限り少ないバイトを読み取るlibmagicのラッパーです。

スクリプトの関連バージョン：

https://raw.githubusercontent.com/scardine/image_size/master/get_image_size.py

[更新]

うーん、残念なことに、JPEGに適用すると、上記は「 'JPEG画像データ、EXIF標準2.21'」になります。画像サイズがありません！ –アレックスフリント

Jpegは魔法に強いようです。 :-)

理由はわかります。JPEGファイルの画像サイズを取得するには、libmagicが読むよりも多くのバイトを読み取る必要がある場合があります。

袖をまくり、この非常にテストされていないスニペット（GitHubから取得）が付属し、サードパーティのモジュールは必要ありません。

Look, Ma! No deps!

#------------------------------------------------------------------------------- # Name: get_image_size # Purpose: extract image dimensions given a file path using just # core modules # # Author: Paulo Scardine (based on code from Emmanuel VAÏSSE) # # Created: 26/09/2013 # Copyright: (c) Paulo Scardine 2013 # Licence: MIT #------------------------------------------------------------------------------- #!/usr/bin/env python import os import struct class UnknownImageFormat(Exception): pass def get_image_size(file_path): """ Return (width, height) for a given img file content - no external dependencies except the os and struct modules from core """ size = os.path.getsize(file_path) with open(file_path) as input: height = -1 width = -1 data = input.read(25) if (size >= 10) and data[:6] in ('GIF87a', 'GIF89a'): # GIFs w, h = struct.unpack("<HH", data[6:10]) width = int(w) height = int(h) Elif ((size >= 24) and data.startswith('\211PNG
\032
') and (data[12:16] == 'IHDR')): # PNGs w, h = struct.unpack(">LL", data[16:24]) width = int(w) height = int(h) Elif (size >= 16) and data.startswith('\211PNG
\032
'): # older PNGs? w, h = struct.unpack(">LL", data[8:16]) width = int(w) height = int(h) Elif (size >= 2) and data.startswith('\377\330'): # JPEG msg = " raised while trying to decode as JPEG." input.seek(0) input.read(2) b = input.read(1) try: while (b and ord(b) != 0xDA): while (ord(b) != 0xFF): b = input.read(1) while (ord(b) == 0xFF): b = input.read(1) if (ord(b) >= 0xC0 and ord(b) <= 0xC3): input.read(3) h, w = struct.unpack(">HH", input.read(4)) break else: input.read(int(struct.unpack(">H", input.read(2))[0])-2) b = input.read(1) width = int(w) height = int(h) except struct.error: raise UnknownImageFormat("StructError" + msg) except ValueError: raise UnknownImageFormat("ValueError" + msg) except Exception as e: raise UnknownImageFormat(e.__class__.__+ msg) else: raise UnknownImageFormat( "Sorry, don't know how to get information from this file." ) return width, height

[2019年更新]

Rust実装： https://github.com/scardine/imsz

Jonathan · Answer

Pypiにはimagesizeと呼ばれるパッケージがありますが、これは現在は動作しますが、あまりアクティブではないようです。

インストール：

pip install imagesize

使用法：

import imagesize width, height = imagesize.get("test.png") print(width, height)

ホームページ： https://github.com/shibukawa/imagesize_py

PyPi： https://pypi.org/project/imagesize/

user2923419 · Answer

私はしばしばインターネットで画像サイズを取得します。もちろん、画像をダウンロードしてからロードして情報を解析することはできません。時間がかかりすぎます。私の方法は、チャンクを画像コンテナに送り、毎回画像を解析できるかどうかをテストすることです。必要な情報を取得したら、ループを停止します。

コードのコアを抽出し、ローカルファイルを解析するように修正しました。

from PIL import ImageFile ImPar=ImageFile.Parser() with open(r"D:	estpic	est.jpg", "rb") as f: ImPar=ImageFile.Parser() chunk = f.read(2048) count=2048 while chunk != "": ImPar.feed(chunk) if ImPar.image: break chunk = f.read(2048) count+=2048 print(ImPar.image.size) print(count)

出力：

(2240, 1488) 38912

実際のファイルサイズは1,543,580バイトで、画像サイズを取得するには38,912バイトしか読み取れません。これが役立つことを願っています。

Lenar Hoyt · Answer

Unixシステムでそれを行う別の短い方法。 fileの出力に依存しますが、すべてのシステムで標準化されているかどうかはわかりません。これはおそらく本番コードでは使用しないでください。さらに、ほとんどのJPEGは画像サイズを報告しません。

import subprocess, re image_size = list(map(int, re.findall('(\d+)x(\d+)', subprocess.getoutput("file " + filename))[-1]))

Yantao Xie · Answer

この answer には別の適切な解像度がありますが、pgm形式がありません。この answer は、pgmを解決しました。そして、bmpを追加します。

コードは以下です

import struct, imghdr, re, magic def get_image_size(fname): '''Determine the image type of fhandle and return its size. from draco''' with open(fname, 'rb') as fhandle: head = fhandle.read(32) if len(head) != 32: return if imghdr.what(fname) == 'png': check = struct.unpack('>i', head[4:8])[0] if check != 0x0d0a1a0a: return width, height = struct.unpack('>ii', head[16:24]) Elif imghdr.what(fname) == 'gif': width, height = struct.unpack('<HH', head[6:10]) Elif imghdr.what(fname) == 'jpeg': try: fhandle.seek(0) # Read 0xff next size = 2 ftype = 0 while not 0xc0 <= ftype <= 0xcf: fhandle.seek(size, 1) byte = fhandle.read(1) while ord(byte) == 0xff: byte = fhandle.read(1) ftype = ord(byte) size = struct.unpack('>H', fhandle.read(2))[0] - 2 # We are at a SOFn block fhandle.seek(1, 1) # Skip `precision' byte. height, width = struct.unpack('>HH', fhandle.read(4)) except Exception: #IGNORE:W0703 return Elif imghdr.what(fname) == 'pgm': header, width, height, maxval = re.search( b"(^P5\s(?:\s*#.*[
])*" b"(\d+)\s(?:\s*#.*[
])*" b"(\d+)\s(?:\s*#.*[
])*" b"(\d+)\s(?:\s*#.*[
]\s)*)", head).groups() width = int(width) height = int(height) Elif imghdr.what(fname) == 'bmp': _, width, height, depth = re.search( b"((\d+)\sx\s" b"(\d+)\sx\s" b"(\d+))", str).groups() width = int(width) height = int(height) else: return return width, height