ファイルサイズの人間が読めるバージョンを取得するための再利用可能なライブラリ？

Question

Webには、バイトサイズから人間が読めるサイズを返す関数を提供するさまざまなスニペットがあります。

>>> human_readable(2048) '2 kilobytes' >>>

しかし、これを提供するPythonライブラリはありますか？

Sridhar Ratnakumar · Accepted Answer

上記の「ライブラリを必要とするタスクが小さすぎる」問題に簡単な実装で対処します。

def sizeof_fmt(num, suffix='B'): for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']: if abs(num) < 1024.0: return "%3.1f%s%s" % (num, unit, suffix) num /= 1024.0 return "%.1f%s%s" % (num, 'Yi', suffix)

サポート：

現在知られているすべてバイナリプレフィックス
負および正の数
1000ヨビバイトより大きい数値
任意の単位（おそらく、Gibibitsで数えたい！）

例：

>>> sizeof_fmt(168963795964) '157.4GiB'

フレッド・シレラ

Pyrocater · Answer

あなたが探していると思われるすべての機能を備えたライブラリは humanize です。 humanize.naturalsize()はあなたが探しているすべてのことを行うようです。

joctee · Answer

これが私のバージョンです。 forループは使用しません。一定の複雑さO（1）を持ち、理論的にはforループを使用するここでの回答よりも効率的です。

from math import log unit_list = Zip(['bytes', 'kB', 'MB', 'GB', 'TB', 'PB'], [0, 0, 1, 2, 2, 2]) def sizeof_fmt(num): """Human friendly file size""" if num > 1: exponent = min(int(log(num, 1024)), len(unit_list) - 1) quotient = float(num) / 1024**exponent unit, num_decimals = unit_list[exponent] format_string = '{:.%sf} {}' % (num_decimals) return format_string.format(quotient, unit) if num == 0: return '0 bytes' if num == 1: return '1 byte'

何が起こっているかをより明確にするために、文字列の書式設定のコードを省略できます。実際に作業を行う行は次のとおりです。

exponent = int(log(num, 1024)) quotient = num / 1024**exponent unit_list[exponent]

akaIDIOT · Answer

私はこの質問が古代であることを知っていますが、最近、ループを回避するバージョンを思い付きました。log2を使用して、サイズの順序を決定します。

from math import log2 _suffixes = ['bytes', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB'] def file_size(size): # determine binary order in steps of size 10 # (coerce to int, // still returns a float) order = int(log2(size) / 10) if size else 0 # format file size # (.4g results in rounded numbers for exact matches and max 3 decimals, # should never resort to exponent values) return '{:.4g} {}'.format(size / (1 << (order * 10)), _suffixes[order])

しかし、読みやすさのためにPythonでないと考えられます:)

Mr. Me · Answer

常にそれらの人の一人である必要があります。さて今日は私です。これが1行の解決策です。関数のシグネチャを数える場合は2行です。

def human_size(bytes, units=[' bytes','KB','MB','GB','TB', 'PB', 'EB']): """ Returns a human readable string reprentation of bytes""" return str(bytes) + units[0] if bytes < 1024 else human_size(bytes>>10, units[1:])

>>> human_size(123) 123 bytes >>> human_size(123456789) 117GB

Jon Tirsen · Answer

Djangoインストール済みを使用している場合は、 filesizeformat を試すこともできます。

from Django.template.defaultfilters import filesizeformat filesizeformat(1073741824) => "1.0 GB"

Mr. Me · Answer

以下はPython 3.6+で動作しますが、私の意見では、ここで最もわかりやすい答えであり、使用する小数点以下の桁数をカスタマイズできます。

def human_readable_size(size, decimal_places=3): for unit in ['B','KiB','MiB','GiB','TiB']: if size < 1024.0: break size /= 1024.0 return f"{size:.{decimal_places}f}{unit}"

Sridhar Ratnakumar · Answer

そのようなライブラリの1つは hurry.filesize です。

>>> from hurry.filesize import alternative >>> size(1, system=alternative) '1 byte' >>> size(10, system=alternative) '10 bytes' >>> size(1024, system=alternative) '1 KB'

Giancarlo Sportelli · Answer

1000の累乗または kibibytes のいずれかを使用すると、より標準に準拠したものになります。

def sizeof_fmt(num, use_kibibyte=True): base, suffix = [(1000.,'B'),(1024.,'iB')][use_kibibyte] for x in ['B'] + map(lambda x: x+suffix, list('kMGTP')): if -base < num < base: return "%3.1f %s" % (num, x) num /= base return "%3.1f %s" % (num, x)

追伸K（大文字）の接尾辞が付いた数千を出力するライブラリを決して信用しないでください:)

markltbaker · Answer

Hurry.filesize（）の代替として提供されるスニペットをリフします。使用するプレフィックスに基づいてさまざまな精度の数値を提供するスニペットを次に示します。一部のスニペットほど簡潔ではありませんが、結果は気に入っています。

def human_size(size_bytes): """ format a size in bytes into a 'human' file size, e.g. bytes, KB, MB, GB, TB, PB Note that bytes/KB will be reported in whole numbers but MB and above will have greater precision e.g. 1 byte, 43 bytes, 443 KB, 4.3 MB, 4.43 GB, etc """ if size_bytes == 1: # because I really hate unnecessary plurals return "1 byte" suffixes_table = [('bytes',0),('KB',0),('MB',1),('GB',2),('TB',2), ('PB',2)] num = float(size_bytes) for suffix, precision in suffixes_table: if num < 1024.0: break num /= 1024.0 if precision == 0: formatted_size = "%d" % num else: formatted_size = str(round(num, ndigits=precision)) return "%s %s" % (formatted_size, suffix)

gojomo · Answer

これは、ほとんどすべての状況で必要なことを行い、オプションの引数でカスタマイズ可能であり、ご覧のとおり、pretty自己文書化です：

from math import log def pretty_size(n,pow=0,b=1024,u='B',pre=['']+[p+'i'for p in'KMGTPEZY']): pow,n=min(int(log(max(n*b**pow,1),b)),len(pre)-1),n*b**pow return "%%.%if %%s%%s"%abs(pow%(-pow-1))%(n/b**float(pow),pre[pow],u)

出力例：

>>> pretty_size(42) '42 B' >>> pretty_size(2015) '2.0 KiB' >>> pretty_size(987654321) '941.9 MiB' >>> pretty_size(9876543210) '9.2 GiB' >>> pretty_size(0.5,pow=1) '512 B' >>> pretty_size(0) '0 B'

高度なカスタマイズ：

>>> pretty_size(987654321,b=1000,u='bytes',pre=['','kilo','mega','giga']) '987.7 megabytes' >>> pretty_size(9876543210,b=1000,u='bytes',pre=['','kilo','mega','giga']) '9.9 gigabytes'

このコードは、Python 2とPython 3の両方に互換性があります。 PEP8コンプライアンスは、読者向けの演習です。覚えておいてください、それはきれいなoutputです。

更新：

数千のコンマが必要な場合は、明白な拡張子を適用するだけです。

def prettier_size(n,pow=0,b=1024,u='B',pre=['']+[p+'i'for p in'KMGTPEZY']): r,f=min(int(log(max(n*b**pow,1),b)),len(pre)-1),'{:,.%if} %s%s' return (f%(abs(r%(-r-1)),pre[r],u)).format(n*b**pow/b**float(r))

例えば：

>>> pretty_units(987654321098765432109876543210) '816,968.5 YiB'

xApple · Answer

これまでのすべての回答を参考にして、ここに私の見解を示します。これは、バイト単位のファイルサイズを整数として保存するオブジェクトです。しかし、オブジェクトを印刷しようとすると、人間が読めるバージョンが自動的に取得されます。

class Filesize(object): """ Container for a size in bytes with a human readable representation Use it like this:: >>> size = Filesize(123123123) >>> print size '117.4 MB' """ chunk = 1024 units = ['bytes', 'KB', 'MB', 'GB', 'TB', 'PB'] precisions = [0, 0, 1, 2, 2, 2] def __init__(self, size): self.size = size def __int__(self): return self.size def __str__(self): if self.size == 0: return '0 bytes' from math import log unit = self.units[min(int(log(self.size, self.chunk)), len(self.units) - 1)] return self.format(unit) def format(self, unit): if unit not in self.units: raise Exception("Not a valid file size unit: %s" % unit) if self.size == 1 and unit == 'bytes': return '1 byte' exponent = self.units.index(unit) quotient = float(self.size) / self.chunk**exponent precision = self.precisions[exponent] format_string = '{:.%sf} {}' % (precision) return format_string.format(quotient, unit)

HST · Answer

senderleの10進数バージョンの固定精度が好きなので、ここに上記のjocteeの答えとの一種のハイブリッドがあります（非整数ベースでログを取得できることを知っていましたか？）：

from math import log def human_readable_bytes(x): # hybrid of https://stackoverflow.com/a/10171475/2595465 # with https://stackoverflow.com/a/5414105/2595465 if x == 0: return '0' magnitude = int(log(abs(x),10.24)) if magnitude > 16: format_str = '%iP' denominator_mag = 15 else: float_fmt = '%2.1f' if magnitude % 3 == 1 else '%1.2f' illion = (magnitude + 1) // 3 format_str = float_fmt + ['', 'K', 'M', 'G', 'T', 'P'][illion] return (format_str % (x * 1.0 / (1024 ** illion))).lstrip('0')

arumuga abinesh · Answer

HumanFriendlyプロジェクトは with this に役立ちます。

import humanfriendly humanfriendly.format_size(1024)

上記のコードは、回答として1KBを提供します。
例ここにあります。

Saeed Zahedian Abroodi · Answer

「ヒューマナイズ」を使用する必要があります。

>>> humanize.naturalsize(1000000) '1.0 MB' >>> humanize.naturalsize(1000000, binary=True) '976.6 KiB' >>> humanize.naturalsize(1000000, gnu=True) '976.6K'

参照：

https://pypi.org/project/humanize/

jerrymouse · Answer

シンプルな2ライナーはどうですか：

def humanizeFileSize(filesize): p = int(math.floor(math.log(filesize, 2)/10)) return "%.3f%s" % (filesize/math.pow(1024,p), ['B','KiB','MiB','GiB','TiB','PiB','EiB','ZiB','YiB'][p])

これが内部でどのように機能するかです：

ログを計算する₂（ファイルサイズ）
10で除算して、最も近いユニットを取得します。（たとえば、サイズが5000バイトの場合、最も近い単位はKbであるため、答えはX KiBである必要があります）
ユニットとともにfile_size/value_of_closest_unitを返します。

ただし、filesizeが0または負の場合は機能しません（ログは0および-ve番号に対して未定義であるため）。追加のチェックを追加できます：

def humanizeFileSize(filesize): filesize = abs(filesize) if (filesize==0): return "0 Bytes" p = int(math.floor(math.log(filesize, 2)/10)) return "%0.2f %s" % (filesize/math.pow(1024,p), ['Bytes','KiB','MiB','GiB','TiB','PiB','EiB','ZiB','YiB'][p])

例：

>>> humanizeFileSize(538244835492574234) '478.06 PiB' >>> humanizeFileSize(-924372537) '881.55 MiB' >>> humanizeFileSize(0) '0 Bytes'

NOTE-KbとKiBには違いがあります。 KBは1000バイトを意味しますが、KiBは1024バイトを意味します。 KB、MB、GBはすべて1000の倍数ですが、KiB、MiB、GiBなどはすべて1024の倍数です。詳細はこちら

Sridhar Ratnakumar · Answer

DiveIntoPython3も talks この関数について。

METAJIJI · Answer

モダンDjangoには自己テンプレートタグfilesizeformatがあります：

human-readableファイルサイズ（つまり、「13 KB」、「4.1 MB」、「102バイト」など）のような値をフォーマットします。

例えば：

{{ value|filesizeformat }}

値が123456789の場合、出力は117.7 MBになります。

詳細： https://docs.djangoproject.com/en/1.10/ref/templates/builtins/#filesizeformat

ayorgo · Answer

以下であなたが見つけようとしているのは、すでに投稿されているものの中で最もパフォーマンスの高い、または最も短いソリューションではありません。代わりに、他の回答の多くが見逃している1つの特定の問題に焦点を当てています。

すなわち、999_995のような入力が与えられた場合：

Python 3.6.1 ... ... >>> value = 999_995 >>> base = 1000 >>> math.log(value, base) 1.999999276174054

これは、最も近い整数に切り捨てられ、入力に戻されると、

>>> order = int(math.log(value, base)) >>> value/base**order 999.995

これは、出力精度を制御する必要があるまで、私たちが期待するものとまったく同じようです。そして、これは物事が少し難しくなり始めるときです。

精度を2桁に設定すると、次のようになります。

>>> round(value/base**order, 2) 1000 # K

1Mの代わりに。

これに対抗するにはどうすればよいですか？

もちろん、明示的に確認できます。

if round(value/base**order, 2) == base: order += 1

しかし、もっとうまくできるでしょうか？最終ステップを実行する前に、どのようにorderをカットする必要があるかを知ることができますか？

わかった。

0.5の10進数の丸め規則を想定すると、上記のif条件は次のように変換されます。

その結果

def abbreviate(value, base=1000, precision=2, suffixes=None): if suffixes is None: suffixes = ['', 'K', 'M', 'B', 'T'] if value == 0: return f'{0}{suffixes[0]}' order_max = len(suffixes) - 1 order = log(abs(value), base) order_corr = order - int(order) >= log(base - 0.5/10**precision, base) order = min(int(order) + order_corr, order_max) factored = round(value/base**order, precision) return f'{factored:,g}{suffixes[order]}'

与える

>>> abbreviate(999_994) '999.99K' >>> abbreviate(999_995) '1M' >>> abbreviate(999_995, precision=3) '999.995K' >>> abbreviate(2042, base=1024) '1.99K' >>> abbreviate(2043, base=1024) '2K'

Matt Joiner · Answer

def human_readable_data_quantity(quantity, multiple=1024): if quantity == 0: quantity = +0 SUFFIXES = ["B"] + [i + {1000: "B", 1024: "iB"}[multiple] for i in "KMGTPEZY"] for suffix in SUFFIXES: if quantity < multiple or suffix == SUFFIXES[-1]: if suffix == SUFFIXES[0]: return "%d%s" % (quantity, suffix) else: return "%.1f%s" % (quantity, suffix) else: quantity /= multiple

crifan · Answer

Sridhar Ratnakumarの回答を参照し、更新先：

def formatSize(sizeInBytes, decimalNum=1, isUnitWithI=False, sizeUnitSeperator=""): """format size to human readable string""" # https://en.wikipedia.org/wiki/Binary_prefix#Specific_units_of_IEC_60027-2_A.2_and_ISO.2FIEC_80000 # K=kilo, M=mega, G=giga, T=tera, P=peta, E=exa, Z=zetta, Y=Yotta sizeUnitList = ['','K','M','G','T','P','E','Z'] largestUnit = 'Y' if isUnitWithI: sizeUnitListWithI = [] for curIdx, eachUnit in enumerate(sizeUnitList): unitWithI = eachUnit if curIdx >= 1: unitWithI += 'i' sizeUnitListWithI.append(unitWithI) # sizeUnitListWithI = ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi'] sizeUnitList = sizeUnitListWithI largestUnit += 'i' suffix = "B" decimalFormat = "." + str(decimalNum) + "f" # ".1f" finalFormat = "%" + decimalFormat + sizeUnitSeperator + "%s%s" # "%.1f%s%s" sizeNum = sizeInBytes for sizeUnit in sizeUnitList: if abs(sizeNum) < 1024.0: return finalFormat % (sizeNum, sizeUnit, suffix) sizeNum /= 1024.0 return finalFormat % (sizeNum, largestUnit, suffix)

出力例は次のとおりです。

def testKb(): kbSize = 3746 kbStr = formatSize(kbSize) print("%s -> %s" % (kbSize, kbStr)) def testI(): iSize = 87533 iStr = formatSize(iSize, isUnitWithI=True) print("%s -> %s" % (iSize, iStr)) def testSeparator(): seperatorSize = 98654 seperatorStr = formatSize(seperatorSize, sizeUnitSeperator=" ") print("%s -> %s" % (seperatorSize, seperatorStr)) def testBytes(): bytesSize = 352 bytesStr = formatSize(bytesSize) print("%s -> %s" % (bytesSize, bytesStr)) def testMb(): mbSize = 76383285 mbStr = formatSize(mbSize, decimalNum=2) print("%s -> %s" % (mbSize, mbStr)) def testTb(): tbSize = 763832854988542 tbStr = formatSize(tbSize, decimalNum=2) print("%s -> %s" % (tbSize, tbStr)) def testPb(): pbSize = 763832854988542665 pbStr = formatSize(pbSize, decimalNum=4) print("%s -> %s" % (pbSize, pbStr)) def demoFormatSize(): testKb() testI() testSeparator() testBytes() testMb() testTb() testPb() # 3746 -> 3.7KB # 87533 -> 85.5KiB # 98654 -> 96.3 KB # 352 -> 352.0B # 76383285 -> 72.84MB # 763832854988542 -> 694.70TB # 763832854988542665 -> 678.4199PB

Peter F · Answer

このソリューションは、あなたの心がどのように機能するかに応じて、あなたにもアピールするかもしれません：

from pathlib import Path def get_size(path = Path('.')): """ Gets file size, or total directory size """ if path.is_file(): size = path.stat().st_size Elif path.is_dir(): size = sum(file.stat().st_size for file in path.glob('*.*')) return size def format_size(path, unit="MB"): """ Converts integers to common size units used in computing """ bit_shift = {"B": 0, "kb": 7, "KB": 10, "mb": 17, "MB": 20, "gb": 27, "GB": 30, "TB": 40,} return "{:,.0f}".format(get_size(path) / float(1 << bit_shift[unit])) + " " + unit # Tests and test results >>> get_size("d:\media\bags of fun.avi") '38 MB' >>> get_size("d:\media\bags of fun.avi","KB") '38,763 KB' >>> get_size("d:\media\bags of fun.avi","kb") '310,104 kb'