Python）でZipファイルを再帰的に抽出する方法

Question

次のような3つのZipファイルを含むZipファイルがあります。

zipfile.Zip\ dirA.Zip\ a dirB.Zip\ b dirC.Zip\ c

これらの名前（dirA、dirB、dirC）のディレクトリにあるZipファイル内にあるすべての内部Zipファイルを抽出したいと思います。
基本的に、次のスキーマになりたいと思います。

output\ dirA\ a dirB\ b dirC\ c

私は以下を試しました：

import os, re from zipfile import ZipFile os.makedirs(directory) # where directory is "\output" with ZipFile(self.archive_name, "r") as archive: for id, files in data.items(): if files: print("Creating", id) dirpath = os.path.join(directory, id) os.mkdir(dirpath) for file in files: match = pattern.match(filename) new = match.group(2) new_filename = os.path.join(dirpath, new) content = archive.open(file).read() with open(new_filename, "wb") as outfile: outfile.write(content)

しかし、それはZipファイルを抽出するだけであり、私は最終的に次のようになります。

output\ dirA\ dirA.Zip dirB\ dirB.Zip dirC\ dirC.Zip

どんなコードセグメントを含む提案私は非常に多くの異なることを試みて、成功せずにドキュメントを読んだので、大いに感謝されるでしょう。

Forge · Accepted Answer

Zipファイルを抽出するときは、内部のZipファイルをディスクではなくメモリに書き込む必要があります。これを行うために、私は BytesIO を使用しました。

このコードをチェックしてください：

_import os import io import zipfile def extract(filename): z = zipfile.ZipFile(filename) for f in z.namelist(): # get directory name from file dirname = os.path.splitext(f)[0] # create new directory os.mkdir(dirname) # read inner Zip file into bytes buffer content = io.BytesIO(z.read(f)) Zip_file = zipfile.ZipFile(content) for i in Zip_file.namelist(): Zip_file.extract(i, dirname) _

extract("zipfile.Zip")を_zipfile.Zip_とともに実行する場合：

_zipfile.Zip/ dirA.Zip/ a dirB.Zip/ b dirC.Zip/ c _

出力は次のようになります。

_dirA/ a dirB/ b dirC/ c _

ronnydw · Answer

ネストされたZipファイル（任意のレベルのネスト）を抽出し、元のZipファイルをクリーンアップする関数の場合：

import zipfile, re, os def extract_nested_Zip(zippedFile, toFolder): """ Extract a Zip file including any nested Zip files Delete the Zip file(s) after extraction """ with zipfile.ZipFile(zippedFile, 'r') as zfile: zfile.extractall(path=toFolder) os.remove(zippedFile) for root, dirs, files in os.walk(toFolder): for filename in files: if re.search(r'\.Zip$', filename): fileSpec = os.path.join(root, filename) extract_nested_Zip(fileSpec, root)

hertopnerd · Answer

私は他の解決策のいくつかを試しましたが、それらを「その場で」機能させることができませんでした。「インプレース」バージョンを処理するためのソリューションを投稿します。注：Zipファイルを削除し、同じ名前のディレクトリに「置き換え」ますなので、保持する場合はZipファイルをバックアップします。

戦略は簡単です。ディレクトリ（およびサブディレクトリ）内のすべてのZipファイルを解凍し、Zipファイルがなくなるまですすぎ、繰り返します。 ZipファイルにZipファイルが含まれている場合は、すすぎと繰り返しが必要です。

_import os import io import zipfile import re def unzip_directory(directory): """" This function unzips (and then deletes) all Zip files in a directory """ for root, dirs, files in os.walk(directory): for filename in files: if re.search(r'\.Zip$', filename): to_path = os.path.join(root, filename.split('.Zip')[0]) zipped_file = os.path.join(root, filename) if not os.path.exists(to_path): os.makedirs(to_path) with zipfile.ZipFile(zipped_file, 'r') as zfile: zfile.extractall(path=to_path) # deletes Zip file os.remove(zipped_file) def exists_Zip(directory): """ This function returns T/F whether any .Zip file exists within the directory, recursively """ is_Zip = False for root, dirs, files in os.walk(directory): for filename in files: if re.search(r'\.Zip$', filename): is_Zip = True return is_Zip def unzip_directory_recursively(directory, max_iter=1000): print("Does the directory path exist? ", os.path.exists(directory)) """ Calls unzip_directory until all contained Zip files (and new ones from previous calls) are unzipped """ iterate = 0 while exists_Zip(directory) and iterate < max_iter: unzip_directory(directory) iterate += 1 pre = "Did not " if iterate < max_iter else "Did" print(pre, "time out based on max_iter limit of", max_iter, ". Took iterations:", iterate) _

Zipファイルがバックアップされていると仮定すると、unzip_directory_recursively(your_directory)を呼び出すことでこれをすべて機能させることができます。