JSONモジュールでプリティプリントするときにカスタムインデントを実装する方法

Question

したがって、Python 2.7を使用しています。jsonモジュールを使用して、次のデータ構造をエンコードしています。

'layer1': { 'layer2': { 'layer3_1': [ long_list_of_stuff ], 'layer3_2': 'string' } }

私の問題は、次のように、きれいな印刷を使用してすべてを印刷していることです。

json.dumps(data_structure, indent=2)

"layer3_1"のコンテンツを除いて、すべてインデントしたいという点を除いて、これはすばらしいことです。これは、膨大な辞書リストの座標であるため、それぞれに単一の値を設定すると、きれいな印刷で数千のファイルが作成されます行の例を以下に示します。

{ "layer1": { "layer2": { "layer3_1": [ { "x": 1, "y": 7 }, { "x": 0, "y": 4 }, { "x": 5, "y": 3 }, { "x": 6, "y": 9 } ], "layer3_2": "string" } } }

私が本当に欲しいのは次のようなものです：

{ "layer1": { "layer2": { "layer3_1": [{"x":1,"y":7},{"x":0,"y":4},{"x":5,"y":3},{"x":6,"y":9}], "layer3_2": "string" } } }

jsonモジュールを拡張することが可能だと聞きました："layer3_1"オブジェクト内でのみインデントをオフにするように設定できますか？もしそうなら、誰か教えていただけますか？

martineau · Accepted Answer

更新

以下は、何度か改訂された私の元の回答のバージョンです。 JFSebastianの answer の最初のアイデアを機能させる方法を示すためだけに投稿したオリジナルとは異なり、彼のようにインデントされていないstringオブジェクトの表現。最新の更新バージョンは、分離でフォーマットされたPythonオブジェクトJSONを返します。

各座標のキーdictは、OPのコメントの1つに従ってソートされた順序で表示されますが、_sort_keys=True_キーワード引数が最初のjson.dumps()呼び出し駆動で指定されている場合のみその過程で、オブジェクトのタイプが途中で文字列に変更されなくなりました。言い換えると、「ラップされた」オブジェクトの実際のタイプが維持されるようになりました。

私の投稿の当初の意図を理解していなかったため、多くの人々が反対票を投じたと思います。そのため、主にその理由で、私は何度か私の回答を「修正」して改善しました。現在のバージョンは、@ Erik Allikが answer で使用したいくつかのアイデアに加えて、私の元の回答と、この回答の下のコメントに示されている他のユーザーからの有用なフィードバックを組み合わせたものです。

次のコードは、Python 2.7.16と3.7.4の両方で変更せずに機能するようです。

_from _ctypes import PyObj_FromPtr import json import re class NoIndent(object): """ Value wrapper. """ def __init__(self, value): self.value = value class MyEncoder(json.JSONEncoder): FORMAT_SPEC = '@@{}@@' regex = re.compile(FORMAT_SPEC.format(r'(\d+)')) def __init__(self, **kwargs): # Save copy of any keyword argument values needed for use here. self.__sort_keys = kwargs.get('sort_keys', None) super(MyEncoder, self).__init__(**kwargs) def default(self, obj): return (self.FORMAT_SPEC.format(id(obj)) if isinstance(obj, NoIndent) else super(MyEncoder, self).default(obj)) def encode(self, obj): format_spec = self.FORMAT_SPEC # Local var to expedite access. json_repr = super(MyEncoder, self).encode(obj) # Default JSON. # Replace any marked-up object ids in the JSON repr with the # value returned from the json.dumps() of the corresponding # wrapped Python object. for match in self.regex.finditer(json_repr): # see https://stackoverflow.com/a/15012814/355230 id = int(match.group(1)) no_indent = PyObj_FromPtr(id) json_obj_repr = json.dumps(no_indent.value, sort_keys=self.__sort_keys) # Replace the matched id string with json formatted representation # of the corresponding Python object. json_repr = json_repr.replace( '"{}"'.format(format_spec.format(id)), json_obj_repr) return json_repr if __name__ == '__main__': from string import ascii_lowercase as letters data_structure = { 'layer1': { 'layer2': { 'layer3_1': NoIndent([{"x":1,"y":7}, {"x":0,"y":4}, {"x":5,"y":3}, {"x":6,"y":9}, {k: v for v, k in enumerate(letters)}]), 'layer3_2': 'string', 'layer3_3': NoIndent([{"x":2,"y":8,"z":3}, {"x":1,"y":5,"z":4}, {"x":6,"y":9,"z":8}]), 'layer3_4': NoIndent(list(range(20))), } } } print(json.dumps(data_structure, cls=MyEncoder, sort_keys=True, indent=2)) _

出力：

_{ "layer1": { "layer2": { "layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}, {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4, "f": 5, "g": 6, "h": 7, "i": 8, "j": 9, "k": 10, "l": 11, "m": 12, "n": 13, "o": 14, "p": 15, "q": 16, "r": 17, "s": 18, "t": 19, "u": 20, "v": 21, "w": 22, "x": 23, "y": 24, "z": 25}], "layer3_2": "string", "layer3_3": [{"x": 2, "y": 8, "z": 3}, {"x": 1, "y": 5, "z": 4}, {"x": 6, "y": 9, "z": 8}], "layer3_4": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] } } } _

M Somerville · Answer

ボッジですが、dumps（）からの文字列を取得したら、その内容の形式がわかっている場合は、正規表現の置換を実行できます。以下に沿ったもの：

s = json.dumps(data_structure, indent=2) s = re.sub('\s*{\s*"(.)": (\d+),\s*"(.)": (\d+)\s*}(,?)\s*', r'{"\1":\2,"\3":\4}\5', s)

Erik Kaplun · Answer

次のソリューションはPython 2.7.xで正しく機能するようです。これはカスタムJSONエンコーダーPython 2.7 JavaScriptコードカスタムエンコードされたオブジェクトが、UUIDベースの置換スキームを使用して出力でJSON文字列になることを回避します。

class NoIndent(object): def __init__(self, value): self.value = value class NoIndentEncoder(json.JSONEncoder): def __init__(self, *args, **kwargs): super(NoIndentEncoder, self).__init__(*args, **kwargs) self.kwargs = dict(kwargs) del self.kwargs['indent'] self._replacement_map = {} def default(self, o): if isinstance(o, NoIndent): key = uuid.uuid4().hex self._replacement_map[key] = json.dumps(o.value, **self.kwargs) return "@@%s@@" % (key,) else: return super(NoIndentEncoder, self).default(o) def encode(self, o): result = super(NoIndentEncoder, self).encode(o) for k, v in self._replacement_map.iteritems(): result = result.replace('"@@%s@@"' % (k,), v) return result

次にこれ

obj = { "layer1": { "layer2": { "layer3_2": "string", "layer3_1": NoIndent([{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}]) } } } print json.dumps(obj, indent=2, cls=NoIndentEncoder)

次の出力を生成します：

{ "layer1": { "layer2": { "layer3_2": "string", "layer3_1": [{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}] } } }

また、すべてのオプション（indentを除く）も正しく渡します。 sort_keys=Trueからネストされたjson.dumps呼び出しまで。

obj = { "layer1": { "layer2": { "layer3_1": NoIndent([{"y": 7, "x": 1, }, {"y": 4, "x": 0}, {"y": 3, "x": 5, }, {"y": 9, "x": 6}]), "layer3_2": "string", } } } print json.dumps(obj, indent=2, sort_keys=True, cls=NoIndentEncoder)

正しく出力：

{ "layer1": { "layer2": { "layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}], "layer3_2": "string" } } }

これは、たとえば、 collections.OrderedDict：

obj = { "layer1": { "layer2": { "layer3_2": "string", "layer3_3": NoIndent(OrderedDict([("b", 1), ("a", 2)])) } } } print json.dumps(obj, indent=2, cls=NoIndentEncoder)

出力：

{ "layer1": { "layer2": { "layer3_3": {"b": 1, "a": 2}, "layer3_2": "string" } } }

SzieberthAdam · Answer

これにより、OPの期待される結果が得られます。

import json class MyJSONEncoder(json.JSONEncoder): def iterencode(self, o, _one_shot=False): list_lvl = 0 for s in super(MyJSONEncoder, self).iterencode(o, _one_shot=_one_shot): if s.startswith('['): list_lvl += 1 s = s.replace('
', '').rstrip() Elif 0 < list_lvl: s = s.replace('
', '').rstrip() if s and s[-1] == ',': s = s[:-1] + self.item_separator Elif s and s[-1] == ':': s = s[:-1] + self.key_separator if s.endswith(']'): list_lvl -= 1 yield s o = { "layer1":{ "layer2":{ "layer3_1":[{"y":7,"x":1},{"y":4,"x":0},{"y":3,"x":5},{"y":9,"x":6}], "layer3_2":"string", "layer3_3":["aaa
bbb","ccc
ddd",{"aaa
bbb":"ccc
ddd"}], "layer3_4":"aaa
bbb", } } } jsonstr = json.dumps(o, indent=2, separators=(',', ':'), sort_keys=True, cls=MyJSONEncoder) print(jsonstr) o2 = json.loads(jsonstr) print('identical objects: {}'.format((o == o2)))

jfs · Answer

あなたは試すことができます：

上記のアプローチはjsonモジュールでは機能しないようです：

import json import sys class NoIndent(object): def __init__(self, value): self.value = value def default(o, encoder=json.JSONEncoder()): if isinstance(o, NoIndent): return json.dumps(o.value) return encoder.default(o) L = [dict(x=x, y=y) for x in range(1) for y in range(2)] obj = [NoIndent(L), L] json.dump(obj, sys.stdout, default=default, indent=4)

無効な出力が生成されます（リストは文字列としてシリアル化されます）：

[ "[{\"y\": 0, \"x\": 0}, {\"y\": 1, \"x\": 0}]", [ { "y": 0, "x": 0 }, { "y": 1, "x": 0 } ] ]

yamlを使用できる場合、メソッドは機能します。

import sys import yaml class NoIndentList(list): pass def noindent_list_presenter(dumper, data): return dumper.represent_sequence(u'tag:yaml.org,2002:seq', data, flow_style=True) yaml.add_representer(NoIndentList, noindent_list_presenter) obj = [ [dict(x=x, y=y) for x in range(2) for y in range(1)], [dict(x=x, y=y) for x in range(1) for y in range(2)], ] obj[0] = NoIndentList(obj[0]) yaml.dump(obj, stream=sys.stdout, indent=4)

それは生成します：

- [{x: 0, y: 0}, {x: 1, y: 0}] - - {x: 0, y: 0} - {x: 0, y: 1}

つまり、最初のリストは[]を使用してシリアル化され、すべてのアイテムは1行にあり、2番目のリストはアイテムごとに1行を使用します。

robm · Answer

JSONに寄与するオブジェクトのタイプが多すぎてJSONEncoderメソッドを試すことができず、さまざまなタイプが多すぎて正規表現を使用できない場合のポスト処理ソリューションを次に示します。この関数は、データ自体の詳細を知る必要なく、指定されたレベルの後で空白を折りたたみます。

def collapse_json(text, indent=12): """Compacts a string of json data by collapsing whitespace after the specified indent level NOTE: will not produce correct results when indent level is not a multiple of the json indent level """ initial = " " * indent out = [] # final json output sublevel = [] # accumulation list for sublevel entries pending = None # holder for consecutive entries at exact indent level for line in text.splitlines(): if line.startswith(initial): if line[indent] == " ": # found a line indented further than the indent level, so add # it to the sublevel list if pending: # the first item in the sublevel will be the pending item # that was the previous line in the json sublevel.append(pending) pending = None item = line.strip() sublevel.append(item) if item.endswith(","): sublevel.append(" ") Elif sublevel: # found a line at the exact indent level *and* we have sublevel # items. This means the sublevel items have come to an end sublevel.append(line.strip()) out.append("".join(sublevel)) sublevel = [] else: # found a line at the exact indent level but no items indented # further, so possibly start a new sub-level if pending: # if there is already a pending item, it means that # consecutive entries in the json had the exact same # indentation and that last pending item was not the start # of a new sublevel. out.append(pending) pending = line.rstrip() else: if pending: # it's possible that an item will be pending but not added to # the output yet, so make sure it's not forgotten. out.append(pending) pending = None if sublevel: out.append("".join(sublevel)) out.append(line) return "
".join(out)

たとえば、インデントレベルが4のjson.dumpsへの入力としてこの構造を使用します。

text = json.dumps({"zero": ["first", {"second": 2, "third": 3, "fourth": 4, "items": [[1,2,3,4], [5,6,7,8], 9, 10, [11, [12, [13, [14, 15]]]]]}]}, indent=4)

これは、さまざまなインデントレベルでの関数の出力です。

>>> print collapse_json(text, indent=0) {"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]} >>> print collapse_json(text, indent=4) { "zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}] } >>> print collapse_json(text, indent=8) { "zero": [ "first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3} ] } >>> print collapse_json(text, indent=12) { "zero": [ "first", { "items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3 } ] } >>> print collapse_json(text, indent=16) { "zero": [ "first", { "items": [ [1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]] ], "second": 2, "fourth": 4, "third": 3 } ] }

Polv · Answer

実際、YAMLはJSONよりも優れています。

NoIndentEncoderを動作させることはできませんが、JSON文字列で正規表現を使用できます...

def collapse_json(text, list_length=5): for length in range(list_length): re_pattern = r'$$' + (r'\s*(.+)\s*,' * length)[:-1] + r'$$' re_repl = r'[' + ''.join(r'\{}, '.format(i+1) for i in range(length))[:-2] + r']' text = re.sub(re_pattern, re_repl, text) return text

問題は、ネストされたリストでこれを実行するにはどうすればよいですか？

前：

[ 0, "any", [ 2, 3 ] ]

後：

[0, "any", [2, 3]]

kashiraja · Answer

補足として、このWebサイトにはJavaScriptが組み込まれており、行が70文字より短い場合にJSON文字列の改行を回避します。

http://www.csvjson.com/json_beautifier

（変更されたバージョンの JSON-js を使用して実装されました）

「インラインショートアレイ」を選択します

コピーバッファーにあるデータをすばやく表示するのに最適です。