xml.parsers.expat.ExpatError：整形式ではありません（無効なトークン）

Question

Xmltodictを使用して以下のxmlファイルをロードすると、次のエラーが表示されます：xml.parsers.expat.ExpatError：整形式ではありません（無効なトークン）：1行目、1列目

これが私のファイルです：

<?xml version="1.0" encoding="utf-8"?> <mydocument has="an attribute"> <and> <many>elements</many> <many>more elements</many> </and> <plus a="complex"> element as well </plus> </mydocument>

ソース：

import xmltodict with open('fileTEST.xml') as fd: xmltodict.parse(fd.read())

Windows 10を使用していますPython 3.6およびxmltodict 0.11.0を使用しています

ElementTreeを使用した場合、機能します

tree = ET.ElementTree(file='fileTEST.xml') for elem in tree.iter(): print(elem.tag, elem.attrib) mydocument {'has': 'an attribute'} and {} many {} many {} plus {'a': 'complex'}

注：改行の問題が発生した可能性があります。
注2：Beyond Compareを2つの異なるファイルで使用しました。
UTF-8 BOMエンコードされたファイルでクラッシュし、UTF-8ファイルで機能します。
UTF-8 BOMは、リーダーがファイルをUTF-8でエンコードされていると識別できるようにする一連のバイト（EF BB BF）です。

jmunsch · Answer

私の場合、ファイルはバイトオーダーマーク付きで保存されていて、notepad ++のデフォルトです

ファイルを再保存しましたなし BOMをプレーンutf8に再保存しました。

Renz Paul Del Rosario · Answer

エンコードタイプを定義するのを忘れたようです。そのxmlファイルを文字列変数に初期化することをお勧めします。

import xml.etree.ElementTree as ET import xmltodict import json tree = ET.parse('your_data.xml') xml_data = tree.getroot() #here you can change the encoding type to be able to set it to the one you need xmlstr = ET.tostring(xml_data, encoding='utf-8', method='xml') data_dict = dict(xmltodict.parse(xmlstr))

winklerrr · Answer

Python 3

一発ギャグ

_data: dict = xmltodict.parse(ElementTree.tostring(ElementTree.parse(path).getroot())) _

_`.json`_および_`.xml`_のヘルパー

特定のpathから_.json_および_.xml_ファイルをロードする小さなヘルパー関数を作成しました。私はそれがここで何人かの人々に役立つかもしれないと思った：

_import json import xml.etree.ElementTree def load_json(path: str) -> dict: if path.endswith(".json"): print(f"> Loading JSON from '{path}'") with open(path, mode="r") as open_file: content = open_file.read() return json.loads(content) Elif path.endswith(".xml"): print(f"> Loading XML as JSON from '{path}'") xml = ElementTree.tostring(ElementTree.parse(path).getroot()) return xmltodict.parse(xml, attr_prefix="@", cdata_key="#text", dict_constructor=dict) print(f"> Loading failed for '{path}'") return {} _

注意事項

json出力の_@_および_#text_マーカーを削除する場合は、パラメーター_attr_prefix=""_および_cdata_key=""_を使用します
通常xmltodict.parse()はOrderedDictを返しますが、パラメーター_dict_constructor=dict_で変更できます

用途

_path = "my_data.xml" data = load_json(path) print(json.dumps(data, indent=2)) # OUTPUT # # > Loading XML as JSON from 'my_data.xml' # { # "mydocument": { # "@has": "an attribute", # "and": { # "many": [ # "elements", # "more elements" # ] # }, # "plus": { # "@a": "complex", # "#text": "element as well" # } # } # } _

出典

Prayson W. Daniel · Answer

私の場合、問題は最初の3文字にありました。だからそれらを削除するとうまくいきました：

import xmltodict from xml.parsers.expat import ExpatError with open('your_data.xml') as f: data = f.read() try: doc = xmltodict.parse(data) except ExpatError: doc = xmltodict.parse(data[3:])

Arount · Answer

xmltodictは解析できないようです<?xml version="1.0" encoding="utf-8"?>

この行を削除すると機能します。

xml.parsers.expat.ExpatError：整形式ではありません（無効なトークン）

Python 3

一発ギャグ

_.json_および_.xml_のヘルパー

出典

_`.json`_および_`.xml`_のヘルパー