Python（Perlスクリプトが指定されています）でUnicode文字をASCII文字に置き換える方法は？

Question

pythonを学習しようとしていますが、次のPerlスクリプトをPythonに変換する方法がわかりませんでした。

#!/usr/bin/Perl -w use open qw(:std :utf8); while(<>) { s/\x{00E4}/ae/; s/\x{00F6}/oe/; s/\x{00FC}/ue/; print; }

スクリプトは、Unicodeのumlautsを代替のASCII出力に変更するだけです。（したがって、完全な出力はASCIIにあります。）ヒントをいただければ幸いです。ありがとう！

user3850 · Accepted Answer

fileinput モジュールを使用して、標準入力またはファイルのリストをループします。
uTF-8から読み取った行をUnicodeオブジェクトにデコードします
次に、必要なUnicode文字を translate メソッドでマップします。

translit.pyは次のようになります：

#!/usr/bin/env python2.6 # -*- coding: utf-8 -*- import fileinput table = { 0xe4: u'ae', ord(u'ö'): u'oe', ord(u'ü'): u'ue', ord(u'ß'): None, } for line in fileinput.input(): s = line.decode('utf8') print s.translate(table),

そして、あなたはそれをこのように使うことができます：

$ cat utf8.txt sömé täßt sömé täßt sömé täßt $ ./translit.py utf8.txt soemé taet soemé taet soemé taet

更新：

python 3文字列はデフォルトでユニコードであり、非ASCII文字または非ラテン文字が含まれている場合はエンコードする必要はありません。したがって、ソリューションは次のようになります。フォロー：

line = 'Verhältnismäßigkeit, Möglichkeit' table = { ord('ä'): 'ae', ord('ö'): 'oe', ord('ü'): 'ue', ord('ß'): 'ss', } line.translate(table) >>> 'Verhaeltnismaessigkeit, Moeglichkeit'

Ian Bicking · Answer

ASCIIに変換するには、 ASCII、Dammit またはこのレシピを試してみてください。

>>> title = u"Klüft skräms inför på fédéral électoral große" >>> import unicodedata >>> unicodedata.normalize('NFKD', title).encode('ascii','ignore') 'Kluft skrams infor pa federal electoral groe'

jfs · Answer

手動の正規表現を作成する代わりに、 unidecode を試してUnicodeをASCIIに変換することができます。これはPython Text::Unidecode Perlモジュールのポートです：

#!/usr/bin/env python import fileinput import locale from contextlib import closing from unidecode import unidecode # $ pip install unidecode def toascii(files=None, encoding=None, bufsize=-1): if encoding is None: encoding = locale.getpreferredencoding(False) with closing(fileinput.FileInput(files=files, bufsize=bufsize)) as file: for line in file: print unidecode(line.decode(encoding)), if __name__ == "__main__": import sys toascii(encoding=sys.argv.pop(1) if len(sys.argv) > 1 else None)

グローバル状態を回避するためにFileInputクラスを使用します。

例：

$ echo 'äöüß' | python toascii.py utf-8 aouss

Climbs_lika_Spyder · Answer

私は translitcodec を使用します

>>> import translitcodec >>> print '\xe4'.decode('latin-1') ä >>> print '\xe4'.decode('latin-1').encode('translit/long').encode('ascii') ae >>> print '\xe4'.decode('latin-1').encode('translit/short').encode('ascii') a

デコード言語は必要に応じて変更できます。単一の実装の長さを短縮するための単純な関数が必要になる場合があります。

def fancy2ascii(s): return s.decode('latin-1').encode('translit/long').encode('ascii')