Android Java UTF-8HttpClientの問題

Question

Webページから取得したJSON配列で奇妙な文字エンコードの問題が発生しています。サーバーは次のヘッダーを送り返しています。

コンテンツタイプtext/javascript; charset = UTF-8

また、Firefoxまたは任意のブラウザーでJSON出力を確認でき、Unicode文字が正しく表示されます。応答には、アクセント記号などを含む別の言語の単語が含まれる場合があります。ただし、Javaでプルダウンして文字列に入れると、奇妙な疑問符が表示されます。これが私のコードです：

HttpParams params = new BasicHttpParams(); HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1); HttpProtocolParams.setContentCharset(params, "utf-8"); params.setBooleanParameter("http.protocol.expect-continue", false); HttpClient httpclient = new DefaultHttpClient(params); HttpGet httpget = new HttpGet("http://www.example.com/json_array.php"); HttpResponse response; try { response = httpclient.execute(httpget); if(response.getStatusLine().getStatusCode() == 200){ // Connection was established. Get the content. HttpEntity entity = response.getEntity(); // If the response does not enclose an entity, there is no need // to worry about connection release if (entity != null) { // A Simple JSON Response Read InputStream instream = entity.getContent(); String jsonText = convertStreamToString(instream); Toast.makeText(getApplicationContext(), "Response: "+jsonText, Toast.LENGTH_LONG).show(); } } } catch (MalformedURLException e) { Toast.makeText(getApplicationContext(), "ERROR: Malformed URL - "+e.getMessage(), Toast.LENGTH_LONG).show(); e.printStackTrace(); } catch (IOException e) { Toast.makeText(getApplicationContext(), "ERROR: IO Exception - "+e.getMessage(), Toast.LENGTH_LONG).show(); e.printStackTrace(); } catch (JSONException e) { Toast.makeText(getApplicationContext(), "ERROR: JSON - "+e.getMessage(), Toast.LENGTH_LONG).show(); e.printStackTrace(); } private static String convertStreamToString(InputStream is) { /* * To convert the InputStream to String we use the BufferedReader.readLine() * method. We iterate until the BufferedReader return null which means * there's no more data to read. Each line will appended to a StringBuilder * and returned as String. */ BufferedReader reader; try { reader = new BufferedReader(new InputStreamReader(is, "UTF-8")); } catch (UnsupportedEncodingException e1) { // TODO Auto-generated catch block e1.printStackTrace(); } StringBuilder sb = new StringBuilder(); String line; try { while ((line = reader.readLine()) != null) { sb.append(line + "
"); } } catch (IOException e) { e.printStackTrace(); } finally { try { is.close(); } catch (IOException e) { e.printStackTrace(); } } return sb.toString(); }

ご覧のとおり、InputStreamReaderでUTF-8を指定していますが、Toastを介して返されたJSONテキストを表示するたびに、奇妙な疑問符が表示されます。代わりにInputStreamをbyte []に送信する必要があると思いますか？

助けてくれてありがとう。

Vit Khudenko · Accepted Answer

これを試して：

if (entity != null) { // A Simple JSON Response Read // InputStream instream = entity.getContent(); // String jsonText = convertStreamToString(instream); String jsonText = EntityUtils.toString(entity, HTTP.UTF_8); // ... toast code here }

Stephen C · Answer

@Arhimedの答えは解決策です。しかし、あなたのconvertStreamToStringコードに明らかに問題があることはわかりません。

私の推測は：

サーバーは、ストリームの開始時にUTFバイトオーダーマーク（BOM）を配置しています。標準のJava UTF-8文字デコーダーはBOMを削除しないため、結果の文字列になってしまう可能性があります（ただし、EntityUtilsのコードは削除しないようです）。 BOMもあれば何でも。）
convertStreamToStringは、文字ストリームを一度に1行ずつ読み取り、ハードワイヤードの' 'を行末マーカーとして使用して再組み立てします。これを外部ファイルまたはアプリケーションに書き込む場合は、プラットフォーム固有の行末マーカーを使用する必要があります。

Win Myo Htet · Answer

ConvertStreamToStringがHttpRespnoseで設定されたエンコーディングを尊重していないだけです。 EntityUtils.toString(entity, HTTP.UTF_8)の内部を見ると、EntityUtilsが最初にHttpResponseにエンコードが設定されているかどうかを確認し、設定されている場合は、EntityUtilsがそのエンコードを使用していることがわかります。エンティティにエンコーディングが設定されていない場合にのみ、パラメータで渡されたエンコーディング（この場合はHTTP.UTF_8）にフォールバックします。

したがって、HTTP.UTF_8がパラメーターで渡されたと言えますが、エンコードが間違っているため、使用されることはありません。 EntityUtilsのヘルパーメソッドを使用してコードを更新します。

 HttpEntity entity = response.getEntity(); String charset = getContentCharSet(entity); InputStream instream = entity.getContent(); String jsonText = convertStreamToString(instream,charset); private static String getContentCharSet(final HttpEntity entity) throws ParseException { if (entity == null) { throw new IllegalArgumentException("HTTP entity may not be null"); } String charset = null; if (entity.getContentType() != null) { HeaderElement values[] = entity.getContentType().getElements(); if (values.length > 0) { NameValuePair param = values[0].getParameterByName("charset"); if (param != null) { charset = param.getValue(); } } } return TextUtils.isEmpty(charset) ? HTTP.UTF_8 : charset; } private static String convertStreamToString(InputStream is, String encoding) { /* * To convert the InputStream to String we use the * BufferedReader.readLine() method. We iterate until the BufferedReader * return null which means there's no more data to read. Each line will * appended to a StringBuilder and returned as String. */ BufferedReader reader; try { reader = new BufferedReader(new InputStreamReader(is, encoding)); } catch (UnsupportedEncodingException e1) { // TODO Auto-generated catch block e1.printStackTrace(); } StringBuilder sb = new StringBuilder(); String line; try { while ((line = reader.readLine()) != null) { sb.append(line + "
"); } } catch (IOException e) { e.printStackTrace(); } finally { try { is.close(); } catch (IOException e) { e.printStackTrace(); } } return sb.toString(); }

Alan Deep · Answer

Archimedの答えは正しいです。ただし、これはHTTPリクエストに追加のヘッダーを指定するだけで実行できます。

Accept-charset: utf-8

何かを削除したり、他のライブラリを使用したりする必要はありません。

例えば、

GET / HTTP/1.1 Host: www.website.com Connection: close Accept: text/html Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.10 Safari/537.36 DNT: 1 Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8 Accept-Charset: utf-8

ほとんどの場合、リクエストにはAccept-Charsetヘッダーがありません。

Alex Goncalves · Answer

応答コンテンツタイプフィールドから文字セットを抽出します。これを行うには、次の方法を使用できます。

private static String extractCharsetFromContentType(String contentType) { if (TextUtils.isEmpty(contentType)) return null; Pattern p = Pattern.compile(".*charset=([^\s^;^,]+)"); Matcher m = p.matcher(contentType); if (m.find()) { try { return m.group(1); } catch (Exception e) { return null; } } return null; }

次に、抽出した文字セットを使用してInputStreamReaderを作成します。

String charsetName = extractCharsetFromContentType(connection.getContentType()); InputStreamReader inReader = (TextUtils.isEmpty(charsetName) ? new InputStreamReader(inputStream) : new InputStreamReader(inputStream, charsetName)); BufferedReader reader = new BufferedReader(inReader);