PHP）で<title>を取得する最速の方法

Question

私はブックマークシステムを実行していて、PHPでページのタイトルを取得するための最速の（最も簡単な）方法を探しています。

$title = page_title($url)のようなものがあるといいでしょう

Ed Carrel · Accepted Answer

<?php function page_title($url) { $fp = file_get_contents($url); if (!$fp) return null; $res = preg_match("/<title>(.*)<\/title>/siU", $fp, $title_matches); if (!$res) return null; // Clean up title: remove EOL's and excessive whitespace. $title = preg_replace('/\s+/', ' ', $title_matches[1]); $title = trim($title); return $title; } ?>

次の入力で 'erに旋回を与えました：

print page_title("http://www.google.com/");

出力：Google

うまくいけば、あなたの使用法のために十分に一般的です。より強力なものが必要な場合は、HTMLパーサーの調査に少し時間を費やしても問題はないかもしれません。

編集：少しのエラーチェックを追加しました。ちょっと最初のバージョンを急いで出しました、ごめんなさい。

Lukas Liesis · Answer

正規表現なしで取得できます。

$title = ''; $dom = new DOMDocument(); if($dom->loadHTMLFile($urlpage)) { $list = $dom->getElementsByTagName("title"); if ($list->length > 0) { $title = $list->item(0)->textContent; } }

Alexei Tenitski · Answer

または、この単純な関数をもう少し防弾にします。

function page_title($url) { $page = file_get_contents($url); if (!$page) return null; $matches = array(); if (preg_match('/<title>(.*?)<\/title>/', $page, $matches)) { return $matches[1]; } else { return null; } } echo page_title('http://google.com');

alex · Answer

正規表現？

cURL を使用して、$ htmlSource変数の内容を取得します。

preg_match('/<title>(.*)<\/title>/iU', $htmlSource, $titleMatches); print_r($titleMatches);

その配列に何があるかを確認してください。

Regexは信頼できない可能性があるため、パーサーを使用する必要がありますが、ほとんどの人はHTMLトラバースについて言います。

他の答えはより詳細を提供します:)

wilks · Answer

私もブックマークシステムを実行していますが、PHP 5なので、stream_get_lineを使用して、（ファイル全体をロードする代わりに）終了タイトルタグまでのみリモートページをロードできることがわかりました。、次に、（ regex の代わりに）explodeを使用して開始タイトルタグの前にあるものを削除します。

function page_title($url) { $title = false; if ($handle = fopen($url, "r")) { $string = stream_get_line($handle, 0, "</title>"); fclose($handle); $string = (explode("<title", $string))[1]; if (!empty($string)) { $title = trim((explode(">", $string))[1]); } } return $title; }

最後のexplodeはPlugTradeの answer に感謝します。タイトルタグは属性を持つことができることを私に思い出させました。

null · Answer

SimpleXmlを正規表現で使用するのが好きです。これは、作成したOpenIDライブラリのページから複数のリンクヘッダーを取得するために使用するソリューションからのものです。タイトルで動作するように調整しました（通常 1つだけですが）。

function getTitle($sFile) { $sData = file_get_contents($sFile); if(preg_match('/<head.[^>]*>.*<\/head>/is', $sData, $aHead)) { $sDataHtml = preg_replace('/<(.[^>]*)>/i', strtolower('<$1>'), $aHead[0]); $xTitle = simplexml_import_dom(DomDocument::LoadHtml($sDataHtml)); return (string)$xTitle->head->title; } return null; } echo getTitle('http://stackoverflow.com/questions/399332/fastest-way-to-retrieve-a-title-in-php');

皮肉なことに、このページのタイトルタグには「タイトルタグ」があります。これは、純粋な正規表現ソリューションで問題を引き起こす場合があります。

このソリューションは、フォーマット/大文字小文字が重要な場合（XMLなど）にネストされたタグの問題を引き起こす可能性のあるタグを小文字にするため、完全ではありませんが、その問題にもう少し関与する方法があります。

PlugTrade.com · Answer

属性が追加されたタイトルタグを処理する関数

function get_title($html) { preg_match("/<title(.+)<\/title>/siU", $html, $matches); if( !empty( $matches[1] ) ) { $title = $matches[1]; if( strstr($title, '>') ) { $title = explode( '>', $title, 2 ); $title = $title[1]; return trim($title); } } } $html = '<tiTle class="aunt">jemima</tiTLE>'; $title = get_title($html); echo $title;