1つのグリッド検索で複数の推定量を試す

Question

grid-search一度に複数の推定量をSklearnまたは他のライブラリで実行できる方法はありますか？たとえば、1つのグリッド検索でSVMとランダムフォレストを渡すことができますか？.

j-a · Answer

はい。例：

pipeline = Pipeline([ ('vect', CountVectorizer()), ('clf', SGDClassifier()), ]) parameters = [ { 'vect__max_df': (0.5, 0.75, 1.0), 'clf': (SGDClassifier(),), 'clf__alpha': (0.00001, 0.000001), 'clf__penalty': ('l2', 'elasticnet'), 'clf__n_iter': (10, 50, 80), }, { 'vect__max_df': (0.5, 0.75, 1.0), 'clf': (LinearSVC(),), 'clf__C': (0.01, 0.5, 1.0) } ] grid_search = GridSearchCV(pipeline, parameters)

Brian Spiering · Answer

from sklearn.base import BaseEstimator from sklearn.model_selection import GridSearchCV class DummyEstimator(BaseEstimator): def fit(self): pass def score(self): pass # Create a pipeline pipe = Pipeline([('clf', DummyEstimator())]) # Placeholder Estimator # Candidate learning algorithms and their hyperparameters search_space = [{'clf': [LogisticRegression()], # Actual Estimator 'clf__penalty': ['l1', 'l2'], 'clf__C': np.logspace(0, 4, 10)}, {'clf': [DecisionTreeClassifier()], # Actual Estimator 'clf__criterion': ['gini', 'entropy']}] # Create grid search gs = GridSearchCV(pipe, search_space)

montxe · Answer

私はあなたが探していたのはこれだと思います：

from sklearn.svm import LinearSVC from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.neural_network import MLPClassifier from sklearn.model_selection import GridSearchCV from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) names = [ "Naive Bayes", "Linear SVM", "Logistic Regression", "Random Forest", "Multilayer Perceptron" ] classifiers = [ MultinomialNB(), LinearSVC(), LogisticRegression(), RandomForestClassifier(), MLPClassifier() ] parameters = [ {'vect__ngram_range': [(1, 1), (1, 2)], 'clf__alpha': (1e-2, 1e-3)}, {'vect__ngram_range': [(1, 1), (1, 2)], 'clf__C': (np.logspace(-5, 1, 5))}, {'vect__ngram_range': [(1, 1), (1, 2)], 'clf__C': (np.logspace(-5, 1, 5))}, {'vect__ngram_range': [(1, 1), (1, 2)], 'clf__max_depth': (1, 2)}, {'vect__ngram_range': [(1, 1), (1, 2)], 'clf__alpha': (1e-2, 1e-3)} ] for name, classifier, params in Zip(names, classifiers, parameters): clf_pipe = Pipeline([ ('vect', TfidfVectorizer(stop_words='english')), ('clf', classifier), ]) gs_clf = GridSearchCV(clf_pipe, param_grid=params, n_jobs=-1) clf = gs_clf.fit(X_train, y_train) score = clf.score(X_test, y_test) print("{} score: {}".format(name, score))

Kota Mori · Answer

TransformedTargetRegressor を使用できます。このクラスは、リグレッサとトランスフォーマーのセットをパラメーターとして使用して、フィッティングする前にターゲット変数を変換するように設計されています。ただし、トランスフォーマーを指定しない場合は、IDトランスフォーマー（つまり、変換なし）が適用されます。リグレッサはクラスパラメータであるため、グリッド検索オブジェクトによって変更できます。

import numpy as np from sklearn.compose import TransformedTargetRegressor from sklearn.linear_model import LinearRegression from sklearn.neural_network import MLPRegressor from sklearn.model_selection import GridSearchCV Y = np.array([1,2,3,4,5,6,7,8,9,10]) X = np.array([0,1,3,5,3,5,7,9,8,9]).reshape((-1, 1))

グリッド検索を行うには、それぞれ異なる推定量のdictのリストとしてparam_gridを指定する必要があります。これは、異なる推定量が異なるパラメータのセットを使用するためです（例：設定fit_intercept with MLPRegressorはエラーを引き起こします）。「リグレッサー」という名前が自動的にリグレッサーに付けられることに注意してください。

model = TransformedTargetRegressor() params = [ { "regressor": [LinearRegression()], "regressor__fit_intercept": [True, False] }, { "regressor": [MLPRegressor()], "regressor__hidden_layer_sizes": [1, 5, 10] } ]

いつものようにフィットできます。

g = GridSearchCV(model, params) g.fit(X, Y) g.best_estimator_, g.best_score_, g.best_params_ # results in like (TransformedTargetRegressor(check_inverse=True, func=None, inverse_func=None, regressor=LinearRegression(copy_X=True, fit_intercept=False, n_jobs=None, normalize=False), transformer=None), -0.419213380219391, {'regressor': LinearRegression(copy_X=True, fit_intercept=False, n_jobs=None, normalize=False), 'regressor__fit_intercept': False})

cgnorthcutt · Answer

できることは、任意の分類子を取り込んだクラスを作成し、各分類子に対してパラメーターの設定を作成することです。

任意の推定量で機能するスイッチャークラスを作成します

_from sklearn.base import BaseEstimator class ClfSwitcher(BaseEstimator): def __init__( self, estimator = SGDClassifier(), ): """ A Custom BaseEstimator that can switch between classifiers. :param estimator: sklearn object - The classifier """ self.estimator = estimator def fit(self, X, y=None, **kwargs): self.estimator.fit(X, y) return self def predict(self, X, y=None): return self.estimator.predict(X) def predict_proba(self, X): return self.estimator.predict_proba(X) def score(self, X, y): return self.estimator.score(X, y) _

これで、tfidfを好きなように事前トレーニングできます。

_from sklearn.feature_extraction.text import TfidfVectorizer tfidf = TfidfVectorizer() tfidf.fit(data, labels) _

次に、この事前トレーニング済みのtfidfを使用してパイプラインを作成します

_from sklearn.pipeline import Pipeline pipeline = Pipeline([ ('tfidf',tfidf), # Already pretrained/fit ('clf', ClfSwitcher()), ]) _

ハイパーパラメータ最適化を実行します

_from sklearn.naive_bayes import MultinomialNB from sklearn.linear_model import SGDClassifier from sklearn.model_selection import GridSearchCV parameters = [ { 'clf__estimator': [SGDClassifier()], # SVM if hinge loss / logreg if log loss 'clf__estimator__penalty': ('l2', 'elasticnet', 'l1'), 'clf__estimator__max_iter': [50, 80], 'clf__estimator__tol': [1e-4], 'clf__estimator__loss': ['hinge', 'log', 'modified_huber'], }, { 'clf__estimator': [MultinomialNB()], 'clf__estimator__alpha': (1e-2, 1e-3, 1e-1), }, ] gscv = GridSearchCV(pipeline, parameters, cv=5, n_jobs=12, verbose=3) # param optimization gscv.fit(train_data, train_labels) _

_`clfestimatorloss`_の解釈方法

_clf__estimator__loss_は、lossが何であれ、estimatorパラメーターとして解釈されます。ここで、一番上の例ではestimator = SGDClassifier()であり、それ自体がclfオブジェクトであるClfSwitcherのパラメーターです。