Pandas scattermatrixのクラスラベル

Question

この質問は以前に尋ねられました、散布行列の複数のデータ、しかし答えを受け取りませんでした。

pandas docs にありますが、クラスごとに異なる色のマーカーを使用して、次のような散布行列を作成したいと思います。たとえば、いくつか欲しいです。いずれかの列（または別のリスト）の値に応じて、ポイントは緑色で表示され、その他のポイントは青色で表示されます。

これは、アイリスデータセットを使用した例です。ポイントの色は、アイリスの種（Setosa、Versicolor、またはVirginica）を表しています。

iris scattermatrix with class labels

pandas（またはmatplotlib）には、そのようなグラフを作成する方法がありますか？

bgschiller · Accepted Answer

更新：この機能は、Seabornの最新バージョンに含まれるようになりました。ここに例があります。

以下は私の一時的な対策でした：

def factor_scatter_matrix(df, factor, palette=None): '''Create a scatter matrix of the variables in df, with differently colored points depending on the value of df[factor]. inputs: df: pandas.DataFrame containing the columns to be plotted, as well as factor. factor: string or pandas.Series. The column indicating which group each row belongs to. palette: A list of hex codes, at least as long as the number of groups. If omitted, a predefined palette will be used, but it only includes 9 groups. ''' import matplotlib.colors import numpy as np from pandas.tools.plotting import scatter_matrix from scipy.stats import gaussian_kde if isinstance(factor, basestring): factor_name = factor #save off the name factor = df[factor] #extract column df = df.drop(factor_name,axis=1) # remove from df, so it # doesn't get a row and col in the plot. classes = list(set(factor)) if palette is None: palette = ['#e41a1c', '#377eb8', '#4eae4b', '#994fa1', '#ff8101', '#fdfc33', '#a8572c', '#f482be', '#999999'] color_map = dict(Zip(classes,palette)) if len(classes) > len(palette): raise ValueError('''Too many groups for the number of colors provided. We only have {} colors in the palette, but you have {} groups.'''.format(len(palette), len(classes))) colors = factor.apply(lambda group: color_map[group]) axarr = scatter_matrix(df,figsize=(10,10),marker='o',c=colors,diagonal=None) for rc in xrange(len(df.columns)): for group in classes: y = df[factor == group].icol(rc).values gkde = gaussian_kde(y) ind = np.linspace(y.min(), y.max(), 1000) axarr[rc][rc].plot(ind, gkde.evaluate(ind),c=color_map[group]) return axarr, color_map

例として、質問と同じデータセットを使用します。利用可能ですここ

>>> import pandas as pd >>> iris = pd.read_csv('iris.csv') >>> axarr, color_map = factor_scatter_matrix(iris,'Name') >>> color_map {'Iris-setosa': '#377eb8', 'Iris-versicolor': '#4eae4b', 'Iris-virginica': '#e41a1c'}

iris_scatter_matrix

これがお役に立てば幸いです。

jrjc · Answer

次のように、pandas）から散布行列を呼び出すこともできます。

_pd.scatter_matrix(df,color=colors) _

colorsはサイズlen(df)を含む色のリストです

Pandas scattermatrixのクラスラベル

更新：この機能は、Seabornの最新バージョンに含まれるようになりました。 ここに例があります 。

更新：この機能は、Seabornの最新バージョンに含まれるようになりました。ここに例があります。