Rでの空間クラスタリング（簡単な例）

Question

私はこの単純なdata.frame

 lat<-c(1,2,3,10,11,12,20,21,22,23) lon<-c(5,6,7,30,31,32,50,51,52,53) data=data.frame(lat,lon)

アイデアは、距離に基づいて空間クラスターを見つけることです

まず、マップ（lon、lat）をプロットします。

plot(data$lon,data$lat)

enter image description here

明らかに、ポイントの位置間の距離に基づいて3つのクラスターがあります。

この目的のために、私はRでこのコードを試しました：

d= as.matrix(dist(cbind(data$lon,data$lat))) #Creat distance matrix d=ifelse(d<5,d,0) #keep only distance < 5 d=as.dist(d) hc<-hclust(d) # hierarchical clustering plot(hc) data$clust <- cutree(hc,k=3) # cut the dendrogram to generate 3 clusters

これは与える：

enter image description here

今、私は同じ点をプロットしようとしますが、クラスターからの色を使用します

plot(data$x,data$y, col=c("red","blue","green")[data$clust],pch=19)

ここに結果があります

enter image description here

それは私が探しているものではありません。

実はこのあらすじのようなものを見つけたい

enter image description here

ご協力ありがとう御座います。

johannes · Accepted Answer

このようなものはどうですか？

lat<-c(1,2,3,10,11,12,20,21,22,23) lon<-c(5,6,7,30,31,32,50,51,52,53) km <- kmeans(cbind(lat, lon), centers = 3) plot(lon, lat, col = km$cluster, pch = 20)

enter image description here

Omri374 · Answer

これは別のアプローチです。まず、座標がUTM（フラット）ではなくWGS-84であると想定します。次に、階層的クラスタリングを使用して、指定された半径内のすべてのネイバーを同じクラスターにクラスター化します（メソッド= singleを使用し、「友達の友達」クラスタリング戦略を採用します）。

距離行列を計算するために、パッケージfieldsのrdist.earthメソッドを使用しています。このパッケージのデフォルトの地球半径は6378.388（赤道半径）であり、探しているものとは異なる可能性があるため、6371に変更しました。詳細については、この記事を参照してください。

library(fields) lon = c(31.621785, 31.641773, 31.617269, 31.583895, 31.603284) lat = c(30.901118, 31.245008, 31.163886, 30.25058, 30.262378) threshold.in.km <- 40 coors <- data.frame(lon,lat) #distance matrix dist.in.km.matrix <- rdist.earth(coors,miles = F,R=6371) #clustering fit <- hclust(as.dist(dist.in.km.matrix), method = "single") clusters <- cutree(fit,h = threshold.in.km) plot(lon, lat, col = clusters, pch = 20)

これは、クラスターの数がわからない場合（k-meansオプションなど）に適したソリューションであり、minPts = 1のdbscanオプションにある程度関連しています。

---編集---

元のデータの場合：

lat<-c(1,2,3,10,11,12,20,21,22,23) lon<-c(5,6,7,30,31,32,50,51,52,53) data=data.frame(lat,lon) dist <- rdist.earth(data,miles = F,R=6371) #dist <- dist(data) if data is UTM fit <- hclust(as.dist(dist), method = "single") clusters <- cutree(fit,h = 1000) #h = 2 if data is UTM plot(lon, lat, col = clusters, pch = 20)

Akshay Pratap Singh · Answer

クラスター化する空間データがあるため、 DBSCAN がデータに最適です。このクラスタリングは、 fpc 、a[〜＃〜] rによって提供されるdbscan()関数を使用して実行できます。 [〜＃〜]パッケージ。

library(fpc) lat<-c(1,2,3,10,11,12,20,21,22,23) lon<-c(5,6,7,30,31,32,50,51,52,53) DBSCAN <- dbscan(cbind(lat, lon), eps = 1.5, MinPts = 3) plot(lon, lat, col = DBSCAN$cluster, pch = 20)

Plot of DBSCAN Clustering