python - K-mean clustering, make centroids not overlap nodes -
i have done k-mean++ clustering , obtained centroids of clusters, using python 2.7, following method given in http://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/
in problem, there further constraint distances between centroid node should larger constant. best way achieve?
it possible 1 centroid close several nodes.
any suggestions on how displace centroids bit?
many thanks.
for example, nodes clustered mynodes = [[469500, 5802610], [468764, 5803422], [467991, 5804202], [470260, 5799949], [469486, 5800730], [468713, 5801510], [467939, 5802291], [467166, 5803072], [467966, 5800204], [467193, 5800985], [466420, 5801766], [466457, 5799700], [465678, 5800488], [464950, 5799229], [470615, 5796600], [469842, 5797405], [470320, 5794955], [469547, 5795735], [468773, 5796516], [467990, 5797297], [470062, 5793215], [469289, 5793996], [468515, 5794776], [467742, 5795557], [466969, 5796338], [466195, 5797119], [469976, 5791334], [469202, 5792115], [468429, 5792896], [467656, 5793676], [466882, 5794457], [466109, 5795238], [465336, 5796050], [464600, 5796840], [470160, 5789250], [469354, 5789972], [468581, 5790753], [467808, 5791534], [467034, 5792315], [466261, 5793096], [465488, 5793877], [464714, 5794658], [463941, 5795499], [463150, 5796210], [469500, 5787920], [468698, 5788614], [467925, 5789395], [467152, 5790176]]
centroids = [[ 467839.6, 5793224.1], [ 467617.22222222, 5800489.94444444]] centroid[0] close node[29], centroid1 close node[8].
if understand question correctly, , not @ sure whether do, solution apparent drawings:
you want point closest given centroid point; , has minimum distance set of node points.
- draw circle around each node point, minimum distance radius
- intersect each circle each other circle, note intersection points
- discard intersection point closer minimum distance node point.
- from remaining intersection points, take 1 closest centroid point. new displaced centroid.
runtime naive implementation should o(number_of_nodes^2)
, though can optimize using fast nearest-neighbour lookup data structure, such kd-tree, when intersect circles , discard intersection points close.
this should optimal solution; no matter algorithm use, cannot find point closer original centroid fits minimum distance constraint; , algorithm should find optimal point.
though since centroid surrounded node points, new point may quite distance away if node points densely packed.
Comments
Post a Comment