练习Data Science - Pandas Pandas Pandas - ajinn的blog

ajinn

浏览: 38465 次

最近访客更多访客>>

eye_n

zhuangfeng159

main_

wd1282988143

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (44)

社区版块

存档分类

练习Data Science - Pandas Pandas Pandas

n = int(input())

import numpy as np

from sklearn.cluster import KMeans

c1=np.array([0,0])

c2=np.array([2,2])

# print(c1,c2)

X = []

X1 = []

X2 = []

for i in range(n):

X=([float(x) for x in input().split()])

X0=np.array(X)

if np.linalg.norm(X0-c1)<=np.linalg.norm(X0-c2):

X1.append(X)

else:

X2.append(X)

# print(X1)

d1=np.array(X1)

# print(X2)

d2=np.array(X2)

if not d1.any():

print('None')

else:

print(np.around(np.mean(d1,axis=0),2))

# print(np.array(X1))

if not d2.any():

print('None')

else:

print(np.around(np.mean(d2,axis=0),2))

# print(X2)

===============

Data Science - Pandas Pandas Pandas

Finding the next centroid

Unsupervised learning algorithm clustering involves updating the centroid of each cluster. Here we find the next centroids for given data points and initial centroids.

Task

Assume that there are two clusters among the given two-dimensional data points and two random points (0, 0), and (2, 2) are the initial cluster centroids. Calculate the euclidean distance between each data point and each of the centroid, assign each data point to its nearest centroid, then calculate the new centroid. If there's a tie, assign the data point to the cluster with centroid (0, 0). If none of the data points were assigned to the given centroid, return None.

Input Format

First line: an integer to indicate the number of data points (n)

Next n lines: two numeric values per each line to represent a data point in two dimensional space.

Output Format

Two lists for two centroids. Numbers are rounded to the second decimal place.

Sample Input

1 0

0 .5

4 0

Sample Output

[0.5 0.25]

[4. 0.]

Explanation

There are 3 data points and we would like to identify two clusters among them. Initial centroids are given (0, 0), and (2, 2). The distances between the first data point (1, 0) and each of the centroids are 1.0 and 2.24, rounded to the second decimal place. The first data point is closter to (0, 0), thus assigned the 0-th cluster. Similarly data point (0, .5) is closer to (0, 0) than to (2, 2), also assigned to the 0th cluster; while (4, 0) is closter to (2, 2), thus assigned to the 1st cluster. To calculate the new centroids, take the mean of all data points in the 0-th and 1st cluster, respectively. Hence the results are [0.5 0.25] and [4. 0.].

分享到：