使用Numpy与Pandas进行简单数据处理

1
2
3
4
5
6
7
8
9
10
11
12
13
from pandas import DataFrame, Series
import numpy

countries = ['Russian Fed.', 'Norway', 'Canada', 'United States',
'Netherlands', 'Germany', 'Switzerland', 'Belarus',
'Austria', 'France', 'Poland', 'China', 'Korea',
'Sweden', 'Czech Republic', 'Slovenia', 'Japan',
'Finland', 'Great Britain', 'Ukraine', 'Slovakia',
'Italy', 'Latvia', 'Australia', 'Croatia', 'Kazakhstan']

gold = [13, 11, 10, 9, 8, 8, 6, 5, 4, 4, 4, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
silver = [11, 5, 10, 7, 7, 6, 3, 0, 8, 4, 1, 4, 3, 7, 4, 2, 4, 3, 1, 0, 0, 2, 2, 2, 1, 0]
bronze = [9, 10, 5, 12, 9, 5, 2, 1, 5, 7, 1, 2, 2, 6, 2, 4, 3, 1, 2, 1, 0, 6, 2, 1, 0, 1]

显示奖牌数据:

1
2
3
4
5
6
7
8
olympic_medal_counts_df = DataFrame(
{'country_name': countries,
'gold': gold,
'silver': silver,
'bronze': bronze})
print olympic_medal_counts_df

df = olympic_medal_counts_df

output>>

upload successful

计算至少获得一枚金牌的国家所获得铜牌的平均数:

1
2
avg_bronze_at_least_one_gold = numpy.mean(df[df.gold > 0].bronze)
print avg_bronze_at_least_one_gold

output>>4.2380952381

计算金牌、银牌与铜牌的平均数:

1
2
avg_medal_count=numpy.mean(df[['gold','silver','bronze']])
print avg_medal_count

output>>

upload successful

如果收获一枚金牌得4分,一枚银牌得2分,一枚铜牌得1分,计算所有国家的总分数

1
2
3
df['points'] = df[['gold','silver','bronze']].dot([4,2,1])
olympic_points_df = df[['country_name','points']]
print olympic_points_df

output>>

upload successful