logo头像
Snippet 博客主题

Pandas-DataFrame

Python“玩”数据的利器!


可以将DataFrame看做是一个表格。(DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.)

初始化

1
2
3
4
5
6
7
8
9
10
>>> import pandas as pd
>>> d = {"one": [1.0, 2.0, 3.0, 5.0], "two": [4.0, 3.0, 2.0, 2.0]}
>>> df = pd.DataFrame(d)
>>> df
one two
0 1.0 4.0
1 2.0 3.0
2 3.0 2.0
3 4.0 1.0


官方文档


IO操作

1
2
3
df = pd.read_csv('') # 读取csv文件
df.to_csv('') # 写csv文件
df.to_csv('', index=False) # 不带index

列操作

对每列进行操作(apply)

1
2
3
4
5
6
7
8
df.apply(lambda x: x.max() - x.min())
Out[67]:
A 2.073961
B 2.671590
C 1.785291
D 0.000000
F 4.000000
dtype: float64

某列取值分布

1
2
3
4
5
6
7
>>> d = {'one': [1.0, 2.0, 3.0, 4.0], 'two': [4.0, 3.0, 2.0, 1.0]}
>>> df['one'].value_counts()
4.0 1
3.0 1
2.0 1
1.0 1
Name: one, dtype: int64

行操作

遍历行

1
2
for index, row in df.iterrows(): 
print(row['c1'], row['c2'])

行过滤

1
2
3
4
5
# 单个条件
df[df["name"] == "zhangsan"]

# 多个条件
df[(df["name"] == "zhangsan") & (df["age"] > 19)]

Dataframe关联

1
pd.merge(df1, df2, on='col_name')

官方文档


聚合计算

1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
... 'Parrot', 'Parrot'],
... 'Max Speed': [380., 370., 24., 26.]})
>>> df
Animal Max Speed
0 Falcon 380.0
1 Falcon 370.0
2 Parrot 24.0
3 Parrot 26.0
>>> df.groupby(['Animal']).mean()
Max Speed
Animal
Falcon 375.0
Parrot 25.0

1
2
3
4
5
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
A B
sum 12.0 NaN
min 1.0 2.0
max NaN 8.0

官方文档-groupby

官方文档-aggregate

评论系统未开启,无法评论!