Pandas-DataFrame | 代码大全

Python“玩”数据的利器!

可以将DataFrame看做是一个表格。(DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.)

初始化

>>> import pandas as pd
>>> d = {"one": [1.0, 2.0, 3.0, 5.0], "two": [4.0, 3.0, 2.0, 2.0]}
>>> df = pd.DataFrame(d)
>>> df
   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

官方文档

IO操作

1
2
3

df = pd.read_csv('') # 读取csv文件
df.to_csv('') # 写csv文件
df.to_csv('', index=False) # 不带index

列操作

对每列进行操作(apply)

df.apply(lambda x: x.max() - x.min())
Out[67]: 
A    2.073961
B    2.671590
C    1.785291
D    0.000000
F    4.000000
dtype: float64

某列取值分布

>>> d = {'one': [1.0, 2.0, 3.0, 4.0], 'two': [4.0, 3.0, 2.0, 1.0]}
>>> df['one'].value_counts()
4.0    1
3.0    1
2.0    1
1.0    1
Name: one, dtype: int64

行操作

遍历行

1 2	for index, row in df.iterrows(): print(row['c1'], row['c2'])

行过滤

# 单个条件
df[df["name"] == "zhangsan"]

# 多个条件
df[(df["name"] == "zhangsan") & (df["age"] > 19)]

Dataframe关联

1	pd.merge(df1, df2, on='col_name')

官方文档

聚合计算

>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

官方文档-groupby

官方文档-aggregate