Pandas 2.2 中文官方教程和指南(七)(3)

简介: Pandas 2.2 中文官方教程和指南(七)

Pandas 2.2 中文官方教程和指南(七)(2)https://developer.aliyun.com/article/1509748

选择

注意

虽然用于选择和设置的标准 Python/NumPy 表达式直观且对交互式工作很方便,但对于生产代码,我们建议使用优化的 pandas 数据访问方法,DataFrame.at()DataFrame.iat()DataFrame.loc()DataFrame.iloc()

查看索引文档索引和选择数据和 MultiIndex /高级索引。

获取项([]

对于DataFrame,传递单个标签选择列并产生等同于df.ASeries

In [24]: df["A"]
Out[24]: 
2013-01-01    0.469112
2013-01-02    1.212112
2013-01-03   -0.861849
2013-01-04    0.721555
2013-01-05   -0.424972
2013-01-06   -0.673690
Freq: D, Name: A, dtype: float64 

对于DataFrame,传递切片:选择匹配的行:

In [25]: df[0:3]
Out[25]: 
 A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
In [26]: df["20130102":"20130104"]
Out[26]: 
 A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860 

按标签选择

查看更多关于按标签选择的信息,使用DataFrame.loc()DataFrame.at()

选择匹配标签的行:

In [27]: df.loc[dates[0]]
Out[27]: 
A    0.469112
B   -0.282863
C   -1.509059
D   -1.135632
Name: 2013-01-01 00:00:00, dtype: float64 

选择所有行(:)与选择列标签:

In [28]: df.loc[:, ["A", "B"]]
Out[28]: 
 A         B
2013-01-01  0.469112 -0.282863
2013-01-02  1.212112 -0.173215
2013-01-03 -0.861849 -2.104569
2013-01-04  0.721555 -0.706771
2013-01-05 -0.424972  0.567020
2013-01-06 -0.673690  0.113648 

对于标签切片,两个端点都是包含的:

In [29]: df.loc["20130102":"20130104", ["A", "B"]]
Out[29]: 
 A         B
2013-01-02  1.212112 -0.173215
2013-01-03 -0.861849 -2.104569
2013-01-04  0.721555 -0.706771 

选择单个行和列标签返回一个标量:

In [30]: df.loc[dates[0], "A"]
Out[30]: 0.4691122999071863 

获取快速访问标量(等同于先前的方法):

In [31]: df.at[dates[0], "A"]
Out[31]: 0.4691122999071863 

按位置选择

查看更多关于按位置选择的信息,使用DataFrame.iloc()DataFrame.iat()

通过传递整数的位置选择:

In [32]: df.iloc[3]
Out[32]: 
A    0.721555
B   -0.706771
C   -1.039575
D    0.271860
Name: 2013-01-04 00:00:00, dtype: float64 

整数切片类似于 NumPy/Python:

In [33]: df.iloc[3:5, 0:2]
Out[33]: 
 A         B
2013-01-04  0.721555 -0.706771
2013-01-05 -0.424972  0.567020 

整数位置位置列表:

In [34]: df.iloc[[1, 2, 4], [0, 2]]
Out[34]: 
 A         C
2013-01-02  1.212112  0.119209
2013-01-03 -0.861849 -0.494929
2013-01-05 -0.424972  0.276232 

明确切片行:

In [35]: df.iloc[1:3, :]
Out[35]: 
 A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804

明确切片列:

In [36]: df.iloc[:, 1:3]
Out[36]: 
 B         C
2013-01-01 -0.282863 -1.509059
2013-01-02 -0.173215  0.119209
2013-01-03 -2.104569 -0.494929
2013-01-04 -0.706771 -1.039575
2013-01-05  0.567020  0.276232
2013-01-06  0.113648 -1.478427 

明确获取值:

In [37]: df.iloc[1, 1]
Out[37]: -0.17321464905330858 

获取快速访问标量(等同于先前的方法):

In [38]: df.iat[1, 1]
Out[38]: -0.17321464905330858 

布尔索引

选择df.A大于0的行。

In [39]: df[df["A"] > 0]
Out[39]: 
 A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-04  0.721555 -0.706771 -1.039575  0.271860 

从满足布尔条件的DataFrame中选择值:

In [40]: df[df > 0]
Out[40]: 
 A         B         C         D
2013-01-01  0.469112       NaN       NaN       NaN
2013-01-02  1.212112       NaN  0.119209       NaN
2013-01-03       NaN       NaN       NaN  1.071804
2013-01-04  0.721555       NaN       NaN  0.271860
2013-01-05       NaN  0.567020  0.276232       NaN
2013-01-06       NaN  0.113648       NaN  0.524988 

使用isin()方法进行过滤:

In [41]: df2 = df.copy()
In [42]: df2["E"] = ["one", "one", "two", "three", "four", "three"]
In [43]: df2
Out[43]: 
 A         B         C         D      E
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632    one
2013-01-02  1.212112 -0.173215  0.119209 -1.044236    one
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804    two
2013-01-04  0.721555 -0.706771 -1.039575  0.271860  three
2013-01-05 -0.424972  0.567020  0.276232 -1.087401   four
2013-01-06 -0.673690  0.113648 -1.478427  0.524988  three
In [44]: df2[df2["E"].isin(["two", "four"])]
Out[44]: 
 A         B         C         D     E
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804   two
2013-01-05 -0.424972  0.567020  0.276232 -1.087401  four 

设置

设置新列会自动根据索引对齐数据:

In [45]: s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6))
In [46]: s1
Out[46]: 
2013-01-02    1
2013-01-03    2
2013-01-04    3
2013-01-05    4
2013-01-06    5
2013-01-07    6
Freq: D, dtype: int64
In [47]: df["F"] = s1 

按标签设置值:

In [48]: df.at[dates[0], "A"] = 0 

按位置设置值:

In [49]: df.iat[0, 1] = 0 

通过分配 NumPy 数组进行设置:

In [50]: df.loc[:, "D"] = np.array([5] * len(df)) 

先前设置操作的结果:

In [51]: df
Out[51]: 
 A         B         C    D    F
2013-01-01  0.000000  0.000000 -1.509059  5.0  NaN
2013-01-02  1.212112 -0.173215  0.119209  5.0  1.0
2013-01-03 -0.861849 -2.104569 -0.494929  5.0  2.0
2013-01-04  0.721555 -0.706771 -1.039575  5.0  3.0
2013-01-05 -0.424972  0.567020  0.276232  5.0  4.0
2013-01-06 -0.673690  0.113648 -1.478427  5.0  5.0 

具有设置的where操作:

In [52]: df2 = df.copy()
In [53]: df2[df2 > 0] = -df2
In [54]: df2
Out[54]: 
 A         B         C    D    F
2013-01-01  0.000000  0.000000 -1.509059 -5.0  NaN
2013-01-02 -1.212112 -0.173215 -0.119209 -5.0 -1.0
2013-01-03 -0.861849 -2.104569 -0.494929 -5.0 -2.0
2013-01-04 -0.721555 -0.706771 -1.039575 -5.0 -3.0
2013-01-05 -0.424972 -0.567020 -0.276232 -5.0 -4.0
2013-01-06 -0.673690 -0.113648 -1.478427 -5.0 -5.0 

获取项([]

对于DataFrame,传递单个标签选择列并产生等同于df.ASeries

In [24]: df["A"]
Out[24]: 
2013-01-01    0.469112
2013-01-02    1.212112
2013-01-03   -0.861849
2013-01-04    0.721555
2013-01-05   -0.424972
2013-01-06   -0.673690
Freq: D, Name: A, dtype: float64 

对于DataFrame,通过传递切片:选择匹配的行:

In [25]: df[0:3]
Out[25]: 
 A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
In [26]: df["20130102":"20130104"]
Out[26]: 
 A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860 

按标签选择

请参阅按标签选择 使用DataFrame.loc()DataFrame.at()

选择匹配标签的行:

In [27]: df.loc[dates[0]]
Out[27]: 
A    0.469112
B   -0.282863
C   -1.509059
D   -1.135632
Name: 2013-01-01 00:00:00, dtype: float64 

选择所有行(:)与选择列标签:

In [28]: df.loc[:, ["A", "B"]]
Out[28]: 
 A         B
2013-01-01  0.469112 -0.282863
2013-01-02  1.212112 -0.173215
2013-01-03 -0.861849 -2.104569
2013-01-04  0.721555 -0.706771
2013-01-05 -0.424972  0.567020
2013-01-06 -0.673690  0.113648 

对于标签切片,两个端点都包括

In [29]: df.loc["20130102":"20130104", ["A", "B"]]
Out[29]: 
 A         B
2013-01-02  1.212112 -0.173215
2013-01-03 -0.861849 -2.104569
2013-01-04  0.721555 -0.706771 

选择单个行和列标签返回一个标量:

In [30]: df.loc[dates[0], "A"]
Out[30]: 0.4691122999071863 

为了快速访问标量(等效于先前的方法):

In [31]: df.at[dates[0], "A"]
Out[31]: 0.4691122999071863 

按位置选择

请参阅按位置选择 使用DataFrame.iloc()DataFrame.iat()

通过传递整数的位置选择:

In [32]: df.iloc[3]
Out[32]: 
A    0.721555
B   -0.706771
C   -1.039575
D    0.271860
Name: 2013-01-04 00:00:00, dtype: float64 

整数切片类似于 NumPy/Python:

In [33]: df.iloc[3:5, 0:2]
Out[33]: 
 A         B
2013-01-04  0.721555 -0.706771
2013-01-05 -0.424972  0.567020 

整数位置位置列表:

In [34]: df.iloc[[1, 2, 4], [0, 2]]
Out[34]: 
 A         C
2013-01-02  1.212112  0.119209
2013-01-03 -0.861849 -0.494929
2013-01-05 -0.424972  0.276232 

明确切片行:

In [35]: df.iloc[1:3, :]
Out[35]: 
 A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804 

明确切片列:

In [36]: df.iloc[:, 1:3]
Out[36]: 
 B         C
2013-01-01 -0.282863 -1.509059
2013-01-02 -0.173215  0.119209
2013-01-03 -2.104569 -0.494929
2013-01-04 -0.706771 -1.039575
2013-01-05  0.567020  0.276232
2013-01-06  0.113648 -1.478427 

明确获取一个值:

In [37]: df.iloc[1, 1]
Out[37]: -0.17321464905330858 

为了快速访问标量(等效于先前的方法):

In [38]: df.iat[1, 1]
Out[38]: -0.17321464905330858 

布尔索引

选择df.A大于0的行。

In [39]: df[df["A"] > 0]
Out[39]: 
 A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-04  0.721555 -0.706771 -1.039575  0.271860 

DataFrame 中选择满足布尔条件的值:

In [40]: df[df > 0]
Out[40]: 
 A         B         C         D
2013-01-01  0.469112       NaN       NaN       NaN
2013-01-02  1.212112       NaN  0.119209       NaN
2013-01-03       NaN       NaN       NaN  1.071804
2013-01-04  0.721555       NaN       NaN  0.271860
2013-01-05       NaN  0.567020  0.276232       NaN
2013-01-06       NaN  0.113648       NaN  0.524988 

使用isin() 方法进行过滤:

In [41]: df2 = df.copy()
In [42]: df2["E"] = ["one", "one", "two", "three", "four", "three"]
In [43]: df2
Out[43]: 
 A         B         C         D      E
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632    one
2013-01-02  1.212112 -0.173215  0.119209 -1.044236    one
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804    two
2013-01-04  0.721555 -0.706771 -1.039575  0.271860  three
2013-01-05 -0.424972  0.567020  0.276232 -1.087401   four
2013-01-06 -0.673690  0.113648 -1.478427  0.524988  three
In [44]: df2[df2["E"].isin(["two", "four"])]
Out[44]: 
 A         B         C         D     E
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804   two
2013-01-05 -0.424972  0.567020  0.276232 -1.087401  four 

设置

设置新列会自动根据索引对齐数据:

In [45]: s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6))
In [46]: s1
Out[46]: 
2013-01-02    1
2013-01-03    2
2013-01-04    3
2013-01-05    4
2013-01-06    5
2013-01-07    6
Freq: D, dtype: int64
In [47]: df["F"] = s1 

按标签设置值:

In [48]: df.at[dates[0], "A"] = 0 

按位置设置值:

In [49]: df.iat[0, 1] = 0 

通过分配 NumPy 数组进行设置:

In [50]: df.loc[:, "D"] = np.array([5] * len(df)) 

先前设置操作的结果:

In [51]: df
Out[51]: 
 A         B         C    D    F
2013-01-01  0.000000  0.000000 -1.509059  5.0  NaN
2013-01-02  1.212112 -0.173215  0.119209  5.0  1.0
2013-01-03 -0.861849 -2.104569 -0.494929  5.0  2.0
2013-01-04  0.721555 -0.706771 -1.039575  5.0  3.0
2013-01-05 -0.424972  0.567020  0.276232  5.0  4.0
2013-01-06 -0.673690  0.113648 -1.478427  5.0  5.0 

使用设置的where操作:

In [52]: df2 = df.copy()
In [53]: df2[df2 > 0] = -df2
In [54]: df2
Out[54]: 
 A         B         C    D    F
2013-01-01  0.000000  0.000000 -1.509059 -5.0  NaN
2013-01-02 -1.212112 -0.173215 -0.119209 -5.0 -1.0
2013-01-03 -0.861849 -2.104569 -0.494929 -5.0 -2.0
2013-01-04 -0.721555 -0.706771 -1.039575 -5.0 -3.0
2013-01-05 -0.424972 -0.567020 -0.276232 -5.0 -4.0
2013-01-06 -0.673690 -0.113648 -1.478427 -5.0 -5.0 

缺失数据

对于 NumPy 数据类型,np.nan表示缺失数据。默认情况下不包括在计算中。请参阅缺失数据部分。

重新索引允许您更改/添加/删除指定轴上的索引。这将返回数据的副本:

In [55]: df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])
In [56]: df1.loc[dates[0] : dates[1], "E"] = 1
In [57]: df1
Out[57]: 
 A         B         C    D    F    E
2013-01-01  0.000000  0.000000 -1.509059  5.0  NaN  1.0
2013-01-02  1.212112 -0.173215  0.119209  5.0  1.0  1.0
2013-01-03 -0.861849 -2.104569 -0.494929  5.0  2.0  NaN
2013-01-04  0.721555 -0.706771 -1.039575  5.0  3.0  NaN 

DataFrame.dropna() 删除任何具有缺失数据的行:

In [58]: df1.dropna(how="any")
Out[58]: 
 A         B         C    D    F    E
2013-01-02  1.212112 -0.173215  0.119209  5.0  1.0  1.0 

DataFrame.fillna() 用于填充缺失数据:

In [59]: df1.fillna(value=5)
Out[59]: 
 A         B         C    D    F    E
2013-01-01  0.000000  0.000000 -1.509059  5.0  5.0  1.0
2013-01-02  1.212112 -0.173215  0.119209  5.0  1.0  1.0
2013-01-03 -0.861849 -2.104569 -0.494929  5.0  2.0  5.0
2013-01-04  0.721555 -0.706771 -1.039575  5.0  3.0  5.0 

isna() 获取布尔掩码,其中值为nan

In [60]: pd.isna(df1)
Out[60]: 
 A      B      C      D      F      E
2013-01-01  False  False  False  False   True  False
2013-01-02  False  False  False  False  False  False
2013-01-03  False  False  False  False  False   True
2013-01-04  False  False  False  False  False   True 

操作

请参阅二进制运算基础部分。

统计

通常的操作排除缺失数据。

计算每列的平均值:

In [61]: df.mean()
Out[61]: 
A   -0.004474
B   -0.383981
C   -0.687758
D    5.000000
F    3.000000
dtype: float64 

计算每行的平均值:

In [62]: df.mean(axis=1)
Out[62]: 
2013-01-01    0.872735
2013-01-02    1.431621
2013-01-03    0.707731
2013-01-04    1.395042
2013-01-05    1.883656
2013-01-06    1.592306
Freq: D, dtype: float64 

与具有不同索引或列的另一个SeriesDataFrame进行操作将使结果与索引或列标签的并集对齐。此外,pandas 会沿指定维度自动广播,并用np.nan填充未对齐的标签。

In [63]: s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)
In [64]: s
Out[64]: 
2013-01-01    NaN
2013-01-02    NaN
2013-01-03    1.0
2013-01-04    3.0
2013-01-05    5.0
2013-01-06    NaN
Freq: D, dtype: float64
In [65]: df.sub(s, axis="index")
Out[65]: 
 A         B         C    D    F
2013-01-01       NaN       NaN       NaN  NaN  NaN
2013-01-02       NaN       NaN       NaN  NaN  NaN
2013-01-03 -1.861849 -3.104569 -1.494929  4.0  1.0
2013-01-04 -2.278445 -3.706771 -4.039575  2.0  0.0
2013-01-05 -5.424972 -4.432980 -4.723768  0.0 -1.0
2013-01-06       NaN       NaN       NaN  NaN  NaN

用户定义的函数

DataFrame.agg()DataFrame.transform()应用用户定义的函数,分别减少或广播其结果。

In [66]: df.agg(lambda x: np.mean(x) * 5.6)
Out[66]: 
A    -0.025054
B    -2.150294
C    -3.851445
D    28.000000
F    16.800000
dtype: float64
In [67]: df.transform(lambda x: x * 101.2)
Out[67]: 
 A           B           C      D      F
2013-01-01    0.000000    0.000000 -152.716721  506.0    NaN
2013-01-02  122.665737  -17.529322   12.063922  506.0  101.2
2013-01-03  -87.219115 -212.982405  -50.086843  506.0  202.4
2013-01-04   73.021382  -71.525239 -105.204988  506.0  303.6
2013-01-05  -43.007200   57.382459   27.954680  506.0  404.8
2013-01-06  -68.177398   11.501219 -149.616767  506.0  506.0 

值计数

查看更多内容,请参阅直方图和离散化。

In [68]: s = pd.Series(np.random.randint(0, 7, size=10))
In [69]: s
Out[69]: 
0    4
1    2
2    1
3    2
4    6
5    4
6    4
7    6
8    4
9    4
dtype: int64
In [70]: s.value_counts()
Out[70]: 
4    5
2    2
6    2
1    1
Name: count, dtype: int64 

字符串方法

Series配备了一组字符串处理方法,位于str属性中,使得在数组的每个元素上操作变得容易,如下面的代码片段所示。查看更多内容,请参阅矢量化字符串方法。

In [71]: s = pd.Series(["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"])
In [72]: s.str.lower()
Out[72]: 
0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object 

统计

通常情况下,操作排除缺失数据。

计算每列的平均值:

In [61]: df.mean()
Out[61]: 
A   -0.004474
B   -0.383981
C   -0.687758
D    5.000000
F    3.000000
dtype: float64 

计算每行的平均值:

In [62]: df.mean(axis=1)
Out[62]: 
2013-01-01    0.872735
2013-01-02    1.431621
2013-01-03    0.707731
2013-01-04    1.395042
2013-01-05    1.883656
2013-01-06    1.592306
Freq: D, dtype: float64 

与具有不同索引或列的另一个SeriesDataFrame进行操作将使结果与索引或列标签的并集对齐。此外,pandas 会沿指定维度自动广播,并用np.nan填充未对齐的标签。

In [63]: s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)
In [64]: s
Out[64]: 
2013-01-01    NaN
2013-01-02    NaN
2013-01-03    1.0
2013-01-04    3.0
2013-01-05    5.0
2013-01-06    NaN
Freq: D, dtype: float64
In [65]: df.sub(s, axis="index")
Out[65]: 
 A         B         C    D    F
2013-01-01       NaN       NaN       NaN  NaN  NaN
2013-01-02       NaN       NaN       NaN  NaN  NaN
2013-01-03 -1.861849 -3.104569 -1.494929  4.0  1.0
2013-01-04 -2.278445 -3.706771 -4.039575  2.0  0.0
2013-01-05 -5.424972 -4.432980 -4.723768  0.0 -1.0
2013-01-06       NaN       NaN       NaN  NaN  NaN 

用户定义的函数

DataFrame.agg()DataFrame.transform()应用用户定义的函数,分别减少或广播其结果。

In [66]: df.agg(lambda x: np.mean(x) * 5.6)
Out[66]: 
A    -0.025054
B    -2.150294
C    -3.851445
D    28.000000
F    16.800000
dtype: float64
In [67]: df.transform(lambda x: x * 101.2)
Out[67]: 
 A           B           C      D      F
2013-01-01    0.000000    0.000000 -152.716721  506.0    NaN
2013-01-02  122.665737  -17.529322   12.063922  506.0  101.2
2013-01-03  -87.219115 -212.982405  -50.086843  506.0  202.4
2013-01-04   73.021382  -71.525239 -105.204988  506.0  303.6
2013-01-05  -43.007200   57.382459   27.954680  506.0  404.8
2013-01-06  -68.177398   11.501219 -149.616767  506.0  506.0 

值计数

查看更多内容,请参阅直方图和离散化。

In [68]: s = pd.Series(np.random.randint(0, 7, size=10))
In [69]: s
Out[69]: 
0    4
1    2
2    1
3    2
4    6
5    4
6    4
7    6
8    4
9    4
dtype: int64
In [70]: s.value_counts()
Out[70]: 
4    5
2    2
6    2
1    1
Name: count, dtype: int64 

字符串方法

Series配备了一组字符串处理方法,位于str属性中,使得在数组的每个元素上操作变得容易,如下面的代码片段所示。查看更多内容,请参阅矢量化字符串方法。

In [71]: s = pd.Series(["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"])
In [72]: s.str.lower()
Out[72]: 
0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object 

合并

连接

pandas 提供了各种便捷的功能,用于轻松组合不同种类的SeriesDataFrame对象,针对索引的各种集合逻辑以及关系代数功能在连接/合并类型操作的情况下。

请参见合并部分。

使用concat()将 pandas 对象沿行连接在一起:

In [73]: df = pd.DataFrame(np.random.randn(10, 4))
In [74]: df
Out[74]: 
 0         1         2         3
0 -0.548702  1.467327 -1.015962 -0.483075
1  1.637550 -1.217659 -0.291519 -1.745505
2 -0.263952  0.991460 -0.919069  0.266046
3 -0.709661  1.669052  1.037882 -1.705775
4 -0.919854 -0.042379  1.247642 -0.009920
5  0.290213  0.495767  0.362949  1.548106
6 -1.131345 -0.089329  0.337863 -0.945867
7 -0.932132  1.956030  0.017587 -0.016692
8 -0.575247  0.254161 -1.143704  0.215897
9  1.193555 -0.077118 -0.408530 -0.862495
# break it into pieces
In [75]: pieces = [df[:3], df[3:7], df[7:]]
In [76]: pd.concat(pieces)
Out[76]: 
 0         1         2         3
0 -0.548702  1.467327 -1.015962 -0.483075
1  1.637550 -1.217659 -0.291519 -1.745505
2 -0.263952  0.991460 -0.919069  0.266046
3 -0.709661  1.669052  1.037882 -1.705775
4 -0.919854 -0.042379  1.247642 -0.009920
5  0.290213  0.495767  0.362949  1.548106
6 -1.131345 -0.089329  0.337863 -0.945867
7 -0.932132  1.956030  0.017587 -0.016692
8 -0.575247  0.254161 -1.143704  0.215897
9  1.193555 -0.077118 -0.408530 -0.862495 

注意

DataFrame添加列相对较快。但是,添加行需要复制,可能会很昂贵。我们建议将预先构建的记录列表传递给DataFrame构造函数,而不是通过迭代附加记录来构建DataFrame

Pandas 2.2 中文官方教程和指南(七)(4)https://developer.aliyun.com/article/1509750

相关文章
|
17天前
|
SQL 数据采集 数据挖掘
Pandas 教程
10月更文挑战第25天
28 2
|
3月前
|
存储 JSON 数据格式
Pandas 使用教程 CSV - CSV 转 JSON
Pandas 使用教程 CSV - CSV 转 JSON
38 0
|
3月前
|
JSON 数据格式 Python
Pandas 使用教程 JSON
Pandas 使用教程 JSON
42 0
|
3月前
|
SQL 数据采集 JSON
Pandas 使用教程 Series、DataFrame
Pandas 使用教程 Series、DataFrame
62 0
|
5月前
|
数据采集 存储 数据可视化
Pandas高级教程:数据清洗、转换与分析
Pandas是Python的数据分析库,提供Series和DataFrame数据结构及数据分析工具,便于数据清洗、转换和分析。本教程涵盖Pandas在数据清洗(如缺失值、重复值和异常值处理)、转换(数据类型转换和重塑)和分析(如描述性统计、分组聚合和可视化)的应用。通过学习Pandas,用户能更高效地处理和理解数据,为数据分析任务打下基础。
623 3
|
6月前
|
索引 Python
Pandas 2.2 中文官方教程和指南(一)(4)
Pandas 2.2 中文官方教程和指南(一)
60 0
|
6月前
|
存储 SQL JSON
Pandas 2.2 中文官方教程和指南(一)(3)
Pandas 2.2 中文官方教程和指南(一)
96 0
|
6月前
|
XML 关系型数据库 PostgreSQL
Pandas 2.2 中文官方教程和指南(一)(2)
Pandas 2.2 中文官方教程和指南(一)
175 0
|
6月前
|
XML 关系型数据库 MySQL
Pandas 2.2 中文官方教程和指南(一)(1)
Pandas 2.2 中文官方教程和指南(一)
177 0
|
6月前
|
C++ 索引 Python
Pandas 2.2 中文官方教程和指南(五)(4)
Pandas 2.2 中文官方教程和指南(五)
50 0