Pandas 2.2 中文官方教程和指南（二十·一）（3）-阿里云开发者社区

Pandas 2.2 中文官方教程和指南（二十·一）（2）https://developer.aliyun.com/article/1508817

其他有用的功能

排除非数值列

再次考虑我们一直在看的示例 DataFrame：

In [205]: df
Out[205]: 
 A      B         C         D
0  foo    one -0.575247  1.346061
1  bar    one  0.254161  1.511763
2  foo    two -1.143704  1.627081
3  bar  three  0.215897 -0.990582
4  foo    two  1.193555 -0.441652
5  bar    two -0.077118  1.211526
6  foo    one -0.408530  0.268520
7  foo  three -0.862495  0.024580

假设我们希望按A列分组计算标准差。有一个小问题，即我们不关心列B中的数据，因为它不是数值型的。您可以通过指定numeric_only=True来避免非数值列：

In [206]: df.groupby("A").std(numeric_only=True)
Out[206]: 
 C         D
A 
bar  0.181231  1.366330
foo  0.912265  0.884785

请注意，df.groupby('A').colname.std().比df.groupby('A').std().colname更有效。因此，如果聚合函数的结果只需要在一列（这里是colname）上，可以在应用聚合函数之前进行过滤。

In [207]: from decimal import Decimal
In [208]: df_dec = pd.DataFrame(
 .....:    {
 .....:        "id": [1, 2, 1, 2],
 .....:        "int_column": [1, 2, 3, 4],
 .....:        "dec_column": [
 .....:            Decimal("0.50"),
 .....:            Decimal("0.15"),
 .....:            Decimal("0.25"),
 .....:            Decimal("0.40"),
 .....:        ],
 .....:    }
 .....: )
 .....: 
In [209]: df_dec.groupby(["id"])[["dec_column"]].sum()
Out[209]: 
 dec_column
id 
1        0.75
2        0.55

(不)观察到的分类值处理

当使用Categorical分组器（作为单个分组器或作为多个分组器的一部分）时，observed关键字控制是否返回所有可能分组值的笛卡尔积（observed=False）或仅返回观察到的分组值（observed=True）。

显示所有数值：

In [210]: pd.Series([1, 1, 1]).groupby(
 .....:    pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=False
 .....: ).count()
 .....: 
Out[210]: 
a    3
b    0
dtype: int64

仅显示观察到的值：

In [211]: pd.Series([1, 1, 1]).groupby(
 .....:    pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=True
 .....: ).count()
 .....: 
Out[211]: 
a    3
dtype: int64

分组后返回的数据类型将始终包括所有被分组的类别。

In [212]: s = (
 .....:    pd.Series([1, 1, 1])
 .....:    .groupby(pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=True)
 .....:    .count()
 .....: )
 .....: 
In [213]: s.index.dtype
Out[213]: CategoricalDtype(categories=['a', 'b'], ordered=False, categories_dtype=object) 
```### NA 组处理
通过`NA`，我们指的是任何`NA`值，包括`NA`、`NaN`、`NaT`和`None`。如果在分组键中存在任何`NA`值，默认情况下这些值将被排除。换句话说，任何“`NA`组”都将被删除。您可以通过指定`dropna=False`来包含 NA 组。
```py
In [214]: df = pd.DataFrame({"key": [1.0, 1.0, np.nan, 2.0, np.nan], "A": [1, 2, 3, 4, 5]})
In [215]: df
Out[215]: 
 key  A
0  1.0  1
1  1.0  2
2  NaN  3
3  2.0  4
4  NaN  5
In [216]: df.groupby("key", dropna=True).sum()
Out[216]: 
 A
key 
1.0  3
2.0  4
In [217]: df.groupby("key", dropna=False).sum()
Out[217]: 
 A
key 
1.0  3
2.0  4
NaN  8

使用有序因子进行分组

作为 pandas 的Categorical类的实例表示的分类变量可以用作分组键。如果是这样，级别的顺序将被保留。当observed=False和sort=False时��任何未观察到的类别将以特定顺序出现在结果的末尾。

In [218]: days = pd.Categorical(
 .....:    values=["Wed", "Mon", "Thu", "Mon", "Wed", "Sat"],
 .....:    categories=["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"],
 .....: )
 .....: 
In [219]: data = pd.DataFrame(
 .....:   {
 .....:       "day": days,
 .....:       "workers": [3, 4, 1, 4, 2, 2],
 .....:   }
 .....: )
 .....: 
In [220]: data
Out[220]: 
 day  workers
0  Wed        3
1  Mon        4
2  Thu        1
3  Mon        4
4  Wed        2
5  Sat        2
In [221]: data.groupby("day", observed=False, sort=True).sum()
Out[221]: 
 workers
day 
Mon        8
Tue        0
Wed        5
Thu        1
Fri        0
Sat        2
Sun        0
In [222]: data.groupby("day", observed=False, sort=False).sum()
Out[222]: 
 workers
day 
Wed        5
Mon        8
Thu        1
Sat        2
Tue        0
Fri        0
Sun        0

使用分组器规范进行分组

您可能需要指定更多数据以正确分组。您可以使用pd.Grouper来提供这种局部控制。

In [223]: import datetime
In [224]: df = pd.DataFrame(
 .....:    {
 .....:        "Branch": "A A A A A A A B".split(),
 .....:        "Buyer": "Carl Mark Carl Carl Joe Joe Joe Carl".split(),
 .....:        "Quantity": [1, 3, 5, 1, 8, 1, 9, 3],
 .....:        "Date": [
 .....:            datetime.datetime(2013, 1, 1, 13, 0),
 .....:            datetime.datetime(2013, 1, 1, 13, 5),
 .....:            datetime.datetime(2013, 10, 1, 20, 0),
 .....:            datetime.datetime(2013, 10, 2, 10, 0),
 .....:            datetime.datetime(2013, 10, 1, 20, 0),
 .....:            datetime.datetime(2013, 10, 2, 10, 0),
 .....:            datetime.datetime(2013, 12, 2, 12, 0),
 .....:            datetime.datetime(2013, 12, 2, 14, 0),
 .....:        ],
 .....:    }
 .....: )
 .....: 
In [225]: df
Out[225]: 
 Branch Buyer  Quantity                Date
0      A  Carl         1 2013-01-01 13:00:00
1      A  Mark         3 2013-01-01 13:05:00
2      A  Carl         5 2013-10-01 20:00:00
3      A  Carl         1 2013-10-02 10:00:00
4      A   Joe         8 2013-10-01 20:00:00
5      A   Joe         1 2013-10-02 10:00:00
6      A   Joe         9 2013-12-02 12:00:00
7      B  Carl         3 2013-12-02 14:00:00

按照特定列和所需频率进行分组。这类似于重新采样。

In [226]: df.groupby([pd.Grouper(freq="1ME", key="Date"), "Buyer"])[["Quantity"]].sum()
Out[226]: 
 Quantity
Date       Buyer 
2013-01-31 Carl          1
 Mark          3
2013-10-31 Carl          6
 Joe           9
2013-12-31 Carl          3
 Joe           9

当指定freq时，pd.Grouper返回的对象将是pandas.api.typing.TimeGrouper的实例。当存在具有相同名称的列和索引时，您可以使用key按列分组，使用level按索引分组。

In [227]: df = df.set_index("Date")
In [228]: df["Date"] = df.index + pd.offsets.MonthEnd(2)
In [229]: df.groupby([pd.Grouper(freq="6ME", key="Date"), "Buyer"])[["Quantity"]].sum()
Out[229]: 
 Quantity
Date       Buyer 
2013-02-28 Carl          1
 Mark          3
2014-02-28 Carl          9
 Joe          18
In [230]: df.groupby([pd.Grouper(freq="6ME", level="Date"), "Buyer"])[["Quantity"]].sum()
Out[230]: 
 Quantity
Date       Buyer 
2013-01-31 Carl          1
 Mark          3
2014-01-31 Carl          9
 Joe          18

获取每个组的第一行

就像对于 DataFrame 或 Series 一样，您可以在 groupby 上调用 head 和 tail：

In [231]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=["A", "B"])
In [232]: df
Out[232]: 
 A  B
0  1  2
1  1  4
2  5  6
In [233]: g = df.groupby("A")
In [234]: g.head(1)
Out[234]: 
 A  B
0  1  2
2  5  6
In [235]: g.tail(1)
Out[235]: 
 A  B
1  1  4
2  5  6

这显示了每个组的第一行或最后一行。

获取每个组的第 n 行

要从每个组中选择第 n 个项目，请使用DataFrameGroupBy.nth()或SeriesGroupBy.nth()。提供的参数可以是任何整数、整数列表、切片或切片列表；请参见下面的示例。当组的第 n 个元素不存在时，不会引发错误；而是不返回相应的行。

一般来说，此操作充当过滤器。在某些情况下，它还会返回每个组一行，因此也是一种缩减。但是，由于通常它可以返回零个或多个组的行，因此 pandas 在所有情况下都将其视为过滤器。

In [236]: df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=["A", "B"])
In [237]: g = df.groupby("A")
In [238]: g.nth(0)
Out[238]: 
 A    B
0  1  NaN
2  5  6.0
In [239]: g.nth(-1)
Out[239]: 
 A    B
1  1  4.0
2  5  6.0
In [240]: g.nth(1)
Out[240]: 
 A    B
1  1  4.0

如果组的第 n 个元素不存在，则结果中不包括相应的行。特别是，如果指定的n大于任何组，结果将是一个空的 DataFrame。

In [241]: g.nth(5)
Out[241]: 
Empty DataFrame
Columns: [A, B]
Index: []

如果要选择第 n 个非空项目，请使用dropna kwarg。对于 DataFrame，这应该是'any'或'all'，就像您传递给 dropna 一样：

# nth(0) is the same as g.first()
In [242]: g.nth(0, dropna="any")
Out[242]: 
 A    B
1  1  4.0
2  5  6.0
In [243]: g.first()
Out[243]: 
 B
A 
1  4.0
5  6.0
# nth(-1) is the same as g.last()
In [244]: g.nth(-1, dropna="any")
Out[244]: 
 A    B
1  1  4.0
2  5  6.0
In [245]: g.last()
Out[245]: 
 B
A 
1  4.0
5  6.0
In [246]: g.B.nth(0, dropna="all")
Out[246]: 
1    4.0
2    6.0
Name: B, dtype: float64

您还可以通过指定多个 nth 值作为整数列表来从每个组中选择多个行。

In [247]: business_dates = pd.date_range(start="4/1/2014", end="6/30/2014", freq="B")
In [248]: df = pd.DataFrame(1, index=business_dates, columns=["a", "b"])
# get the first, 4th, and last date index for each month
In [249]: df.groupby([df.index.year, df.index.month]).nth([0, 3, -1])
Out[249]: 
 a  b
2014-04-01  1  1
2014-04-04  1  1
2014-04-30  1  1
2014-05-01  1  1
2014-05-06  1  1
2014-05-30  1  1
2014-06-02  1  1
2014-06-05  1  1
2014-06-30  1  1

您还可以使用切片或切片列表。

In [250]: df.groupby([df.index.year, df.index.month]).nth[1:]
Out[250]: 
 a  b
2014-04-02  1  1
2014-04-03  1  1
2014-04-04  1  1
2014-04-07  1  1
2014-04-08  1  1
...        .. ..
2014-06-24  1  1
2014-06-25  1  1
2014-06-26  1  1
2014-06-27  1  1
2014-06-30  1  1
[62 rows x 2 columns]
In [251]: df.groupby([df.index.year, df.index.month]).nth[1:, :-1]
Out[251]: 
 a  b
2014-04-01  1  1
2014-04-02  1  1
2014-04-03  1  1
2014-04-04  1  1
2014-04-07  1  1
...        .. ..
2014-06-24  1  1
2014-06-25  1  1
2014-06-26  1  1
2014-06-27  1  1
2014-06-30  1  1
[65 rows x 2 columns]

枚举组项目

要查看每行出现在其组内的顺序，请使用cumcount方法：

In [252]: dfg = pd.DataFrame(list("aaabba"), columns=["A"])
In [253]: dfg
Out[253]: 
 A
0  a
1  a
2  a
3  b
4  b
5  a
In [254]: dfg.groupby("A").cumcount()
Out[254]: 
0    0
1    1
2    2
3    0
4    1
5    3
dtype: int64
In [255]: dfg.groupby("A").cumcount(ascending=False)
Out[255]: 
0    3
1    2
2    1
3    1
4    0
5    0
dtype: int64

枚举组

要查看组的排序（而不是由cumcount给出的组内行的顺序），您可以使用DataFrameGroupBy.ngroup()。

请注意，分组的编号与在迭代 groupby 对象时看到组的顺序相匹配，而不是它们首次观察到的顺序。

In [256]: dfg = pd.DataFrame(list("aaabba"), columns=["A"])
In [257]: dfg
Out[257]: 
 A
0  a
1  a
2  a
3  b
4  b
5  a
In [258]: dfg.groupby("A").ngroup()
Out[258]: 
0    0
1    0
2    0
3    1
4    1
5    0
dtype: int64
In [259]: dfg.groupby("A").ngroup(ascending=False)
Out[259]: 
0    1
1    1
2    1
3    0
4    0
5    1
dtype: int64

绘图

Groupby 也适用于一些绘图方法。在这种情况下，假设我们怀疑第一列中的值在“B”组中平均高出 3 倍。

In [260]: np.random.seed(1234)
In [261]: df = pd.DataFrame(np.random.randn(50, 2))
In [262]: df["g"] = np.random.choice(["A", "B"], size=50)
In [263]: df.loc[df["g"] == "B", 1] += 3

我们可以通过箱线图轻松可视化这一点：

In [264]: df.groupby("g").boxplot()
Out[264]: 
A         Axes(0.1,0.15;0.363636x0.75)
B    Axes(0.536364,0.15;0.363636x0.75)
dtype: object

调用boxplot的结果是一个字典，其键是我们分组列g的值（“A”和“B”）。结果字典的值可以通过boxplot的return_type关键字控制。有关更多信息，请参阅可视化文档。

警告

由于历史原因，df.groupby("g").boxplot() 不等同于 df.boxplot(by="g")。请参见这里进行解释。

管道函数调用

与DataFrame和Series提供的功能类似，接受GroupBy对象的函数可以使用pipe方法链接在一起，以实现更清晰、更易读的语法。要了解有关.pipe的一般信息，请参见这里。

结合.groupby和.pipe通常在需要重用 GroupBy 对象时很有用。

举个例子，假设有一个包含店铺、产品、收入和销量列的 DataFrame。我们想要对每个店铺和每种产品进行价格（即收入/销量）的分组计算。我们可以通过多步操作来实现，但使用管道表达可以使代码更易读。首先设置数据：

In [265]: n = 1000
In [266]: df = pd.DataFrame(
 .....:    {
 .....:        "Store": np.random.choice(["Store_1", "Store_2"], n),
 .....:        "Product": np.random.choice(["Product_1", "Product_2"], n),
 .....:        "Revenue": (np.random.random(n) * 50 + 10).round(2),
 .....:        "Quantity": np.random.randint(1, 10, size=n),
 .....:    }
 .....: )
 .....: 
In [267]: df.head(2)
Out[267]: 
 Store    Product  Revenue  Quantity
0  Store_2  Product_1    26.12         1
1  Store_2  Product_1    28.86         1

现在我们找到每个店铺/产品的价格。

In [268]: (
 .....:    df.groupby(["Store", "Product"])
 .....:    .pipe(lambda grp: grp.Revenue.sum() / grp.Quantity.sum())
 .....:    .unstack()
 .....:    .round(2)
 .....: )
 .....: 
Out[268]: 
Product  Product_1  Product_2
Store 
Store_1       6.82       7.05
Store_2       6.30       6.64

当您想要将分组对象传递给某个任意函数时，管道也可以表达性强，例如：

In [269]: def mean(groupby):
 .....:    return groupby.mean()
 .....: 
In [270]: df.groupby(["Store", "Product"]).pipe(mean)
Out[270]: 
 Revenue  Quantity
Store   Product 
Store_1 Product_1  34.622727  5.075758
 Product_2  35.482815  5.029630
Store_2 Product_1  32.972837  5.237589
 Product_2  34.684360  5.224000

这里mean接受一个 GroupBy 对象，并分别找到每个店铺-产品组合的收入和销量列的平均值。mean函数可以是任何接受 GroupBy 对象的函数；.pipe将 GroupBy 对象作为参数传递给您指定的函数。

示例

多列因子化

通过使用DataFrameGroupBy.ngroup()，我们可以类似于factorize()（如在重塑 API 中进一步描述）的方式提取关于组的信息，但这种方式自然地适用于混合类型和不同来源的多列。这在处理中间分类步骤时可能很有用，当组行之间的关系比它们的内容更重要时，或者作为仅接受整数编码的算法的输入。（有关 pandas 对完整分类数据的支持的更多信息，请参见分类介绍和 API 文档。）

In [271]: dfg = pd.DataFrame({"A": [1, 1, 2, 3, 2], "B": list("aaaba")})
In [272]: dfg
Out[272]: 
 A  B
0  1  a
1  1  a
2  2  a
3  3  b
4  2  a
In [273]: dfg.groupby(["A", "B"]).ngroup()
Out[273]: 
0    0
1    0
2    1
3    2
4    1
dtype: int64
In [274]: dfg.groupby(["A", [0, 0, 0, 1, 1]]).ngroup()
Out[274]: 
0    0
1    0
2    1
3    3
4    2
dtype: int64

按索引器分组以‘resample’数据

重新采样从已有观察数据或生成数据的模型中产生新的假设样本（重新采样）。这些新样本类似于现有样本。

为了使resample在非日期时间索引上起作用，可以使用以下过程。

在以下示例中，df.index // 5返回一个整数数组，用于确定哪些内容被选中进行分组操作。

注意

下面的示例展示了如何通过将样本合并为更少的样本来进行降采样。通过使用 df.index // 5，我们将样本聚合到箱中。通过应用 std() 函数，我们将许多样本中包含的信息聚合成一小部分值，即它们的标准差，从而减少样本数量。

In [275]: df = pd.DataFrame(np.random.randn(10, 2))
In [276]: df
Out[276]: 
 0         1
0 -0.793893  0.321153
1  0.342250  1.618906
2 -0.975807  1.918201
3 -0.810847 -1.405919
4 -1.977759  0.461659
5  0.730057 -1.316938
6 -0.751328  0.528290
7 -0.257759 -1.081009
8  0.505895 -1.701948
9 -1.006349  0.020208
In [277]: df.index // 5
Out[277]: Index([0, 0, 0, 0, 0, 1, 1, 1, 1, 1], dtype='int64')
In [278]: df.groupby(df.index // 5).std()
Out[278]: 
 0         1
0  0.823647  1.312912
1  0.760109  0.942941

返回一个 Series 以传播名称

分组 DataFrame 列，计算一组指标并返回一个命名 Series。该 Series 的名称将用作列索引的名称。这在与重塑操作（如堆叠）结合使用时特别有用，其中列索引名称将用作插入列的名称：

In [279]: df = pd.DataFrame(
 .....:    {
 .....:        "a": [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
 .....:        "b": [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
 .....:        "c": [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
 .....:        "d": [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
 .....:    }
 .....: )
 .....: 
In [280]: def compute_metrics(x):
 .....:    result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
 .....:    return pd.Series(result, name="metrics")
 .....: 
In [281]: result = df.groupby("a").apply(compute_metrics, include_groups=False)
In [282]: result
Out[282]: 
metrics  b_sum  c_mean
a 
0          2.0     0.5
1          2.0     0.5
2          2.0     0.5
In [283]: result.stack(future_stack=True)
Out[283]: 
a  metrics
0  b_sum      2.0
 c_mean     0.5
1  b_sum      2.0
 c_mean     0.5
2  b_sum      2.0
 c_mean     0.5
dtype: float64

将对象拆分为组

分组的抽象定义是提供标签到组名的映射。要创建一个 GroupBy 对象（稍后会详细介绍 GroupBy 对象），您可以执行以下操作：

In [1]: speeds = pd.DataFrame(
 ...:    [
 ...:        ("bird", "Falconiformes", 389.0),
 ...:        ("bird", "Psittaciformes", 24.0),
 ...:        ("mammal", "Carnivora", 80.2),
 ...:        ("mammal", "Primates", np.nan),
 ...:        ("mammal", "Carnivora", 58),
 ...:    ],
 ...:    index=["falcon", "parrot", "lion", "monkey", "leopard"],
 ...:    columns=("class", "order", "max_speed"),
 ...: )
 ...: 
In [2]: speeds
Out[2]: 
 class           order  max_speed
falcon     bird   Falconiformes      389.0
parrot     bird  Psittaciformes       24.0
lion     mammal       Carnivora       80.2
monkey   mammal        Primates        NaN
leopard  mammal       Carnivora       58.0
In [3]: grouped = speeds.groupby("class")
In [4]: grouped = speeds.groupby(["class", "order"])

映射可以通过多种不同的方式指定：

一个要在每个索引标签上调用的 Python 函数。
与索引长度相同的列表或 NumPy 数组。
一个字典或 Series，提供一个 标签 -> 分组名称 的映射。
对于 DataFrame 对象，一个字符串，指示要用于分组的列名或索引级别名称。
以上任何一种的列表。

总体上，我们将分组对象称为键。例如，考虑以下 DataFrame：

注意

传递给 groupby 的字符串可以是列名或索引级别。如果一个字符串同时匹配列名和索引级别名称，将引发 ValueError。

In [5]: df = pd.DataFrame(
 ...:    {
 ...:        "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
 ...:        "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
 ...:        "C": np.random.randn(8),
 ...:        "D": np.random.randn(8),
 ...:    }
 ...: )
 ...: 
In [6]: df
Out[6]: 
 A      B         C         D
0  foo    one  0.469112 -0.861849
1  bar    one -0.282863 -2.104569
2  foo    two -1.509059 -0.494929
3  bar  three -1.135632  1.071804
4  foo    two  1.212112  0.721555
5  bar    two -0.173215 -0.706771
6  foo    one  0.119209 -1.039575
7  foo  three -1.044236  0.271860

在 DataFrame 上，通过调用 groupby() 方法，我们可以获得一个 GroupBy 对象。该方法返回一个 pandas.api.typing.DataFrameGroupBy 实例。我们可以自然地按照 A 或 B 列，或两者都进行分组：

In [7]: grouped = df.groupby("A")
In [8]: grouped = df.groupby("B")
In [9]: grouped = df.groupby(["A", "B"])

注意

df.groupby('A') 只是 df.groupby(df['A']) 的语法糖。

如果我们在列 A 和 B 上还有一个 MultiIndex，我们可以按照除指定列之外的所有列进行分组：

In [10]: df2 = df.set_index(["A", "B"])
In [11]: grouped = df2.groupby(level=df2.index.names.difference(["B"]))
In [12]: grouped.sum()
Out[12]: 
 C         D
A 
bar -1.591710 -1.739537
foo -0.752861 -1.402938

上述 GroupBy 将根据其索引（行）拆分 DataFrame。要按列拆分，首先进行转置：

In [13]: def get_letter_type(letter):
 ....:    if letter.lower() in 'aeiou':
 ....:        return 'vowel'
 ....:    else:
 ....:        return 'consonant'
 ....: 
In [14]: grouped = df.T.groupby(get_letter_type)

pandas Index 对象支持重复值。如果在 groupby 操作中使用非唯一索引作为分组键，则相同索引值的所有值将被视为一个组，因此聚合函数的输出将只包含唯一索引值：

In [15]: index = [1, 2, 3, 1, 2, 3]
In [16]: s = pd.Series([1, 2, 3, 10, 20, 30], index=index)
In [17]: s
Out[17]: 
1     1
2     2
3     3
1    10
2    20
3    30
dtype: int64
In [18]: grouped = s.groupby(level=0)
In [19]: grouped.first()
Out[19]: 
1    1
2    2
3    3
dtype: int64
In [20]: grouped.last()
Out[20]: 
1    10
2    20
3    30
dtype: int64
In [21]: grouped.sum()
Out[21]: 
1    11
2    22
3    33
dtype: int64

请注意，直到需要拆分为止，不会发生拆分。创建 GroupBy 对象仅验证您已传递了有效的映射。

注意

许多种复杂的数据操作可以用 GroupBy 操作来表达（尽管不能保证是最有效的实现）。您可以在标签映射函数中发挥创造力。

Pandas 2.2 中文官方教程和指南（二十·一）（4）https://developer.aliyun.com/article/1508819

Pandas 2.2 中文官方教程和指南（二十·一）（3）

其他有用的功能

排除非数值列

(不)观察到的分类值处理

使用有序因子进行分组

使用分组器规范进行分组

获取每个组的第一行

获取每个组的第 n 行

枚举组项目

枚举组

绘图

管道函数调用

示例

多列因子化

按索引器分组以‘resample’数据

返回一个 Series 以传播名称

将对象拆分为组

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

Pandas 2.2 中文官方教程和指南（二十·一）（3）

其他有用的功能

排除非数值列

(不)观察到的分类值处理

使用有序因子进行分组

使用分组器规范进行分组

获取每个组的第一行

获取每个组的第 n 行

枚举组项目

枚举组

绘图

管道函数调用

示例

多列因子化

按索引器分组以‘resample’数据

返回一个 Series 以传播名称

将对象拆分为组

热门文章

最新文章

相关课程

相关电子书