eland.DataFrame.groupby#

DataFrame.groupby(by: Optional[Union[str, List[str]]] = None, dropna: bool = True) DataFrameGroupBy#

用于执行 groupby 操作

参数#

by

用于 groupby 的列或列列表,目前接受列或列列表

dropna: 默认值 True

如果为 True,并且如果 group key 包含 NA 值,则 NA 值连同行/列将被丢弃。

返回值#

eland.groupby.DataFrameGroupBy

参见#

pandas.DataFrame.groupby

示例#

>>> ed_flights = ed.DataFrame('http://localhost:9200', 'flights', columns=["AvgTicketPrice", "Cancelled", "dayOfWeek", "timestamp", "DestCountry"])
>>> ed_flights.groupby(["DestCountry", "Cancelled"]).agg(["min", "max"], numeric_only=True) 
                      AvgTicketPrice              dayOfWeek
                                 min          max       min  max
DestCountry Cancelled
AE          False         110.799911  1126.148682       0.0  6.0
            True          132.443756   817.931030       0.0  6.0
AR          False         125.589394  1199.642822       0.0  6.0
            True          251.389603  1172.382568       0.0  6.0
AT          False         100.020531  1181.835815       0.0  6.0
...                              ...          ...       ...  ...
TR          True          307.915649   307.915649       0.0  0.0
US          False         100.145966  1199.729004       0.0  6.0
            True          102.153069  1192.429932       0.0  6.0
ZA          False         102.002663  1196.186157       0.0  6.0
            True          121.280296  1175.709961       0.0  6.0

[63 rows x 4 columns]
>>> ed_flights.groupby(["DestCountry", "Cancelled"]).mean(numeric_only=True) 
                       AvgTicketPrice  dayOfWeek
DestCountry Cancelled
AE          False          643.956793   2.717949
            True           388.828809   2.571429
AR          False          673.551677   2.746154
            True           682.197241   2.733333
AT          False          647.158290   2.819936
...                               ...        ...
TR          True           307.915649   0.000000
US          False          598.063146   2.752014
            True           579.799066   2.767068
ZA          False          636.998605   2.738589
            True           677.794078   2.928571

[63 rows x 2 columns]
>>> ed_flights.groupby(["DestCountry", "Cancelled"]).min(numeric_only=False) 
                       AvgTicketPrice  dayOfWeek           timestamp
DestCountry Cancelled
AE          False          110.799911          0 2018-01-01 19:31:30
            True           132.443756          0 2018-01-06 13:03:25
AR          False          125.589394          0 2018-01-01 01:30:47
            True           251.389603          0 2018-01-01 02:13:17
AT          False          100.020531          0 2018-01-01 05:24:19
...                               ...        ...                 ...
TR          True           307.915649          0 2018-01-08 04:35:10
US          False          100.145966          0 2018-01-01 00:06:27
            True           102.153069          0 2018-01-01 09:02:36
ZA          False          102.002663          0 2018-01-01 06:44:44
            True           121.280296          0 2018-01-04 00:37:01

[63 rows x 3 columns]