eland.DataFrame.groupby#
- DataFrame.groupby(by: Optional[Union[str, List[str]]] = None, dropna: bool = True) DataFrameGroupBy #
用于执行 groupby 操作
参数#
- by
用于 groupby 的列或列列表,目前接受列或列列表
- dropna: 默认值 True
如果为 True,并且如果 group key 包含 NA 值,则 NA 值连同行/列将被丢弃。
返回值#
eland.groupby.DataFrameGroupBy
参见#
示例#
>>> ed_flights = ed.DataFrame('http://localhost:9200', 'flights', columns=["AvgTicketPrice", "Cancelled", "dayOfWeek", "timestamp", "DestCountry"]) >>> ed_flights.groupby(["DestCountry", "Cancelled"]).agg(["min", "max"], numeric_only=True) AvgTicketPrice dayOfWeek min max min max DestCountry Cancelled AE False 110.799911 1126.148682 0.0 6.0 True 132.443756 817.931030 0.0 6.0 AR False 125.589394 1199.642822 0.0 6.0 True 251.389603 1172.382568 0.0 6.0 AT False 100.020531 1181.835815 0.0 6.0 ... ... ... ... ... TR True 307.915649 307.915649 0.0 0.0 US False 100.145966 1199.729004 0.0 6.0 True 102.153069 1192.429932 0.0 6.0 ZA False 102.002663 1196.186157 0.0 6.0 True 121.280296 1175.709961 0.0 6.0 [63 rows x 4 columns]
>>> ed_flights.groupby(["DestCountry", "Cancelled"]).mean(numeric_only=True) AvgTicketPrice dayOfWeek DestCountry Cancelled AE False 643.956793 2.717949 True 388.828809 2.571429 AR False 673.551677 2.746154 True 682.197241 2.733333 AT False 647.158290 2.819936 ... ... ... TR True 307.915649 0.000000 US False 598.063146 2.752014 True 579.799066 2.767068 ZA False 636.998605 2.738589 True 677.794078 2.928571 [63 rows x 2 columns]
>>> ed_flights.groupby(["DestCountry", "Cancelled"]).min(numeric_only=False) AvgTicketPrice dayOfWeek timestamp DestCountry Cancelled AE False 110.799911 0 2018-01-01 19:31:30 True 132.443756 0 2018-01-06 13:03:25 AR False 125.589394 0 2018-01-01 01:30:47 True 251.389603 0 2018-01-01 02:13:17 AT False 100.020531 0 2018-01-01 05:24:19 ... ... ... ... TR True 307.915649 0 2018-01-08 04:35:10 US False 100.145966 0 2018-01-01 00:06:27 True 102.153069 0 2018-01-01 09:02:36 ZA False 102.002663 0 2018-01-01 06:44:44 True 121.280296 0 2018-01-04 00:37:01 [63 rows x 3 columns]