eland.eland_to_pandas#

eland.eland_to_pandas(ed_df: DataFrame, show_progress: bool = False) DataFrame#

将 eland.Dataframe 转换为 pandas.DataFrame

注意:这会将整个 Elasticsearch 索引加载到核心 pandas.DataFrame 结构中。对于大型索引,这可能会对 Elasticsearch 集群造成重大负载,并需要大量的内存

参数#

ed_df: eland.DataFrame

引用 Elasticsearch 索引的源 eland.Dataframe

show_progress: bool

将选项的进度输出到标准输出?默认情况下为 False。

返回值#

pandas.Dataframe

pandas.DataFrame 包含 eland.DataFrame 中的所有行和列

示例#

>>> ed_df = ed.DataFrame('https://127.0.0.1:9200', 'flights').head()
>>> type(ed_df)
<class 'eland.dataframe.DataFrame'>
>>> ed_df
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00

[5 rows x 28 columns]

eland.DataFrame 转换为 pandas.DataFrame(注意:这会将整个 Elasticsearch 索引加载到核心内存中)

>>> pd_df = ed.eland_to_pandas(ed_df)
>>> type(pd_df)
<class 'pandas.core.frame.DataFrame'>
>>> pd_df
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00

[5 rows x 28 columns]

eland.DataFrame 转换为 pandas.DataFrame 并每 10000 行显示进度

>>> pd_df = ed.eland_to_pandas(ed.DataFrame('https://127.0.0.1:9200', 'flights'), show_progress=True) 
2020-01-29 12:43:36.572395: read 10000 rows
2020-01-29 12:43:37.309031: read 13059 rows

另请参见#

eland.pandas_to_eland: 从 pandas.DataFrame 创建 eland.Dataframe