eland.eland_to_pandas#
- eland.eland_to_pandas(ed_df: DataFrame, show_progress: bool = False) DataFrame #
将 eland.Dataframe 转换为 pandas.DataFrame
注意:这会将整个 Elasticsearch 索引加载到核心 pandas.DataFrame 结构中。对于大型索引,这可能会对 Elasticsearch 集群造成重大负载,并需要大量的内存
参数#
- ed_df: eland.DataFrame
引用 Elasticsearch 索引的源 eland.Dataframe
- show_progress: bool
将选项的进度输出到标准输出?默认情况下为 False。
返回值#
- pandas.Dataframe
pandas.DataFrame 包含 eland.DataFrame 中的所有行和列
示例#
>>> ed_df = ed.DataFrame('http://localhost:9200', 'flights').head() >>> type(ed_df) <class 'eland.dataframe.DataFrame'> >>> ed_df AvgTicketPrice Cancelled ... dayOfWeek timestamp 0 841.265642 False ... 0 2018-01-01 00:00:00 1 882.982662 False ... 0 2018-01-01 18:27:00 2 190.636904 False ... 0 2018-01-01 17:11:14 3 181.694216 True ... 0 2018-01-01 10:33:28 4 730.041778 False ... 0 2018-01-01 05:13:00 [5 rows x 28 columns]
将 eland.DataFrame 转换为 pandas.DataFrame(注意:这会将整个 Elasticsearch 索引加载到核心内存中)
>>> pd_df = ed.eland_to_pandas(ed_df) >>> type(pd_df) <class 'pandas.core.frame.DataFrame'> >>> pd_df AvgTicketPrice Cancelled ... dayOfWeek timestamp 0 841.265642 False ... 0 2018-01-01 00:00:00 1 882.982662 False ... 0 2018-01-01 18:27:00 2 190.636904 False ... 0 2018-01-01 17:11:14 3 181.694216 True ... 0 2018-01-01 10:33:28 4 730.041778 False ... 0 2018-01-01 05:13:00 [5 rows x 28 columns]
将 eland.DataFrame 转换为 pandas.DataFrame 并每 10000 行显示进度
>>> pd_df = ed.eland_to_pandas(ed.DataFrame('http://localhost:9200', 'flights'), show_progress=True) 2020-01-29 12:43:36.572395: read 10000 rows 2020-01-29 12:43:37.309031: read 13059 rows
另请参见#
eland.pandas_to_eland: 从 pandas.DataFrame 创建 eland.Dataframe