eland.csv_to_eland#

eland.csv_to_eland(filepath_or_buffer, es_client: Union[str, List[str], Tuple[str, ...], Elasticsearch], es_dest_index: str, es_if_exists: str = 'fail', es_refresh: bool = False, es_dropna: bool = False, es_type_overrides: Optional[Mapping[str, str]] = None, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, chunksize=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, warn_bad_lines: bool = True, error_bad_lines: bool = True, on_bad_lines: str = 'error', delim_whitespace=False, low_memory: bool = True, memory_map=False, float_precision=None) DataFrame#

将逗号分隔值 (csv) 文件读入 eland.DataFrame(即 Elasticsearch 索引)。

修改 Elasticsearch 索引

注意,不支持 pandas 迭代选项

参数#

es_client: Elasticsearch 客户端参数
  • elasticsearch-py 参数或

  • elasticsearch-py 实例

es_dest_index: str

要追加的 Elasticsearch 索引名称

es_if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’

如果索引已存在,如何处理。

  • fail: 抛出 ValueError 错误。

  • replace: 在插入新值之前删除索引。

  • append: 将新值插入到现有索引中。如果不存在则创建。

es_dropna: bool, default ‘False’
  • True: 删除缺失值(参见 pandas.Series.dropna)

  • False: 包含缺失值 - 可能会导致批量操作失败

es_type_overrides: dict, default None

列的字典: es_type 用于覆盖默认的 es 数据类型映射

chunksize

在批量索引到 Elasticsearch 之前要读取的 csv 行数

其他参数#

pandas.read_csv 中派生的参数。

另请参见#

pandas.read_csv

注释#

不支持迭代器

示例#

查看 Elasticsearch 中是否存在 ‘churn’ 索引

>>> from elasticsearch import Elasticsearch 
>>> es = Elasticsearch() 
>>> es.indices.exists(index="churn") 
False

读取 ‘churn.csv’ 并使用第一列作为 _id(以及 eland.DataFrame 索引)

# churn.csv
,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,0
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,0
...
>>>  ed.csv_to_eland(
...      "churn.csv",
...      es_client='http://localhost:9200',
...      es_dest_index='churn',
...      es_refresh=True,
...      index_col=0
... ) 
          account length  area code  churn  customer service calls  ... total night calls  total night charge total night minutes voice mail plan
0                128        415      0                       1  ...                91               11.01               244.7             yes
1                107        415      0                       1  ...               103               11.45               254.4             yes
2                137        415      0                       0  ...               104                7.32               162.6              no
3                 84        408      0                       2  ...                89                8.86               196.9              no
4                 75        415      0                       3  ...               121                8.41               186.9              no
...              ...        ...    ...                     ...  ...               ...                 ...                 ...             ...
3328             192        415      0                       2  ...                83               12.56               279.1             yes
3329              68        415      0                       3  ...               123                8.61               191.3              no
3330              28        510      0                       2  ...                91                8.64               191.9              no
3331             184        510      0                       2  ...               137                6.26               139.2              no
3332              74        415      0                       0  ...                77               10.86               241.4             yes

[3333 rows x 21 columns]

验证数据现在已存在于 ‘churn’ 索引中

>>> es.search(index="churn", size=1) 
{'took': 1, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 3333, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'churn', '_id': '0', '_score': 1.0, '_source': {'state': 'KS', 'account length': 128, 'area code': 415, 'phone number': '382-4657', 'international plan': 'no', 'voice mail plan': 'yes', 'number vmail messages': 25, 'total day minutes': 265.1, 'total day calls': 110, 'total day charge': 45.07, 'total eve minutes': 197.4, 'total eve calls': 99, 'total eve charge': 16.78, 'total night minutes': 244.7, 'total night calls': 91, 'total night charge': 11.01, 'total intl minutes': 10.0, 'total intl calls': 3, 'total intl charge': 2.7, 'customer service calls': 1, 'churn': 0}}]}}

TODO - 目前 eland.DataFrame 可能不会保留 csv 中数据的顺序。