eland.csv_to_eland#
- eland.csv_to_eland(filepath_or_buffer, es_client: Union[str, List[str], Tuple[str, ...], Elasticsearch], es_dest_index: str, es_if_exists: str = 'fail', es_refresh: bool = False, es_dropna: bool = False, es_type_overrides: Optional[Mapping[str, str]] = None, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, chunksize=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, warn_bad_lines: bool = True, error_bad_lines: bool = True, on_bad_lines: str = 'error', delim_whitespace=False, low_memory: bool = True, memory_map=False, float_precision=None) DataFrame #
将逗号分隔值 (csv) 文件读入 eland.DataFrame(即 Elasticsearch 索引)。
修改 Elasticsearch 索引
注意,不支持 pandas 迭代选项
参数#
- es_client: Elasticsearch 客户端参数
elasticsearch-py 参数或
elasticsearch-py 实例
- es_dest_index: str
要追加的 Elasticsearch 索引名称
- es_if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’
如果索引已存在,如何处理。
fail: 抛出 ValueError 错误。
replace: 在插入新值之前删除索引。
append: 将新值插入到现有索引中。如果不存在则创建。
- es_dropna: bool, default ‘False’
True: 删除缺失值(参见 pandas.Series.dropna)
False: 包含缺失值 - 可能会导致批量操作失败
- es_type_overrides: dict, default None
列的字典: es_type 用于覆盖默认的 es 数据类型映射
- chunksize
在批量索引到 Elasticsearch 之前要读取的 csv 行数
其他参数#
从 pandas.read_csv 中派生的参数。
另请参见#
注释#
不支持迭代器
示例#
查看 Elasticsearch 中是否存在 ‘churn’ 索引
>>> from elasticsearch import Elasticsearch >>> es = Elasticsearch() >>> es.indices.exists(index="churn") False
读取 ‘churn.csv’ 并使用第一列作为 _id(以及 eland.DataFrame 索引)
# churn.csv ,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn 0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,0 1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,0 ...
>>> ed.csv_to_eland( ... "churn.csv", ... es_client='http://localhost:9200', ... es_dest_index='churn', ... es_refresh=True, ... index_col=0 ... ) account length area code churn customer service calls ... total night calls total night charge total night minutes voice mail plan 0 128 415 0 1 ... 91 11.01 244.7 yes 1 107 415 0 1 ... 103 11.45 254.4 yes 2 137 415 0 0 ... 104 7.32 162.6 no 3 84 408 0 2 ... 89 8.86 196.9 no 4 75 415 0 3 ... 121 8.41 186.9 no ... ... ... ... ... ... ... ... ... ... 3328 192 415 0 2 ... 83 12.56 279.1 yes 3329 68 415 0 3 ... 123 8.61 191.3 no 3330 28 510 0 2 ... 91 8.64 191.9 no 3331 184 510 0 2 ... 137 6.26 139.2 no 3332 74 415 0 0 ... 77 10.86 241.4 yes [3333 rows x 21 columns]
验证数据现在已存在于 ‘churn’ 索引中
>>> es.search(index="churn", size=1) {'took': 1, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 3333, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'churn', '_id': '0', '_score': 1.0, '_source': {'state': 'KS', 'account length': 128, 'area code': 415, 'phone number': '382-4657', 'international plan': 'no', 'voice mail plan': 'yes', 'number vmail messages': 25, 'total day minutes': 265.1, 'total day calls': 110, 'total day charge': 45.07, 'total eve minutes': 197.4, 'total eve calls': 99, 'total eve charge': 16.78, 'total night minutes': 244.7, 'total night calls': 91, 'total night charge': 11.01, 'total intl minutes': 10.0, 'total intl calls': 3, 'total intl charge': 2.7, 'customer service calls': 1, 'churn': 0}}]}}
TODO - 目前 eland.DataFrame 可能不会保留 csv 中数据的顺序。