Integrating Hadoop and Elasticsearch - Best of Two Worlds
12 Sep 2018For real-time analytics needs, organizations are using Hadoop and ElasticSearch together. Hadoop to ElasticSearch is one of very common integration pattern for API access for your data in Hadoop and with a connector provided by ElasticSearch, it makes it really easy to get data flowing with very less work.
Download ES-Hadoop Connector
1. Add Jar files to Hive project
ADD JAR hdfs:/user/elasticsearch-jars/elasticsearch-hadoop-hive-6.2.1.jar;
ADD JAR hdfs:/user/elasticsearch-jars/commons-httpclient-3.0.1.jar;
2. Ingesting Data from Hadoop to ElasticSearch
CREATE EXTERNAL TABLE IF NOT EXISTS test_db.es_items(
sku INT COMMENT 'SKU for an Item',
location_id INT COMMENT 'Store Location Id',
start_date STRING COMMENT 'Start Date for Item Location',
end_date STRING COMMENT 'End Date for Item Location',
channel STRING COMMENT 'Channel for Item Location',
source_system STRING COMMENT 'Data Source for the Item Location'
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = '${ES_INDEX}/${hiveconf:ES_TYPE}',
'es.nodes' = '${ES_NODE_LIST}',
'es.port' = '${ES_PORT}');
Now you can add data to the es_items
table and you would be able to see it in your elasticsearch cluster.