Creating a CSV File With Headers from Your Apache Hive Data

If you’re using Apache Hadoop, you know what a powerful tool it is for storing and processing massive amounts of data – but accessing all those petabytes can be difficult. Apache Hive allows you to more easily query, extract, and analyze your Hadoop data using SQL-like commands. Creating a CSV file from a data table is just one way to pull the important information you need into a usable format, however exporting the file from Hive will often leave you without a header due to the difference in formatting. This How.com.vn article will show you how to maintain the column headers using Hive and the Beeline command line interface when you export to CSV.

Steps

1
Update your software and server. If you haven’t updated in a while, you may be running deprecated versions of the HiveServer. HiveServer2 has its own CLI (command line interface) called Beeline which replaces the original Hive CLI and allows for more flexibility when accessing your data. You will also require:
- Java 1.7 or newer
- Hadoop 2.x
2
Run HiveServer2. In your computer’s terminal, enter $HIVE_HOME/bin/hiveserver2.
- $HIVE_HOME is the directory in which Hive is stored.
Advertisement
3
Run Beeline. In your terminal, enter $HIVE_HOME/bin/beeline -u connect jdbc:hive2://LOCALHOST:PORT USERNAME PASSWORD.
- LOCALHOST is the IP address where the HiveServer2 was started.
- PORT defaults to 10000.
- USERNAME and PASSWORD are the credentials you used when setting up Hive.
This will show you a list of your current databases and their filenames.
5
Export your file. With the name of the database you want to export represented by DATABASE, enter the following line of code. It will create a file in the HIVE_HOME folder in CSV format, complete with headers!
```
$HIVE_HOME/bin/beeline -u jdbc:hive2://localhost:10000 -n USERNAME -p PASSWORD --outputformat=csv2 -e "SELECT * FROM FILENAME." > export.csv
```
Advertisement

Expert Q&A

Search

Add New Question

Ask a Question

200 characters left

Include your email address to get a message when this question is answered.

Submit

Tip

The now deprecated HiveServer1 used the Hive CLI to interact with Hadoop, but with the release of HiveServer2 and Beeline, this will soon be phased out. Newer versions will not support the Hive CLI commands.

Creating a CSV File With Headers from Your Apache Hive Data

Steps

Expert Q&A

Tip

You Might Also Like

References

About this article

Is this article up to date?

You Might Also Like