Skip to content

Guide to Apache Hive

July 10, 2014

Description

Apache Hive is data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

This technology is primarily used on top of Apache Hadoop.

 

Start Command Line Shell

$ hive

 

Useful Commands

Show Databases/Schemas

hive> show databases;

Use Database/Schema

hive> use {database_name};

Show Tables in Database/Schemea

hive> show tables;

Show Table Partitions

hive> show partitions {table_name};

 

Describe Table

hive> desc {table_name}

Run Hive Query

hive> select * from {schema}.{table_name} where hive_entry_timestamp>"{starting_timestamp}" and hive_entry_timestamp<="ending_timestamp" limit 100;

Stop Command Line Shell

hive> quit;

OR

hive> exit;

OR

Ctrl + Z
Advertisements
Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: