Using Toad with Hive in Amazon Elastic Map Reduce

The Toad for Cloud Databases eclipse client has support for Hive queries which makes it really easy for me to run queries against our test hadoop clusters.  It also supports Hive running on top of Amazon Elastic Map Reduce (EMR), but you do need to be aware that in EMR the default ports are different from what we have come to expect.

Firstly, if you have started an EMR cluster with Hive 0.5 support, then the Hive server will be running on port 10001, not port 10000.  The second difference is that the JobTracker is running on port 9100, rather than 50030.  So when attaching to EMR, you would set up your hive connection something like this:

1-02-2011 5-56-03 PM

Once you’ve done that, the Hive connection will show all the Hive tables and you can enter HQL queries in the SQL editor.  You can drag table and column names into the editor as well:

1-02-2011 5-57-29 PM

One of the simple, but really useful things about the hive client is that you can jump to the jobtracker web page while the HQL is running to see how it is going:

1-02-2011 5-59-19 PM

Here’s the resulting JobTracker console.  We can see the job running and – if we scroll to the right or maximize the window – we can see how the Map and reduce phases of the Hive job are progressing:

1-02-2011 8-11-07 PM