Altiscale is now part of SAP.

Learn more


A big blog for Big Data.

By | December 29th, 2014 | Hadoop, HIVE, Tez

Optimizing Apache Hive by Tuning Configuration Parameters

With the recent availability of Hive 0.13.1/Tez on Altiscale clusters, Hive users have new options for tuning the performance of their Hive databases. This blog post describes four techniques: using Tez, implementing ORC format tables, leveraging Cost Based Optimization and enabling vectorization. We’ve seen positive performance impact on Hive queries from all four.

By | November 13th, 2014 | Big Data, Data Science, Hadoop, HIVE

There Will Never Be One SQL to Rule All

Many discussions of SQL-on-Hadoop implicitly assume a single engine will emerge to handle all analytical workloads. All workloads will be shifted into Hadoop, eliminating the need for data marts. Sounds great, but it’s not going to happen. To understand why, consider this (oversimplified!) table comparing a few SQL solutions by a variety of different quality attributes:

By | September 2nd, 2014 | Big Data, Hadoop, HIVE, Uncategorized

Taming a Hive Query Gone Wild

One of my colleagues is fond of reminding people that Hadoop is perfectly happy to let you do bad things at scale. The point being, given the massive amounts of data and computational-capacity available in a Hadoop environment, bad code or mistakes with Hadoop can be much more costly than in traditional data environments.

By | June 20th, 2014 | Big Data, Hadoop, HIVE

What’s New in Hive 0.13?

The recent release of Hive 0.13 completes the final phase of the Stinger initiative and is available simultaneously with Tez 0.4. While there are many bug fixes in the latest version of Hive, the biggest and certainly the most anticipated changes are the speed and performance improvements.

By | June 3rd, 2014 | Big Data, Hadoop, HIVE

Altiscale Supports Apache Hive 0.13

Today’s blog post is coming to you from this week’s Hadoop Summit, taking place at the San Jose Convention Center. With the Exhibit Hall about to open we’re pretty excited to be here and interact with our colleagues and customers. One of the items we’ve announced today is the availability of Apache Hive™ 0.13 on the Altiscale Hadoop-as-a-Service (HaaS) platform, just weeks following its general software release to the industry. We’re pretty proud we’re able to launch Hive 0.13 so quickly. You may ask “Why?,” especially since some distros have announced the availability of this version of Hive a couple weeks ago. Well, the difference with us is that distros just make the software available to you. You still have to work through any bugs and optimize this brand new software with your cluster. That takes a lot of time and work.