Altiscale is now part of SAP.

Learn more
Blog

Blog

A big blog for Big Data.

 
By | January 26th, 2016 | Analytics, Big Data, Hadoop

Best Practices for Dynamic Partitioning in Hive

With its proven ability to speed performance, partitioning is a must-have feature in the tool set of any Big Data query engine. Hive is no exception; it has had partition support since its early versions. Although this blog will touch on static partitioning, it will primarily focus on when and how to best employ dynamic partitioning—a method we believe is often underutilized as an effective means for partitioning data and improving performance.

By | December 17th, 2015 | Analytics, Big Data, Hadoop

Apache Yetus: Faster, More Reliable Software Development

What makes good software? There is no shortage of books, papers, and opinions discussing the topic, and they often agree that two key characteristics are correctness and maintainability. To help teams develop correct and maintainable software, the Apache Software Foundation (ASF) has released the first version of a new, top-level project—Apache Yetus—that’s generating quite a bit of excitement within various Apache communities.

By | December 16th, 2015 | Analytics, Big Data, Hadoop

Altiscale’s HdfsUtils Equals Easier HDFS Management

At Altiscale, we’re constantly working to enhance the customer experience on our data cloud, while also staying committed to Hadoop’s open-source DNA. We smooth out the rough edges of Hadoop and Spark whenever we can to reduce friction and improve time to value. Recently, we’ve done some “smoothing” in the area of HDFS usage management. We’ve developed and are now providing HdfsUtils, a package of tools that helps customers to more quickly and easily manage their HDFS usage.