Altiscale is now part of SAP.

Learn more


A big blog for Big Data.

By | January 26th, 2016 | Analytics, Big Data, Hadoop

Best Practices for Dynamic Partitioning in Hive

With its proven ability to speed performance, partitioning is a must-have feature in the tool set of any Big Data query engine. Hive is no exception; it has had partition support since its early versions. Although this blog will touch on static partitioning, it will primarily focus on when and how to best employ dynamic partitioning—a method we believe is often underutilized as an effective means for partitioning data and improving performance.

By | December 17th, 2015 | Analytics, Big Data, Hadoop

Apache Yetus: Faster, More Reliable Software Development

What makes good software? There is no shortage of books, papers, and opinions discussing the topic, and they often agree that two key characteristics are correctness and maintainability. To help teams develop correct and maintainable software, the Apache Software Foundation (ASF) has released the first version of a new, top-level project—Apache Yetus—that’s generating quite a bit of excitement within various Apache communities.

By | December 16th, 2015 | Analytics, Big Data, Hadoop

Altiscale’s HdfsUtils Equals Easier HDFS Management

At Altiscale, we’re constantly working to enhance the customer experience on our data cloud, while also staying committed to Hadoop’s open-source DNA. We smooth out the rough edges of Hadoop and Spark whenever we can to reduce friction and improve time to value. Recently, we’ve done some “smoothing” in the area of HDFS usage management. We’ve developed and are now providing HdfsUtils, a package of tools that helps customers to more quickly and easily manage their HDFS usage.

By | June 23rd, 2015 | Docker

What’s Next for YARN and Docker

As an execution fabric for running applications, YARN faces the same software-dependency problem encountered by all operating systems—“DLL Hell” as it was once affectionately known. In one post we explained how lightweight Linux Containers combined with speedy Docker Images are a perfect way to solve this dependency problem. In another post, we highlighted some of the deficiencies of Docker for this purpose, and outlined some of the work we’ve done to integrate Docker into YARN, addressing the deficiencies along the way. It’s been a while since our last report, so we thought it was time to bring you up to date.

By | February 17th, 2015 | Big Data, Hadoop, Hadoop as a Service

The Open Data Platform: Uniting for an Enterprise-Class Hadoop Ecosystem

This morning, a coalition of fourteen leading technology organizations announced the creation of the Open Data Platform (ODP), an industry association dedicated to accelerating the adoption of enterprise-class, big data applications that are based on the Apache Hadoop ecosystem of solutions. We at Altiscale are proud to be part of this initiative.

By | February 11th, 2015 | Big Data, Hadoop

The Total Economic Impact™ Of Altiscale Hadoop-as-a-Service: Cost Savings And Business Benefits Enabled By Hadoop-as-a-Service

Altiscale commissioned Forrester Consulting to conduct a Total Economic Impact™ (TEI) study and examine the potential return on investment (ROI) enterprises may realize by deploying its Hadoop-as-a-Service (HaaS). The purpose of this study is to provide readers with a framework to evaluate the potential financial impact of Altiscale’s Hadoop-as-a-Service on their organizations.

By | January 20th, 2015 | Big Data, Security and Compliance

Security Needs to be Demonstrated

Altiscale was founded to bring Hadoop to the enterprise through the Cloud. Most of us come from large Web companies like Yahoo!, Google and LinkedIn. We bring with us a very deep understanding of the technical aspects of cloud security issues and the practical experience of dealing with security threats, ranging from teenage hackers to foreign governments.