A big blog for Big Data.

By | January 26th, 2016 | Analytics, Big Data, Hadoop

Best Practices for Dynamic Partitioning in Hive

With its proven ability to speed performance, partitioning is a must-have feature in the tool set of any Big Data query engine. Hive is no exception; it has had partition support since its early versions. Although this blog will touch on static partitioning, it will primarily focus on when and how to best employ dynamic partitioning—a method we believe is often underutilized as an effective means for partitioning data and improving performance.

By | January 5th, 2016 | Analytics, Big Data, Hadoop, Security and Compliance

Achieving Regulatory Compliance When Employing Cloud Service Providers

Achieving regulatory compliance can be complicated—especially when using a service provider like Altiscale or Amazon AWS. You may wonder: Is your organization responsible for every aspect of its regulatory compliance when using a service provider? And how does the service provider fit in? Are they responsible for anything and, if so, what? How can your organization make sure your service provider is doing what it’s supposed to?

By | December 17th, 2015 | Analytics, Big Data, Hadoop

Apache Yetus: Faster, More Reliable Software Development

What makes good software? There is no shortage of books, papers, and opinions discussing the topic, and they often agree that two key characteristics are correctness and maintainability. To help teams develop correct and maintainable software, the Apache Software Foundation (ASF) has released the first version of a new, top-level project—Apache Yetus—that’s generating quite a bit of excitement within various Apache communities.

By | December 16th, 2015 | Analytics, Big Data, Hadoop

Altiscale’s HdfsUtils Equals Easier HDFS Management

At Altiscale, we’re constantly working to enhance the customer experience on our data cloud, while also staying committed to Hadoop’s open-source DNA. We smooth out the rough edges of Hadoop and Spark whenever we can to reduce friction and improve time to value. Recently, we’ve done some “smoothing” in the area of HDFS usage management. We’ve developed and are now providing HdfsUtils, a package of tools that helps customers to more quickly and easily manage their HDFS usage.

By | October 20th, 2015 | Analytics, Big Data, Business Intelligence, Hadoop

Hitting a Big Data Wall? How to Improve Hadoop ROI

With the adoption of any new technology, businesses are tasked with measuring success and demonstrating value to the organization — Big Data is not any different. As interest in Big Data grows, the value of the technology platform that delivers it must be accurately calculated and demonstrated to the rest of the organization. Hadoop is increasingly the platform of choice for Big Data, and there is a common misconception that predicting and deriving ROI from Hadoop deployments is too challenging. According to a recent Gartner research report, 24 percent of organizations are not measuring the ROI of Big Data at all.

By | September 30th, 2015 | Security and Compliance

How Altiscale Supports HIPAA Customers’ Compliance Needs

Today Altiscale is announcing the achievement of two major security milestones: Health Insurance Portability and Accountability Act (HIPAA) compliance and PCI DSS Level 1 certification. With the addition of HIPAA compliance, Altiscale is making it faster and easier for organizations to adhere to HIPAA standards for data security and privacy, while enabling them to leverage the power of Hadoop for Big Data analytic processing. This reduces risk and accelerates the realization of business value by eliminating many of the difficulties of deploying Hadoop in HIPAA regulated organizations.

By | September 30th, 2015 | Security and Compliance

PCI Compliance: Not Just a Checkbox

Here’s the thing about PCI compliance: you may not technically need it, but you still want it. And now, with Altiscale, you have it. Strictly speaking, the purpose of the Payment Card Industry Data Security Standard (PCI DSS) is to protect a very narrow set of data called cardholder data1, which is only required or even useful for processing payment transactions. In fact, PCI prohibits storage of any cardholder data unless it is necessary for the explicit purpose of payment processing. Altiscale is not a payment processor. So, to remain compliant with the spirit of PCI, customers generally do not store or process what the PCI Security Standards Council (SSC) defines as cardholder data or sensitive authentication data at Altiscale.