Blog

The Unwelcome Guest: Why VMs Aren’t the Solution for Next-Gen Applications

This blog was originally published on O’Reilly Radar.

Data center operating systems are emerging as a first-class category of distributed system software. Apache Hadoop, for example, is evolving from a MapReduce framework into YARN, a generic platform for scale-out applications.

To enable a rich ecosystem of diverse applications to coexist on these platforms, providing adequate isolation is crucial. The isolation mechanism […]

By |August 12th, 2015|Virtual Machines|Comments Off on The Unwelcome Guest: Why VMs Aren’t the Solution for Next-Gen Applications|

Fix Your Big Data Problems with One Simple Solution

It’s time to challenge the premise of on-premises Big Data.

It’s time to stop dealing with job failures, lengthy completion times and the sheer man-hours it takes for data scientists and engineers just to operate and maintain Hadoop. It’s a waste of time and does nothing to move your business forward.

It’s time for a better way. And we’re confident that […]

By |July 22nd, 2015|Big Data Challenge|Comments Off on Fix Your Big Data Problems with One Simple Solution|

Tips and Tricks for Running Spark on Hadoop, Part 2

In this blog we’ll discuss how to troubleshoot problems and exceptions in Spark applications. You’ll note that we don’t explicitly discuss SparkSQL. This is because, for the purposes of this blog, SparkSQL is considered a Spark job.

If, after you’ve read this blog, you’d like to view additional details, examples, and the latest updates regarding Spark troubleshooting, please visit Troubleshooting […]

By |July 14th, 2015|Spark|Comments Off on Tips and Tricks for Running Spark on Hadoop, Part 2|

What’s Next for YARN and Docker

As an execution fabric for running applications, YARN faces the same software-dependency problem encountered by all operating systems—“DLL Hell” as it was once affectionately known. In one post we explained how lightweight Linux Containers combined with speedy Docker Images are a perfect way to solve this dependency problem. In another post, we highlighted some of the deficiencies of Docker […]

By |June 23rd, 2015|Docker|Comments Off on What’s Next for YARN and Docker|

Spark and Hadoop: Friends, not Foes

On June 15th, IBM announced plans to make massive investments in Spark-related technologies. This caps off almost a year of significant business and press attention for Spark.

Our customers have been using Spark since the launch of the Altiscale Data Cloud. During that time, Altiscale has included a version of Spark on our Hadoop-based big data platform, starting from Spark […]

By |June 22nd, 2015|Spark|Comments Off on Spark and Hadoop: Friends, not Foes|

Top 10 Hadoop/YARN Logs – Part 2

This is (the long-awaited) part two of a blog describing important Hadoop logs and their use at Altiscale. In part one, we discussed five logs that yield the most insights and information about our Hadoop 2/YARN clusters. In this blog, we’ll discuss another five.

We depend heavily on log entries to effectively monitor and optimize Altiscale resources. It’s especially important […]

By |June 16th, 2015|Hadoop Logs|Comments Off on Top 10 Hadoop/YARN Logs – Part 2|

Understanding Security and Compliance for Data Lakes and Cloud Providers

When using a service provider like Altiscale, customers are naturally very concerned about the safety and availability of their information assets. Many customers would like to perform a detailed security review of each of their service providers. However, as a practical matter, service providers cannot afford to support custom security reviews from hundreds to thousands of customers—and most customers […]

By |June 11th, 2015|Security and Compliance|Comments Off on Understanding Security and Compliance for Data Lakes and Cloud Providers|

Gartner Gets it Right, Mostly

Gartner started quite a buzz in the big data world with a study that shows that over half of respondents have no plans to invest in Hadoop at this time.

Did Gartner get it right? Does this mean bad things for Hadoop? For Altiscale?

Yes, not really, and hell no.

Gartner simply pointed out the difficult reality that Hadoop adoption is slower […]

By |June 2nd, 2015|Hadoop Challenges|Comments Off on Gartner Gets it Right, Mostly|

Oozie Launcher: Tips for Tackling its Challenges

In this blog post, we will look at Apache Oozie’s launcher job and ways to avoid some of the common pitfalls associated with it. We hope this provides our readers with a better understanding of Oozie’s execution model, including its subtleties.

Oozie’s Execution Model: A Different Approach
Oozie’s execution model is different from the default approach users take to run Hadoop […]

By |May 28th, 2015|Oozie|Comments Off on Oozie Launcher: Tips for Tackling its Challenges|

Tips and Tricks for Running Spark on Hadoop, Part 1

As more of our customers begin running Spark on Hadoop, we’ve identified—and helped them to overcome—some challenges they commonly face. To help other organizations tackle these hurdles, we’re launching this series of blog posts that will share tips and tricks for quickly getting up and running on Spark and reducing overall time to value.

Our focus is Spark running on […]

By |May 19th, 2015|Spark|Comments Off on Tips and Tricks for Running Spark on Hadoop, Part 1|