Altiscale is now part of SAP.

Learn more
Blog

Blog

A big blog for Big Data.

 
By | September 12th, 2016 | Spark

Apache Spark 2.0 Now Available

Apache Spark 2.0 Now Available Apache Spark 2.0, released in late July, is now available on the Altiscale Data Cloud. As one of the most active open-source Big Data projects under development, Spark is an integral component of the Big Data ecosystem and is used by a majority of Altiscale customers in areas such as interactive SQL and data transformation. The addition of Spark 2.0 to Altiscale’s Spark-as-a-Service offering provides improved performance, expanded SQL support, and streamlined programming APIs.

By | April 21st, 2016 | Hadoop as a Service, Spark, Spark on Hadoop

Hadoop-as-a-Service in the Classroom

This entry is by Jimmy Lin, Professor and the David R. Cheriton Chair in the David R. Cheriton School of Computer Science at the University of Waterloo. Jimmy just finished teaching a big data course to a group of about 70 undergraduate and graduate students. A big impediment to conducting courses of this type is the lack of large scale Hadoop infrastructure. Fortunately, Altiscale was able to provide a Hadoop cluster where Jimmy's students could conduct their coursework. Below, Jimmy shares his findings and experiences over the course of the semester. (more…)

By | May 19th, 2015 | Spark

Tips and Tricks for Running Spark on Hadoop, Part 1: Execution Modes

As more of our customers begin running Spark on Hadoop, we’ve identified—and helped them to overcome—some challenges they commonly face. To help other organizations tackle these hurdles, we’re launching this series of blog posts that will share tips and tricks for quickly getting up and running on Spark and reducing overall time to value.

By | February 24th, 2015 | Hadoop, Spark

What is the problem that the Open Data Platform seeks to solve?

Last week, the Open Data Platform (ODP) was announced and written about in the technology and business press. While Altiscale has received strong support for its participation in the ODP, we have also received many questions about why the ODP is necessary and the exact role that it would serve. In this blog post, we want to explain more deeply the problem that the ODP is out to solve. In a subsequent post we’ll discuss why we think that a new organization complementary to the Apache Software Foundation (ASF) is the best way to solve this problem.

By | February 3rd, 2015 | Hadoop, HDFS, Spark, YARN

Spark and Hadoop Together in the Cloud

When it comes to running Spark and Hadoop in the cloud, Altiscale provides the best of both worlds. As an example, our cloud platform utilizes YARN for resource management. This means you can leverage MapReduce for large-scale batch processing while opting to deploy Spark for in-memory, interactive analysis using GraphX, MLLib, or your own custom Spark applications.

By | September 11th, 2014 | Hadoop, HDFS, Spark, YARN

Apache Spark on Altiscale’s Platform

Apache Spark has quickly become a popular alternative to MapReduce for big data analysis on Hadoop. Consequently, we’ve made Spark available on the Altiscale platform with our customers using it to run exploratory jobs. Their workload has given us insight into use-cases that are well suited to Spark, which we want to share with you in this post. We’ve also gained experience running Spark as a service via YARN, which we’ll discuss in a future post.