Altiscale is now part of SAP.

Learn more


A big blog for Big Data.

By | April 19th, 2016 | Spark on Hadoop

Spark On Hadoop: Thin JARs

Thus far in this blog series we have focused on the Apache Spark framework, with an emphasis on RDDs, resource tuning, and memory settings. This blog, Part 5 in the series about Spark on Hadoop, will cover some best practices for building a Spark JAR file using either the Simple Build Tool (SBT) or Apache Maven.

By | March 3rd, 2016 | Analytics, Big Data, Hadoop

Tips and Tricks for Running Spark on Hadoop, Part 4: Memory Settings

In Part 3 of this blog series, we discussed how to configure RDDs to optimize Spark performance. We outlined various storage persistence options for RDDs and how to size memory-only RDDs. In this blog, Part 4 of the series, we’ll discuss the memory requirements of other Spark application components. Although Spark does a number of things very well, it will not, unfortunately, intelligently configure memory settings on your behalf. So, we’ll outline how to determine how much memory is available for your RDDs and data so that you can adjust the command line parameters and configuration when you launch your Spark jobs.

By | May 19th, 2015 | Spark

Tips and Tricks for Running Spark on Hadoop, Part 1: Execution Modes

As more of our customers begin running Spark on Hadoop, we’ve identified—and helped them to overcome—some challenges they commonly face. To help other organizations tackle these hurdles, we’re launching this series of blog posts that will share tips and tricks for quickly getting up and running on Spark and reducing overall time to value.

By | September 11th, 2014 | Hadoop, HDFS, Spark, YARN

Apache Spark on Altiscale’s Platform

Apache Spark has quickly become a popular alternative to MapReduce for big data analysis on Hadoop. Consequently, we’ve made Spark available on the Altiscale platform with our customers using it to run exploratory jobs. Their workload has given us insight into use-cases that are well suited to Spark, which we want to share with you in this post. We’ve also gained experience running Spark as a service via YARN, which we’ll discuss in a future post.