A big blog for Big Data.

By | April 27th, 2016 | Hadoop, NodeGroup, Performance

Part 1.2: Investigation, Analysis, and Resolution of NodeGroup performance issues on Bare Metal Hardware clusters.

As we saw in Part 1.1 of our blog series on “How to Identify and Resolve Hadoop NodeGroup Performance Problems on Hardware clusters with no virtualization” once we started to use NodeGroup implementation we observed a performance degradation against the DFSIO benchmark. In this blog, Part 1.2 of the series, we’re going to explain the steps we’ve taken to identify and resolve this problem.

By | April 21st, 2016 | Hadoop as a Service, Spark, Spark on Hadoop

Hadoop-as-a-Service in the Classroom

This entry is by Jimmy Lin, Professor and the David R. Cheriton Chair in the David R. Cheriton School of Computer Science at the University of Waterloo. Jimmy just finished teaching a big data course to a group of about 70 undergraduate and graduate students. A big impediment to conducting courses of this type is the lack of large scale Hadoop infrastructure. Fortunately, Altiscale was able to provide a Hadoop cluster where Jimmy's students could conduct their coursework. Below, Jimmy shares his findings and experiences over the course of the semester. (more…)

By | April 19th, 2016 | Spark on Hadoop

Spark On Hadoop: Thin JARs

Thus far in this blog series we have focused on the Apache Spark framework, with an emphasis on RDDs, resource tuning, and memory settings. This blog, Part 5 in the series about Spark on Hadoop, will cover some best practices for building a Spark JAR file using either the Simple Build Tool (SBT) or Apache Maven.

By | April 14th, 2016 | Big Data News

Big Data News: Battle for Hadoop standards continues, RBS and Big Data, your face as data

As the Hadoop ecosystem continues to develop and expand, Spark is expected to be a dominant project over the next 6 years, according to Wikibon. ODPi announces its runtime spec to further Hadoop ecosystem standards. RBS tries to use Big Data to go back in time to a better customer service experience. And how do you feel when your face is data? An art student finds it too easy to find “strangers” online.

By | March 24th, 2016 | Big Data News

Big Data News: IoT, Big Data for Banking, NATO and Cloud Security

Big Data continues to prove itself, as shown with this latest collection of articles about Big Data in logistics, fleet operations management, public health, banking, and the music business. NATO gets in on the cloud security act, and PBS has a great story on using predictive analytics to better protect foster children. Plus, what do data scientists really do all day?

By | March 23rd, 2016 | Hadoop as a Service

Welcome to Big Data in the Cloud, Cloudera

On Tuesday, according to the Twitter news from Cloudera Analyst Day 2016, Cloudera announced that it is moving to the next great opportunity to create value for its customers: Big Data in the cloud.  Congratulations, we have been leading the charge for Apache Hadoop and Apache Spark in the cloud for years, and we know that your move is a good one.

By | March 16th, 2016 | Product News

Announcing the Altiscale Insight Cloud

We at Altiscale are excited to announce our new solution, the Altiscale Insight Cloud, a self-service analytics solution for Big Data. The Altiscale Insight Cloud makes it easier and faster for business analysts to get access to and insights from Big Data, which means that organizations can get to business insights more quickly.

By | March 9th, 2016 | Big Data News

Big Data News: Spark, Use Cases, and MBAs

Spark continues its ascent, with a major announcement from Hortonworks this past week. Big Data opens up opportunities for pharma, insurance, and transportation, as well as helps to drive people to the polls. The promise of IoT continues, while even MBAs are finding promising reasons to unleash their inner data scientist:

By | March 8th, 2016 | Analytics, Big Data, Hadoop

Scheduling Jobs Using Cron or Oozie

Linux System Admins often use cron to schedule recurrent Hadoop jobs. Examples of such jobs might include processing data that has come in during the day to make it ready for analysis the following day, or running a background workflow at times when the cluster is not busy. However, we recommend using Oozie instead of cron for managing workflows in Hadoop. This is because Oozie is specially designed to support Hadoop workloads and offers useful features that cron does not.