Architecture

Architecture

Data at your service.

The Altiscale Data Cloud is secure, scalable, and comprehensive. And our operations team ensures it’s hassle free.

The Altiscale Data Cloud was built to solve enterprise-class Big Data challenges with ease. It is a comprehensive Big Data solution, based on the Apache Hadoop and Apache Spark ecosystem. It offers a full data science workbench. It is surrounded with world-class security technology and process protections. And it all runs in a highly secure datacenter where the hardware and networking are specifically selected, configured, and managed for Big Data operations.

The result is elasticity, high reliability, and blazingly fast performance. Your Big Data jobs get done—quickly—without unnecessary stress. That’s why some of the largest organizations in the world trust Altiscale with their data and analytics and even run their own applications on top of Altiscale for their customers.

Altiscale Data Cloud

The Altiscale Data Cloud is specifically constructed and managed to deliver high-performance Big Data results. A comprehensive solution built to satisfy the demands of a broad range of data analytics requirements, it comes preconfigured with core compute engines such as Spark, Tez, and MapReduce, as well as services such as Hive, Oozie, Pig, and Spark SQL.

Altiscale runs Hadoop on hardware that is purpose-built and tuned for Hadoop. Not only does Altiscale select the best hardware for Big Data, we also configure the kernel and network parameters on top of the hardware for Big Data performance.

As a result, the Altiscale Data Cloud handily beats the performance of its competitors, as proven in real-world examples by Altiscale customers.

Altiscale Data Cloud

Storage Services — HDFS

 

Storage services primarily consists of Hadoop Distributed File System (HDFS) and HCatalog. HDFS is a highly reliable, fault-tolerant, distributed storage system used for storing and retrieving Big Data at high throughput. HCatalog is a storage management layer that provides a common interface to multiple compute services.

Fault Tolerant
The Altiscale operations team works around the clock for you, monitoring the clusters for hardware or software failures. We take professional pride in ensuring that your service is up to date, fault tolerant, and reliable.

Secure
Altiscale supports Kerberos-enabled Hadoop clusters, which ensure the user is authenticated before accessing HDFS.

Elastic
Altiscale considers scale-out architecture to be core to its business. Customers can easily expand capacity as their needs increase without having to worry themselves about hardware. Altiscale customers benefit from elastic storage, which grows and shrinks to meet customer needs over time.

storage-serv

Resource Management — YARN

Resource management at Altiscale Data Cloud is managed through YARN (Yet Another Resource Negotiator), a large-scale distributed operating system for Big Data introduced in Hadoop 2.x, which improves significantly on resource management in Hadoop 1.x. Big Data jobs often come in bursts, requiring rapid shifts in processing capacity. While some jobs might only require a small amount of computing capacity, others might dramatically exceed the data volume of an average job. One of the key benefits of the Altiscale Data Cloud is elasticity—giving customers rapid access to the capacity they need to get their largest jobs done, without having to worry about hardware or job scheduling. Since Altiscale is in the cloud, the resources for scaling are simply available whenever the customer needs them.

YARN lets multiple data processing engines, such as Spark, Tez, and MapReduce, run on top of Hadoop. This unlocks an entirely new approach to data analytics by enabling multiple analytics frameworks to run simultaneously and take advantage of the data stored in HDFS.

yarn

Ready for Any Job, Anytime
Run simultaneous applications on the same dataset.

Spike Proof
Utilize Altiscale’s elasticity and expand service capacity to accommodate spiking demand.

Future Ready
Altiscale is optimized for Big Data and easily scales as overall data volumes grow.

Compute Services

 

Compute services consists of a set of engines and basic services that sit on top of the compute engines to perform different types of processing. There are several compute engines that run on top of Altiscale. Depending on the use case, any of the following can be used on top of Altiscale Data Cloud.

Apache Spark  Run Spark in production.
Apache Tez  Run your existing MapReduce jobs faster on Tez.
MapReduce  Use the most stable, batch-based processing engine.
Apache Hive  Use the most reliable SQL-like data warehouse purpose built on Hadoop. Visualize using Tableau or any tool that connects to Hive using JDBC or ODBC.

Big Data Solutions and Analytics

Altiscale makes it easy for you to search, explore, obtain insights, and do analytics on top of Big Data. We provide a set of open-source solutions for data exploration and analytics as well as partner with specialized analytics providers to ensure that data scientists have a complete workbench of effective tools to get their jobs done right.

Learn more about our partners here.

Data Transfer and Connectivity Options

 

There are several secure options for moving data in and out of the Altiscale Data Cloud. The options available also depend on the type of data that needs to be moved. The different types of data can be roughly divided as follows:

SQL Data
Transfer data between Altiscale and a structured datastore, such as a relational database.

Sqoop
Transfer bulk data between Apache Hadoop and structured datastores, such as relational databases.

Streaming Data
Stream data to Altiscale using a high-throughput pub/sub-based solution.

Kafka
Ingest event-based data to HDFS using Camus, or stream it using Spark Streaming.

Flume
Collect, aggregate, and move large amounts of log data directly to Altiscale.

Bulk Data
Use the standard DistCP tool that comes with Apache Hadoop, or use our open-source tools to transfer data efficiently to Altiscale. Transfer from any of the following with ease.

On-premises HDFS

On-premises storage area network (SAN) or network-attached storage (NAS)

Cloud storage, such as Amazon Simple Storage Service (S3)

Connectivity Options
Whether your data is in your private cloud or Amazon S3, we provide enterprise-class connectivity options to Altiscale Data Cloud.

Workbench – SSH
Use secure shell to view, run, or access your cluster.

High-Throughput Transfer Host
A high-throughput transfer host will ingest Big Data at a high frequency.

IPSec
Use IPSec to authenticate and secure your data transfers.

Direct Connection
Deliver data directly through Altiscale’s Direct Connection.

Portal

The Altiscale Portal is a central location where you can add or remove new users, control access to your cluster, and access cluster information, job details, or usage details.

portal

Altiscale Advisory Services and Proactive Support

 

Altiscale advisory services bring users a team of experts available to provide advice regarding which engines to use and how to best plan their jobs. By working with Altiscale advisory services in advance, users can more rapidly and easily achieve their goals. Altiscale proactive support helps customers keep their jobs on track, notifying them and providing fixes in advance when jobs look like they might be headed for trouble.

Superior architecture
drives superior results.

Read what Forrester Consulting has to say about
the benefits Altiscale customers experience.

61%

Reduction in 
hadoop job failures

46%

Reduction in 
job completion times

SIGNIFICANT IMPROVEMENT IN
DATA SCIENTIST PRODUCTIVITY

<90
days

payback period

Get in touch.

For a more involved conversation, contact our expert team.

Contact Us