Hadoop as a service (HaaS)
What is Hadoop as a service (HaaS)?
Hadoop as a service (HaaS), also known as Hadoop in the cloud, is a big data analytics framework that stores and analyzes data in the cloud using Hadoop. Users do not have to invest in or install additional infrastructure on premises when using the technology, as HaaS is provided and managed by a third-party vendor.
Hadoop is a software framework used to manage data and storage for big data applications in clustered systems. Hadoop gives users the ability to collect, process and analyze data. HaaS strives to provide the same experience to users in the cloud. HaaS is useful for medium and large-scale organizations that do not have the infrastructure or ability to host Hadoop on premises.
The open source Hadoop big data analytics framework enables large, unstructured data sets to be analyzed. Hadoop's storage mechanism, Hadoop Distributed File System (HDFS), distributes these workloads across multiple nodes so they can be processed in parallel. Hadoop-as-a-service providers integrate proprietary programs with the Hadoop framework to make it easier for organizations to use and typically include management and support capabilities. Most HaaS offerings are cloud-based, and pricing is most often on a per-cluster, per-hour basis.
Features
HaaS providers offer a variety of features and support, including the following:
- Hadoop framework deployment support.
- Hadoop cluster management.
- Alternative programming languages.
- Data transfer between clusters.
- Customizable and user-friendly dashboards and data manipulation.
- Security features.
Advantages and disadvantages
Running HaaS can lead to a balance of advantages and disadvantages. Advantages of HaaS include the following:
- Elimination of the need to deploy additional physical hardware infrastructure.
- A wide range of data sources that can be used, including clickstream data or emails.
- Supported functions, including fraud detection, data warehousing or automated copying of data in case of data loss.
- Seed so the tools that process data are used on the same servers the data is located on, leading data process speeds to increase.
Disadvantages, however, include the following:
- The Hadoop open source programming language requires a special set of skills many organizations do not have in-house or cannot afford.
- Skilled engineers well rounded in Hadoop are hard to find.
- Hadoop security measures are disabled by default.
- Only medium to large organizations can make efficient use of HaaS.
The services HaaS providers offer in their platforms are both a positive and a negative. HaaS providers can offer a wide variety of features that could include just the Hadoop software or other features, such as virtual machines. This variety can be useful for organizations that want to choose their provider based on precisely what they need and what the provider offers. However, it may also be initially confusing for an organization just starting to consider HaaS.
HaaS providers and features to consider
Amazon was the first major provider of Hadoop as a service. Other providers include the following:
- Microsoft.
- IBM.
- Oracle.
- OpenStack.
- Google.
Features to look for in a HaaS provider include the following:
- Data that should be stored persistently in HDFS -- this avoids issues associated with translating data stored in other formats into HDFS.
- Elasticity to accommodate a wide variety of workloads.
- Ability to recover from processing failures without restarting the entire process, known as nonstop operations.
- A self-configuring environment that enables automatic configuration based on workload.
Editor's note: This article was revised in 2024 by TechTarget editors to improve the reader experience.