What is Big Data as a Service (BDaaS)?
Thanks to the increased adoption of cloud infrastructures, processing, storing, and analyzing huge amounts of data has never been easier. The big data revolution may have already happened, but it’s Big Data as a service, or BDaas, that’s making it a reality for many businesses and organizations.
Essentially, BDaas is any service that involves managing or running big data on the cloud.
The advantages of BDaas
There are many advantages to using a BDaaS solution. It makes many of the aspects that managing a big data infrastructure yourself so much easier.
One of the biggest advantages is that it makes managing large quantities of data possible for medium-sized businesses. Not only can it be technically and physically challenging, it can also be expensive. With BDaaS solutions that run in the cloud, companies don’t need to stump up cash up front, and operational expenses on hardware can be kept to a minimum. With cloud computing, your infrastructure requirements are fixed at a monthly or annual cost.
However, it’s not just about storage and cos. BDaaS solutions sometimes offer in-built solutions for artificial intelligence and analytics, which means you can accomplish some pretty impressive results without having to have a huge team of data analysts, scientists and architects around you.
The different models of BDaaS
There are three different BDaaS models. These closely align with the 3 models of cloud infrastructure: IaaS, PaaS, and SaaS.
- Big Data Infrastructure as a Service (IaaS) – Basic data services from a cloud service provider.
- Big Data Platform as a Service (PaaS) – Offerings of an all-round Big Data stack like those provided by Amazon S3, EMR or RedShift. This excludes ETL and BI.
- Big Data Software as a Service (SaaS) – A complete Big Data stack within a single tool.
How does the Big Data IaaS Model work?
A good example of the IaaS model is Amazon’s AWS IaaS architecture, which combines S3 and EC2. Here, S3 acts as a data lake that can store infinite amounts of structured as well as unstructured data. EC2 acts a compute layer that can be used to implement a data service of your choice and connects to the S3 data.
For the data layer you have the option of choosing from among:
- Hadoop – The Hadoop ecosystem can be run on an EC2 instance giving you complete control
- NoSQL Databases – These include MongoDB or Cassandra
- Relational Databases – These include PostgreSQL or MySQL
For the compute layer, you can choose from among:
- Self-built ETL scripts that run on EC2 instances
- Commercial ETL tools that can run on Amazon’s infrastructure and use S3
- Open source processing tools that run on AWS instances, like Kafka
How does the Big Data PaaS Model work?
A standard Hadoop cloud-based Big Data Infrastructure on Amazon contains the following:
- Data Ingestion – Logs file data from any data source
- Amazon S3 Data Storage Layer
- Amazon EMR – A scalable set of instances that run Map/Reduce against the S3 data.
- Amazon RDS – A hosted MySQL database that stores the results from Map/Reduce computations.
- Analytics and Visualization – Using an in-house BI tool.
A similar set up can be replicated using Microsoft’s Azure HDInsight. The data ingestion can be made easier with Azure Data Factory’s copy data tool. Apart from that, Azure offers several storage options like Data lake storage and Blob storage that you can use to store results from the computations.
How does the Big Data SaaS model work?
A fully hosted Big Data stack complete that includes everything from data storage to data visualization contains the following:
- Data Layer – Data needs to be pulled into a basic SQL database. An automated data warehouse does this efficiently
- Integration Layer – Pulls the data from the SQL database into a flexible modeling layer
- Processing Layer – Prepares the data based on the custom business requirements and logic provided by the user
- Analytics and BI Layer – Fully featured BI abilities which include visualizations, dashboards, and charts, etc.
Azure Data Warehouse and AWS Redshift are the popular SaaS options that offer a complete data warehouse solution in the cloud. Their stack integrates all the four layers and is designed to be highly scalable. Google’s BigQuery is another contender that’s great for generating meaningful insights at an unmatched price-performance.
Choosing the right BDaaS provider
It sounds obvious, but choosing the right BDaaS provider is ultimately all about finding the solution that best suits your needs.
There are a number of important factors to consider, such as workload, performance, and cost, each of which will have varying degrees of importance for you. criteria behind the classification include workload, performance and budget requirements.
Here are 3 ways you might approach a BDaaS solution:Core BDaaS
Core BDaaS uses a minimal platform like Hadoop with YARN and HDFS and other services like Hive. This service has gained popularity among companies which use this for any irregular workloads or as part of their larger infrastructure. They might not be as performance intensive as the other two categories.
A prime example would be Elastic MapReduce or EMR provided by AWS. This integrates freely with NoSQL store, S3 Storage, DynamoDB and similar services. Given its generic nature, EMR allows a company to combine it with other services which can result in simple data pipelines to a complete infrastructure.
Performance BDaaS assists businesses that are already employing a cluster-computing framework like Hadoop to further optimize their infrastructure as well as the cluster performance. Performance BDaaS is a good fit for companies that are rapidly expanding and do not wish to be burdened by having to build a data architecture and a SaaS layer.
The benefit of outsourcing the infrastructure and platform is that companies can focus on specific processes that add value instead of concentrating on complicated Big Data related infrastructure. For instance, there are many third-party solutions built on top of Amazon or Azure stack that let you outsource your infrastructure and platform requirements to them.
If your business is in need of additional features that may not be within the scope of Hadoop, Feature BDaaS may be the way forward. Feature BDaaS focuses on productivity as well as abstraction. It is designed to enable users to be up and using Big Data quickly and efficiently.
Feature BDaaS combines both PaaS and SaaS layers. This includes web/API interfaces, and database adapters that offer a layer of abstraction from the underlying details. Businesses don’t have to spend resources and manpower setting up the cloud infrastructure. Instead, they can rely on third-party vendors like Qubole and Altiscale that are designed to set it up and running on AWS, Azure or cloud vendor of choice quickly and efficiently.
Additional Tips for Choosing a Provider
When evaluating a BDaaS provider for your business, cost reduction and scalability are important factors. Here are a few tips that should help you choose the right provider.
- Low or Zero Startup Costs – A number of BDaaS providers offer a free trial period. Therefore, theoretically, you can start seeing results before you even commit a dollar.
- Scalable – Growth in scale is in the very nature of a Big Data project. The solution should be easy and affordable to scale, especially in terms of storage and processing resources.
- Industry Footprint – It is a good idea to choose a BDaaS provider that already has an experience in your industry. This is doubly important if you are also using them for consultancy and project planning requirements.
- Real-Time Analysis and Feedback – The most successful Big Data projects today are those that can provide almost immediate analysis and feedback. This helps businesses to take remedial action instantly instead of working off of historical data.
- Managed or Self-Service – Most BDaaS providers today provide a mix of both managed as well as self-service models based on the company’s needs. It is common to find a host of technical staff working in the background to provide the client with services as needed.
The value of big data is not in the data itself, but in the insights that can be drawn after processing it and running it through robust analytics. This can help to guide and define your decision making for the future.
A quick tip with regards to using Big Data: keep it small at the initial stages. This ensures the data can be checked for accuracy and the metrics derived from them are right. Once confirmed, you can go ahead with more complex and larger data projects.
Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Oracle, Zend, CheckPoint and Ixia. Gilad is a 3-time winner of international technical communication awards, including the STC Trans-European Merit Award and the STC Silicon Valley Award of Excellence.
Over the past 7 years Gilad has headed Agile SEO, which performs strategic search marketing for leading technology brands. Together with his team, Gilad has done market research, developer relations and content strategy in 39 technology markets, lending him a broad perspective on trends, approaches and ecosystems across the tech industry.