If you currently use Microsoft's Hadoop offering, you'll be pleased here about several new features and enhancements that are being added. Earlier this month, Microsoft announced that it's cloud-based Hadoop service, Azure HDInsight, would receive a security upgrade and performance boost. Microsoft claims these changes will provide greater security for users, while also speeding up Big Data queries by as much as 25x.
A Little Bit About Hadoop
Hadoop is an open-source software framework that's used for distributed storage and processing of Big Data on computer clusters. It's a relatively new technology with roots dating back to 2003, a time during which Google published a paper – the Google File System – describing the technology. It wasn't until several years later in 2006, however, when Apache began to turn these beliefs into a working, functional project. Hadoop 0.1.0 was released on April 2006.
Hadoop is comprised primarily of Java programming language, although it also has some C and command line utilities built into the framework.
According to Wikipedia, the Hadoop framework is composed of the four following modules:
Hadoop Common – contains libraries and utilities for use by other Hadoop modules.
Hadoop Distributed File System (HDFS) – a distributed file system for storing data on machines; it offers high aggregate bandwidth throughout the respective Hadoop cluster.
Hadoop Yarn – resource management system that works to manage resources in a cluster while also scheduling processes and applications.
Hadoop MapReduce – a variant of MapReduce that's used for large-scale data processing (e.g. Big Data).
But one of the problems faced by Hadoop is security (or lack thereof). According to a recent survey of 158 executives, nearly half expressed concerns regarding the security of Hadoop.
This doesn't necessarily mean that you should stop using Hadoop, however. The publicly available version may lack the necessary security features for enterprise-use, but there are other, more secure Hadoop offerings out there. Microsoft, for instance, offers Azure HDInsight as its Hadoop offering, which boasts several security upgrades over its default counterpart. And thanks to the introduction of even more security features, HDInsights will likely become an even more popular choice among enterprise users.
Microsoft's Hadoop Upgrade: What You Should Know
While presenting at the Strata Hadoop+World conference, Microsoft announced several security and performance enhancements to its Azure HDInsight Hadoop service. As explained on its blog, Microsoft has taken these steps to provide enterprise-level users with the “highest levels of security.” So, what new security and performance features can you expect to see in Azure HDInsight?
One of several security features being added to Azure HDInsight is server-side data encryption. All data processed by Microsoft's Hadoop service can be encrypted. This feature is said to work transparently with Azure HDInsight, meaning users are not required to take any additional steps. And users of Azure Data Lake Store can either use service-managed encryption keys, or they can manage their own keys in the Key Vault – a digital vault that protects users' security keys using enhanced models.
Microsoft is also adding a central policy and management portal where system administrators can manage user access over their Hadoop systems and components. Known as “Apache Ranger,” this is another means of enhancing the security for its Azure HDInsight users. Microsoft has also stated that users can analyze their audit records in the Apache Ranger interface.
A third security feature coming to Azure HDInsight is streamlined authentication and identify management. Authentication is somewhat of a double-edged sword. It's necessary to prevent cyber intrusions, but it can also slow down enterprises and their workers. But Microsoft is looking to make the process a little easier by integrating Azure Active Directory and Active Directory Domain Services for authentication and identity management, all of which is accomplished in just a few clicks. This means users can secure their Hadoop clusters more quickly, allowing them to focus their time and attention on other tasks. According to Microsoft, this feature will also improve the ease of use for existing Active Directory deployment.
“...we are pleased to announce new capabilities in Azure HDInsight, Microsoft’s managed Hadoop and Spark cloud services, that build on our leadership to make Hadoop enterprise-ready in the cloud and easy for your users with the most security capabilities of any cloud Hadoop solution, big data query speeds that approach data warehousing performance, and new notebook experiences for data scientists all on the latest Hortonworks Data Platform 2.5 and Spark 2.0 platform,” wrote Tiffany Wissner, Microsoft's Senior Director for Product Marketing, in a blog post announcing the new features.
These new Azure HDInsight security features are expected to launch in October.
In addition to the security enhancements mentioned above, Microsoft is also implementing several new performance upgrades to its Azure HDInsight service. The company claims it's the first Cloud Hadoop solution to use Long Lives and Process (LLP) from Singer.Next. As such, Microsoft says Azure HDInsight users can expect speeds up to 25x faster than before, which is pretty impressive to say the least.
How exactly does LLP work? LLP essentially compresses memory while still retaining its scalability within a Hadoop cluster. LLP also enhances the Hive execution engine for services like Smart Map Joins and Better MapJoin.
Security is a top concern among enterprise users of Hadoop services, and for good reason: there's been a disturbing trend of increased cyber attacks and cyber intrusion in recent years. Hadoop has its own built-in safeguards, but Microsoft is planning to enhance the security for its users with the features mentioned above.
These are just a few of the many ways that Microsoft is planning to improve its Azure HDInsight Hadoop service. Be sure to check back with our blog here for more news surrounding HDInsight and other Hadoop offerings.
Thanks for reading and feel free to let us know your thoughts in the comments below regarding Hadoop.