We are thrilled to announce that Hortonworks Data Platform (HDP) version 3.0 is now available for early access. For more information, go here.  3.0 delivers new capabilities for the enterprise to enable agile application deployment, new machine learning/deep learning workloads, real-time database, & security and governance. It is a key component of the modern architecture and can be deployed both on-premises and the cloud.  Many of the new enhancements to 3.0 are based on Apache Hadoop 3.1 and include containerization, GPU support, erasure coding and namenode federation.  3.0 is:

  • : Agile application deployment via containerization, which enables apps to be launched quickly, allowing users to save time and resources. Containerization makes it possible to run multiple versions from the application enabling developers to rapidly create new features by developing and testing new versions of services without disrupting old ones. It improves resource utilization and increases task throughput for containers. The end result: faster time to market for services and increased developer productivity.
  • : Support for deep learning applications, allowing customers to run workloads such as machine learning and deep learning that require substantial – and expensive – GPU resources. GPU pooling enables the sharing of GPU resources with more workloads for cost effectiveness. With GPU isolation, GPUs can be dedicated to an application so that no other application has access to that GPU.
  • : HDP is optimized for the cloud, ensuring automated cloud provisioning to simplify big data deployments while optimizing the use of cloud resources. The platform includes engineered support for all of the the major cloud object stores: Amazon S3, Azure Data Lake Store (ADLS), Azure Storage Blob, and Google Cloud Storage (GCS) technical preview.  HDP is cloud agnostic. Customers can use Cloudbreak for easy provisioning of HDP clusters to your their provider of choice.  In addition, there are service connectors to cloud including Apache HBase and S3 (technical preview), and Apache Spark with S3Guard for higher query performance.

A Few Other Enhancements for HDP 3.0

We continue to invest in innovation and improved usability in HDP to make it more enterprise ready and secure for our customers.  For example, we have many new improvements around query performance and throughout from the newest release of Apache Hive 3.0.  We added additional enhancements for Apache Spark and data science workloads with a technical preview for TensorFlow.  Furthermore, we continue to get more granular around governance by adding  tags on metadata to follow the data through the ecosystem across the entire enterprise.

Other additional capabilities include:

  • Scalability and availability with NameNode federation, allowing customers to scale to thousands of nodes and a billion files. Higher availability with multiple name nodes and standby capabilities allow for the undisrupted, continuous cluster operations if a namenode goes down.
  • Lower total cost of ownership with erasure coding, providing a data protection method that up to this point has mostly been found in object stores. Hadoop 3 will no longer default to storing three full copies of each piece of data across its clusters. Instead of that 3x hit on storage, the erasure encoding method in Hadoop 3 will incur an overhead of 1.5x while maintaining the same level of data recoverability from disk failure. The end result will be a 50% savings in storage overhead, reducing it by half.
  • Real-time database, delivering improved query optimization to process more data at a faster rate by eliminating the performance gap between low-latency and high-throughput workloads. Enabled via Apache Hive 3.0, HDP 3.0 offers the only unified SQL solution that can seamlessly combine real-time & historical data, making both available for deep SQL analytics. New features such as                workload management enable fine grained resource allocation so no need to worry about resource competition. Materialized views pre-computes and caches the intermediate tables into views where the query optimizer will automatically leverage the pre-computed cache, drastically improve performance. The end result is faster time to insights.
  • Data science performance improvements around Apache Spark and Apache Hive integration. HDP 3.0 provides seamless Spark integration to the cloud. And containerized TensorFlow technical preview combined with GPU pooling delivers a deep learning framework that makes deep learning faster and easier.
  • Enhanced security and governance, promoting greater regulatory compliance, including GDPR, through full chain of custody of data as well as fine-grained auditing of events. These new features offer the unique ability to track the lineage of data from its origin to the data lake. It also enables auditors to view data without making changes, have time-based policies, and audit events around third parties with encryption protection.


At this time, Hortonworks would like to thank everyone within the Apache community for all of their efforts and contributions. The pace of innovation continues to be truly amazing. We are grateful for the opportunity to work with such a unique group of dedicated professionals. We look forward to continuing our work with you and the ever vibrant community!    We look forward to hearing your feedback.


For further information, please check out: What’s new in Hortonworks Data Platform?


Source link


Please enter your comment!
Please enter your name here