Cloudera Altus is a suite of products which enable you to deploy and consume Cloudera Enterprise clusters in the public cloud environment of your choice. With Cloudera Altus cloud services such as Altus Data Engineering or Altus Data Warehouse Cloudera creates and manages clusters on your behalf, using your cloud provider’s infrastructure resources, freeing you up to focus on your workloads and their business value. Cloudera Altus cloud services creates resources and manages them in your account, not a Cloudera account, putting you solidly in control. You retain all access control over your data—Cloudera takes care of keeping the cluster running in good health.
When you’ve finished this blog post you will understand the technical details of the controls used to set up your cloud environment, which implement security best practices while still allowing Cloudera Altus cloud services access to that environment.
Because security is important for any cloud service, we’re going to go into the detail of how you secure your compute and data when using Cloudera Altus. Fortunately for you, Altus cloud services greatly simplify this process, providing wizards and templates that handle most of this technical complexity, but if you wondered what exactly was happening at a more detailed level, then read on.
The Infrastructure Construction Problem
This first problem is how can a Cloudera Altus customer control which resources and services Altus has access to while also managing access within their own organizational structure.
From this perspective, Altus is simply an application in some other account (AWS) or tenant (Azure) that needs to have access to the resources in an account or subscription (Azure) under your control. Following normal security best practices, you will want to grant Altus the minimum necessary permissions to build and manage clusters, but no additional ones.
The Data Access Problem
The second problem is how can you control the access that those resources created by Altus have to data within cloud provider account. After all, Altus has built some machines in the Altus customer’s cloud account but those machines will need access to the data within that account. How can that access be controlled without putting credentials in those machines?
Although implemented differently in each cloud provider, the solution to this problem comes down to the following steps, which Altus users take—these user initiated steps are then complemented by tasks performed by the cloud provider, such as authenticating Altus when it makes a request, or interpreting authorization policies.
- Establish trust between your account or tenant (the trustor) and Altus (the trustee)
- Specify the actions Altus will be authorized to perform (typically listing and creating machines and network technologies—no data access is required).
- Determine the cloud resource scope on which such actions may be performed
- Manage which of your Altus users may access which of your cloud resources
Within Altus, the object that encapsulates the whole security relationship is an Environment. As a Cloudera Altus customer, you construct and control Environments. In particular, the Altus Administrator role has this control, and that role may be allocated to one or more users in your organization. The Altus UI includes visual wizards to make it simple to create all the necessary artifacts.
Most organizations will have several Environments, each one establishing different policies and trust both over different cloud providers and over different scopes within a single cloud provider. For example, a customer might have one Environment per AWS region or per Azure subscription. Or one Environment for each type of user: developers, test, production, project, or division. Multiple Environments provide administrators fine-grained control over allocating cloud provider resources to your Altus users.
Azure uses Azure Active Directory to establish trust. Altus requests consent for an application to access a subscription:
Authorization is achieved by assigning the Contributor role in your Azure subscription to the registered Cloudera Altus application. This permits the Cloudera Altus application to list, create, and destroy resources within that subscription.
Further control of the scope of Cloudera Altus application’s control is achieved by either only assigning the Contributor role to specific resources within the subscription, or by creating a custom role and assigning that to the Cloudera Altus application at the subscription scope level. These options are described more fully in the Altus documentation.
The technique used to solve the problem in AWS is very similar to the one used to solve the problem in Azure. When these cloud providers create a machine in an account they can attach policies which will allow any user on that machine some kind of access to the resources in the account, as determined by the policy. The exact mechanics differ, but that’s the general solution.
Azure uses a Managed Service Identity (MSI) to solve the data access problem. In particular, it uses a user assigned MSI as the identity of the code that is running on the machine. This MSI gets added to the tenant trusted by the subscription in which the machine is running. Role Based Access control is then applied to the data within ADLS to control what access that identity has to specific data.
At the time of publication, an MSI cannot be created using the Azure Portal. It must be created using one of: CLI, PowerShell, Resource Manager Template, or REST. Here’s an example of using the CLI:
First, log in to the appropriate subscription:
Note, we have launched a browser for you to login. For old experience with device
code, use “az login –use-device-code”
You have logged in. Now let us find all subscriptions you have access to…
“name”: “[email protected]”,
“name”: “[email protected]”,
Then create the MSI within the subscription:
az identity create -g altus-quickstart -n altus-quickstart-msi
One can then use the Data Explorer to assign the MSI the appropriate privileges to the relevant ADLS folder hierarchy (permissions must be set from the ADLS root down to the relevant folders or files of interest)
Cloudera Altus cloud services provide a multi-cloud PaaS solution designed to automate massive-scale data engineering and data warehouse workloads in your public cloud, without the headache of managing the infrastructure yourself. It gives end users complete control over which cloud resources Altus clusters can use without giving access to data in your cloud account.
Toby Ferguson works on the Cloud Customer Success team at Cloudera.