Access controls to Kubernetes deployments

[This is the second blog in a series on #Kuberneteslearnings by Freshworks engineers. No more Kubernetes horror stories!]

Once we decided to move to Kubernetes (or Amazon EKS) and replace all our machines (or EC2 Instances) with pods, we had to ensure access to this cluster for developers. At the same time, we needed to introduce some level of control as well, so we could ensure specific users would have a specific amount of access to the system.

As mentioned in our first blog in this series, we were heavy users of AWS OpsWorks. Integration with Identity Access Management users for accessing Instance on the OpsWorks stack was easy to manage. OpsWorks provides an interface to map IAM users to its stack. All that we had to do was have a bastion for users to log in and access the OpsWorks Instance based on the privileges assigned to them. This provided an audit trail of IAM users accessing OpsWorks Instances.

Unfortunately, EKS did not provide the same level of integration or ability to have IAM users access the Kubernetes API. This led us to solve for the following challenges:

  • Keep the bastion in place as we were still in hybrid mode—OpsWorks and EKS;
  • Easy mechanisms to manage user access to the clusters, similar to OpsWorks;
  • Auditable workflows to see where user accesses can be traced.

Solution at 20,000 ft

In AWS, accesses are defined using roles. Hence, roles are first-class mechanisms to access any resource in AWS. So the solution was to provide a user with role mapping, and carry that role to access the Kubernetes (EKS) cluster.

From the bastion, the user could log in and assume the role created for them so as to access the EKS cluster. The user’s role would be bound to a ClusterRole (or Role), using ClusterRoleBinding (or RoleBinding).

This article will further dive into the implementation details of the cluster access mechanism described in the above paragraphs.

Defining cluster user categories

To control access to the cluster we saw it fit to have three categories at the EKS cluster level that would translate to ClusterRole:

  1. Admin: All privileges including ClustorRole and Role manipulation privileges;
  2. Power user: All privileges except for ClusterRole and Role manipulation privileges;
  3. User: Read access and exec privileges.

We decided to also have three categories at the Namespace level that would translate to Role:

  1. Admin: All privileges on Namespace, including Role manipulation privileges;
  2. Power user: All privileges on Namespace, except for Role manipulation privileges;
  3. User: Read access to resources, and also exec privileges.

So users’ access privileges to the cluster are defined by their binding to one of the above (done using ClusterRolebinding at the cluster level or RoleBinding at the namespace level).

Lambda and User Roles

As mentioned above, we need to create multiple ClusterRoles on each EKS cluster to bucket users according to their access privileges. And since we have to spin up many clusters it would become hard to manually create and manage these over time. So we created an AWS Lambda function that creates these ClusterRoles on every EKS cluster.

To bind an IAM user to a ClusterRole, or rather add them to ClusterRoleBinding, we extended the Lambda to do the job. Now, Lambda not just creates ClusterRoles but also creates corresponding IAM Groups (with the same name as that ClusterRole). Why IAM Groups? Because these act like buckets and when a user is added to an IAM Group, Lambda will add that user to the appropriate ClusterRoleBinding.

Yet, the user is not relevant as his identity cannot be carried across to access AWS resources. So Lambda also takes care of creating a User Role. This is the IAM User Role that is present in ClusterRoleBinding.

This Lambda does much of the magic, which includes doing the same work at the Namespace level too, wherein it would work with Role and RoleBindings in a similar fashion as with ClusterRoles, specified in the above section.

Dockersh and Metadataproxy

Now that we had mapped users’ roles to Role-based access controls under the Kubernetes cluster, the challenge was to get a Secure Shell (or SSH) user to assume that role. Well, with opensource and the use of familiar technologies, the tools were not far away and we landed on Dockersh and Metadataproxy.

Dockersh is like any other Shell except that when you configure a user’s default shell as Dockersh, the user will be dropped into a docker container completely isolated from other users when they log into Instance. That way we would be able to control what a user does in their environment.

So we created a Chef recipe that made Dockesh the default Shell for all users on that bastion, as well as configured kubeconfig files for cluster access. We also installed all the relevant tools for accessing EKS clusters in the Dockersh image. Here are some of the tools that we shipped with the Docker image:

  • Kubectl: Official Command-line interface tool for accessing the Kubernetes cluster
  • K9s: Curses-based tool to access your cluster, also providing features to view logs and exec into pods
  • Kubectx: CLI tool to quickly switch between Cluster Contexts
  • Kubens: CLI tool to quickly switch between Namespaces within a Context

Metadataproxy, is a proxy for AWS’s metadata service that gives out scoped IAM credentials from the Security Token Service (which enables one to request temporary and limited privilege credentials for IAM Users). So we setup IP table rules intercept all calls made to metadata service, from the user’s Dockersh, and make it all go through the Metadataproxy Docker container.

Now, to the fun part. When a user connects to the bastion, we drop them into a Dockersh, setting an environment variable called IAM_ROLE to the IAM username of the user. Since all metadata calls are intercepted by the Metadataproxy container, it looks for the Dockersh container’s IAM_ROLE, and makes that container assume that role. And since Lambda has already created the right access maps on the EKS cluster for the Role, the user from their Dockersh can enjoy their privileges from there.

Putting it all together

When a new EKS cluster named hyades is created, the AWS Lambda will create the following ClusterRoles as well as the corresponding IAM Groups.

If user John Smith wants a power-user access to hyades he will have to raise a request an the AWS Cloud Admin will add johnsmith (IAM Username) to the k8s-poweruser@hydes IAM Group. The Lambda in its subsequent iteration does two things:

  1. Create an IAM Role called johnsmith;
  2. Bind johnsmith to the ClusterRole k8s-poweruser@hydes.

At this point, John Smith should be able to log into the bastion to access hyades using the Kubectl with the privileges bestowed on him.

The bastion itself is managed and configured using Chef, which sets up the Dockersh as the default Shell for all the users. It also configures the Kubernetes configuration for all the clusters. Since John Smith is a cluster level power user his powers encompass all the Namespaces on the cluster. Also, John Smith is connecting to the cluster as johnsmith Role and not as johnsmith user because of the trick that Dockersh and Metadataproxy together play in the background.

Now suppose we create a Namespace called taurus. Again, AWS Lambda would pick this change in its next cycle and create the following IAM Groups and Kubernetes Roles.

Now user Pocahontas comes along and wants admin access to the taurus namespace. She would raise a request to the Cloud Admin. All that the Cloud Admin has to do is add pocahontas (IAM Username) to k8s-admin@hyades@taurus IAM.

Again, Lambda will pick up this change, create a pocahontas IAM Role, and bind pocahontas to the k8s-admin@hyades@taurus Kubernetes Role. Pocahontas can now log into the bastion and access the cluster hyades as the admin of the taurus Namespace. But she will not have cluster-wide access.

So changing both johnsmith and pocahontas user privileges should be as simple as moving them among various IAM Groups. Also, all actions of both johnsmith and pocahontas will be tracked as far as the API calls to the Kubernetes cluster are concerned, this a can be tracked in the CloudWatch Insights as all EKS Control Plane audit logs are pushed to CloudWatch.


Credits: All diagrams were created using LibreOffice Draw. The font used in the images is Architects Daughter Regular.