Cloud… We need to talk about limits

Let’s talk a little bit about one of the correct ways to operate the cloud when working with large teams, without taking away the freedom of the team, but also without taking risks in terms of cost and security, through the Cloud Custodian Tool

Nowadays, uploading servers has become much easier and faster because of the cloud, uploading a server or service in a matter of minutes, and this has brought great benefits to all companies that use it, having their time2market much faster, but together with that, it has brought some dangers, both in terms of security and costs, and you can also raise your costs in the same agility as a server will be create or have your files exposed with just a click on a console, so more than ever it was necessary to talk about some limits.

I know that you may be thinking something like: “Wow, but the agility that the cloud brought to my team, will I need to revoke and set limits?”, And I answer that yes and no at the same time. Creating a catalog that what your team can use or not, can be a huge and constant job, since the public clouds launch a new service or change in it every day (not in the same proportion that new Javascript frameworks are created, but it is very fast too!), where the team that maintains the management of your cloud has to “make available” cloud products to your team, creating catalogs and products available, almost like a “cloud” from the “cloud”. Okay, but what can we do? How can we keep the agile team working with their favorite cloud but also without bringing risk to my business? One solution is to apply Guardrails! That’s right, put a limit without “producing” products.

To explain it better, imagine you take your 8 year old daughter to play bowling; It’s easier for you, because you know how the game works and your goal, you know how to deal with the weight of the ball and how to throw it, even if clumsy, it reaches the pins…but on the other hand, the same game and the same ball your daughter will have to cope, so she doesn’t have as much strength as you do, that big, heavy ball is very clumsy for her little hand and her strength and aim depends much more on a push and improvisation, so most of the time the ball it goes through the gutter, not reaching the pins (its objective); So how to solve this? Guardrails! We go up the side gutters and the experience totally changes, where now she can reach her goal, even if it costs to bump into the side channel, putting the ball back in its course and finally hitting the pins (and maybe even painting a strike! ).

This is the goal, not to deprive of playing the game as it should be, and the Guardrails of the cloud work the same way, that is, we will let our development team use their favorite resources from their favorite public cloud, we will just “raise the gutters ”when there is an undesirable deviation. A practical example, you can safely release the use of S3 buckets for your team, but concerned with security, you limit it to not leave the bucket public to the world, avoiding the possible leakage of data that a user may have inadvertently enabled, another example, you can let your team up EC2 machines in your development account, but we only allow a maximum of 5 instances and in the “T3” family, avoiding create a large number of servers and large and expensive machines, saving cost to our company.

Tools

Natively, public cloud players have their “Guardrails” services, each implementing in its best way, where it is a good practice to use them. For example, in AWS has the means to do this by joining services such as CloudTrail and CloudWatch Events, however there are other tools on the market that have a more “multi-cloud” approach, and today we are going to use the open source Cloud Custodian tool.

Cloud Custodian

In line with Complaince as a Code, we have a very good open source tool; Developed by Capital One, Cloud Custodian uses YAML as its language and applies its policies for AWS, Azure and GCP. Custodian basically works with some key components, such as:

▪ Name: The name of your policy

▪ Resource: The type of resource you will use, for example: aws.rds, where the provider will be AWS and the resource RDS.

▪ Filter: This is where we will get the resources we want to limit, putting their attributes that are possibly out of compliance.

▪ Mode: The way your script will work, that is, if it will be executed on demand, scheduled by cron jobs or even executed through triggers in your cloud provider.

▪ Actions: These are the actions that we will take when we find a resource that met our filters, since placing a tag going to destroying the resource.

In addition, it is possible to create generic filters, where we can filter almost any text returned from the cloud API, through JMESPath, and create custom actions too, using webhook, where for example, if any resource is found, we can trigger an external API that will take some action…and that’s where creativity enter in this room! With Cloud Custodian, in addition to creating your Guardrails, we can able to use it to generate a catalog of our infrastructure, without need to apply any action, where Custodian can navigate between the regions and subscriptions of your cloud player.

With Cloud Custodian, in addition to creating your Guardrails, we can able to use it to generate a catalog of our infrastructure, without need to apply any action, where Custodian can navigate between the regions and subscriptions of your cloud player.

Ok, but show me the code!

Okay okay, now let’s go with two examples, both working on AWS, where in the first you need to check if there are any IAM users who are your AccessKey is older than 90 days, and notify an SQS if they find it, and the second one, is that if any user uploading an S3 bucket with public access enabled (this one is a classic!), the bucket is automatically tagged.

Before we start…

The installation and configuration of the Cloud Custodian is very simple, it can be done via PIP, well detailed in the official documentation, including showing how to “connect” with your AWS, Azure or Google account, so I will not cover this article, but after installed, a command that will help a lot is the

Where it lists the resources available to you, and you can navigate between them until you reach a “skeleton” of what that resource has available to you. Another tip that will help to save some of your time, is to create your “tests” based on some IaC component (for example Terraform), where you can easily provision / remove your resources and test if your policies are being called and successfully executed … and who knows with that time you’ve won you won’t be able to go bowling with those you love? 😉

Example 1

To check IAM users, we will use the aws.iam feature, but first, let’s see what the custodian tells us when we run his schema.

Awsome! We have some tips for actions and filters, so in our case we will use the filter access-key and the action notify, so our policy will look like this:

File: example-1.yml

Did you see that we are using two filters?

In other words, our action will only happen if both were true Example 2 In this second example, we can use our schema through the command

With this, out policy will be:

 File: example-2.yml

Did you notice that this time we are using the cloudtrail mode? When using it, in our account will create a CloudWatch Event that will be monitoring the CloudTrail, where when an event that we choose happens, which in this case is the “CreateBucket”, a trigger will be fired that will trigger a LambdaFunction that will execute our action, and in this case, it is tagging the bucket (our action), where all this configuration was done by CloudCustodian when you run it.

 Execution

 With our policies created, now is the time to execute them, where in the terminal we will execute:

Where in this first example, we will execute the file example-1.yml and the parameter “- s” is where we will keep the return of the execution. As we have no defined mode, this policy will only be executed when we call via run the CLI Tip: Within this generated directory, it has a .json file that shows all the resources found, just like the call made by the AWS CLI, and this file is very good for debugs! In the second run, we will run as follows:

Now I put the “-d” parameter to run in dry-run mode, where it will not effect anything on our account, it is a safer way to check if the resources he will perform the action on are correct, if our policy, by mistake, will not get something wrong, and if everything is OK, we can run without “-d”, which will create this policy in our account, being triggered automatically because of the cloudtrail mode.