Secure Terraform Delivery Pipeline – Best Practices. Part 2/2.


Part 1 of this short blog series covered best practices for organizing Terraform projects and delivering infrastructure changes with secure pipelines. This part will touch upon the surface of infrastructure testing, compliance as code and security as code topics.

Download the complete guide in PDF here.

Continuous compliance

With the ease and speed of introducing changes in resource configuration of cloud resources, comes a great risk of introducing issues, as well as breaking compliance rules or company standards.

The goal of infrastructure testing and continuous compliance is to ensure automated verification of infrastructure rules. For example: 

  • Limit allowed resource types and locations,
  • Verify machine types and sizes,
  • Verify resource configuration (e.g. parameters, naming, tags, tiers, encryption configuration),
  • Check software versions and extensions installed on VMs,
  • Check audit configuration applied to resources,
  • Verify IAM roles and assignments (e.g. min/max count of administrators),
  • Verify the configuration of Kubernetes deployments like allowed images, ports, limits, naming conventions.

Terraform can introduce changes very quickly and have a huge impact on the infrastructure. This is why automated testing and policy verification is just as important as in any other programming platform. 

It can mean:

  • Requirements verification
    Automated verification of some non-functional requirements and assumptions for the project that need to be verified continuously before the sign-off. This is similar to unit/integration testing in software development. Usually, the team that creates Terraform scripts provides the tests as well.
  • Compliance as Code
    Automated verification of company, organization or regulatory compliance policies using a set of rules. Rules can be related to resource type (e.g. forbidden services), resource location, machine types, OS type and version, replication options, pricing tier/SLA, tagging or naming conventions etc. required across a whole organization. A separate team might provide company-wide policies and compliance standards.
  • Security as Code 
    Automated verification of security policies of introduced infrastructure. Rules can be related to RBAC control, network, firewalls, cloud access control, key vaults, keys, secrets and certificates, encryption etc. required across the entire organization. Security rules may be built with the IT Security team as well as with third-party tooling.

In general, there are two layers of Infrastructure as Code verification:

  • Pre-deployment – Verification of the Terraform code or Terraform plan, more akin to Wstatic code analysis,
  • Post-deployment – Verification of the resources created in the cloud environment after Terraform configuration is applied.

Pre-deployment verification (build-time)

Terraform validate is a built-in tool. It will check the correctness of syntax, variables etc. A good idea is to also run at least a Terraform plan as a validation step to check if it doesn’t fail with an error. Keep in mind that running the plan against „empty” infrastructure may have different results than a plan against the previous version of infrastructure.

There are certain tools that allow verifying Terraform code or plan before it gets applied with more rules than just syntax and correctness. 

  • Terraform Sentinel can be used with Terraform Cloud/Terraform Enterprise. This tool allows creating a set of company-wide policies and applying them to all Terraform projects across multiple teams to ensure, that each project adheres to the rules. This tool will verify the actual plan before the deployment to live infrastructure and look for not allowed resource types, configuration options etc. Think about it as static code analysis for Terraform, like Sonarqube.
  • An open-source Terraform Compliance using Python BDD framework
  • A simple tool tflint will also check if configuration parameters are correct in a given cloud (for example inexistent VM instance type). Currently available for AWS only. 
  • Forseti Terraform Validator can run Forseti rules verification against Terraform plan file. This one is for Google Cloud only.
  • Another example of checking Terraform files with Python (HCL can also be just parsed and verified with a programming language)

Post-deployment verification (runtime)

Verification of Terraform code before it is applied brings just partial value. This will not verify items applied with custom scripts and „null_resource” (when Terraform does not support some options) or will not find changes introduced manually or due to an error. This is why it is also valuable to verify the running infrastructure.

To address continuous compliance, run verification on production-like environments and in the production always after deployment but not only then (e.g. daily). Always use a system account with read-only permissions

Each cloud provider exposes the whole infrastructure as plain REST/JSON API (or gRPC) as well as SDKs for common programming languages.

It is very easy to use a language of choice and a favourite testing framework to easily create tests with the use of language SDK or even pure REST API and tools like REST Assured or JSON Assert

Use a testing framework that will provide a nice and readable test report that can be a document (BDD rather than pure JUnit).

The tests can be executed:

  • in a live environment (including production) to apply all requirements checks as well as security or compliance policies
  • In a deploy → test → undeploy flow to verify the correctness of the whole Terraform configuration

Here is an example using Kotest and Azure SDK:

Using regular programming skills, it is very easy to build a shared, parameterized set of tests with some effort.

Bash scripting with CLI might not be the best choice because the tests will become complex. Using a scripting language (Python, PowerShell) should be good. Strongly-typed languages (like Kotlin or Go) help a lot when using cloud provider SDK due to IDE support. 

In this approach, a native API or SDK is used which is usually the „source of truth”. This is important when new cloud features are added to 3rd party tools (like Terraform or InSpec) with a delay and potential bugs. Relying on 3rd party tools for testing may cause problems. Sometimes even cloud CLI (bash or Powershell) is delayed. API is always first.

Built-in cloud policy tools

Each cloud provider has a native tool to address company-wide governance policies. These are:

Cloud compliance services are sometimes provided with a set of rules mapped to industry standards such as HIPAA, ISO 27001 or CSA Benchmarks. Creating custom rules is not always easy. These tools can scope policy verification over a set of company projects/accounts/subscriptions not always „on-demand” during the Terraform pipeline run.

One of the approaches observed in large organisations is that there are separate teams maintaining company-wide compliance rules and infrastructure as code. This means that the infrastructure team needs to adhere to standards and policies but is not always the author of new rules. The continuous policy tools should be used in addition to infrastructure testing.

AWS Config and Control Tower

AWS Config is a service that continuously monitors and records AWS resource configurations and allows verifying overall compliance against the configurations specified in the internal company guidelines. It comes with a set of around 150 pre-built managed rules as well with the SDK for creating and testing custom AWS Config rules. 

Since AWS Config rules are in fact AWS Lambda functions defined in NodeJS or Python, there is a large library of rules available in Github.

A sample fragment of code in Python:

In general, verification can be triggered by schedule or when resources are created or modified. The results can be asynchronously verified using the AWS API or Console. 

For a large scale project, a complete framework Compliance Engine is available to manage RuleSet in separate compliance account.

AWS Config verification can be woven into the Terraform Deployment pipeline as a post-release check. The results are however not immediate and some coding will be required to gather an end-to-end compliance report. Therefore, this is to ensure that the implemented change still adheres to company standards rather than to use it as a testing step.

AWS Config allows grouping the rules together with remediation actions into Conformance Packs (also “as a code” using YAML templates) to be easily deployable into multiple accounts and regions. Sample conformance packs:

  • Operational Best Practices For Amazon S3
  • Operational Best Practices For Amazon DynamoDB
  • Operational Best Practices For PCI-DSS
  • Operational Best Practices For AWS Identity And Access Management

With the use of AWS Control Tower, it is possible to:

  • integrate AWS Config rules into an end-to-end compliance and conformance overview dashboard over a multi-account organization,
  • have an Account Factory for creating new AWS Accounts with predefined rules and settings.

In general, AWS Config is a versatile solution to handle company-wide standard compliance as a code and security as a code. It might not be the fastest way to implement individual solution requirements verification, where simple tests may be easier to maintain. Its use in Terraform pipeline is possible as an addition to built-in AWS continuous compliance solution, but not necessary.

Pros:

  • Flexibility, and extensibility since the rules are actual code in Python or JavaScript,
  • SDK for rules development and a wide set of open-source rules in addition to built-in ones,
  • Open-source tools for the whole multi-account Compliance Engine available as well as integration with Control Tower.

Cons:

  • The rules code can become complex and constitute a whole programming project,
  • The rule cannot prevent creating a non-compliant resource (only detective mode),
  • Including asynchronous rules verification into Terraform CD pipeline requires a complex solution.

Azure Policy and Security Center

Azure Policy is a system built with declaratively defined rules applied to all resources in the scope of the assigned policy. Azure Policies can be assigned on Azure Subscription level as well as on Management Group level (a group of Subscriptions, e.g. whole organization, all production subscriptions etc.).

Azure provides over 1000 predefined, parameterized policies. Custom policies are defined in JSON code and each policy consists of 3 parts:

  • Parameters – they are defined during the assignment
  • Policy Rule – the “if” part of the rule 
  • Effect – policy can either raise an alert (audit)  or prevent creating a resource (deny)

A sample fragment of code of a policy:

Policies in “deny” mode work like additional validation rules, which means that a resource that is not passing the verification will not be created.

In addition to that, policies can remediate some threats – e.g. automatically install required VM extensions or modify the configuration.

Azure Policy check can be included as a post-deployment correctness check in the Azure DevOps release pipeline.

Besides individual policies, there is a number of predefined Policy Initiatives in Azure, for example:

  • Audit ISO 27001:2013 controls and deploy specific VM Extensions to support audit requirements (56 policy checks)
  • Audit PCI v3.2.1:2018 controls and deploy specific VM Extensions to support audit requirements (37 policy checks)
  • Audit CIS Microsoft Azure Foundations Benchmark 1.1.0 recommendations and deploy specific supporting VM Extensions (83 policy checks)

Policy Initiatives are parameterized groups of policies to be assigned on Subscription or Management Group level and can be custom-created for company standards. There is a default Security Center initiative, containing over 90 configurable policies and is assigned by default to every Azure subscription. 

Azure Security Center is a single place to govern results of all policy checks across the organization as well as group results of different threat detection systems (Network, Active Directory, VMs etc.). Since policies are verified periodically, the Security Center can address continuous compliance in Azure providing the alerting mechanism and verification history. To simplify the process of managing corporate-wide compliance, companies can also maintain Azure Blueprints. A blueprint is a combination of policies and initiatives together with default resource groups and IAM access configuration.

Azure Policies are very powerful but the tool does not provide a developer-friendly interface for creating custom rules, especially, when JSON is used as a language. Even without custom policies, the set of predefined policies is impressive and can address a wide range of compliance requirements. A complete Compliance as Code solution may combine infrastructure tests and Azure policies.

Pros:

  • A large set predefined policies and initiatives for industry-standard compliance requirements,
  • Built-in integration with Azure DevOps and Azure Security Center,
  • The policy can work in „deny” mode,
  • Policy management with initiatives and blueprints.

Cons:

  • Developing custom policies in JSON is hard,
  • Executing policies „on-demand” is not possible.

Forseti and Google Cloud Security Command Center

Google has open-sourced the Forseti Security project to address security rules validation and policy enforcement in Google Cloud. This is a policy-as-code system that consists of multiple modules and works together with:

The Forseti Rule Library is Open Source and uses Rego (by Open Policy Agent framework) files to define policy rule templates.

Here is a sample policy that forbids public IPs for Cloud SQL databases:

Using policy templates only requires defining constraints in YAML code. If a policy template for the actual use case is missing, Forseti provides tools and guidelines for authoring and testing custom policies. 

Setting up Forseti Security requires running dedicated infrastructure in GCP that consists of Cloud SQL, compute and Cloud Storage. Google recommends creating a separate project to serve as a policy monitoring environment. Forseti provides a Terraform configuration for installation. The server picks policy configuration deployed to Google Cloud Storage and then:

  • constantly monitors policies,
  • can enforce security rules,
  • stores Cloud Configuration snapshots in Cloud SQL.

To increase the overall policy and security status visibility, Google offers the Security Command Center dashboard. Besides Forseti, GCSCC can use other threat detection systems as a source of vulnerabilities alerts like:

  • Cloud Data Loss Prevention Discovery,
  • Anomaly Detection,
  • Event Threat Detection,
  • 3rd party cloud security tools from Acalvio, Capsule8, Cavirin, Chef, Check Point CloudGuard Dome9, Cloudflare, CloudQuest, McAfee, Qualys, Redblaze, Redlock, StackRox, Tenable.io, and Twistlock.

Forseti Security offers complete compliance-as-code tooling that can be used as part of Terraform CD pipeline in the pre-deployment step (with Forseti Terraform Validator) as well as post-deployment (with Forseti Config Validator). Continuous compliance support with Cloud Command Center and other plug-in systems adds up to a complete solution. 

Being a security-oriented solution, this may require significant effort to implement additional custom non-functional requirements checks, thus, it might be a good idea to combine it with infrastructure testing approach.

Pros:

  • Support for GCP config validation as well as for Terraform validation,
  • Declarative rules language with tooling support,
  • Integration with Security Command Center.

Cons:

  • Requires installation and maintenance of infrastructure and setup of open-source components,
  • A small library of predefined rules (around 70 templates)  and lack of industry-standard policy sets (e.g. CIS Benchmarks, HIPAA etc.),
  • Lack of preventive-mode, only reactive/detection mode.

For a complete guide for setting up Forseti with Config Validator and Terraform Validator, check out the blog article Protecting your GCP infrastructure at scale with Forseti Config Validator (plus part 2, part 3 and part 4).

3rd party Policy as Code tools

Continuous Compliance and security management for public cloud solutions is an emerging market for enterprise-grade solutions. Several solutions became available, usually provided by known players in the IT security area. 

Here are some examples of emerging open-source tools addressing the policy-as-code topic.

InSpec

Chef InSpec is an open Compliance as Code tool that can run a set of rules on top of running infrastructure. Policies are defined in a DSL that is quite descriptive and readable.

Many resource definitions are available for AWS, Azure and Google Cloud.

The drawback of this solution is that resource definitions are added to InSpec with some delay compared to cloud provider API (just like to Terraform or even to cloud provider’s CLI and SDK) and some of the resources may be very hard to test.

Pulumi CrossGuard

Another solution that allows policy as code is Pulumi CrossGuard. It allows for a more programmatic approach (Javascript/TypeScript) over an SDK supporting AWS, Azure, GCP as well as Kubernetes. Here is an example:

 This is currently in beta and has the same dependency on 3rd party provider for resources.

Azure Secure DevOps Kit

Azure Secure DevOps Kit is an open-source set of policies and rules implemented in a PowerShell-based framework and ready to be executed automatically in a pipeline or using e.g. Azure Automation. It supports Azure only, implemented in Microsoft, but it is not an official Microsoft product. Some of the policies overlap with the built-in Azure Security Center.

Summary

The last few years is the time of “-as-code” approaches to infrastructure, compliance, security, configuration management etc. Using software to address hardware or process problems is the most effective approach, and becomes possible with hardware virtualization and the cloud. When infrastructure configuration and scale changes are introduced in minutes rather than hours or days, and data or workload can be placed on the wrong continent just “by accident”, a totally new approach for tooling and practices is required. This will bring a lot more changes in the nearest future and hopefully, the industry will start to standardize around well-known tools.