Production-ready Azure Terraform modules

6 min readMar 4, 2020

Why

Claranet has recently released a bunch of Terraform modules for Azure and intends to publish more: https://registry.terraform.io/search?q=claranet%20azure.

Creating modules encapsulating one or more Terraform resources is a way to enforce our best practices, harmonize the customers’ implementations and be able to improve our base features and track the changes.

We won’t recall the benefits of Terraform since you can easily find numerous and comprehensive articles about it. We also use it because we think we can have a sufficient level of abstraction to allow us having packages (here modules) that can be shared internally and also with the rest of the world.

We chose to open source those to ease the collaboration with our partners, customers and developers, to have our approach challenged, to benefit from contributions and maybe reach our future colleagues.

Built for production

The modules we release are not intended to cover the full scope of the underlying resources but to set some settings by default or ease some configuration for the production environment, and those last 2 words are important.

There are many things that we’d like to enforce in a production environment: consistent naming, logging, monitoring, high availability and security.

Even if those Terraform modules are vanilla Terraform and can be used anywhere, we use them with the Claranet Terraform wrapper that enforces the project file tree and adds some sugar in Terraform use.

Naming

For our modules, all the created resources have their names generated from the project name, customer name, region, environment and resource type as recommanded by Azure.

Each resource can also have a prefix in front of the generated name.

Also, a custom name can always be set, which can be useful in case of a hands-on project in which the infrastructure is already built without Terraform but does not follow the naming convention, or if a customer wants to keep its convention.

Here is an example from our Redis template. We generate the expected name using locals that we inject then in the resource itself:

So, reading the name of a resource, in a monitoring alert, for example, allows us to identify exactly which resource it is and its purpose.

Logs & metrics

We always want, for production-ready resources, logging enabled and aggregated along with detailed and numerous metrics.

Our goal is to have in each module the ability to specify the storage account and/or the log analytics workspace outputs for logs aggregation. Azure Diagnostics Settings is the broadly used implementation and is activated with all the log categories when possible.

Since the log management in Azure is not uniform, for example, VMs do not manage their logs with Diagnostic Settings, we also aim to provide a unified way to manage the logs for those services.

For App Services & Azure Functions, an Application Insights resource is always created along with the service instance and linked to it.

For Windows and Linux Virtual Machines, Diagnostics extensions are always configured with the highest level of metrics and logs activated.

High availability

Most of the Azure services offer high availability through zone redundancy, multiple instances or clustering. We chose to have a highly available service with each module default configuration.

We made this choice because of two considerations:

Having a deployment automated tool that enforces high availability is more reliable than a DevOps engineer following documentation when building an environment. It follows the “Working software over comprehensive documentation” agile principle. Also, we have the documentation too.
Having a non-redundant infrastructure should be a choice explicitly made in your code.

For example, we made the following choices according to the Azure documentation:

For App Service module, we chose to set the default instance count to 2.

For Virtual Machines, we force them to be in an Availability Set since we should always have more than one instance.

For Redis Cache, a cluster of 3 shards is the default configuration, all while taking Redis specification into account.

Non-intrusive

We do not assume that the user has extensive rights on the Azure environment. So that, the modules do not use internally any Azure Active Directory read or write operation as it can be done for RBAC management on some resources.

Also, we do not enforce any resource group management, we only assume that the user has enough rights to create the resources scoped by the module in the target resource group.

Modularity

Our modules are built for everyone. That’s why, even though we made some recommendations by setting default values, we do not hard-code anything, letting users choose how they want to use the module.

There will always be some ways that we could not predict, you can override any default we added in the module to set your own.

We also try as much as possible to keep tracks of all new capabilities of the AzureRM terraform provider to update our modules accordingly.

Run tools

Production platforms need tools to operate them like logs, backups and monitoring. For monitoring, we use Datadog (SignalFx now, blog post yet to come…) as an external solution and don’t rely directly on Azure Monitor.

We have bundled in modules, for run purposes, the following services: Key Vault, Log Analytics Workspace, Storage Account (for logs) and Recovery Vault.

Most of the modules we made can be mapped (and must be mapped!) with those previous tools since they can provide a convenient way to be plugged, ie a SAS Token is created for the logs Storage Account.

Application lifecycle

Modules are published to Github and Terraform registry with semver versioning.

We plan to maintain each module with bug fixes and improvements. The bug fixes made by Azure or the Terraform provider will be implemented in the published modules so that the production stacks that use it can be able to have it only by version bumps.

Since the modules are versioned and provided with a comprehensive Changelog, updates are easy to trace. Also, the Terraform code that uses those modules should be versioned, so it’s easy to track down who made a change and when.

More than that, some Azure features are not currently covered by the Terraform provider. We implement some of these features by using CLI or PowerShell commands by designing an interface as if the feature was already available with Terraform. The benefit is that we can implement new features without any breaking change for the user who will always have backward compatibility, even if the implementation can be very different.

Made for humans

At last, we design modules as APIs that can be easily understood and focus on the user’s needs and not on the underlying Terraform mechanism or Azure API.

Also, we intend to hide implementation heterogeneity by providing a unified interface for similar features, log management or storage accesses for example.

Some examples:

the `ssl_enforcement` variable for MySql connection (that can be “Enabled” or “Disabled”) is changed to a simple boolean `force_ssl`, easy to understand and that can be implemented without any risk of a typo.
The Sku variables for MySql or SQL resources that need to provide redundant information is changed to provide as less information as possible, like size & family and either guess or enforce other information.
Storage linking for logs, backup or any other internal behaviour sometimes require the resource id, and sometimes the storage name. Sometimes the access is made through a SAS token, sometimes through an access key. We hide this diversity by using always the same input variables no matter what the implementation is.

Conclusion

Releasing Terraform modules is a huge work that takes us a lot of time but we can see every day the benefits and we believe it’s worth the effort. We can observe in a short lap of time:

Uniform architectures for all our customers that make easy to hands-on for people who need to work on it. Non-comprehensive list: resource naming, backup retention, activated logging, monitoring thresholds and scope …
Uniform base code for all our projects
High availability and security best practices enforced
Environments created with always up-to-date Azure features and tools, and you know it’s going incredibly fast on this

Feel free to use these Terraform modules and ping us to discuss it or contribute by an issue or a pull request, you’re more than welcome.

You can find all described above here https://claranet.tf/#azure