Application in the cloud ranges from a simple single page application to complex applications running complex logics across multiple boundaries and geographic regions. Irrespective of what kind they are, they must deal with secrets. Managing secrets is usually hard, especially when the application needs to be compliant, and the application has a big user base, and cannot afford any downtime. There have been numerous incidents on the internet, where the services and applications have taken downtime, due to expired secrets, which impacts the customer experience and the reputation of the service. Also, mismanaging of secrets is a huge security risk to the customer, their data, and the service. Leaking of customer data, and not able to secure them can lead to major lawsuits against the service and the company owning it. So, the secrets an application uses are as much important as the application and its features.
An application manages different kinds of secrets based on its functionalities, e.g. certificates, passwords, connection strings etc. Each of these types of secrets has different properties, life cycle, and management process. But, if we try to define the life cycle of a secret, they go through roughly three stages:
- Acquisition – the step where the secret is created
- Whitelisting – the step where the secret is activated
- Revocation – the step where the secret is deactivated
Some of the steps can be automated and some not, depending on the type of the secret. We will focus on the principles and some of the guidelines of secret management, with the Azure KeyVault as the secret store. But these principles and guidelines can be generally applied to any form of a secret store and management workflows. We will also, talk about some of the ways the secret can be “safely” consumed by an application hosted on the Azure platform.
Why is secret rotation important?
There are four main reasons, rotation of secrets makes sense:
- Expiry: A lot of your secrets has a hard-bound expiry date associated with it, and X.509 certificates are a common example of such. If you want to continue using an X.509 certificate, you must rotate them. If you are using it for SSL/TLS, the browser would not display your site unless if you have a valid certificate.
- Dependency Breakage: A lot of time, your secret might still be valid and usable, but you still might need to rotate it, because its “related” secret might have changed. A common example is if your X.509 certificate intermediate-CA or root-CA has been changed. Another example could be a SAS token issued for an Azure storage account.
- Security: Not all secrets have a hard-bound expiry date or a depended secret, but you will still need to rotate them if it was leaked. As a security guideline, it always necessary to rotate them, even though they are not known to be leaked, because a lot of the time, the owner of the secret might not be aware that it was leaked, and by the time they do, a lot of damage has been done. Rotating them reduces the damage in these cases. Also, in case of leakage, it might be necessary to do a root cause analysis to find out what caused the leakage and take mitigation steps to stop in future.
- Compliance: If your service or application needs to be compliant as per the standard compliance certification guidelines, you need to have your secrets rotated on a periodic basis. It is very important to assert who can see and manage a production secret, and there needs to be a clear separation of duties between a development team and the operation team, and it gets a bit complicated if you are a DevOps team. I have a future article coming about more on this.
Questions and Consideration regarding secrets management
The most common scenarios, when a secret need to be updated is when it has expired, or about to be expired. But there are few considerations that always need to be thought through before a secret is updated or rolled out:
- Service Downtime: For any running service, the secret is that is being rolled out is in use. How would you roll it out without affecting the customers who are relying on the secret? For example, one of the partner services can rely on a service-to-service certificate to authenticate with your endpoint. If the certificate is rollout out, how can it connect? This doesn’t only apply to your partners, but rather if the database connection string has been updated, how would your application connect to the backend database. Or, if you update your oAuth 2.0 assertion certificate, how would your application get an access token (please see the previous article on oAuth 2.0 to know more about it), without a full re-deployment of your application. Keep in mind, your application could be running in multiple geographic locations, with thousands of users accessing it, and rolling an application may not as easy as it may sound. You essentially won’t want any downtime due to the secret rollout.
- Access to production secrets: Another important aspect is, how will the secret be rolled out. Who can see and update the secret? Most of the time, these secrets are like gold-mine for people, and having access to these can mean, they can have access to your customer sensitive data. Sometimes as sensitive as health records, or financial records. This becomes more the important, as not all secrets cannot be fully automated, and it’s not always easy to apply any DevOps principals to these secrets. There have been numerous times, when an authorized person who can rotate a secret, has innocently left the secret in his desktop while rotating, and at a later point, someone easily steals it from his desktop. In that case, it won’t matter how much protection you have implemented in your secret store. Thera is also some severe compliance implications for it.
- Forced out-of-band Rotations: There are times when you would need to rotate your secret out of bound. Suppose, you have a privileged user in your company who are authorized to manage secrets. What happens when he leaves the company? Or, what happens if the intermediary/root CA of all your certificates has been changed due to a breach or is no longer supported, and the child certificates need to be issued by a new CA (please see my previous post on X.509 certificates for more about this). Or, worst case, what if all your secrets are leaked, and you need to change them immediately, to contain the damage. In all these situations, you would need to roll out the secret as soon as possible, and you would want to do that safely.
- Secret Expiry Notification: Won’t it be awesome, if your secrets never expire, and you would not spend a sleepless night with service outage? Well, that could be true, if some process notifies you of expiring secrets well before time so that you have ample time to rotate and deploy them. How can you set up a secret notification for your secrets, and create a process for your team to receive it and act on it, to never see them expire?
- Auditing: You would also want to keep a record of which secret was changed when and by whom? This is necessary for the investigative purpose, and to audit access controls to secrets on a periodic basis. This also helps in troubleshooting any issues that were caused by a secret rotation. This is a requirement for compliance too.
Managing Secrets in Azure KeyVault
Azure has an amazing product called KeyVault (KV), which specialized in storing and managing secrets. It started as a secure secret storage and then is evolving as a storage and management solution. It essentially stores secrets in three forms, as an X.509 certificate, as a cryptographic key, and as a free-form secret text. The cryptographic keys can be further secured with HSM-protected based storage with few additional cost. KeyVault has also provided other aspects of the secret storage like fine-grained access control using Azure Active Directory, auditing, expiry notification etc., which is absolutely necessary to protect and maintain the secrets.
- Free-form Secret and Key Rotation: You can set up an automatic rotation of a free-form secret and keys stored in the KeyVault, using the Azure Automation service. Here is a nice wiki describing this. While you can write your own automation to rotate your secrets, KeyVault has started supporting automatic rotation of resource-specific secrets, e.g. Azure Storage Keys.
- X.509 Certificate Rotation:
- Public CA signed X.509 Certificate Rotation: This is one of the most interesting features, where you can set up your certificates to automatically rotate with a supported public certificate signing authorities. This is explained in little more detail in this blog. At the time of writing this article, KV supports the following CAs:
- Intermediate CA-signed certificate: If you want to create a certificate signed by your enterprise intermediate CA, you can do so by starting the process in KeyVault with a CSR, and then send the CSR to your intermediate-CA to sign it. Once you get the intermediate-CA signed public key, you can integrate it back to the original request to generate the private key. Both the above approach of getting a CA-signed certificate is extremely useful, as the private key is born inside the KeyVault and never leaves during its management lifecycle.
- Self-signed certificate: You can also create self-signed certificates in your KeyVault automatically by setting the issuer as “Self”.
Deployments of Secrets using the Push Model
Safe consumption of secrets is as important as the safe management of the secrets. I am going to talk about two of the common Azure application types and some guidelines about consumption of secrets for these, i.e. PaaSv1 Cloud Application, and PaaSv2 Service Fabric. There are other types of application, e.g. AppService, Web sites, Function, Service Fabric container service etc., which is not covered here, and I would probably write a future article on those. This also assumes that you have stored your secrets in Azure KeyVault.
Cloud Service (PaaSV1)
The deployment mechanism of a cloud service is through Azure classic REST API. The primary deployment collaterals for a cloud service deployment is a CSCFG and a CSPKG file. The binaries are packed into the CSPKG file and is generated from a CSDEF file, which has topology definition of the cloud service. The CSDEF would have the endpoints and the binary locations etc. The CSCFG file contains the list of X.509 certificates, with its thumbprints, and its association with the individual endpoints. The Azure fabric essentially takes the CSCFG and the CSPKG, and provisions the VMs as requested and then deploys the binaries, that are packed with the CSPKG. When the VM starts up, and the endpoints are configured in the IIS, it would expect all the X.509 certificates installed on the box. So, before making a call to the CreateDeployment rest API to start a deployment, it is necessary to perform this pre-deployment step, where you call your KeyVault APIs to download the secrets and then call the classic API with ServiceCertificateCreateParameters to upload them. The rest of the non-certificate type secrets, such as SQL connection strings, Azure storage keys, SAAS tokens etc., that are needed by your runtime code can be downloaded from the KV location and can be packaged as a setting in the CSCFG file, which you can access using the RoleEnvironment class from your application. Please keep in mind, it is extremely important to ensure that the setting with your secrets don’t go as plain text, as these settings can be easily seen in the portal. They are usually also kept on the C:\config of the VMs. You can install the encryption certificate by calling API with ServiceCertificateCreateParameters, and encrypt the setting using the public key. While reading secret setting values at runtime, it can be decrypted with the private key which is already installed. Please read my previous post about asymmetric key encryption in .Net.
Service Fabric (PaaSv2)
Service Fabric is the next-gen PaaS offering from Azure. The offering lets you host multiple application (called as apps) in the same VMs. This leads to better resource utilization and provides many advantages over the traditional cloud service. Please read the Service Fabric documentation to know more about the tech. There are essentially two ways, the service fabric application can be deployed, and one way the service fabric cluster can be provisioned. The service fabric cluster can be provisioned using ARM template. You can also use the Azure Service Fabric PowerShell cmdlets to do the same. The application deployment can be done using the FabricClient and PowerShell cmdlets. At the time of writing this article, Microsoft Azure has release preview of the ARM template-based deployment of service fabric application. Just like the cloud application, the X.509 certificates needed by the applications in the cluster can be installed in the VM, by referring it with the KV path, in the ARM template with a cluster deployment. The VMSS RP (part of the ARM template), which is responsible for provisioning the VM, would pull the certificates from the KV location and install it in the VM. Then when the application is deployed, the certificate is already in the VM, and the application can start using it. There is a small catch here, the VMSS RP can only access KV in the same location and subscription, so it might be necessary for you to replicate your secrets from your primary KeyVault to this region specific KeyVault, if you have any shared secrets. Also, don’t forget to check the “Enable for Template Deployment” checkbox in your KeyVault for VMSS to access it. But for non-certificate secrets, the deployment script can pull it out from the KeyVault and jam it in the settings. As mentioned before, it’s extremely important to encrypt these settings. Luckily, the service fabric team has created an awesome pipeline to manage these. They have it all documented on their wiki.
Using the pull model to deploy secrets (using MSI)
In the above model for both cloud service and service fabric, the secret deployment model is more of a push model, where the deployment script pushes the secret to the running VMs or nodes. But there is another model, where your runtime code can pull those secrets directly from the KeyVault. The most important thing to understand about this model is, the runtime code would need an identity so that it can authenticate to get an access token to access the KeyVault storing the secrets. So, you still need a way to push an identity credential secret before it can pull rest of your secrets. Today, for cloud service, the deployment pre-step can call the API with ServiceCertificateCreateParameters, and for Service Fabric you can perform a cluster deployment to push the initial bootstrap credential, and then finally using that to authenticate with KeyVault. But this has a “timing” problem, where the certificate deployment is disjoined from the application/service deployment, and there can be situations which could cause this a problem. For service fabric, imagine a situation, where you perform a cluster deployment with the bootstrap credential certificate, but before you perform the application deployment, the VM node crashed, and Azure provisions a new VM for you. In that case, Azure will create the new VM with the new VMSS template, installing the new certificate, whereas the rest of the VM would have the old one. This leads to an inconsistent state of your service. To mitigate such a situation, Azure has recently released a new offering to their stack. It is called as Managed Service Identity (MSI). Azure has a nice article about this in its wiki. Essentially, when you enable the MSI extension for your service, it will create an identity for the application in Azure Active Directory, and it will bake-in the identity credentials in the application VM/nodes. The runtime code can utilize this credential to authenticate with the Azure KeyVault to download the rest of the secrets. This model is quite interesting and solves a lot of the secret handling problems, but also has some clear advantages and disadvantages:
- You don’t need to deploy your application or cluster if your secrets have been updated. You can configure your runtime code to pull the secrets as its updated. So, the pipeline becomes simpler, with less number of steps involved.
- The secrets have far fewer touchpoints, i.e. the secret can be born in the Azure KeyVault and used in the VM running the code. It doesn’t need to be in any other place, which leads to less chance of leakage.
- If your service is in maintenance mode, and there is no active deployment, with this model you won’t have to redeploy your application just to update your secrets.
- You are essentially adding an external dependency to your service. Which means, if KeyVault has an outage, you might not be able to get to your secrets. This is more important if one the VM crashes, and Azure provision a new one for you. To mitigate such a risk, you might have to potentially create a caching mechanism for your secret.
- A lot of people is not too comfortable with the fact that, an update to a secret would potentially be rolled out automatically to the live services. This removes the safety net, where the new rotated secret is bad and causes an outage to your service. This is more so if you have automatic secret rotation and management system. This is equivalent to safe code rollout with VIP Swap or an upgrade domain walk in the cloud service.