CIT - Build CDK Infrastructure Testing - Part 1 - Terratest and the Integrated Integration
TL;DR You don`t need a DSL to do easy integration testing. With CDK available in go, infrastructure test can be programmed with GO packages easily.
Motivation for CIT-CDK Infrastructure Testing
Basic Test Pyramid
On top of the test pyramid are the End-To-End test. For a website that would mean - will the (end) user of the site will get see a proper HTML page or not.
With DevOps and IaC - Infrastructure as Code - the application is not separated from the infrastructure anymore. The application itself relies on the infrastructure. So to have a running application and a successful End-To-End test, application and infrastructure must be tested and work.
So we have an application and and infrastructure side of the pyramid.
Divided Test Pyramid
To test the infrastructure End-To-End we have to decouple the application from the infrastructure. On way to do this is to deploy a minimal web application - a web server with a testable response - to fully test the infrastructure.
Integration on the app side
Application Development Testing
During development the unit test should be performed multiple times, at least before each commit to code repository. So they should be as atomic as possible and performing fast.
If multiple services or frontend and backend work together, the cooperation is to be tested in the integration tests. Usually they take more time and will be performed in a CI/CD pipeline.
The End-to-End test just checks if everything works together. It does not check whether the application is maintainable or flexible or understandable. For development, extensibility and refactoring and a good unit test coverage is helpful.
Integration on the CDK side
Infrastructure Development Testing
You start an CDK development with(*):
cdk init app --language=yourlanguage
The Unit test generated by this scaffolding just check, whether the right the CloudFormation (Cfn) templates are generated.
This is useful, if you are not sure about the generated Cfn-template. For instances if you create new constructs. But not so useful with standard constructs.
For instance, this is a part of the generated CDK go code:
awssns.NewTopic(stack, jsii.String("MyTopic"), &awssns.TopicProps{
DisplayName: jsii.String("MyCoolTopic"),
})
A SNS topic is created.
With the generation of the SNS Topic, the generated test code checks the proper creation of an SNS CloudFormation Topic:
template := gjson.ParseBytes(bytes)
displayName := template.Get("Resources.MyTopic86869434.Properties.DisplayName").String()
assert.Equal(t, "MyCoolTopic", displayName)
This is only useful if you are unsure, whether the awssns constructs works. But this is tested in the awssns.NewTopc construct itself.
Have a look at Github CDK SNS:
Code Snippet from aws-cdk/packages/@aws-cdk/aws-sns/test/test.sns.ts
expect(stack).toMatch({
'Resources': {
'MyTopic86869434': {
'Type': 'AWS::SNS::Topic',
So we are testing twice.
Again, if you are developing a new construct or if you create resources dynamically it could also be helpful to add test. But not if you just test “an sns construct will generate an sns cloudformation resource”.
Do not test AWS basic service functionality, but the integrations
You also should not test that an AWS service base functionality works, but you should test if your configuration works.
For example, you do not have to test that a Security Group which is open on port 80 really opens port 80. But with a more complex scenario with several groups and several servers, it could make sense to test that server A really can connect to server B and that only on this very port.
So I think that testing the integration of the created infrastructure should be part of the automated testing. I will call this “CIT” - CDK Infrastructure Testing.
Challenges for CIT
To achieve this, we have to face some challenges:
- Mapping of logical and physical IDs
- Finding or creating Test Libraries
- Testing with context
Logical - Physical Mapping
When you create infrastructure with CDK or CloudFormation you define names for your Constructs. CDK generates Resource names from the construct names. These Resource names are called logical IDs or Logical Names. In the CDK code, an instance can be titled “Web-Server”.
Construct name MyTopic
awssns.NewTopic(stack, jsii.String("MyTopic"), &awssns.TopicProps{
DisplayName: jsii.String("MyCoolTopic"),
})
Resource name MyTopic86869434 in Cfn
Resources:
MyTopic86869434:
Type: AWS::SNS::Topic
Properties:
DisplayName: MyCoolTopic
Metadata:
aws:cdk:path: GocdkStack/MyTopic/Resource
When CloudFormation creates the resource it will give it an ID like GocdkStack-MyTopic86869434-1EQYGNEXFF12T
. This is called the Physical ID Each call to an AWS API concerning this Topic (like aws sns list-subscriptions-by-topic
) will need to have the Physical ID.
Call to get information about the Topic resource
aws sns list-subscriptions-by-topic --topic-arn "arn:aws:sns:eu-central-1:669453403305:GocdkStack-MyTopic86869434-1EQYGNEXFF12T"
{
"Subscriptions": []
}
Physical ID GocdkStack-MyTopic86869434-1EQYGNEXFF12T in deployed Stack
{
"LogicalResourceId": "MyTopic86869434",
"PhysicalResourceId": "arn:aws:sns:eu-central-1:669453403305:GocdkStack-MyTopic86869434-1EQYGNEXFF12T",
"ResourceType": "AWS::SNS::Topic"
}
So test running on the physical side need to have the physical ID.
My goal for the CIT is to provide helper functions for an easy mapping of Construct names and physical IDs. For the quick start we will create a SystemsManager Parameter Store parameter for the physical id.
Service Test Libraries
A general distinction for testing libraries is if you want to use a specialized DSL (domain-specific language) or using low levels calls with the AWS SDK.
An example for an testing DSL is Chef InSpec
E.g. an Application Load balancer can be tested with:
describe aws_alb('arn:aws:elasticloadbalancing') do
it { should exist }
end
describe aws_alb(load_balancer_arn: 'arn:aws:elasticloadbalancing') do
it { should exist }
end
Or the generated SNS Topic could be tested in InSpec with:
describe aws_sns_topic('arn:aws:sns:*::my-topic-name') do
it { should exist }
end
The plus of such an generic approach is an easy start. The downside is that the DSL is limited and extensions of the DSL is more complicated. And as new AWS services types keep coming, it is very hard work keeping creating new DSL items for each new service type.
Between DSL and totally relying on coding with pure SDK is terratest. This is an GO library aimed at helping tests for terraform generated infrastructure. As the infrastructure does not care about how its generated, its quite easy to use it for Cfn/CDK generated resources also. Some AWS helper functions for testing you will find in the module terratest/aws.
I have used terratest
now in a handful of projects and found it quite useful. As it uses the AWS GO SDK, you may easily add AWS GO SDK API Calls and have access to the whole AWS API.
terratest
has code for checking SNS Topic existance here
Creating a test for SNS with the GO SDK would be easy as adding a few lines in this GO V2 SNS Example: GO SNS.
Context aware Testing
Another challenge is that some resources under the test microscope only work when its linked to another resource, which I will call agents. This is the case with SecurityGroups and IAM policies.
VPC/Network Context
Without attaching an Security Group to an ENI Elastic Network Interface you may not check whether it really works.
2018 I used Chef kitchen/inspec in this (german) blogpost to create a test and a testee instance to test routing with transit gateway: https://aws-blog.de/2018/12/mit-allen-verbunden-teil-1.html
Another example of inspec with Systems Manager is described here in Thomas post: https://aws-blog.de/2020/10/air-gapped-compliance-scans-with-inspec.html
IAM Context
If you want to test the results of IAM policies, you need to have en entity which has these policies attached. With the complexity and feature rich IAM policy json/yaml data, testing could often help to get clarity here.
We will look at the agent problem later.
Now we start with a working first example.
CIT UseCase: End to End Testing an CDK generated Load Balancer Web Server
I want to code an End to End test for an CDK generated webserver with Application Load Balancer.
The Application
To start right away without additional libraries, i just write the physical ID, or the dns name in this example to the Systems Manager (SSM) Parameter Store. With that the code has a mapping between the logical and physical ID.
The Load Balancer GO CDK code:
lb := elasticloadbalancingv2.NewApplicationLoadBalancer(stack, aws.String("LB"),
&elasticloadbalancingv2.ApplicationLoadBalancerProps{
Vpc: myVpc,
InternetFacing: aws.Bool(true),
LoadBalancerName: aws.String("ALBGODEMO"),
},
)
Output the Url to Parameter Store:
ssm.NewStringParameter(stack, aws.String("govpc"),
&ssm.StringParameterProps{
Description: aws.String("alb"),
ParameterName: aws.String("/cdk-templates/go/alb_ec2"),
StringValue: lb.LoadBalancerDnsName(),
},
)
Test Code
terratest
provides an easy method to test http calls with the http_helper
func TestALBRequest(t *testing.T) {
storedUrl := aws.GetParameter(t,region,"/cdk-templates/go/alb_ec2")
url := fmt.Sprintf("http://%s", storedUrl)
sleepBetweenRetries, error := time.ParseDuration("10s")
if error != nil {
panic("Can't parse duration")
}
http_helper.HttpGetWithRetry(t, url, nil, 200 , "<h1>hello world</h1>", 20, sleepBetweenRetries)
}
Run, Test and Destroy script
In the github repo, you see all scripts defined in the Taskfile.yml
:
Taskfile settings
- In the moment CDK version going wild, so I use a fixed CDK version number:
vars:
version: v2.0.0-rc.3
constructs: v10.0.5
npxoptions: -y
...
npx {{.npxoptions}} cdk@{{.version}}
- Deploy
- npx {{.npxoptions}} cdk@{{.version}} deploy --require-approval never --profile $AWSUME_PROFILE
- Test
- go test
or use go test -v
for chattyness.
- Destroy
- npx {{.npxoptions}} cdk@{{.version}} destroy --force --profile $AWSUME_PROFILE
Calling the deploy-test-destroy cycle on different settings:
FAIL 1
If the stack is not created yet, the test fails:
go test
--- FAIL: TestALBRequest (0.41s)
ssm.go:18:
Error Trace: ssm.go:18
alb_ec2_test.go:16
Error: Received unexpected error:
ParameterNotFound:
status code: 400, request id: 6576de83-ff43-4817-b7c6-d337637d0542
Test: TestALBRequest
FAIL
exit status 1
FAIL alb_ec2 0.875s
This is because the SSM parameter does not exist - ParameterNotFound
.
Fail 2
If I forgot to start the httpd in the userdata script, the test will also fail:
yum update
yum install -y httpd
systemctl enable httpd
echo "<h1>hello world</h1>" > /var/www/html/index.html
test:
Stack ARN:
arn:aws:cloudformation:eu-central-1:669453403305:stack/AlbInstStack/ae48b400-b94e-11eb-9fc8-0635bbfae99c
task: go test
TestALBRequest 2021-05-20T11:39:44+02:00 retry.go:91: HTTP GET to URL http://ALBGODEMO-1441991621.eu-central-1.elb.amazonaws.com
TestALBRequest 2021-05-20T11:39:44+02:00 http_helper.go:32: Making an HTTP GET call to URL http://ALBGODEMO-1441991621.eu-central-1.elb.amazonaws.com
TestALBRequest 2021-05-20T11:39:44+02:00 retry.go:103: HTTP GET to URL http://ALBGODEMO-1441991621.eu-central-1.elb.amazonaws.com returned an error: Validation failed for URL http://ALBGODEMO-1441991621.eu-central-1.elb.amazonaws.com. Response status: 502. Response body:
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
...
</html>. Sleeping for 10s and will try again.
The SSM parameter exists, points to the right ALB, but the webserver is not running.
With the retry cycle I make sure that CloudFormation has enough time to create the resources.
OK
If I fix the userdata, created the right alb etc the test will pass:
task test
task: go test
TestALBRequest 2021-05-16T16:56:25+02:00 retry.go:91: HTTP GET to URL http://ALBGODEMO-465149672.eu-central-1.elb.amazonaws.com
TestALBRequest 2021-05-16T16:56:25+02:00 http_helper.go:32: Making an HTTP GET call to URL http://ALBGODEMO-465149672.eu-central-1.elb.amazonaws.com
PASS
ok alb_ec2 0.771s
This is a simple example. I did the same thing with IIS on windows and Powershell Scripting - thats more than four lines in the UserData
and this test really helped me.
You can run the test cycle with task cit
from the github
code.
Whole Deploy-Test-Destroy cycle
task cit
...
AlbInstStack: deploying...
[0%] start: Publishing 4ad61576fc9e67b1526ec28726d554e5177af3c40a96dce030832ddf21f2eda2:669453403305-eu-central-1
[100%] success: Published 4ad61576fc9e67b1526ec28726d554e5177af3c40a96dce030832ddf21f2eda2:669453403305-eu-central-1
AlbInstStack: creating CloudFormation changeset...
[██████████████████████████████████████████████████████████] (48/48)
✅ AlbInstStack
Stack ARN:
arn:aws:cloudformation:eu-central-1:669453403305:stack/AlbInstStack/a2bc6830-bba0-11eb-b5db-0ad1bb830c16
task: go test
TestALBRequest 2021-05-23T10:31:01+02:00 retry.go:91: HTTP GET to URL http://ALBGODEMO-1159953432.eu-central-1.elb.amazonaws.com
TestALBRequest 2021-05-23T10:31:01+02:00 http_helper.go:32: Making an HTTP GET call to URL http://ALBGODEMO-1159953432.eu-central-1.elb.amazonaws.com
PASS
ok alb_ec2 1.622s
Profile ggtrcadmin
task: npx -y cdk@v2.0.0-rc.3 destroy --force --profile $AWSUME_PROFILE
AlbInstStack: destroying...
10:31:19 | DELETE_IN_PROGRESS | AWS::CloudFormation::Stack | AlbInstStack
...
✅ AlbInstStack: destroyed
Using stages for Infrastructure testing vs Integrated End-to-End Test
With the userdata
in the example the webserver just responded with a testable response.
In order to distinguish between infrastructure test and app+infrastructure end-to-end test, context variables can be used.
The CDK documentation shows you how to do it:
With a flag “stage”, which holds values like ‘[dev|prod] you switch between test userdata and production userdata:
stage := scope.Node().TryGetContext(aws.String("stage"))
And check the flag in your code like:
userdataFileName := "userdata/webserver-infratest.sh"
if stage == "prod"{
userdataFileName="userdata/webserver-production.sh"
}
For prod stage with: cdk synth -c stage=prod
now you get:
UserData:
Fn::Base64: |-
#!/bin/bash
yum update
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Styled Production Page</h1> <p> lorem ipso</p>" > /var/www/html/index.html
Whereas with cdk synth -c stage=dev
you get :
UserData:
Fn::Base64: |-
#!/bin/bash
yum update
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>hello world</h1>" > /var/www/html/index.html
Which would deploy a testable instance and webserver.
Other CDK languages
This approach involves programming only the “cit” infrastructure test in GO. Here is the same example with the TypeScript CDK LoadBalancer and cit in GO:
CDK example template LoadBalancer EC2.
By using the SystemsManager parameter store as a language-independent parameter transfer, we are polyglot.
Conclusion
It was really easy (30 lines of go code) to test a CDK generated webserver. This approach can be used for all CDK languages. In the next part I will show a GO module for directly accessing physical IDs via the CDK name aka logical ID.
Check it out
This code is available at the tecRacer Github “cdk-templates” repository:
https://github.com/tecracer/cdk-templates/tree/master/go/alb_ec2
For remarks, discussion, chatting please contact me on twitter @megaproaktiv.
I hope these thought can be useful for you next project!
Notes
(*) Create a GO CDK V2 Application
To build a GO CDK application, you currently have to create the template in cdk v1 and then manually upgrade the modules to CDK V2.
- Create go app
alias cdk1='npx cdk@v1.105.0'
cdk1 init app --language=go
- Migrate the modules, here I have described how to:
Migrating to AWS CDK v2 - the missing GO manual
- Switch to CDK V2
alias cdk='npx cdk@v2.0.0-rc.4'
cdk synth