From EC2 to C6 - Clear Compliance Chaos with clear Customer Collaboration with MSSP



To battle security compliance is a challenge, which is besten solved as a team sport. We show you some solutions to win this tug of war, like knowing not only the technical, but also the psychological aspects of the game.

This is the way our MSSP works together with you.

Big test before going live vs. shift left

Phases

Imagine you have a small AWS workload up and running. After deployment (2) you decide to perform a Well Architected Review . You will find out that its a little bit late for this.

Two examples from my review experience:

Fun question - how do you know, how old an AWS account is?

Answer: Just look at the age of the oldest IAM User with static access keys. I have seen keys which were several 1000 Days old. If you are very lucky, nothing has happened. Here comes the problem: You really do not know whether an incident occured or when. If you realize that a kex was leaked, you can only guess, when in the last 1000 days this happened.

See SEC02-BP05 Audit and rotate credentials periodically for details.

Another common example: You decided - for cost reasons - to use only one AWS account for development, testing and production. What could go wrong? See OPS05-BP08 Use multiple environments/Common anti-patterns for some answer.

After some discussion with our AWS architects you are convinced to use several accounts. Just - seperating accounts after the fact is much more difficult than setting up several accounts from the beginning. So you pay double for your cost savings.

The takeaways are:

  1. You cannot test quality into an existing architecture afterwards, you just detect findings.
  2. Check quality and compliance as early as possible. This is called “shift left”, as you shift the check left on the timeline

How many is too many?

There are several hundert checks on AWS services. They are bundled in packages like the Well Architected Framework or the CIS Benchmarks. The package you should start with is the AWS Foundational Security Best Practices.

To illustrate some of the security tools, I have deployed this small architecture in a freshly nuked account. Then I scanned the account with three different tools.

Small Architecture

Before reading on, please take a look at the architecture and try to find the security issues. What would you do differently?

A look at tools

I want to show you, that the “score” is just a rough number. Like with all KPI/metris, you should not mix systems. The value is seeing a trend in the numbers, not the number itself.

The tools -which all have different scores - are:

Score Turbot Pipe 19 critical

Score Turbot

With Turbot Pipe you can scan a single account in seconds. The score shows just the AWS foundational check. There are mods for other cloud providers availably.

This is a ideal tool to do a quick pre-scan.

Score Security HUB 91%

Score Security HUB

This is on of the AWS batteries included tools. You cannot include other cloud providers or endpoint security (like workstations virus scan)

It needs some time to get started: Security Hub 2 hours

If you really know all controls by heart, you can use this tool to get a good overview.

Score Trend Vision One 68%

Score Trend

This is the most comprehensive tool. It includes endpoint security and cloud security. It is not limited to AWS. We use it at tecRacer MSSP because the different views on the architecture are bundled in one tool.

Tools and numbers

Which all tools have in common, that the settings for findings like criticality or scope are pre-defined. Let me show you an example, where the prefined settings just not match the usecase.

In the architecture there is a single EC2 instance type t4g.nano. This costs about 4 USD per month.

The findings are:

tvo findings

Finding Why it does not apply here
Instance detailed monitoring is not enabled This is an rarely used application, so we do not need detailed monitoring
Instance Termination protection This is to be discussed. As the instance is created with Infrastructure as Code, we can recreate it in minutes.
Instance in auto scaling group  This is a UseCase for a defined workload with maximum 10 people accessing the application. The server can handle up to 1000 users, so we do not need an auto scaling group.

Therefore its important to have the ability to adjust the settings to your needs. Like the criticality of a finding, here an example for Trend Vision one:

tvo critiality

Another funny example:

To enable Trend Vision one, an Lambda function is created in the monitored account. This lambda has findings from the beginning, which also does not apply:

tvo lambda

Finding Why it does not apply here
VPV Access see Lambda.3 This is not only wrong, it could be harmful
Tracing  For distributed services, tracing can be helpful. But it also affects the timing of the lambda.

This is the Trend Micro information page for the VPC rule, here called Lambda-007: Lambda-007

So if and only if the Lambda function have to access resources with private IP you should enable VPC access. But we hade a project where enabling VPC access led to an error which was hard to find:

If you have just on Lambda function on the network card in the VPC and this lambda is just called once a week, then the network card is automatically deleted after a certain ammount of idle time. That means that the first start of the Lambda function after the network card was deleted can take minutes. That caused the lambda to time out and procuced an error.

Not just noise - know thy scope

The two examples showed noise, which should be suppressed by rules. Many other findings pointing to real problems. And thats just what the findings are: The point you to a possible problem. You have to decide if this is a real problem or not.

Scopes

  1. The first question is: What is the scope of the findings? Does that fit my usecase
  2. Then enlarge the time frame: Will that become a problem in the future?

Some general questions you might find usefull to enlarge the scope of your thinking:

A) What will happen, if the finding will be remediated?

B) What will not happen, if the finding will not be remediated?

C) What will happen, if the finding will not be remediated?

D) What will not happen, if the finding will not be remediated?

Remediaton means fixing the finding, in the example put the Lambda function in a VPC.

With the Lambda example:

A) a network card is created and will be automatically deleted. All calls from the Lambda function will be coming from a static ip. The very first call will take minutes.

B) Resources with a private IP will not be accessible

C) First cold start will take only milliseconds/seconds

D) Timeout of the Lambda

To make a good decision you have to work with the scope and with some psychological principles.

Psychological bias

Commitment and Consistency

Robert B. Cialdini describes in his book “Influence” the principle of commitment and consistency. If you have invested mental or physical resources in a decision, you are more likely to stick to it. The result is, that you are more likely to stick to a decision, even if it is wrong. That applies to the findings of the security tools as well. You have decided about the architecture, like using only one AWS account and you will defend this decision. Unless you work trustfully together with an independent observer, eg from MSSP.

Social System Culture

I have shown that you need extensive knowledge in AWS operations to make a good decision about the findings of the security tools. When you develop software running on AWS you need extensive knowledge and expierence in software development.

Space

Working trustfully together in the Collaboration Space gives you the ability to overcome the psychological bias .

Summary: The shift left collaboration pattern

To cut through the noise of the many findings, know thy scope and collaborate with an independent observer, here is a very simplified recipe:

  1. Shift left: Start with the security tools as early as possible. Establish a baseline with ideally zero open findings.
  2. Collaboration Space: Work trustfully together with an independent observer from MSSP. This person should have a good understanding of the security tools and the architecture.
  3. Discuss delta, the news findings after each sprint. This way the ammount of findings is managable

In the discussion between dev and ops:

  • Know the scope of the findings
  • Enlarge the time frame
  • Ask the 4 questions

What’s next?

If you need MSP or MSSP Hosting for your AWS workloads, don’t hesitate to contact us at tecRacer.

Thanks for reading. Enjoy building!

Similar Posts You Might Enjoy

Choosing solutions for container security on AWS.

Even the best security measures can’t prevent all security incidents. With the right tools, you can detect and respond to incidents quickly. The problem is not anymore that you have no tools at hand but that you have too many. How do you choose the right one? - by Gernot Glawe

Creating an Alarm to Detect Usage of a Pending Deletion KMS Keys and AWS Secrets

In cloud computing, security is a critical concern. While AWS provides backup solutions for many resources, custom configurations often require additional protection. Two key services, AWS Key Management Service (KMS) and AWS Secrets Manager, don’t offer direct backup options. However, they implement a deletion grace period— by default 30 days and this is the maximum — allowing for potential restoration. - by Alexey Vidanov

From fragile to formidable: How to detect, fix and prevent container vulnerabilities with Inspector and Docker Scout

A webserver running on a container. Sound simple. Let`s dive deeper into how your architecture choices affect application security. I use docker scout for the container and show how Amazon Inspector can serve as a general-purpose security tool. - by Gernot Glawe