Cross Account Resource Access - Invalid Principal in Policy
Separating projects into different accounts in a big organization is considered a best practice when working with AWS. AWS supports us by providing the service Organizations. However, this leads to cross account scenarios that have a higher complexity. My colleagues and I already explained one of those scenarios in this blog post, which deals with S3 ownership (AWS provided a solution for the problem in the meantime). Today, I will talk about another cross account scenario that came up in our project, explain why it caused problems and how we solved them.
The Scenario
It is a rather simple architecture. A Lambda function from account A called Invoker Function needs to trigger a function in account B called Invoked Function. Obviously, we need to grant permissions to Invoker Function to do that. We have some options to implement this.
The Simple Solution (that caused the Problem)
The simplest way to achieve the functionality is to grant the Invoker Function in account A permission to invoke the Invoked Function in account B by attaching the following policy to the role of Invoker Function:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
],
"Resource": "arn:aws:lambda:eu-central-1:<account-id-b>:function:invoked-function"
}
]
}
While this would be a complete solution in a non-cross-account scenario, we need to do an additional step, namely granting the invoke permission also in the resource policy of Invoked Funciton in Account B. This is not possible via the console, so you will need to use the CLI or even better, build everything via Infrastructure as Code (IaC). Using the CLI the necessary command looks like this:
aws lambda add-permission --function-name invoked-function \
--statement-id any-id \
--action lambda:invokeFunction \
--principal arn:aws:iam::<account-id-b>:role/service-role/invoker-function-role-3z82i06i
The Invoker role ARN has a random suffix, as it got automatically created by AWS. In the AWS console of account B the Lambda resource based policy will look like this:
{
"Version": "2012-10-17",
"Id": "default",
"Statement": [
{
"Sid": "any-id",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<account-id-a>:role/service-role/invoker-function-role-3z82i06i"
},
"Action": "lambda:invokeFunction",
"Resource": "arn:aws:lambda:eu-central-1:<account-id-b>:function:invoked-function"
}
]
}
Now this works fine and you can go for it. Have fun :)
Unless you are in a real world scenario, maybe even productive, and you need a reliable architecture. Then go on reading. In the real world, things happen. In this case the role in account A gets recreated. As the role got created automatically and has a random suffix, the ARN is now different. Consequently, the Invoker Function does not have permission to trigger Invoked Function anymore. You can simply solve this problem by creating the role by yourself and giving it a name without random suffix and you will be surprised: You still get permission denied in Invoker Function when recreating the role.
What happened is that on the side of Invoked Function in account B, the resource policy changed to something like this as soon as the role gets deleted:
{
"Version": "2012-10-17",
"Id": "default",
"Statement": [
{
"Sid": "any-id",
"Effect": "Allow",
"Principal": {
"AWS": "AROA4KVSNIJZBLR5NCUAW"
},
"Action": "lambda:invokeFunction",
"Resource": "arn:aws:lambda:eu-central-1:<account-id-b>:function:invoked-function"
}
]
}
The principal changed from the ARN of the role in account A to a cryptic value. This is due to the fact that each ARN at AWS has a unique id that AWS works with in the backend. We normally only see the better-readable ARN. However, as the role in A got recreated, the new role got a new unique id and AWS can’t resolve the old unique id anymore. Hence, we do not see the ARN here, but the unique id of the deleted role. Although we might have the same ARN when recreating the role, we do not have the same underlying unique id. That is the reason why we see permission denied error on the Invoker Function now. This is done for security purposes by AWS.
A consequence of this error is that each time the principal changes in account A, account B needs a redeployment.
But a redeployment alone is not even enough.
A simple redeployment will give you an error stating Invalid Principal in Policy
.
Here you have some documentation about the same topic in S3 bucket policy.
To solve this, you will need to manually delete the existing statement in the resource policy and only then you can redeploy your infrastructure.
You don’t want that in a prod environment.
Instead we want to decouple the accounts so that changes in one account don’t affect the other.
The Account Id Solution
The easiest solution is to set the principal to a more static value. That is, for example, the account id of account A.
{
"Version": "2012-10-17",
"Id": "default",
"Statement": [
{
"Sid": "any-id",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<account-id-a>:root"
},
"Action": "lambda:invokeFunction",
"Resource": "arn:aws:lambda:eu-central-1:<account-id-b>:function:invoked-function"
}
]
}
However, this does not follow the least privilege principle. In this case, every IAM entity in account A can trigger the Invoked Function in account B. You could argue that account A is a trusted account from your Organization and that they do not get sensitive information or cause harm when triggering Invoked Function. Be aware that account A could get compromised. Then this policy enables the attacker to cause harm in a second account.
The Policy-Condition Solution
A nice solution would be to use a combination of both approaches by setting the account id as principal and using a condition that limits the access to a specific source ARN. This could look like the following:
{
"Version": "2012-10-17",
"Id": "default",
"Statement": [
{
"Sid": "any-id",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<account-id-a>:root"
},
"Action": "lambda:invokeFunction",
"Resource": "arn:aws:lambda:eu-central-1:<account-id-b>:function:invoked-function",
"Condition": {
"ArnLike": {
"AWS:SourceArn": "arn:aws:iam::<account-id-a>:role/service-role/invoker-role"
}
}
}
]
}
Sadly, this does not work.
The Invoker Function gets a permission denied error as the condition evaluates to false.
It seems SourceArn is not included in the invoke request.
We can’t create such a resource policy in the console and the CLI and IaC frameworks are limited to use the --source-arn
parameter to set a condition.
I tried a lot of combinations and never got it working.
The Assume-Role Solution
The last approach is to create an IAM role in account B that the Invoker Function assumes before invoking Invoked Function. The IAM role needs to have permission to invoke Invoked Function. In that case we don’t need any resource policy at Invoked Function. However, we have a similar issue in the trust policy of the IAM role even though we have far more control about the condition statement here.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<account-id-a>:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringLike": {
"aws:PrincipalArn": "arn:aws:iam::<account-id-a>:role/service-role/invoker-role"
}
}
}
]
}
Using this policy statement and adding some code in the Invoker Function, so that it assumes this role in account A before invoking the Invoked Function, works.
This is some overhead in code and resources compared to the simple solution via resource policy, but it solves our problem and provides some advantages.
First, the value of aws:PrincipalArn
is just a simple string.
AWS does not resolve it to an internal unique id.
Hence, it does not get replaced in case the role in account A gets deleted and recreated.
Second, you can use wildcards (* or ?) for potentially changing characters like e.g. a random suffix or if you want to grant the AssumeRole permission to a set of resources.
We decoupled the accounts as we wanted. As long as account A keeps the role name in a pattern that matches the value of PrincipalArn, account B is now independent of redeployments in account A.
Conclusion
The simple solution is obviously the easiest to build and has least overhead. In case resources in account A never get recreated this is totally fine. Using the accounts root as a principle without condition is a simple and working solution but does not follow least privileges principle so I would not recommend you to use it. In this scenario using a condition in the Lambdas resource policy did not work due to limited configuration possibilities in the CLI. Lastly, creating a role and using a condition in the trust policy is the solution that solves the described problems.
In this blog I explained a cross account complexity with the example of Lambda functions.
However, I guess the Invalid Principal
error appears everywhere, where resource policies are used.
I have experienced it with bucket policies and it just makes sense that it is similar with SNS topics or trust policies in IAM roles.
The difference for Lambda is that in most other cases you have more options to set conditions in the resource policy and thus you don’t need to use an extra role.