Working with lists in DynamoDB
DynamoDB has support for storing complex data types like lists, sets or maps (aka dictionaries/hash tables). This capability allows for flexible usage patterns. In this article we’ll take a closer look at lists. We’ll explore what is possible with them, what isn’t and how we can manipulate them through Python.
This article doesn’t follow a clear storyline, it’s more like a list of recipes you can use in your own projects.
The Basics
DynamoDB supports lists as attributes for items.
However: they’re not supported as part of a key.
That means the attributes that make up your partition and sort key can’t be lists (or maps or sets for that matter).
If you try to create an item with an attribute that’s part of a global secondary indexes’ key schema that has an incompatible data type, you’ll get the error below.
In the example GSI1PK
and GSI1SK
are the partition and sort keys of the global secondary index GSI1
.
One of the things that makes GSIs useful is that you can create them at any point in time. Even after you’ve added data to the table. Now the question is: what happens if I create a GSI based on an attribute that may be a complex data type in some of the already existing items. The answer is a little anticlimactic. Index creation works, but the items won’t show up in the GSI.
Lists can store items of different types, that means you’re free to mix numbers, strings, sets, lists and other types in a single list. An item like this that mixes different data types is perfectly valid:
{
"listAttribute": {
"L": [
{
"N": "1"
},
{
"M": {
"a": { "S": "b"}
}
},
{
"L": [
{ "N": "1"}
]
},
{
"NS": ["1", "2"]
},
{ "S": "text"}
]
},
"PK": {
"S": "pk"
}
}
Creating a demo table
Let’s move on to manipulating lists. We’ll use Python and the AWS SDK for this. First we’ll create a table for us to work with - it’s a simple table with On-Demand capacity and a partition key that is also the primary key.
"""Quick primer for working with lists in DynamoDB attributes"""
import typing
import boto3
from botocore.exceptions import ClientError
TABLE_NAME = "list-demo"
def create_table_if_not_exists():
try:
boto3.client("dynamodb").create_table(
AttributeDefinitions=[{"AttributeName": "PK", "AttributeType": "S"}],
TableName=TABLE_NAME,
KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}],
BillingMode="PAY_PER_REQUEST"
)
except ClientError as err:
if err.response["Error"]["Code"] == 'ResourceInUseException':
# Table already exists
pass
else:
raise err
Creating items with lists
Now that we have a table, we can think about the kind of data we want to store in it.
I decided on a simple pattern where there is a sensor and each sensor can have a list of measurements.
To create a sensor with the list of measurements, I’m using the table-resource from boto3
, which automatically translates the Python data types to the underlying DynamoDB format.
Creating an item is now a simple put_item
operation on the table resource.
Note that I’ve also included a condition that raises an exception if the item already exists.
This way we’ll only create new items and not overwrite existing ones.
import typing
import boto3
import boto3.dynamodb.conditions as conditions
from botocore.exceptions import ClientError
TABLE_NAME = "list-demo"
def create_sensor_if_not_exists(sensor_id: str, measurements: typing.List[int] = None):
"""Create a new sensor with optional measurements if it doesn't exist."""
measurements = measurements or []
table = boto3.resource("dynamodb").Table(TABLE_NAME)
try:
table.put_item(
Item={
"PK": f"S#{sensor_id}",
"sensorId": sensor_id,
"type": "SENSOR",
"measurements": measurements
},
ConditionExpression=conditions.Attr("PK").not_exists()
)
except ClientError as err:
if err.response["Error"]["Code"] == 'ConditionalCheckFailedException':
raise ValueError("Sensor already exists") from err
else:
raise err
This function optionally accepts a list of initial measurements. If they’re not supplied, it will just store an empty list on the item. This wasn’t possible in the old days, but DynamoDB now supports empty lists.
Appending to lists
A second use case would be to append a new measurement to the list.
To achieve this we could read the item, append the new measurement to the list locally and subsequently overwrite the old item, but that would be inefficient.
DynamoDB has a list_append
function that is supported in the UpdateItem
API call.
This also has the benefit that DynamoDB takes care of any race conditions that may arise when we update an item.
Here’s an example for that:
import typing
import boto3
import boto3.dynamodb.conditions as conditions
from botocore.exceptions import ClientError
TABLE_NAME = "list-demo"
def append_measurement_to_sensor(sensor_id: str, measurement: int):
"""Add a measurement to a sensor if said sensor exists"""
table = boto3.resource("dynamodb").Table(TABLE_NAME)
try:
table.update_item(
Key={
"PK": f"S#{sensor_id}",
},
UpdateExpression="SET #m = list_append(#m, :measurement)",
ExpressionAttributeNames={
"#m": "measurements",
},
ExpressionAttributeValues={
":measurement": [measurement]
},
ConditionExpression=conditions.Attr("PK").exists()
)
except ClientError as err:
if err.response["Error"]["Code"] == 'ConditionalCheckFailedException':
raise ValueError("Sensor doesn't exist") from err
else:
raise err
I want to point out how the UpdateExpression
works.
The expression SET #m = list_append(#m, :measurement)
essentially says: For the item that matches the Key
, set the attribute that’s referenced as #m
to the value of list_append(#m, :measurement)
.
The latter only works, if #m
is of type list and in that case adds the value of the :measurement
placeholder at the end.
The ExpressionAttributeNames
argument is responsible for replacing any #
-variables in the update expression.
ExpressionAttributeValues
on the other hand replaces all :
-variables in the update expression.
This is a good practice and it allows you to circumvent problems, if your attributes have the names of reserved keywords in DynamoDB.
Deleting from lists
Now that we’ve added a few measurements, we notice that some of them are incorrect.
Let’s remove those.
Removing list items can be done through an UpdateItem
call with a specific update expression.
import typing
import boto3
import boto3.dynamodb.conditions as conditions
from botocore.exceptions import ClientError
TABLE_NAME = "list-demo"
def delete_measurement_from_sensor(sensor_id: str, measurement_idx: int):
"""Remove the measurement at a specific index from a sensor if the sensor exists"""
table = boto3.resource("dynamodb").Table(TABLE_NAME)
try:
table.update_item(
Key={
"PK": f"S#{sensor_id}",
},
UpdateExpression=f"REMOVE #m[{measurement_idx}]",
ExpressionAttributeNames={
"#m": "measurements",
},
ConditionExpression=conditions.Attr("PK").exists()
)
except ClientError as err:
if err.response["Error"]["Code"] == 'ConditionalCheckFailedException':
raise ValueError("Sensor doesn't exist") from err
else:
raise err
Note that this removes items from the measurement list based on their index in the list (0-based). I’ve also added a condition that verifies the item exists before we remove a value. This is actually optional as it wouldn’t fail without it. In my case I want it to fail if it can’t find the item, because something clearly has gone wrong and I want to be notified of that fact.
Appending to a list and updating a specific value at the same time
The last use case is an edge case.
Suppose we want to change the value of an existing measurement at any point of the list and append a new measurement at the end.
Easy, you might think - just combine list_append
and the regular set-a-value syntax.
Unfortunately that doesn’t work (see this stackoverflow question for an example) and you’ll get an error like this:
Two document paths overlap with each other; must remove or rewrite one of these paths
Fortunately there is a neat workaround for this. When you set a high index on your update call that is outside of the range of the list, the value will be appended to the end.
import typing
import boto3
import boto3.dynamodb.conditions as conditions
from botocore.exceptions import ClientError
TABLE_NAME = "list-demo"
def change_first_and_append(sensor_id: str, new_first: int, to_append: int):
table = boto3.resource("dynamodb").Table(TABLE_NAME)
try:
table.update_item(
Key={
"PK": f"S#{sensor_id}",
},
UpdateExpression=f"SET #m[0] = :new_first, #m[1000000] = :new_last",
ExpressionAttributeNames={
"#m": "measurements",
},
ExpressionAttributeValues={
":new_first": new_first,
":new_last": to_append
},
ConditionExpression=conditions.Attr("PK").exists()
)
except ClientError as err:
if err.response["Error"]["Code"] == 'ConditionalCheckFailedException':
raise ValueError("Sensor doesn't exist") from err
else:
raise err
I’m using this property to avoid the aforementioned error.
In this case I know that my list will have fewer than 1.000.000 entries, so I’m using 1.000.000 in the update expression (SET #m[0] = :new_first, #m[1000000] = :new_last
) to essentially append the value to the list.
I was surprised when I learned about this behavior in the stackoverflow question I linked to, but it’s well documented:
When you use
SET
to update a list element, the contents of that element are replaced with the new data that you specify. If the element doesn’t already exist,SET
appends the new element at the end of the list.If you add multiple elements in a single
SET
operation, the elements are sorted in order by element number.
Limitations
Working with and updating lists has a few limitations at the moment:
- You can’t remove items based on their position from the end of a list (something like
list[-1]
to address the last item isn’t possible as it would be in pure Python) - You can’t have a condition that checks if an item exists in a list
- There is no way to enforce a data type for a list, you’d have to use a set for that, which has the drawback of not being ordered
- It’s unfortunately impossible to have list-based sort keys and filter based on that (although this would be really cool)
Conclusions
Working with lists is fairly easy in DynamoDB, although there are some quirks to it. If you have more of these to share, feel free to reach out to me on the social media channels in my bio, I’m happy to add them here.
— Maurice