Regaining Amazon QuickSight SPICE capacity

This week, we had a bit of an issue with our Amazon QuickSight account: we ran out of space! Amazon QuickSight is a business intelligence (BI) solution offered by AWS. It is a versionless, serverless, auto-scaling solution with deep integration with AWS services and can be interacted with using the AWS SDK, the AWS CLI, or the QuickSight Console.

While Amazon QuickSight supports auto-scaling for storage (or more precisely, the automatic purchase of storage capacity when needed), we didn’t enable it for our Amazon QuickSight account. A few days ago, when experimenting with numerous datasets, we ended up using all of our purchased storage capacity.

Amazon QuickSight capacity full

When using Amazon QuickSight, you can either store your data in SPICE (Super-fast, Parallel, In-memory Calculation Engine) or directly query it from your data source whenever needed. Datasets that are directly queried from your data source do not consume SPICE capacity.

There are some advantages when saving datasets in spice:

  • Analytical queries process faster.
  • Datasets saved in SPICE can be reused at will without incurring additional costs.
  • Analyses and dashboards are not limited to the performance of the data source.

So, we ran out of SPICE capacity. Whe importing a SPICE dataset uses up all of SPICE remaining capacity, the following happens:

  • SPICE isn’t able to estimate the size of a dataset before importing it, so the dataset is first imported.
  • Even if it’s determined during the import of a dataset that no more SPICE capacity is available, Amazon QuickSight will import the whole dataset into SPICE. Any capacity on top of the Free and the purchased capacity will be referred as over capacity.
  • Amazon QuickSight will charge for the actual capacity usage. SPICE capacity currently costs $0.38 per GB per month.
  • Further ingestion won’t be possible.
  • To be able to refresh datasets or create new SPICE datasets again one will have to either purchase enough SPICE capacity or delete datasets until there is available capacity.

In our case, there was no need to purchase more capacity so we decided to delete datasets. Our Amazon QuickSight account content important datasets and analyses, so we couldn’t just delete datasets at will. We needed to find the exact datasets that were using much more capacity than expected.

When you open a Dataset in Amazon QuickSight, you can see how much SPICE capacity the dataset is using.

QuickSight SPICE dataset capacity

Having more that 100 datasets in our accounts, it would be troublesome to first get access to each of them and then open each and everyone of them to see how much SPICE capacity is used by each dataset.

Here comes the Amazon QuickSight API!
It is possible to interact with Amazon QuickSight using the AWS SDK. This provides us with a way to implement a script that will have a look at each of the datasets and let us know which datasets are the top capacity consumers in our account. The following Amazon QuickSight APIs are helpful to achieve our goal:

  • ListDataSets: List all datasets available in an Amazon Quicksight account within a given region
  • DescribeDataSet: Describe a dataset and provide information such as the used SPICE capacity
  • DeleteDataSet: Delete a dataset. While it is possible to automatically delete datasets using a lot the available capacity, that is an operation that should be done after careful consideration so an automated deletion of datasets was not implemented in the solution.

The solution is implemented as a Python script that uses the AWS Python SDK (boto3) and requests an account ID to list the top-10 datasets with the highest SPICE capacity usage. The script only considers datasets with a size of 128MB or larger. The following snippet shows the main functions in the script:


def get_top_spice_capacity_datasets(
    datasets_size_infos: list[DatasetSizeInfo], top_n: int = 10, minimum_capacity_in_bytes: int = 128 * 1024 * 1024
) -> list[DatasetSizeInfo]:
    """
    Returns the top N datasets with the highest consumed Spice capacity, filtered by a minimum capacity.

    Args:
        datasets_size_infos (list): A list of dataset size information objects.
        top_n (int): The number of top datasets to return.
        minimum_capacity_in_bytes (int): The minimum capacity in bytes to filter datasets.

    Returns:
        list: A list of dataset size information objects for the top N datasets.
    """
    filtered_datasets_size_infos = [
        d for d in datasets_size_infos if d["ConsumedSpiceCapacityInBytes"] >= minimum_capacity_in_bytes
    ]
    sorted_datasets = sorted(
        filtered_datasets_size_infos, key=lambda x: x["ConsumedSpiceCapacityInBytes"], reverse=True
    )
    return sorted_datasets[:top_n]


def list_top_spice_capacity_datasets(
    account_id: str, top_n: int = 10, minimum_capacity_in_bytes: int = 128 * 1024 * 1024
) -> list[DatasetSizeInfo]:
    """
    Lists the top N QuickSight datasets with the highest consumed Spice capacity.
    Only datasets with at least the configured minimum capacity are returned.

    Args:
        account_id (str): The AWS account ID.
        top_n (int): The number of top datasets to return.
        minimum_capacity_in_bytes (int): The minimum capacity in bytes to filter datasets.

    Returns:
        list: A list of dataset size information objects for the top N datasets.
    """
    datasets_summaries = list_quicksight_datasets_summaries(account_id=account_id)
    dataset_size_infos = get_datasets_size_infos(datasets_summaries, account_id=account_id)
    return get_top_spice_capacity_datasets(
        dataset_size_infos, top_n=top_n, minimum_capacity_in_bytes=minimum_capacity_in_bytes
    )

The full script is available on Github.

Executing the script with our QuickSight account ID generated the following (redacted) output.

> python list_top_spice_capacity_datasets.py
Enter your AWS account ID: 123456789012
Error getting dataset size for 21ed69b7-8ffe-41d4-b6e3-d9e0b8411d13/KDS-S3-Z4891-bc8d-4e6dcf374fb3_reduced.csv: An error occurred (InvalidParameterValueException) when calling the DescribeDataSet operation: The data set type is not supported through API yet
Error getting dataset size for 4b62807e-cda3-4217-b18d-5cb1e6d3e88a/titanic-demo-file: An error occurred (InvalidParameterValueException) when calling the DescribeDataSet operation: The data set type is not supported through API yet
Error getting dataset size for 64cb01af-7def-4d23-bc63-b9e6f0dec636/cust.csv: An error occurred (InvalidParameterValueException) when calling the DescribeDataSet operation: The data set type is not supported through API yet
[{'Arn': 'arn:aws:quicksight:eu-west-1:123456789012:dataset/1ade907f-c5cc-4adc-a592-8469e5a3da9f',
  'ConsumedSpiceCapacityInBytes': 135001466281,
  'DataSetId': '1ade907f-c5cc-4adc-a592-8469e5a3da9f',
  'ImportMode': 'SPICE',
  'Name': 'global-electronics-retail'},
 {'Arn': 'arn:aws:quicksight:eu-west-1:123456789012:dataset/3acccb06-a1f3-6a5d-a7bd-e1cf963303ee',
  'ConsumedSpiceCapacityInBytes': 285941024,
  'DataSetId': '3acccb06-a1f3-6a5d-a7bd-e1cf963303ee',
  'ImportMode': 'SPICE',
  'Name': 'training02 dataset'}]

Datasets created from uploaded files currently do not support the DescribeDataSet API. These are the ones generating the listed errors.

At the end of the console we see which datasets are currently using most of the SPICE capacity in our account. It’s then easier to delete unneeded datasets to the regain the SPICE capacity. As can be seen, a single dataset was responsible of around 99% of the SPICE capacity usage in our account. Deleting it helped us regain our SPICE capacity.

Amazon QuickSight SPICE dataset capacity regained

— Franck


Links:

Cover from Wikipedia

Similar Posts You Might Enjoy

How I spent a few hours using advanced technology to save $2

Opportunity cost is an important economic concept, but sometimes we need to ignore it to learn something. Join me in using a variety of services and tools to figure out what is using my KMS keys and if I can safely delete them. - by Maurice Borgmeier

Embedded Embeddings Database: Building a low cost serverless RAG solution

Retrieval-Augmented Generation (RAG) solutions are an impressive way to talk to one’s data. One of the challenges of RAG solutions is the associated cost, often driven by the vector database. In a previous blog article I presented how to tackle this issue by using Athena with Locality Sensitive Hashing (LSH) as a knowledge database. One the of the main limitations with Athena is the latency and the low number of concurrent queries. In this new blog article, I present a new low-cost serverless solution that makes use of an embedded vector database, SQLite, to achieve a low cost while maintaining high concurrency. - by Franck Awounang Nekdem

Advanced API Gateway Throttling: Time-Based Adjustments, Budget Alerts, and WAF Configured with Terraform

In the previous post, “API Gateway and Lambda Throttling with Terraform”, we covered the basics of setting up throttling for your API Gateway and Lambda functions. In this follow-up, we’ll take it to the next level, adding budget controls, time-based throttling adjustments, and AWS WAF security integration to safeguard your API while optimizing both performance and cost-efficiency. - by Alexey Vidanov