Articles tagged with "python"

Push-Down-Predicates in Parquet and how to use them to reduce IOPS while reading from S3

Working with datasets in pandas will almost inevitably bring you to the point where your dataset doesn’t fit into memory. Especially parquet is notorious for that since it’s so well compressed and tends to explode in size when read into a dataframe. Today we’ll explore ways to limit and filter the data you read using push-down-predicates. Additionally, we’ll see how you can do that efficiently with data stored in S3 and why using pure pyarrow can be several orders of magnitude more I/O-efficient than the plain pandas version.

The beating heart of SQS - of Heartbeats and Watchdogs

Using SQS as a queue to buffer tasks is probably the most common use case for the service. Things can get tricky if these tasks have a wide range of processing durations. Today, I will show you how to implement an SQS consumer that utilizes heartbeats to dynamically extend the visibility timeout to accommodate different processing durations.

Enable Autocomplete for boto3 in VSCode

One of the less pleasant aspects of working with AWS using Python is the fact that most IDEs aren’t able to natively support Autocomplete or IntelliSense for the AWS SDK for Python (boto3) because of the way boto3 is implemented. Today I’m going to show you how easy it has become to enable Autocomplete for boto3 in VSCode. Before we come to the solution, let’s talk about why native Autocomplete doesn’t work with boto3.

Getting a near-real-time view of a DynamoDB stream with Python

DynamoDB streams help you respond to changes in your tables, which is commonly used to create aggregations or trigger other workflows once data is updated. Getting a near-real-time view into these Streams can also be helpful during developing or debugging a Serverless application in AWS. Today, I will share a Python script that I built to hook into DynamoDB streams. Before we begin, I suggest you read my blog post that contains a deep dive into DynamoDB streams and how they’re implemented because we’ll be using these concepts today.

What is Amazon Ion, and how can I read and write it in Python?

Amazon Ion is a data serialization format that was open-sourced by Amazon in 2016 and is used internally at the company. Over time it has also been introduced into some AWS services and is the data format that services like the Quantum Ledger Database (QLDB) use. It has also started to appear in more commonly used services, so I think it’s worth taking a closer look at. This article will explain what Ion is, its benefits, and how you can use it in Python.