Articles tagged with "airflow"

Making the TPC-H dataset available in Athena using Airflow

The TPC-H dataset is commonly used to benchmark data warehouses or, more generally, decision support systems. It describes a typical e-commerce workload and includes benchmark queries to enable performance comparison between different data warehouses. I think the dataset is also useful to teach building different kinds of ETL or analytics workflows, so I decided to explore ways of making it available in Amazon Athena.

On-Prem Airflow to MWAA

Transforming large amounts of data into formats that help solve business problems is what data engineers excel at. A combination of Serverless tools such as Athena, StepFunctions, Lambda, or Glue can get the job done in many projects. However, some customers prefer to rely on Open-Source projects and tools to access more talent in the job market and be less reliant on a single cloud provider. Today, we’ll share a story of a modernization project that we did for a customer in the online marketing industry.

Understanding Apache Airflow on AWS

Apache Airflow doesn’t only have a cool name; it’s also a powerful workflow orchestration tool that you can use as Managed Workflows for Apache Airflow (MWAA) on AWS. This post will explain which problems the service solves, how you can get started, and the most important concepts you need to understand.