dbt

Learn how dbt simplifies data transformations with modular SQL, testing, and automation.
dbt
AWS
Data Engineering
Author

Ravi Kalia

Published

March 22, 2025

Introduction to dbt

Modern data teams rely on dbt (Data Build Tool) to manage SQL-based transformations efficiently. Unlike traditional ETL tools, dbt follows an ELT approach, allowing analysts and engineers to:

  • Write modular SQL models
  • Automate testing & documentation
  • Optimize query performance with incremental models
  • Integrate with CI/CD & Git

In this blogpost, we’ll set up dbt for Amazon Redshift, transform data, and automate workflows. 🎯


️ Setting Up dbt with Redshift

Install dbt for Redshift

pip install dbt-redshift

Initialize a dbt Project

dbt init my_dbt_project
cd my_dbt_project

This creates a structured project:

my_dbt_project/
│── models/           # SQL transformations
│── dbt_project.yml   # Project settings
│── profiles.yml      # Redshift connection info

Configure Redshift Connection (profiles.yml)

my_dbt_project:
  outputs:
    dev:
      type: redshift
      host: my-redshift-cluster.xxxx.region.redshift.amazonaws.com
      user: my_user
      password: my_password
      port: 5439
      dbname: my_database
      schema: analytics
      threads: 4
  target: dev

Run a connection test:

dbt debug

Transforming Data with dbt Models

Create a Simple dbt Model

touch models/orders_summary.sql

Example: models/orders_summary.sql

WITH orders AS (
    SELECT 
        user_id,
        COUNT(*) AS total_orders,
        SUM(order_value) AS total_spent
    FROM raw_data.orders
    WHERE status = 'completed'
    GROUP BY user_id
)
SELECT * FROM orders;
  • Aggregates total orders and spending per user.
  • Uses { ref('orders') } for modular design.

Run the Transformation

dbt run
  • Creates analytics.orders_summary in Redshift.

Automate Data Testing

version: 2
models:
  - name: orders_summary
    tests:
      - unique:
          column_name: user_id
      - not_null:
          column_name: total_orders

Run tests:

dbt test
  • Ensures data integrity before deployment.

Generating Documentation & Lineage Graph

Run:

dbt docs generate
dbt docs serve

Open http://localhost:8080 to explore: * Data lineage graphs
* Auto-generated table definitions


Automating dbt with Apache Airflow

If you’re using AWS MWAA (Managed Airflow), schedule dbt runs with:

from airflow.providers.dbt.cloud.operators.dbt import DbtRunOperator

dbt_run = DbtRunOperator(
    task_id="run_dbt",
    dbt_cloud_conn_id="dbt_cloud_default",
    job_id=12345
)
  • Integrates dbt into your data pipeline.

Why Use dbt Instead of Pure SQL?

Feature Pure SQL dbt
Modularity ❌ Manual reuse βœ… References & macros
Testing ❌ Manual checks βœ… Automated (dbt test)
Documentation ❌ Hard to track βœ… Auto-generated
Lineage Graph ❌ No built-in support βœ… Interactive visualization
CI/CD & Git ❌ Manual scripts βœ… Git + CI/CD
Incremental Models ❌ Full recompute βœ… Efficient updates

Final Verdict: dbt makes SQL transformations scalable, testable, and production-ready. Ready to explore? Try dbt Cloud or set it up in your AWS stack!


πŸ’‘ Got questions or need help with dbt? Contact me!