Understanding dbt. Snapshots

Mykola-Bohdan Vynnytskyi
11 min readFeb 1, 2025
Photo by Aoumeur Abderrahmen on Unsplash

Welcome back to the Understanding dbt series! In this article, we’re diving into one of dbt’s most powerful features — snapshots.

Imagine this: you have a table that gets updated regularly — maybe a customer profile table, where details like email, phone number, or address change over time.

If you only keep the latest version of each record, you lose historical data — and with it, valuable insights.

This is where snapshots come in! Snapshots allow you to track and store historical changes in your data, helping you analyze how records evolve over time.

Whether you’re dealing with slowly changing dimensions (SCDs), compliance tracking, or simply need audibility, dbt snapshots provide a simple and scalable way to retain historical data without requiring complex ETL logic.

If you’re new to this series, check out the previous articles to build a solid dbt foundation:

In this article, we’ll explore:

What snapshots are and why they’re useful

How dbt tracks changes over time

Step-by-step setup of a snapshot

Real-world use cases and querying snapshots

By the end, you’ll be ready to implement snapshots in your own dbt project and start tracking historical data like a pro!

Let’s get started. 🚀

What are Snapshots in dbt?

In dbt, snapshots are a way to track and store changes to your data over time.

They allow you to maintain a historical record of your tables, capturing how specific rows evolve with each update.

Think of them as your data’s time machine — giving you the ability to look back and see what a record looked like last week, last month, or even last year.

Why Do We Need Snapshots?

Let’s say you have a table with customer information. Every time a customer updates their email, phone number, or address, the old information is overwritten.

Without snapshots, there’s no easy way to see what that data looked like before the update. This can be a problem if you need to:

  • Track changes over time (e.g., how a customer’s address history evolved).
  • Analyze trends (e.g., how customer segments change).
  • Ensure compliance and auditability (e.g., maintaining a record for legal or business reporting).
  • Recover from accidental changes (e.g., if incorrect data is pushed to the table).

Snapshots help solve these problems by creating a versioned history of your data.

Slowly Changing Dimensions (SCDs)

In data warehousing, snapshots are often used to handle Slowly Changing Dimensions (SCDs).

SCDs refer to dimensions that change over time, like a customer’s address or a product’s price. There are different strategies for managing these changes:

  • Type 1 SCD: Overwrite the old data with the new data (no history is kept).
  • Type 2 SCD: Keep both the old and new data, allowing you to track historical changes. This is where dbt snapshots shine!

With snapshots, dbt makes it easy to implement Type 2 SCDs, automatically capturing changes and maintaining historical records.

How Snapshots Work in dbt

So, to summarize, dbt snapshots are a simple yet powerful way to track changes in your data over time.

They allow you to capture historical versions of records without complex ETL processes, making it easy to analyze how your data evolves. Whether you’re dealing with slowly changing dimensions (SCDs), compliance tracking, or just need a way to audit data changes, snapshots have you covered.

Now that we know what snapshots are and why they’re important, let’s dive into how they actually work in dbt.

Snapshots rely on smart logic to detect changes in your data and store historical versions of those changes.

The beauty of dbt is that it handles most of the heavy lifting for you — you just need to configure how you want to track those changes.

The Basics of Snapshots

Here’s how dbt snapshots work in three simple steps:

1. Initial Load: The first time you run a snapshot, dbt captures the current state of your data and saves it in a special snapshot table.

2. Change Detection: On subsequent runs, dbt checks for changes in your data based on specific criteria (like a timestamp or certain fields). If a change is detected, dbt records a new version of the row in the snapshot table.

3. Version History: Over time, your snapshot table grows, storing multiple versions of each record, complete with timestamps showing when each version was valid.

Snapshot Strategies: How dbt Detects Changes

dbt uses two main strategies to detect changes in your data: the timestamp strategy and the check strategy. Let’s break down how each one works.

1. Timestamp Strategy

The timestamp strategy is perfect when your source table has an updated_at column or something similar that tracks when a record was last modified. dbt looks at this timestamp to determine if the data has changed.

When to use it:

  • Your source table has a reliable updated_at, modified_at, or similar column.
  • You want to capture changes only when this timestamp is updated.

Example Configuration:

{% snapshot customer_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
)
}}

SELECT * FROM {{ source('raw_data', 'customers') }}
{% endsnapshot %}

In this example:

  • unique_key: Identifies each row uniquely (customer_id in this case).
  • updated_at: Tells dbt to track changes based on this timestamp column.

How it works: Every time updated_at changes for a record, dbt creates a new version of that record in the snapshot table.

2. Check Strategy

The check strategy compares the actual values in specific columns to detect changes. If any of the monitored columns change, dbt records a new version of the row.

When to use it:

  • Your table doesn’t have an updated_at column.
  • You want to track changes to specific columns, regardless of when they were updated.

Example Configuration:

{% snapshot customer_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='check',
check_cols=['name', 'email', 'phone_number']
)
}}

SELECT * FROM {{ source('raw_data', 'customers') }}
{% endsnapshot %}

In this example:

  • check_cols: dbt will compare the values in the name, email, and phone_number columns. If any of these change, a new version of the row is recorded.

How it works: Even if there’s no timestamp, dbt will detect when any of the specified columns change and capture that in the snapshot.

How Snapshots Are Stored

When dbt runs a snapshot, it creates a table in your warehouse that looks something like this:

| customer_id | name         | email                | dbt_valid_from       | dbt_valid_to         | dbt_scd_id                          |
|-------------|--------------|----------------------|----------------------|----------------------|-------------------------------------|
| 1 | Alice Smith | alice@example.com | 2024-01-01 10:00:00 | 2024-01-05 09:00:00 | 9d3b2d6e-5f2d-4c8e-bf3f-e7f1a8b8b9c7 |
| 1 | Alice Smith | alice.new@example.com| 2024-01-05 09:00:00 | null | 2f1d9b7e-3e1c-4e8c-a9f1-c9d7d8e9f1a2 |
  • dbt_valid_from: The timestamp when the version of the record became valid.
  • dbt_valid_to: The timestamp when this version was replaced. The current version will have null here.
  • dbt_scd_id: A unique ID that dbt generates for each version of the record.

How dbt Handles Snapshots Behind the Scenes

Depending on your data warehouse, dbt handles snapshots slightly differently:

  • If your warehouse supports merge statements (like Snowflake, BigQuery, and Redshift), dbt uses merge to update the snapshot table efficiently.
  • If merge isn’t supported, dbt first deletes old records and then inserts new ones to simulate the same behavior.
  • Transaction Management: dbt treats each snapshot as a single transaction. If something fails mid-way, dbt rolls back the changes to prevent partial updates and ensure data integrity.

Setting Up a Snapshot in dbt

Now that we understand how snapshots work, let’s get our hands dirty and set one up in dbt. The process is straightforward, and with just a few steps, you’ll be tracking historical changes like a pro.

Step 1: Create a Snapshot File

Snapshots in dbt are defined in snapshot files, which are SQL files stored in a special snapshots/ directory within your dbt project.

Here’s how to get started:

1. Create the snapshots/ folder in your project (if it doesn’t already exist).

2. Inside that folder, create a new file — for example, customer_snapshot.sql.

Your project structure should look like this:

dbt_project/
├── models/
├── snapshots/
│ └── customer_snapshot.sql
├── data/
└── schema.yml

Step 2: Define the Snapshot Configuration

In your new snapshot file (customer_snapshot.sql), you’ll define the configuration that tells dbt how to track changes.

You need to specify a few key things:

  • unique_key: The primary key that uniquely identifies each record.
  • strategy: How dbt should detect changes (timestamp or check).
  • Change detection details: Either the updated_at column (for timestamp strategy) or the columns to compare (for check strategy).

Here’s an example of a timestamp-based snapshot:

{% snapshot customer_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
)
}}

SELECT
customer_id,
name,
email,
updated_at
FROM {{ source('raw_data', 'customers') }}
{% endsnapshot %}

And here’s an example of a check-based snapshot:

{% snapshot customer_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='check',
check_cols=['name', 'email']
)
}}

SELECT
customer_id,
name,
email
FROM {{ source('raw_data', 'customers') }}
{% endsnapshot %}

Step 3: Run the Snapshot

Once you’ve set up your snapshot configuration, it’s time to run it!

Use the following command to execute the snapshot:

dbt snapshot

The first time you run this, dbt will capture the current state of your data. On subsequent runs, dbt will detect changes and create new versions of any records that have been updated.

Step 4: Check the Snapshot Results

dbt stores snapshots in your data warehouse in a table named after your snapshot file (in this case, customer_snapshot).
You can inspect the table to see how dbt has tracked changes over time.

Here’s what the snapshot table might look like after a few runs:

| customer_id | name         | email                  | dbt_valid_from       | dbt_valid_to         | dbt_scd_id                           |
|-------------|--------------|------------------------|----------------------|----------------------|--------------------------------------|
| 1 | Alice Smith | alice@example.com | 2024-01-01 10:00:00 | 2024-01-05 09:00:00 | 9d3b2d6e-5f2d-4c8e-bf3f-e7f1a8b8b9c7 |
| 1 | Alice Smith | alice.new@example.com | 2024-01-05 09:00:00 | null | 2f1d9b7e-3e1c-4e8c-a9f1-c9d7d8e9f1a2 |
  • dbt_valid_from: When the record became valid.
  • dbt_valid_to: When the record was replaced (or null if it’s the current version).
  • dbt_scd_id: A unique identifier for each version of the record.

Optional: Configure Snapshot Retention

By default, dbt keeps all historical versions of your records. But if you want to limit how long data is retained, you can configure retention settings.

Example of retaining snapshots for 90 days:

{% snapshot customer_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at',
invalidate_hard_deletes=True,
snapshot_freshness='90 days'
)
}}

SELECT * FROM {{ source('raw_data', 'customers') }}
{% endsnapshot %}

Example Use Case: Tracking Customer Changes

Let’s bring snapshots to life with a real-world example. One of the most common use cases for dbt snapshots is tracking changes in customer data.

Customer information — like names, email addresses, or subscription statuses — often changes over time. Snapshots let you track these changes and maintain a historical record of every update.

Tracking Customer Email Changes

Imagine you have a customers table in your data warehouse. This table stores details like customer_id, name, email, and updated_at.

Whenever a customer updates their email, the old email is overwritten, and only the latest information remains.

But what if you need to see how many times a customer changed their email or track when specific changes happened?

That’s where snapshots come in.

Setting Up the Snapshot

Here’s how you’d set up a snapshot to track changes in the customers table.

Step 1: Define the Snapshot

In your snapshots/customer_snapshot.sql file, configure the snapshot to track changes based on the updated_at timestamp:

{% snapshot customer_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
)
}}

SELECT
customer_id,
name,
email,
updated_at
FROM {{ source('raw_data', 'customers') }}
{% endsnapshot %}
  • unique_key=’customer_id’ ensures each customer is uniquely identified.
  • strategy=’timestamp’ tells dbt to track changes using the updated_at column.

Step 2: Simulate Data Changes

Let’s assume the initial customers table looks like this:

| customer_id | name         | email               | updated_at          |
|-------------|--------------|---------------------|---------------------|
| 1 | Alice Smith | alice@example.com | 2024-01-01 10:00:00 |
| 2 | Bob Johnson | bob@example.com | 2024-01-02 14:30:00 |

A few days later, Alice updated her email:

| customer_id | name         | email                    | updated_at          |
|-------------|--------------|--------------------------|---------------------|
| 1 | Alice Smith | alice.new@example.com | 2024-01-05 09:00:00 |
| 2 | Bob Johnson | bob@example.com | 2024-01-02 14:30:00 |

Step 3: Run the Snapshot

After Alice updates her email, you run the snapshot:

dbt snapshot

dbt will detect the change and create a new version of Alice’s record in the snapshot table.

What the Snapshot Looks Like

Your snapshot table now stores both versions of Alice’s data:

| customer_id | name         | email                   | dbt_valid_from       | dbt_valid_to         | dbt_scd_id                          |
|-------------|--------------|-------------------------|----------------------|----------------------|-------------------------------------|
| 1 | Alice Smith | alice@example.com | 2024-01-01 10:00:00 | 2024-01-05 09:00:00 | 9d3b2d6e-5f2d-4c8e-bf3f-e7f1a8b8b9c7|
| 1 | Alice Smith | alice.new@example.com | 2024-01-05 09:00:00 | null | 2f1d9b7e-3e1c-4e8c-a9f1-c9d7d8e9f1a2|
| 2 | Bob Johnson | bob@example.com | 2024-01-02 14:30:00 | null | 4c8e5f2d-9d3b-2d6e-bf3f-e7f1a8b8b9c7|
  • dbt_valid_from marks when each version became valid.
  • dbt_valid_to shows when a version was replaced (null for the current version).
  • dbt_scd_id is a unique identifier for each version of the record.

Querying the Snapshot: Analyzing Changes

Now that we’ve captured historical changes, let’s query the snapshot to analyze them.

  1. Find All Customers Who Changed Their Email
SELECT 
customer_id,
name,
COUNT(*) AS number_of_changes
FROM {{ ref('customer_snapshot') }}
GROUP BY customer_id, name
HAVING COUNT(*) > 1

This query returns all customers who’ve had more than one version of their record, indicating they’ve updated their information.

2. Track the History of a Specific Customer

Want to see how many times Alice changed her email?

Run this:

SELECT 
customer_id,
name,
email,
dbt_valid_from,
dbt_valid_to
FROM {{ ref('customer_snapshot') }}
WHERE customer_id = 1
ORDER BY dbt_valid_from

This will show the full history of Alice’s email changes, along with when each version was valid.

Real-World Applications

Snapshots aren’t just for tracking customer emails. Here are a few more scenarios where snapshots come in handy:

  • Subscription Changes: Track when users upgrade, downgrade, or cancel subscriptions.
  • Employee Records: Keep a history of job titles, departments, or salary changes.
  • Pricing Changes: Maintain historical product prices to analyze trends over time.
  • Audit Trails: Ensure compliance by retaining historical versions of sensitive data.

Conclusion

And that’s a wrap on dbt snapshots! 🎉

In this article, we explored how snapshots help you track historical changes in your data, making it easy to manage slowly changing dimensions, audit data updates, and analyze trends over time. Whether you’re monitoring customer updates, subscription changes, or any evolving dataset, snapshots give you a powerful, automated way to keep a full history of your records.

We covered:

What snapshots are and why they’re important.

How dbt detects changes using timestamp and check strategies.

Setting up snapshots step by step in your project.

• A real-world use case to bring it all together.

If you enjoyed this article, don’t forget to check out the rest of the Understanding dbt series:

Thanks for reading! 🙌

If you found this helpful, give it a clap, share it with your network, and feel free to drop any questions or thoughts in the comments.

Stay tuned for more deep dives into dbt and other data engineering topics.
See you in the next article! 🚀

Sign up to discover human stories that deepen your understanding of the world.

Mykola-Bohdan Vynnytskyi
Mykola-Bohdan Vynnytskyi

Written by Mykola-Bohdan Vynnytskyi

Data Engineer by day. YouTuber, author, and creator of courses by night. Passionate about Big Data and self-development.

No responses yet

Write a response