by JF OY - Mar 6, 2023
DBT is the T in ELT. It doesn’t extract or load data, but it’s extremely good at transforming data that’s already loaded into your warehouse. This “transform after load” architecture is becoming known as ELT (extract, load, transform).
DBT is a compiler and a runner
At the most basic level, DBT has two components: a compiler and a runner. Users write DBT code in their text editor of choice and then invoke DBT from the command line. DBT compiles all code into raw SQL and executes that code against the configured data warehouse.
and Best practices
✨ Encourage to add data descriptions to YAML!!!
Sources: Schemas and tables in a source-conformed structure (i.e. tables and columns in a structure based on what an API returns), loaded by a third-party tool. https://docs.getdbt.com/docs/building-a-dbt-project/using-sources
src_{source}__{object}
Table
Staging models: The atomic unit of data modeling. Each model bears a one-to-one relationship with the source data table it represents. It has the same granularity, but the columns have been renamed, recast, or usefully reconsidered into a consistent format. stg_{source}__{object}
View
DateTime relate columns should be transformed to epoch sec or timestamp with no timezone and specific show on the column name.
{what ever}_{date/timestamp}_{epoch/utc/tz}
Good way
where stored_to_backend_timestamp_tz >= '2020-03-14'::timestamp at time zone 'Asia/Jakarta'
Bad way
where stored_to_backend_timestamp_tz >= '2020-03-14'
There is a primary key that is both unique and not null (and tested).
Marts models: Models that represent business processes and entities, abstracted from the data sources that they are based on.
{dim/fact}_{object}_{t/v}
Table or View
fact_{object}_{t/v}
: A tall, narrow table representing real-world processes that have occurred or are occurring. The heart of these models is usually an immutable event stream: sessions, transactions, orders, stories, and votes.dim_{object}_{t/v}
: A wide, short table where each row is a person, place, or thing; the ultimate source of truth when identifying and describing entities of the organization. They are mutable, though slowly changing: customers, products, candidates, buildings, and employees.summary_{aggregated time window}_{object}_{t/v}
: Aggregation based on time window or other dimensions.├── dbt_project.yml
└── models
├── marts
| ├── core
| | ├── docs.md
| | ├── core.yml
| | ├── dim_account_profile_v.sql
| | └── dim/fact_*_t/v.sql
| └── {business group}
| ├── *docs.md
| ├── dim/fact/summary_*_v.yml
| └── dim/fact/summary_*_v.sql
└── staging
├── account
| ├── docs.md
| ├── src_account.yml
| ├── stg_account.yml
| └── stg_account__account_profile.sql
└── {service}
├── docs.md
├── src_{service}.yml
├── stg_{service}.yml
└── stg_{service}__*.sql
engineer maintenance
engineer maintenance
engineer maintenance
dw
└── schemas
├── src
| └── Tables
| ├── src_account__account
| ├── src_account__account_profile
| ├── src_gsheet_data__*
| ├── src_gsheet_ops__*
| └── src_billing__*
├── dbt_stg
| └── Views
| ├── stg_account__account_profile
| └── stg_billing__*
├── dbt_mart_core
| └── Views
| └── dim_account_profile_v
├── dbt_mart_ico
| └── Views
| └── dim/fact_*_v
└── dbt_mart_{business group}
└── Views
└── dim/fact_*_v
https://discourse.getdbt.com/t/faq-cleaning-up-removed-models-from-your-production-schema/113
Please only use models from dbt_mart_core
or dbt_mart_{your business group}
. Any model from other schemas may change fast and engineer won’t keep the old version.
Currently, we don’t have strict limits on tags. But some tags are used for airflow triggers. Please be careful and avoid using them in the wrong way.
daily-run
daily-test
ds-{*}
{*}_full-run
refresh all data{*}_batch-run
only fetch recent dataanomaly
unit-test
qa
retire-{date}
retire-20230201
dev-backfill
dbt_mart_core
would be fat in the future.target/compiled
directory contains select
statements that you can run in any query editor.target/run
directory contains the SQL dbt executes to build your models.logs/dbt.log
file contains all the queries that dbt runs and additional logging. Recent errors will be at the bottom of the file.