Every day, data engineers rely on a set of powerful tools to streamline data transformation tasks and maintain data robust pipelines. Among these tools, dbt (data build tool) stands out for its ability to simplify complex SQL workflows. One of these simplification mechanisms is called macros.
Macros are pieces of code that can be reused multiple times, they are analogous to functions in other programming languages, and are extremely useful if you find yourself repeating code across multiple models.
They are typically created under the macros folder of your dbt project. Nonetheless, packages offer the possibility to leverage the expertise of the community by providing pre-built, shareable collections of macros, the most known package being dbt utils.
In this article, we'll explore the three dbt macros scenarios that have become indispensable in my daily routines at Human37.
1. Union_relations macro
The union all statement combines the results of two or more select statements. This task can rapidly become a nightmare when tables have mismatched column names and order, or when the number of fields is not equal. Most of the time running union all statements preliminary requires ordering the columns of all the models combined and populating unmatched columns as null. The following example shows how an union all query between two models, for instance sales and sessions with unmatched columns, works.
SELECT
date,
landing_page_url,
sessions,
active_on_site,
NULL AS order,
NULL AS customers,
NULL AS revenue
FROM sessions
UNION ALL
SELECT
date,
landing_page_url,
NULL AS sesssions,
NULL AS active_on_site,
order,
customers,
revenue
FROM sales
As you can imagine, this gets rapidly painful when a new column needs to be added within the unions, this column would require to be casted as null values to each select statement of the union.
Fortunately, thanks to the dbt_utils package, there exists the union_all macro which frees us from this chore as it automatically populates each column that does not exist in the others select statements of the union and rearrange the fields. The hereinbelow line of code below provides the same result as the aforementioned example.
All you need to have access to that macro is to install the dbt_utils package in your packages.yml file, run the dbt deps command, and invoke the macro in a SQL model as follows.
{{ dbt_utils.union_relations(relations=[ref('sessions'), ref('sales')]) }}
Documentation: https://github.com/dbt-labs/dbt-utils#union_relations-source
2. Get_profile macro
Data profiling is the process of examining, analyzing, and creating useful summaries of data. The idea is to have access to a table that summarizes the main descriptive statistics of each column of a given model. It helps quickly identify null proportion, uniqueness of id columns and the distribution shape of quantitative attributes.
Thankfully, the dbt_profile package offers an easy to use macro which can achieve this goal in one line.
{{ dbt_profiler.get_profile(relations=ref('sessions') }}
Find below the compiled query of that dbt macro.
WITH source_data AS (
SELECT
*
FROM `table`
),
column_profiles AS (
SELECT
LOWER('session_id') AS column_name,
NULLIF(LOWER('string'), '') AS data_type,
CAST(COUNT(*) AS NUMERIC) AS row_count,
SUM(CASE WHEN 'session_id' IS NULL THEN 0 ELSE 1 END) / CAST(COUNT(*) AS NUMERIC) AS not_null_proportion,
COUNT(DISTINCT 'session_id') / CAST(COUNT(*) AS NUMERIC) AS distinct_proportion,
COUNT(DISTINCT 'session_id') AS distinct_count,
CAST(NULL AS STRING) AS min,
CAST(NULL AS STRING) AS avg,
CAST(CURRENT_TIMESTAMP() AS STRING) AS profiled_at,
1 AS _column_position
FROM source_data
UNION ALL
...
)
SELECT
*
FROM column_profiles
ORDER BY _column_position ASC
This command will create a table that contains some useful information and statistics of each column of your model.
Small additional tips, this dbt_profile macro can be combined with the preceding dbt_union_relation macro to output a table that will contain all the profiles you need to monitor your dbt account and data transformations.
3. Custom macros
So far, the macros presented in this article have been created by third-parties and shared freely through packages. Nonetheless, your organization can also create their own macros which promotes modular design, reusability, and consistency across models. Furthermore, when creating macros you can leverage the strength of Jinja as a templating language. As a matter of fact, Jinja allows you to parameterize your macros, making them highly adaptable to your scenarios and business needs.
At Human37, we have been creating macros to monitor our clients’ data warehouse related costs, to convert currency fields for clients operating in different countries, to parse json data, or even to clean data in standardized data warehouse formats. Custom macros and Jinja together form a dynamic toolkit that empowers organizations’ data teams in their data engineering workflow.
{% macro generate_cost_monitoring(projects, region='region-eu', tb_cost=7.5 %}
WITH all project_date AS (
{% for project in projects %}
SELECT
'{{ project }}' AS project_name,
SUM(total_buted_billed)/1024/1024/1024/1024 AS total_bytes_processed_tb,
SUM(total_bytes_billes)/1024/1024/1024/1024 * {{ tb_cost }} AS est_cost,
user_email,
EXTRACT(DAY FROM creation_time) AS day,
EXTRACT(MONTH FROM creation_time) AS month,
EXTRACT(YEAR FROM creation_time) AS year,
destination_table.dataset_id AS dataset_id,
destination_table.table_id AS table_id,
query
FROM `{{ project }}`.`{{ region }}`.INFORMATION_SCHEMA.JOBS
WHERE job_type = 'QUERY'
GROUP BY email, day, month, year, project_name, dataset_id, table_id, query
{% if not loop.last %}
UNION ALL
{% endif %}
{% endfor %}
)
SELECT * FROM all_project_data
{% endmacro %}
Conclusion
In conclusion, the power of dbt macros has proven to be a game-changer in my daily routines as a data analyst at Human37. They not only have saved me valuable time but also enriched our data transformation workflows.
If you would like to explore the details of our work with other clients, please feel free to reach out. We are always eager to engage in insightful discussions.
Comments