Mapping Dataflows in Azure Data Factory

Spark Low-code ETL development using Mapping Dataflows in Azure Data Factory.

Posted by Aravind Nuthalapati on July 24, 2021

This Article explains how mapping dataflows works in Azure Data Factory.

Low-Code Mapping Dataflows in Azure Data Factory: Benefits, Optimization, and Comparison with Databricks

1. Introduction

Azure Data Factory (ADF) **Mapping Dataflows** is a low-code ETL (Extract, Transform, Load) solution that allows data engineers to visually design and automate data transformations without writing extensive code. While **ADF Dataflows** simplify development, **Azure Databricks Notebooks** provide greater flexibility and customization for complex transformations.

2. How Low-Code Mapping Dataflows Help Developers

2.1 Visual Interface for Data Transformations

Mapping Dataflows provide an **intuitive drag-and-drop** UI to build transformations, eliminating the need for complex Spark or SQL code.

2.2 Built-in Data Transformation Capabilities

Developers can use **pre-built transformations** like:

  • Aggregations
  • Joins (Inner, Outer, Cross, Left, Right)
  • Filters
  • Derived Columns
  • Pivots and Unpivots
  • Surrogate Keys

2.3 Seamless Integration with Azure Services

Mapping Dataflows integrate natively with:

  • Azure Data Lake Storage (ADLS Gen2)
  • Azure Synapse Analytics
  • Azure SQL Database
  • Azure Blob Storage
  • Cosmos DB

2.4 Optimized for ETL and ELT Workflows

- Works well for **data ingestion, transformation, and loading (ETL)**.
- Supports **push-down optimization** for **ELT-style transformations** on SQL-based stores.

2.5 Auto-Scalability and Performance Tuning

Mapping Dataflows run on **Azure Data Flow Compute**, automatically scaling based on workload size.

3. Optimization Techniques for Mapping Dataflows

3.1 Enable Staged Execution for Performance Boost

Using **Staged Execution** helps store intermediate transformation results in Data Lake to improve query efficiency.

SET 'stagedExecutionEnabled' = true;

3.2 Optimize Partitioning for Large Datasets

Choose the right **partitioning strategy**:

  • **Round Robin** - Balanced distribution for general workloads.
  • **Hash Partitioning** - Used for **joins** and **aggregations**.
  • **Key-based Partitioning** - Distributes data by a specific column.

3.3 Use Derived Columns Instead of Data Copies

Instead of creating redundant datasets, use **Derived Column Transformations** to manipulate values dynamically.

3.4 Push-Down SQL Queries for ELT

Enable **push-down optimization** to allow SQL-based stores to execute transformations:

SET 'enablePushDown' = true;

3.5 Reduce Memory Consumption with Data Flow Debugging

Enable **Debug Mode** to process sample data rather than full datasets.

4. Comparison: ADF Mapping Dataflows vs. Databricks Notebooks

Feature ADF Mapping Dataflows Azure Databricks Notebooks
Code Complexity Low-Code, Drag-and-Drop Requires Python, Scala, SQL
Best for ETL / Data Transformation Pipelines Advanced Data Engineering & Machine Learning
Performance Optimization Auto-Optimized with Push-Down Queries Manual Spark Tuning Required
Integration Native with ADF, Synapse, ADLS Integrates with ADF, ADLS, Synapse
Execution Engine Azure Data Flow Compute (Auto-Scaling) Apache Spark Clusters
Cost Optimization Lower Cost for Simple Transformations Higher Cost for Complex Workloads
Streaming Support Limited Full Support for Streaming Workloads
Security Built-in Azure RBAC Custom Authentication Required

5. When to Choose ADF Mapping Dataflows vs. Databricks Notebooks

5.1 Choose ADF Mapping Dataflows If:

  • You need a **low-code ETL** solution.
  • You want **cost-effective** data transformation.
  • You prefer **automatic scaling** without manual tuning.
  • You are integrating with **Azure Data Lake, Synapse, or Blob Storage**.

5.2 Choose Azure Databricks If:

  • You need **complex transformations** beyond built-in ADF capabilities.
  • You require **machine learning and AI workloads**.
  • You need **streaming data processing**.
  • You are working with **real-time analytics on large datasets**.

6. Summary

- **ADF Mapping Dataflows** simplify ETL processes with a low-code UI, making data transformation easy and scalable.
- **Azure Databricks Notebooks** offer a powerful, flexible platform for **advanced data engineering and AI workloads**.
- Choose **ADF for cost-effective ETL** and **Databricks for complex big data processing**.