Skip to main content
Version: devel

Release highlights: 1.17

New: DuckLake destination

You can now use the DuckLake destination — supporting all bucket and catalog combinations. It’s a great fit for lightweight data lakes and local development setups.

Read the docs →


Custom metrics in pipelines

You can now collect custom metrics directly inside your resources and transform steps. This makes it easy to track things like page counts, skipped rows, or API calls — right where the data is extracted.

Use dlt.current.resource_metrics() to store custom values while your resource runs. These metrics are automatically merged into the pipeline trace and visible in the run summary.

Example:

import dlt
from dlt.sources.helpers.rest_client import RESTClient
from dlt.sources.helpers.rest_client.paginators import JSONLinkPaginator

client = RESTClient(
base_url="https://pokeapi.co/api/v2",
paginator=JSONLinkPaginator(next_url_path="next"),
data_selector="results",
)

@dlt.resource
def get_pokemons():
metrics = dlt.current.resource_metrics()
metrics["page_count"] = 0
for page in client.paginate("/pokemon", params={"limit": 100}):
metrics["page_count"] += 1
yield page

pipeline = dlt.pipeline("get_pokemons", destination="duckdb")
load_info = pipeline.run(get_pokemons)
print("Custom metrics:", pipeline.last_trace.last_extract_info.metrics)

Custom metrics are grouped together with performance and transform stats under resource_metrics, so you can view them easily in traces or dashboards.

Read more →


Limit your data loads for testing

When working with large datasets, you can now limit how much data a resource loads using the new add_limit method. This is perfect for sampling a few records to preview your data or test transformations faster.

Example:

import itertools
import dlt

# Load only the first 10 items from an infinite stream
r = dlt.resource(itertools.count(), name="infinity").add_limit(10)

You can also:

  • Count rows instead of yields:

    my_resource().add_limit(10, count_rows=True)
  • Or stop extraction after a set time:

    my_resource().add_limit(max_time=10)

It’s a simple but powerful way to test pipelines quickly without pulling millions of rows.


Incremental loading for filesystem

You can now use incremental loading with the filesystem source even easier — ideal for tracking updated or newly added files in S3 or local folders. dlt detects file changes (using fields like modification_date) and loads only what’s new.

Example:

import dlt
from dlt.sources.filesystem import filesystem, read_parquet

filesystem_resource = filesystem(
bucket_url="s3://my-bucket/files",
file_glob="**/*.parquet",
incremental=dlt.sources.incremental("modification_date")
)
pipeline = dlt.pipeline("my_pipeline", destination="duckdb")
pipeline.run((filesystem_resource | read_parquet()).with_name("table_name"))

You can also split large incremental loads into smaller chunks:

  • Partition loading – divide your files into ranges and load each independently (even in parallel).
  • Split loading – process files sequentially in small batches using row_order, files_per_page, or add_limit().

This makes it easy to backfill large file collections efficiently and resume incremental updates without reloading everything.

Learn more →


Split and partition large SQL loads

When working with huge tables, you can now split incremental loads into smaller chunks or partition backfills into defined ranges. This makes data appear faster and allows you to retry only failed chunks instead of reloading everything.

Split loading

If your source returns data in a deterministic order (for example, ordered by created_at), you can combine incremental with add_limit() to process batches sequentially:

import dlt
from dlt.sources.sql_database import sql_table

pipeline = dlt.pipeline("split_load", destination="duckdb")

messages = sql_table(
table="chat_message",
incremental=dlt.sources.incremental(
"created_at",
row_order="asc", # required for split loading
range_start="open" # disables deduplication
),
)

# Load one-minute chunks until done
while not pipeline.run(messages.add_limit(max_time=60)).is_empty:
pass

Partitioned backfills

You can also load large datasets in parallel partitions using initial_value and end_value. Each range runs independently, helping you rebuild large tables safely and efficiently.

Together, these methods make incremental loading more flexible and robust for both testing and production-scale pipelines.

Read more →


Shout-out to new contributors

Big thanks to our newest contributors:


Full release notes

View the complete list of changes →

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.