Skip to content

FastFlowTransform Documentation Hub

FastFlowTransform (FFT) is a SQL + Python data modeling engine with a deterministic DAG, parallel executor, optional caching, incremental builds, auto-generated docs, snapshots, and built-in data-quality tests. The fft CLI orchestrates compilation, execution, docs, validation, and history tracking across DuckDB, Postgres, BigQuery (pandas + BigFrames), Databricks/Spark, and Snowflake Snowpark.

Use this page as the front door into the docs: start with the orientation section, then jump to the guide that matches the task you have at hand.


Table of Contents


Quick Orientation

  • New to FFT? Read the Quickstart for installation (venv + editable install), seeding, and the first fft run.
  • Want the bigger picture? The Technical Overview explains the project layout, DAG, scheduler, registry, executors, and the roadmap snapshot.
  • Learning the CLI surface area? Browse the CLI Guide for command groups such as fft run, fft snapshot run, fft dag, fft docgen, fft test, and fft utest.

Build & Run Projects

  • Project layout & CLI workflow: Pair the “Project Layout” chapter of the Technical Overview with the CLI Guide to understand how fft run, fft test, and fft dag fit together.
  • Profiles & environments: Profiles & Environments covers executor profiles, environment overrides, credential handling, and engine-specific flags.
  • Runtimes & observability flags: Logging & Verbosity explains log levels, JSON logs, progress indicators, and metrics toggles during fft run.
  • Local runtimes & engines: Local Engine Setup walks through DuckDB, Postgres, Spark/Delta, BigQuery, and Snowflake Snowpark bootstrapping for the demos.
  • CI-friendly workflows: CI Checks & Change-Aware Runs introduces fft ci-check and fft run --changed-since for structural validation and diff-aware pipelines.

Modeling & Configuration

  • SQL + Python authoring model: API & Models documents the Python node decorators, HTTP helper (fastflowtransform.api.http), and how ref() / source() bindings work in both SQL and Python models.
  • Templates, macros, and config keys: Configuration & Macros lists the config(...) options, reusable macros, helper functions, and naming rules for .ff.sql / .ff.py.
  • Project-level metadata: Project Configuration describes project.yml, default materializations, tags, exposures, docs strings, and the models/ hierarchy.
  • Sources & seeds: Sources shows how to register upstream tables/files, snapshots of raw data, and how state tracking interacts with sources.

Execution & State Management

  • Parallelism, caching & rebuilds: Cache & Parallelism dives into the level-wise scheduler, fingerprint cache, and --rebuild / --no-cache behaviors.
  • Incremental models: Incremental Processing explains merge vs append strategies, cleanup rules, and engine-specific hooks.
  • Snapshots / history tables: Snapshots documents the materialized='snapshot' config, timestamp vs check strategies, and the dedicated fft snapshot run . --env <profile> entrypoint.
  • Selective runs: State Selection covers --selector, --select, --exclude, --changed, and --results filters across DAGs.

Testing & Data Quality

  • Schema-bound YAML tests: YAML Tests details how to define and run column-level constraints declared in .yml.
  • Reusable data-quality suites: Data Quality Tests catalogs reconciliation, freshness, and anomaly rules that can attach to models or sources.
  • Source freshness guard-rails: Source Freshness covers fft source freshness, metadata in sources.yml, and interpreting warn/error thresholds in the docs UI.
  • Fast model unit tests: Unit Tests shows how to author .sql / .py assertions, seed fixtures, and run them via fft utest.

Docs, Debugging & Operations

  • Auto-generated docs & lineage: Auto Docs explains fft dag --html, fft docgen, JSON exports, and optional sync-db-comments for Postgres/Snowflake.
  • Visibility & logging: Logging & Verbosity lists CLI flags for structured logs, progress bars, and verbose executor info.
  • Troubleshooting: Troubleshooting & Error Codes enumerates the most common failures, retry strategies, and diagnostic commands.

Examples & Tutorials

All demos live in the top-level examples/ directory and ship with Makefiles plus runnable seeds.


Reference & Contribution

  • API reference: Browse the generated API Reference (MkDocStrings) for public functions, classes, and executors under src/fastflowtransform.
  • Architecture internals: The Technical Overview dives into registries, DAG building, validation, and engine abstractions.
  • Contributing: Follow Contributing.md for dev environment setup (uv, pyproject.toml), coding standards, tests, and PR expectations.
  • License: Apache 2.0 — see License.md.

Need Help?