Portfolio

Available for work

Jordan Calloway

Data Engineer

Seattle, WA·4 yrs exp

Data Engineer with 4 years building reliable data pipelines, warehouses, and analytics infrastructure at scale. I bridge the gap between raw data and business decisions — working closely with analytics, ML, and product teams.

01Designed and maintained a multi-source ELT pipeline ingesting 15GB+ of daily event data into Snowflake, cutting analyst query times from 40s to under 3s02Migrated a legacy cron-based ETL system to Airflow DAGs with full observability, reducing pipeline failures by 80% and eliminating on-call incidents03Built PipelineKit solo — an open-source CLI for scaffolding production-ready Airflow DAGs with built-in retry logic, alerting, and data quality checks

About

I write pipelines like I write code — modular, tested, and documented. If an analyst can't trust the data, the pipeline doesn't matter. Observability and data quality checks are first-class citizens, not afterthoughts.

Data Engineer with 4 years building and maintaining data infrastructure at scale using Python, Airflow, dbt, and Snowflake.

I started in analytics at a mid-size e-commerce company and kept getting pulled into the infrastructure side — fixing broken pipelines, rebuilding unreliable ingestion jobs, and wondering why the data was always wrong. Eventually I made it official. I've worked across retail, fintech, and SaaS, and I've learned that good data engineering is mostly about trust — making sure downstream teams can rely on what you ship.

CurrentlyCurrently building PipelineKit and exploring real-time streaming with Flink and Kafka.

Contributions

11day streak

Longest

418

Contributions in 2026

Less

Top in 2026:PythonSQLShell

Skills

Languages

PythonSQLScala

data engineering

Apache AirflowApache SparkApache Kafkadbt (data build tool)

data warehousing

SnowflakeBigQueryRedshiftDelta Lake

databases

PostgreSQLMySQLRedisDynamoDB

tools & devops

AWS (S3, Glue, Lambda, EMR)GCP (Dataflow, BigQuery, Pub/Sub)DockerTerraformGitHub Actions

data quality

Great ExpectationsMonte Carlo

data & ML

PandasNumPyJupyter

AI dev tools

GitHub CopilotChatGPT

03.

Projects

PipelineKit — Airflow DAG Scaffolding CLI

Python

CLI tool for scaffolding production-ready Apache Airflow DAGs from YAML specs, with built-in retry logic, SLA alerting, and data quality checks via Great Expectations.

—Built a Python CLI using Click that generates fully configured Airflow DAGs from a YAML spec — including task dependencies, retry policies, SLA alerting, and Slack notifications.
—Integrated Great Expectations checkpoint generation so every scaffolded pipeline ships with schema validation and row-count anomaly detection out of the box.
—Added a local dev mode using Docker Compose that spins up Airflow, PostgreSQL, and a mock data source in under 60 seconds for rapid iteration.

PythonApache AirflowGreat ExpectationsClickJinja2Docker

Live Demo

video

StreamLedger — Real-Time Financial Event Tracker

Python

Reference architecture for real-time financial event processing using Kafka and Flink, with a live React dashboard. Built to explore sub-second streaming pipelines.

—Built a Kafka producer simulating realistic financial transaction streams at 10k events/second, with configurable fraud patterns for testing downstream anomaly detection.
—Implemented a Flink streaming job for real-time aggregations — rolling totals, per-merchant spend windows, and anomaly flagging — with sub-500ms end-to-end latency.
—Wired a React dashboard with WebSocket updates to visualize live transaction volume, flagged events, and per-category breakdowns.

PythonApache KafkaApache FlinkPostgreSQLReactTypeScript

Live Demo

video

dbt-audit-macros

SQL

A collection of dbt macros for automated audit logging, freshness checks, and row-count reconciliation across Snowflake and BigQuery models.

—Wrote a suite of dbt macros for automated audit logging — tracking row counts, null rates, and schema changes on every model run without custom Python.
—Added freshness check macros compatible with both Snowflake and BigQuery that surface stale source tables before they silently corrupt downstream models.
—Packaged as a dbt Hub-compatible package with full documentation and example project so teams can install and configure in under 10 minutes.

dbtSQLSnowflakeBigQueryJinja2

Live Demo

2025

SnowSync — Incremental Data Loader

Python

Python library for CDC-aware incremental data loading into Snowflake — supports PostgreSQL, MySQL, S3, and REST API sources with automatic schema evolution.

—Built a Python library that detects changed rows using configurable watermark columns or CDC log parsing, and loads only deltas into Snowflake — reducing warehouse credit usage significantly.
—Supports multiple source connectors out of the box: PostgreSQL, MySQL, S3 CSV/Parquet, and REST APIs with pagination handling.
—Added schema evolution handling — automatically ALTERs target Snowflake tables when source columns are added or types change, with configurable safety thresholds.

PythonSnowflakeAWS S3pandasSQLAlchemyDocker

Live Demo

video

2025

MetricLayer — Headless Metrics Store

Python

Headless metrics store built on dbt and Snowflake — define business metrics once in YAML and query them via REST API or a lightweight React explorer. Eliminates duplicated metric logic across BI tools.

—Built a YAML-based metric definition layer on top of dbt — teams define metrics once with dimensions, filters, and time grains, and MetricLayer generates the underlying SQL automatically.
—Implemented a FastAPI query layer that translates metric + dimension + date range requests into optimized Snowflake SQL, with Redis caching for frequently queried combinations.
—Built a lightweight React explorer UI for browsing defined metrics, running ad-hoc queries, and exporting results to CSV — usable by non-technical stakeholders without SQL knowledge.

PythondbtSnowflakeFastAPIRedisReact

Live Demo

video

2026

02.My Experience

Senior Data Engineer

Meridian Analytics — Seattle, WA

Jan 2023 — Present

3y 3m

CurrentFull-TimeSeattle, WA

Senior data engineer on a 6-person data platform team at a Series B fintech company. Own core ingestion pipelines, the Snowflake data warehouse, and data quality infrastructure used by 30+ analysts and 3 ML engineers.

01Redesigned the company's core ELT architecture from ad-hoc Python scripts to a fully orchestrated Airflow + dbt stack, reducing data freshness SLA breaches by 90%.
02Built a data contract framework with Great Expectations that runs on every pipeline run — catching schema drift and volume anomalies before they reach downstream dashboards.
03Collaborated with the ML team to build a feature store on top of Snowflake and Redis, cutting feature computation time for batch model training from 6 hours to 45 minutes.
04Led a cross-functional initiative with analytics and product to define and document 80+ core business metrics in dbt, creating a single source of truth for company-wide reporting.

PythonApache AirflowdbtSnowflakeKafkaGreat ExpectationsAWS S3Terraform

Data Engineer

Novu Commerce — Portland, OR

Aug 2020 — Dec 2022

2y 5m

Full-TimePortland, OR

Data engineer on a two-person data team at a mid-size e-commerce company. Built and maintained pipelines for marketing, finance, and operations analytics.

01Migrated 14 legacy cron-based ETL jobs to Apache Airflow DAGs with proper retry logic, SLA monitoring, and Slack alerting — reducing weekly pipeline failures from 8-10 incidents to near-zero.
02Implemented a Snowflake data warehouse from scratch, consolidating data from Shopify, Stripe, Google Ads, and an internal PostgreSQL database into a unified analytics layer.
03Built a near-real-time inventory sync pipeline using AWS Lambda and DynamoDB Streams that reduced inventory discrepancy reports by 70%.

PythonApache AirflowSnowflakedbtAWS LambdaDynamoDBPostgreSQL

Education

University of Washington — Seattle

B.S. Information Systems

2016-09-01 – 2020-06-01

Udacity

Data Engineering Nanodegree

2020-10-01 – 2021-03-01

Coursera — Google Cloud

Online Specialization: Data Engineering with Google Cloud

2022-04-01 – 2022-08-01

Certifications

Nov 2022 — Nov 2025Expired

AWS Certified Data Engineer – Associate

Amazon Web Services

Jun 2023

dbt Certified Developer

dbt Labs

Feb 2023 — Feb 2025Expired

Snowflake SnowPro Core Certification

Snowflake

Mar 2024 — Mar 2026Expired

Google Professional Data Engineer

Google Cloud

Sep 2024

Databricks Certified Associate Developer for Apache Spark

Databricks

Contact

Making Data Teams Actually Trust Their Data

I'm currently open to senior data engineering roles at data-driven companies. If you're looking for someone who can build reliable pipelines, improve data quality, and work closely with analytics and ML teams — feel free to reach out.

jordan.calloway@example.com ↗

Location

Seattle, WA

Elsewhere

LinkedIn ↗GitHub ↗Twitter ↗Website ↗

Portfolio

Available for work

Jordan Calloway

Data Engineer

Seattle, WA·4 yrs exp

About

I write pipelines like I write code — modular, tested, and documented. If an analyst can't trust the data, the pipeline doesn't matter. Observability and data quality checks are first-class citizens, not afterthoughts.

Data Engineer with 4 years building and maintaining data infrastructure at scale using Python, Airflow, dbt, and Snowflake.

CurrentlyCurrently building PipelineKit and exploring real-time streaming with Flink and Kafka.

Contributions

11day streak

Longest

418

Contributions in 2026

Less

Top in 2026:PythonSQLShell

Skills

Languages

PythonSQLScala

data engineering

Apache AirflowApache SparkApache Kafkadbt (data build tool)

data warehousing

SnowflakeBigQueryRedshiftDelta Lake

databases

PostgreSQLMySQLRedisDynamoDB

tools & devops

AWS (S3, Glue, Lambda, EMR)GCP (Dataflow, BigQuery, Pub/Sub)DockerTerraformGitHub Actions

data quality

Great ExpectationsMonte Carlo

data & ML

PandasNumPyJupyter

AI dev tools

GitHub CopilotChatGPT

03.

Projects

PipelineKit — Airflow DAG Scaffolding CLI

Python

CLI tool for scaffolding production-ready Apache Airflow DAGs from YAML specs, with built-in retry logic, SLA alerting, and data quality checks via Great Expectations.

—Built a Python CLI using Click that generates fully configured Airflow DAGs from a YAML spec — including task dependencies, retry policies, SLA alerting, and Slack notifications.
—Integrated Great Expectations checkpoint generation so every scaffolded pipeline ships with schema validation and row-count anomaly detection out of the box.
—Added a local dev mode using Docker Compose that spins up Airflow, PostgreSQL, and a mock data source in under 60 seconds for rapid iteration.

PythonApache AirflowGreat ExpectationsClickJinja2Docker

Live Demo

video

StreamLedger — Real-Time Financial Event Tracker

Python

Reference architecture for real-time financial event processing using Kafka and Flink, with a live React dashboard. Built to explore sub-second streaming pipelines.

—Built a Kafka producer simulating realistic financial transaction streams at 10k events/second, with configurable fraud patterns for testing downstream anomaly detection.
—Implemented a Flink streaming job for real-time aggregations — rolling totals, per-merchant spend windows, and anomaly flagging — with sub-500ms end-to-end latency.
—Wired a React dashboard with WebSocket updates to visualize live transaction volume, flagged events, and per-category breakdowns.

PythonApache KafkaApache FlinkPostgreSQLReactTypeScript

Live Demo

video

dbt-audit-macros

SQL

A collection of dbt macros for automated audit logging, freshness checks, and row-count reconciliation across Snowflake and BigQuery models.

—Wrote a suite of dbt macros for automated audit logging — tracking row counts, null rates, and schema changes on every model run without custom Python.
—Added freshness check macros compatible with both Snowflake and BigQuery that surface stale source tables before they silently corrupt downstream models.
—Packaged as a dbt Hub-compatible package with full documentation and example project so teams can install and configure in under 10 minutes.

dbtSQLSnowflakeBigQueryJinja2

Live Demo

2025

SnowSync — Incremental Data Loader

Python

Python library for CDC-aware incremental data loading into Snowflake — supports PostgreSQL, MySQL, S3, and REST API sources with automatic schema evolution.

—Built a Python library that detects changed rows using configurable watermark columns or CDC log parsing, and loads only deltas into Snowflake — reducing warehouse credit usage significantly.
—Supports multiple source connectors out of the box: PostgreSQL, MySQL, S3 CSV/Parquet, and REST APIs with pagination handling.
—Added schema evolution handling — automatically ALTERs target Snowflake tables when source columns are added or types change, with configurable safety thresholds.

PythonSnowflakeAWS S3pandasSQLAlchemyDocker

Live Demo

video

2025

MetricLayer — Headless Metrics Store

Python

—Built a YAML-based metric definition layer on top of dbt — teams define metrics once with dimensions, filters, and time grains, and MetricLayer generates the underlying SQL automatically.
—Implemented a FastAPI query layer that translates metric + dimension + date range requests into optimized Snowflake SQL, with Redis caching for frequently queried combinations.
—Built a lightweight React explorer UI for browsing defined metrics, running ad-hoc queries, and exporting results to CSV — usable by non-technical stakeholders without SQL knowledge.

PythondbtSnowflakeFastAPIRedisReact

Live Demo

video

2026

02.My Experience

Senior Data Engineer

Meridian Analytics — Seattle, WA

Jan 2023 — Present

3y 3m

CurrentFull-TimeSeattle, WA

01Redesigned the company's core ELT architecture from ad-hoc Python scripts to a fully orchestrated Airflow + dbt stack, reducing data freshness SLA breaches by 90%.
02Built a data contract framework with Great Expectations that runs on every pipeline run — catching schema drift and volume anomalies before they reach downstream dashboards.
03Collaborated with the ML team to build a feature store on top of Snowflake and Redis, cutting feature computation time for batch model training from 6 hours to 45 minutes.
04Led a cross-functional initiative with analytics and product to define and document 80+ core business metrics in dbt, creating a single source of truth for company-wide reporting.

PythonApache AirflowdbtSnowflakeKafkaGreat ExpectationsAWS S3Terraform

Data Engineer

Novu Commerce — Portland, OR

Aug 2020 — Dec 2022

2y 5m

Full-TimePortland, OR

Data engineer on a two-person data team at a mid-size e-commerce company. Built and maintained pipelines for marketing, finance, and operations analytics.

01Migrated 14 legacy cron-based ETL jobs to Apache Airflow DAGs with proper retry logic, SLA monitoring, and Slack alerting — reducing weekly pipeline failures from 8-10 incidents to near-zero.
02Implemented a Snowflake data warehouse from scratch, consolidating data from Shopify, Stripe, Google Ads, and an internal PostgreSQL database into a unified analytics layer.
03Built a near-real-time inventory sync pipeline using AWS Lambda and DynamoDB Streams that reduced inventory discrepancy reports by 70%.

PythonApache AirflowSnowflakedbtAWS LambdaDynamoDBPostgreSQL

Education

University of Washington — Seattle

B.S. Information Systems

2016-09-01 – 2020-06-01

Udacity

Data Engineering Nanodegree

2020-10-01 – 2021-03-01

Coursera — Google Cloud

Online Specialization: Data Engineering with Google Cloud

2022-04-01 – 2022-08-01

Certifications

Nov 2022 — Nov 2025Expired

AWS Certified Data Engineer – Associate

Amazon Web Services

Jun 2023

dbt Certified Developer

dbt Labs

Feb 2023 — Feb 2025Expired

Snowflake SnowPro Core Certification

Snowflake

Mar 2024 — Mar 2026Expired

Google Professional Data Engineer

Google Cloud

Sep 2024

Databricks Certified Associate Developer for Apache Spark

Databricks

Contact

Making Data Teams Actually Trust Their Data

jordan.calloway@example.com ↗

Location

Seattle, WA

Elsewhere

LinkedIn ↗GitHub ↗Twitter ↗Website ↗