Portfolio
  • About
  • Contributions
  • Skills
  • Projects
  • Experience
  • Education
  • Certifications
  • Contact
P
Portfolio

Navigation

  • 01About
  • 02Contributions
  • 03Skills
  • 04Projects
  • 05Experience
  • 06Education
  • 07Certifications
  • 08Contact
Portfolio
Available for work

Jordan Calloway

—

Data Engineer

Seattle, WA·4 yrs exp

Data Engineer with 4 years building reliable data pipelines, warehouses, and analytics infrastructure at scale. I bridge the gap between raw data and business decisions — working closely with analytics, ML, and product teams.

01Designed and maintained a multi-source ELT pipeline ingesting 15GB+ of daily event data into Snowflake, cutting analyst query times from 40s to under 3s02Migrated a legacy cron-based ETL system to Airflow DAGs with full observability, reducing pipeline failures by 80% and eliminating on-call incidents03Built PipelineKit solo — an open-source CLI for scaffolding production-ready Airflow DAGs with built-in retry logic, alerting, and data quality checks

About

02
“

I write pipelines like I write code — modular, tested, and documented. If an analyst can't trust the data, the pipeline doesn't matter. Observability and data quality checks are first-class citizens, not afterthoughts.

”

Data Engineer with 4 years building and maintaining data infrastructure at scale using Python, Airflow, dbt, and Snowflake.

I started in analytics at a mid-size e-commerce company and kept getting pulled into the infrastructure side — fixing broken pipelines, rebuilding unreliable ingestion jobs, and wondering why the data was always wrong. Eventually I made it official. I've worked across retail, fintech, and SaaS, and I've learned that good data engineering is mostly about trust — making sure downstream teams can rely on what you ship.

CurrentlyCurrently building PipelineKit and exploring real-time streaming with Flink and Kafka.

Contributions

11day streak

Longest

418

Contributions in 2026

Less
More
Top in 2026:PythonSQLShell

Skills

Languages

3
PythonSQLScala

data engineering

4
Apache AirflowApache SparkApache Kafkadbt (data build tool)

data warehousing

4
SnowflakeBigQueryRedshiftDelta Lake

databases

4
PostgreSQLMySQLRedisDynamoDB

tools & devops

5
AWS (S3, Glue, Lambda, EMR)GCP (Dataflow, BigQuery, Pub/Sub)DockerTerraformGitHub Actions

data quality

2
Great ExpectationsMonte Carlo

data & ML

3
PandasNumPyJupyter

AI dev tools

2
GitHub CopilotChatGPT
03.

Projects

PipelineKit — Airflow DAG Scaffolding CLI

Python

CLI tool for scaffolding production-ready Apache Airflow DAGs from YAML specs, with built-in retry logic, SLA alerting, and data quality checks via Great Expectations.

  • —Built a Python CLI using Click that generates fully configured Airflow DAGs from a YAML spec — including task dependencies, retry policies, SLA alerting, and Slack notifications.
  • —Integrated Great Expectations checkpoint generation so every scaffolded pipeline ships with schema validation and row-count anomaly detection out of the box.
  • —Added a local dev mode using Docker Compose that spins up Airflow, PostgreSQL, and a mock data source in under 60 seconds for rapid iteration.
PythonApache AirflowGreat ExpectationsClickJinja2Docker
Live Demo
video

StreamLedger — Real-Time Financial Event Tracker

Python

Reference architecture for real-time financial event processing using Kafka and Flink, with a live React dashboard. Built to explore sub-second streaming pipelines.

  • —Built a Kafka producer simulating realistic financial transaction streams at 10k events/second, with configurable fraud patterns for testing downstream anomaly detection.
  • —Implemented a Flink streaming job for real-time aggregations — rolling totals, per-merchant spend windows, and anomaly flagging — with sub-500ms end-to-end latency.
  • —Wired a React dashboard with WebSocket updates to visualize live transaction volume, flagged events, and per-category breakdowns.
PythonApache KafkaApache FlinkPostgreSQLReactTypeScript
Live Demo
video

dbt-audit-macros

SQL

A collection of dbt macros for automated audit logging, freshness checks, and row-count reconciliation across Snowflake and BigQuery models.

  • —Wrote a suite of dbt macros for automated audit logging — tracking row counts, null rates, and schema changes on every model run without custom Python.
  • —Added freshness check macros compatible with both Snowflake and BigQuery that surface stale source tables before they silently corrupt downstream models.
  • —Packaged as a dbt Hub-compatible package with full documentation and example project so teams can install and configure in under 10 minutes.
dbtSQLSnowflakeBigQueryJinja2
Live Demo
dbt-audit-macros 1
dbt-audit-macros 2

2025

SnowSync — Incremental Data Loader

Python

Python library for CDC-aware incremental data loading into Snowflake — supports PostgreSQL, MySQL, S3, and REST API sources with automatic schema evolution.

  • —Built a Python library that detects changed rows using configurable watermark columns or CDC log parsing, and loads only deltas into Snowflake — reducing warehouse credit usage significantly.
  • —Supports multiple source connectors out of the box: PostgreSQL, MySQL, S3 CSV/Parquet, and REST APIs with pagination handling.
  • —Added schema evolution handling — automatically ALTERs target Snowflake tables when source columns are added or types change, with configurable safety thresholds.
PythonSnowflakeAWS S3pandasSQLAlchemyDocker
Live Demo
video

2025

MetricLayer — Headless Metrics Store

Python

Headless metrics store built on dbt and Snowflake — define business metrics once in YAML and query them via REST API or a lightweight React explorer. Eliminates duplicated metric logic across BI tools.

  • —Built a YAML-based metric definition layer on top of dbt — teams define metrics once with dimensions, filters, and time grains, and MetricLayer generates the underlying SQL automatically.
  • —Implemented a FastAPI query layer that translates metric + dimension + date range requests into optimized Snowflake SQL, with Redis caching for frequently queried combinations.
  • —Built a lightweight React explorer UI for browsing defined metrics, running ad-hoc queries, and exporting results to CSV — usable by non-technical stakeholders without SQL knowledge.
PythondbtSnowflakeFastAPIRedisReact
Live Demo
video

2026

02.My Experience

Senior Data Engineer

Meridian Analytics — Seattle, WA

Jan 2023 — Present
3y 3m
CurrentFull-TimeSeattle, WA

Senior data engineer on a 6-person data platform team at a Series B fintech company. Own core ingestion pipelines, the Snowflake data warehouse, and data quality infrastructure used by 30+ analysts and 3 ML engineers.

  • 01Redesigned the company's core ELT architecture from ad-hoc Python scripts to a fully orchestrated Airflow + dbt stack, reducing data freshness SLA breaches by 90%.
  • 02Built a data contract framework with Great Expectations that runs on every pipeline run — catching schema drift and volume anomalies before they reach downstream dashboards.
  • 03Collaborated with the ML team to build a feature store on top of Snowflake and Redis, cutting feature computation time for batch model training from 6 hours to 45 minutes.
  • 04Led a cross-functional initiative with analytics and product to define and document 80+ core business metrics in dbt, creating a single source of truth for company-wide reporting.
PythonApache AirflowdbtSnowflakeKafkaGreat ExpectationsAWS S3Terraform

Data Engineer

Novu Commerce — Portland, OR

Aug 2020 — Dec 2022
2y 5m
Full-TimePortland, OR

Data engineer on a two-person data team at a mid-size e-commerce company. Built and maintained pipelines for marketing, finance, and operations analytics.

  • 01Migrated 14 legacy cron-based ETL jobs to Apache Airflow DAGs with proper retry logic, SLA monitoring, and Slack alerting — reducing weekly pipeline failures from 8-10 incidents to near-zero.
  • 02Implemented a Snowflake data warehouse from scratch, consolidating data from Shopify, Stripe, Google Ads, and an internal PostgreSQL database into a unified analytics layer.
  • 03Built a near-real-time inventory sync pipeline using AWS Lambda and DynamoDB Streams that reduced inventory discrepancy reports by 70%.
PythonApache AirflowSnowflakedbtAWS LambdaDynamoDBPostgreSQL

Education

06
#InstitutionDegreePeriod
01

University of Washington — Seattle

B.S. Information Systems

2016-09-01 – 2020-06-01

02

Udacity

Data Engineering Nanodegree

2020-10-01 – 2021-03-01

03

Coursera — Google Cloud

Online Specialization: Data Engineering with Google Cloud

2022-04-01 – 2022-08-01

Certifications

Nov 2022 — Nov 2025Expired

AWS Certified Data Engineer – Associate

Amazon Web Services

Verify
Jun 2023

dbt Certified Developer

dbt Labs

Verify
Feb 2023 — Feb 2025Expired

Snowflake SnowPro Core Certification

Snowflake

Verify
Mar 2024 — Mar 2026Expired

Google Professional Data Engineer

Google Cloud

Verify
Sep 2024

Databricks Certified Associate Developer for Apache Spark

Databricks

Verify

Contact

07

Making Data Teams Actually Trust Their Data

I'm currently open to senior data engineering roles at data-driven companies. If you're looking for someone who can build reliable pipelines, improve data quality, and work closely with analytics and ML teams — feel free to reach out.

Email

jordan.calloway@example.com ↗

Location

Seattle, WA

Elsewhere

LinkedIn ↗GitHub ↗Twitter ↗Website ↗
Made withSerisLab