Portfolio

Available for work

Lyrics is Life

Other

Seattle, WA·0 yrs exp

Data Engineer with 4 years building reliable data pipelines, warehouses, and analytics infrastructure at scale. I bridge the gap between raw data and business decisions — working closely with analytics, ML, and product teams.

01Designed and maintained a multi-source ELT pipeline ingesting 15GB+ of daily event data into Snowflake, cutting analyst query times from 40s to under 3s02Migrated a legacy cron-based ETL system to Airflow DAGs with full observability, reducing pipeline failures by 80% and eliminating on-call incidents03Built PipelineKit solo — an open-source CLI for scaffolding production-ready Airflow DAGs with built-in retry logic, alerting, and data quality checks

02.

About

Bio

Data Engineer with 4 years building and maintaining data infrastructure at scale using Python, Airflow, dbt, and Snowflake.

Story

I started in analytics at a mid-size e-commerce company and kept getting pulled into the infrastructure side — fixing broken pipelines, rebuilding unreliable ingestion jobs, and wondering why the data was always wrong. Eventually I made it official. I've worked across retail, fintech, and SaaS, and I've learned that good data engineering is mostly about trust — making sure downstream teams can rely on what you ship.

Currently

Currently building PipelineKit and exploring real-time streaming with Flink and Kafka.

Approach

I write pipelines like I write code — modular, tested, and documented. If an analyst can trust the data, the pipeline matters. Observability and data quality checks are first-class citizens, not afterthoughts.

Skills

Languages

3 skills

01Python

02SQL

03Scala

data engineering

4 skills

01Apache Airflow

02Apache Spark

03Apache Kafka

04dbt (data build tool)

data warehousing

3 skills

01Snowflake

02BigQuery

03Delta Lake

databases

2 skills

01PostgreSQL

02Redis

tools & devops

4 skills

01AWS (S3, Glue, Lambda, EMR)

02Docker

03Terraform

04GitHub Actions

data quality

1 skill

01Great Expectations

data & ML

2 skills

01Pandas

02NumPy

03.

Selected Work

01Featured Project

PipelineKit — Airflow DAG Scaffolding CLI

CLI tool for scaffolding production-ready Apache Airflow DAGs from YAML specs, with built-in retry logic, SLA alerting, and data quality checks via Great Expectations.

—Built a Python CLI using Click that generates fully configured Airflow DAGs from a YAML spec — including task dependencies, retry policies, SLA alerting, and Slack notifications.
—Integrated Great Expectations checkpoint generation so every scaffolded pipeline ships with schema validation and row-count anomaly detection out of the box.
—Added a local dev mode using Docker Compose that spins up Airflow, PostgreSQL, and a mock data source in under 60 seconds for rapid iteration.

PythonApache AirflowGreat ExpectationsClickDocker

Live Demo31

02Featured Project

StreamLedger — Real-Time Financial Event Tracker

Reference architecture for real-time financial event processing using Kafka and Flink, with a live React dashboard.

—Built a Kafka producer simulating realistic financial transaction streams at 10k events/second, with configurable fraud patterns for testing downstream anomaly detection.
—Implemented a Flink streaming job for real-time aggregations — rolling totals, per-merchant spend windows, and anomaly flagging — with sub-500ms end-to-end latency.
—Wired a React dashboard with WebSocket updates to visualize live transaction volume, flagged events, and per-category breakdowns.

PythonApache KafkaApache FlinkPostgreSQLReact

Live Demo12

03Featured Project

dbt-audit-macros

A collection of dbt macros for automated audit logging, freshness checks, and row-count reconciliation across Snowflake and BigQuery models.

—Wrote a suite of dbt macros for automated audit logging — tracking row counts, null rates, and schema changes on every model run without custom Python.
—Added freshness check macros compatible with both Snowflake and BigQuery that surface stale source tables before they silently corrupt downstream models.
—Packaged as a dbt Hub-compatible package with full documentation and example project so teams can install and configure in under 10 minutes.

dbtSQLSnowflakeBigQueryJinja2

Live Demo44

02.Work Experience

Meridian Analytics — Seattle, WA

2023 – Now

3yr 5moCurrent

Senior Data Engineer

Jan 2023 — Present

Full-TimeSeattle, WA

Senior data engineer on a 6-person data platform team at a Series B fintech company. Own core ingestion pipelines, the Snowflake data warehouse, and data quality infrastructure used by 30+ analysts and 3 ML engineers.

01Redesigned the company's core ELT architecture from ad-hoc Python scripts to a fully orchestrated Airflow + dbt stack, reducing data freshness SLA breaches by 90%.
02Built a data contract framework with Great Expectations that runs on every pipeline run — catching schema drift and volume anomalies before they reach downstream dashboards.
03Led a cross-functional initiative with analytics and product to define and document 80+ core business metrics in dbt, creating a single source of truth for company-wide reporting.

PythonApache AirflowdbtSnowflakeKafkaGreat ExpectationsTerraform

Novu Commerce — Portland, OR

2020 – 2022

2yr 5mo

Data Engineer

Aug 2020 — Dec 2022

Full-TimePortland, OR

Data engineer on a two-person data team at a mid-size e-commerce company. Built and maintained pipelines for marketing, finance, and operations analytics.

01Migrated 14 legacy cron-based ETL jobs to Apache Airflow DAGs with proper retry logic, SLA monitoring, and Slack alerting — reducing weekly pipeline failures from 8-10 incidents to near-zero.
02Implemented a Snowflake data warehouse from scratch, consolidating data from Shopify, Stripe, Google Ads, and an internal PostgreSQL database into a unified analytics layer.
03Built a near-real-time inventory sync pipeline using AWS Lambda and DynamoDB Streams that reduced inventory discrepancy reports by 70%.

PythonApache AirflowSnowflakedbtAWS LambdaDynamoDBPostgreSQL

Education

B.S. Information Systems

University of Washington — Seattle

2016 — 2020

3yr 10mo

Data Engineering Nanodegree

Udacity

2020 — 2021

5mo

Contact

Get in touch

Available for work

Making Data Teams Actually Trust Their Data

I'm currently open to senior data engineering roles at data-driven companies. If you're looking for someone who can build reliable pipelines, improve data quality, and work closely with analytics and ML teams — feel free to reach out.

rbajra19@gmail.com

Seattle, WA

Portfolio

Available for work

Lyrics is Life

Other

Seattle, WA·0 yrs exp

02.

About

Bio

Data Engineer with 4 years building and maintaining data infrastructure at scale using Python, Airflow, dbt, and Snowflake.

Story

Currently

Currently building PipelineKit and exploring real-time streaming with Flink and Kafka.

Approach

Skills

Languages

3 skills

01Python

02SQL

03Scala

data engineering

4 skills

01Apache Airflow

02Apache Spark

03Apache Kafka

04dbt (data build tool)

data warehousing

3 skills

01Snowflake

02BigQuery

03Delta Lake

databases

2 skills

01PostgreSQL

02Redis

tools & devops

4 skills

01AWS (S3, Glue, Lambda, EMR)

02Docker

03Terraform

04GitHub Actions

data quality

1 skill

01Great Expectations

data & ML

2 skills

01Pandas

02NumPy

03.

Selected Work

01Featured Project

PipelineKit — Airflow DAG Scaffolding CLI

CLI tool for scaffolding production-ready Apache Airflow DAGs from YAML specs, with built-in retry logic, SLA alerting, and data quality checks via Great Expectations.

—Built a Python CLI using Click that generates fully configured Airflow DAGs from a YAML spec — including task dependencies, retry policies, SLA alerting, and Slack notifications.
—Integrated Great Expectations checkpoint generation so every scaffolded pipeline ships with schema validation and row-count anomaly detection out of the box.
—Added a local dev mode using Docker Compose that spins up Airflow, PostgreSQL, and a mock data source in under 60 seconds for rapid iteration.

PythonApache AirflowGreat ExpectationsClickDocker

Live Demo31

02Featured Project

StreamLedger — Real-Time Financial Event Tracker

Reference architecture for real-time financial event processing using Kafka and Flink, with a live React dashboard.

—Built a Kafka producer simulating realistic financial transaction streams at 10k events/second, with configurable fraud patterns for testing downstream anomaly detection.
—Implemented a Flink streaming job for real-time aggregations — rolling totals, per-merchant spend windows, and anomaly flagging — with sub-500ms end-to-end latency.
—Wired a React dashboard with WebSocket updates to visualize live transaction volume, flagged events, and per-category breakdowns.

PythonApache KafkaApache FlinkPostgreSQLReact

Live Demo12

03Featured Project

dbt-audit-macros

A collection of dbt macros for automated audit logging, freshness checks, and row-count reconciliation across Snowflake and BigQuery models.

—Wrote a suite of dbt macros for automated audit logging — tracking row counts, null rates, and schema changes on every model run without custom Python.
—Added freshness check macros compatible with both Snowflake and BigQuery that surface stale source tables before they silently corrupt downstream models.
—Packaged as a dbt Hub-compatible package with full documentation and example project so teams can install and configure in under 10 minutes.

dbtSQLSnowflakeBigQueryJinja2

Live Demo44

02.Work Experience

Meridian Analytics — Seattle, WA

2023 – Now

3yr 5moCurrent

Senior Data Engineer

Jan 2023 — Present

Full-TimeSeattle, WA

01Redesigned the company's core ELT architecture from ad-hoc Python scripts to a fully orchestrated Airflow + dbt stack, reducing data freshness SLA breaches by 90%.
02Built a data contract framework with Great Expectations that runs on every pipeline run — catching schema drift and volume anomalies before they reach downstream dashboards.
03Led a cross-functional initiative with analytics and product to define and document 80+ core business metrics in dbt, creating a single source of truth for company-wide reporting.

PythonApache AirflowdbtSnowflakeKafkaGreat ExpectationsTerraform

Novu Commerce — Portland, OR

2020 – 2022

2yr 5mo

Data Engineer

Aug 2020 — Dec 2022

Full-TimePortland, OR

Data engineer on a two-person data team at a mid-size e-commerce company. Built and maintained pipelines for marketing, finance, and operations analytics.

01Migrated 14 legacy cron-based ETL jobs to Apache Airflow DAGs with proper retry logic, SLA monitoring, and Slack alerting — reducing weekly pipeline failures from 8-10 incidents to near-zero.
02Implemented a Snowflake data warehouse from scratch, consolidating data from Shopify, Stripe, Google Ads, and an internal PostgreSQL database into a unified analytics layer.
03Built a near-real-time inventory sync pipeline using AWS Lambda and DynamoDB Streams that reduced inventory discrepancy reports by 70%.

PythonApache AirflowSnowflakedbtAWS LambdaDynamoDBPostgreSQL

Education

B.S. Information Systems

University of Washington — Seattle

2016 — 2020

3yr 10mo

Data Engineering Nanodegree

Udacity

2020 — 2021

5mo

Contact

Get in touch

Available for work

Making Data Teams Actually Trust Their Data

rbajra19@gmail.com

Seattle, WA