P
Portfolio
  • 01.About
  • 02.Skills
  • 03.Projects
  • 04.Experience
  • 05.Education
  • 06.Contact
P
Portfolio

Navigation

  • 01About
  • 02Skills
  • 03Projects
  • 04Experience
  • 05Education
  • 06Contact
Portfolio
Available for work

Lyrics is Life

—

Other

Seattle, WA·0 yrs exp

Data Engineer with 4 years building reliable data pipelines, warehouses, and analytics infrastructure at scale. I bridge the gap between raw data and business decisions — working closely with analytics, ML, and product teams.

01Designed and maintained a multi-source ELT pipeline ingesting 15GB+ of daily event data into Snowflake, cutting analyst query times from 40s to under 3s02Migrated a legacy cron-based ETL system to Airflow DAGs with full observability, reducing pipeline failures by 80% and eliminating on-call incidents03Built PipelineKit solo — an open-source CLI for scaffolding production-ready Airflow DAGs with built-in retry logic, alerting, and data quality checks
02.

About

01Bio
Bio

Data Engineer with 4 years building and maintaining data infrastructure at scale using Python, Airflow, dbt, and Snowflake.

Data Engineer with 4 years building and maintaining data infrastructure at scale using Python, Airflow, dbt, and Snowflake.

02Story
Story

I started in analytics at a mid-size e-commerce company and kept getting pulled into the infrastructure side — fixing broken pipelines, rebuilding unreliable ingestion jobs, and wondering why the data was always wrong. Eventually I made it official. I've worked across retail, fintech, and SaaS, and I've learned that good data engineering is mostly about trust — making sure downstream teams can rely on what you ship.

I started in analytics at a mid-size e-commerce company and kept getting pulled into the infrastructure side — fixing broken pipelines, rebuilding unreliable ingestion jobs, and wondering why the data was always wrong. Eventually I made it official. I've worked across retail, fintech, and SaaS, and I've learned that good data engineering is mostly about trust — making sure downstream teams can rely on what you ship.

03Currently
Currently

Currently building PipelineKit and exploring real-time streaming with Flink and Kafka.

Currently building PipelineKit and exploring real-time streaming with Flink and Kafka.

04Approach
Approach

I write pipelines like I write code — modular, tested, and documented. If an analyst can trust the data, the pipeline matters. Observability and data quality checks are first-class citizens, not afterthoughts.

I write pipelines like I write code — modular, tested, and documented. If an analyst can trust the data, the pipeline matters. Observability and data quality checks are first-class citizens, not afterthoughts.

Skills

Languages

3 skills
01Python
02SQL
03Scala

data engineering

4 skills
01Apache Airflow
02Apache Spark
03Apache Kafka
04dbt (data build tool)

data warehousing

3 skills
01Snowflake
02BigQuery
03Delta Lake

databases

2 skills
01PostgreSQL
02Redis

tools & devops

4 skills
01AWS (S3, Glue, Lambda, EMR)
02Docker
03Terraform
04GitHub Actions

data quality

1 skill
01Great Expectations

data & ML

2 skills
01Pandas
02NumPy
03.

Selected Work

01Featured Project

PipelineKit — Airflow DAG Scaffolding CLI

CLI tool for scaffolding production-ready Apache Airflow DAGs from YAML specs, with built-in retry logic, SLA alerting, and data quality checks via Great Expectations.

  • —Built a Python CLI using Click that generates fully configured Airflow DAGs from a YAML spec — including task dependencies, retry policies, SLA alerting, and Slack notifications.
  • —Integrated Great Expectations checkpoint generation so every scaffolded pipeline ships with schema validation and row-count anomaly detection out of the box.
  • —Added a local dev mode using Docker Compose that spins up Airflow, PostgreSQL, and a mock data source in under 60 seconds for rapid iteration.
PythonApache AirflowGreat ExpectationsClickDocker
Live Demo31
02Featured Project

StreamLedger — Real-Time Financial Event Tracker

Reference architecture for real-time financial event processing using Kafka and Flink, with a live React dashboard.

  • —Built a Kafka producer simulating realistic financial transaction streams at 10k events/second, with configurable fraud patterns for testing downstream anomaly detection.
  • —Implemented a Flink streaming job for real-time aggregations — rolling totals, per-merchant spend windows, and anomaly flagging — with sub-500ms end-to-end latency.
  • —Wired a React dashboard with WebSocket updates to visualize live transaction volume, flagged events, and per-category breakdowns.
PythonApache KafkaApache FlinkPostgreSQLReact
Live Demo12
dbt-audit-macros 1
dbt-audit-macros 2
03Featured Project

dbt-audit-macros

A collection of dbt macros for automated audit logging, freshness checks, and row-count reconciliation across Snowflake and BigQuery models.

  • —Wrote a suite of dbt macros for automated audit logging — tracking row counts, null rates, and schema changes on every model run without custom Python.
  • —Added freshness check macros compatible with both Snowflake and BigQuery that surface stale source tables before they silently corrupt downstream models.
  • —Packaged as a dbt Hub-compatible package with full documentation and example project so teams can install and configure in under 10 minutes.
dbtSQLSnowflakeBigQueryJinja2
Live Demo44

02.Work Experience

01

Meridian Analytics — Seattle, WA

2023 – Now

3yr 3moCurrent

Senior Data Engineer

Jan 2023 — Present

Full-TimeSeattle, WA

Senior data engineer on a 6-person data platform team at a Series B fintech company. Own core ingestion pipelines, the Snowflake data warehouse, and data quality infrastructure used by 30+ analysts and 3 ML engineers.

  • 01Redesigned the company's core ELT architecture from ad-hoc Python scripts to a fully orchestrated Airflow + dbt stack, reducing data freshness SLA breaches by 90%.
  • 02Built a data contract framework with Great Expectations that runs on every pipeline run — catching schema drift and volume anomalies before they reach downstream dashboards.
  • 03Led a cross-functional initiative with analytics and product to define and document 80+ core business metrics in dbt, creating a single source of truth for company-wide reporting.
PythonApache AirflowdbtSnowflakeKafkaGreat ExpectationsTerraform
02

Novu Commerce — Portland, OR

2020 – 2022

2yr 5mo

Data Engineer

Aug 2020 — Dec 2022

Full-TimePortland, OR

Data engineer on a two-person data team at a mid-size e-commerce company. Built and maintained pipelines for marketing, finance, and operations analytics.

  • 01Migrated 14 legacy cron-based ETL jobs to Apache Airflow DAGs with proper retry logic, SLA monitoring, and Slack alerting — reducing weekly pipeline failures from 8-10 incidents to near-zero.
  • 02Implemented a Snowflake data warehouse from scratch, consolidating data from Shopify, Stripe, Google Ads, and an internal PostgreSQL database into a unified analytics layer.
  • 03Built a near-real-time inventory sync pipeline using AWS Lambda and DynamoDB Streams that reduced inventory discrepancy reports by 70%.
PythonApache AirflowSnowflakedbtAWS LambdaDynamoDBPostgreSQL

Education

B.S. Information Systems

University of Washington — Seattle

2016 — 2020
3yr 10mo

Data Engineering Nanodegree

Udacity

2020 — 2021
5mo

Contact

Get in touch
Available for work

Making Data Teams Actually Trust Their Data

I'm currently open to senior data engineering roles at data-driven companies. If you're looking for someone who can build reliable pipelines, improve data quality, and work closely with analytics and ML teams — feel free to reach out.

rbajra19@gmail.com
Seattle, WA
#
Made withSerisLab