Suranjit Banik — Senior Data Engineer · Azure Databricks Specialist

01 About

I build the data infrastructure behind modern analytics and AI — designing scalable lakehouse pipelines, governing data platforms, and shipping AI-powered products that turn raw data into decisions.

From insurance to digital media, I architect medallion lakehouses on Azure Databricks — Lakeflow Declarative Pipelines, Delta Lake, Unity Catalog, and config-driven PySpark frameworks built to scale, govern, and last. An early adopter of agentic development on Databricks, pairing Claude Code with the Databricks MCP server to ship faster. Based in Toronto.

5+

Years in data engineering

30+

Broker sources onboarded

40%

Faster pipeline runtimes

60%

Faster source onboarding

02 Selected Work

(01) Live · GitHub Pages

This Portfolio

The single-page site you're viewing — designed and built with Three.js and GSAP, deployed on GitHub Pages.

Stars

—

Forks

—

Language

—

Updated

—

Visit live site ↗ View repository →

03 GitHub Activity

—

Contributions · last year

—

Avg contributions · week

—

Busiest day

—

Public repositories

Commit activity

Last 12 months · pulled live from GitHub

MonWedFri

Loading contribution data… Less More

Latest commits

Most recent public pushes

Loading recent commits…

See everything on GitHub ↗

04 Capabilities

Databricks

Delta Lake
Lakeflow Pipelines (DLT)
Unity Catalog
Databricks Apps
Workflows · Asset Bundles
Auto Loader
Genie · Lakebase

AI & Agentic Tooling

Claude Code
Databricks MCP Server
Databricks CLI
Omnigent
Claude Cowork

Languages & Cloud

Python · PySpark
Spark SQL · SQL
Azure ADLS · ADF
Azure DevOps · Logic Apps
AWS S3 · GCP BigQuery

Methods & Tools

Medallion Architecture
Metadata-Driven Ingestion
Data Governance
CI/CD · Performance Tuning
Streamlit · Jira · Confluence

05 How I Can Support You

(01)

Lakehouse architecture & pipelines

Design and build medallion lakehouses on Databricks — Lakeflow Declarative Pipelines with data-quality expectations, CDC, Auto Loader, and Unity Catalog governance, engineered for schema-resilient ingestion at scale.

(02)

AI-powered data products

Ship Databricks Apps and Genie/Lakebase products that put live data in users' hands — from natural-language querying for underwriters to LLM-powered column mapping that eliminates manual ingestion fixes.

(03)

CI/CD & platform modernization

Modernize release workflows with Databricks Asset Bundles and Azure DevOps across dev, qa, uat, and prod — improving release velocity, environment portability, and onboarding time for new sources by ~60%.

(04)

Performance & cost tuning

Cut Spark runtimes by ~40% with Predictive Optimization, liquid clustering on serverless compute, and join-strategy tuning — lower cloud spend, faster insights.

(05)

Agentic engineering enablement

Bring agentic workflows to your data team — pairing Claude Code with the Databricks MCP server and CLI to scaffold pipelines, generate tests, and debug jobs, shipping projects ahead of schedule.

06 Experience

Apr 2024 — Present

Senior Data Engineer

Zurich Canada · Toronto, ON

Built an AI-powered policy & claims app with Databricks Genie and Lakebase — natural-language querying for underwriters with claims notes persisted to a Lakebase OLTP store, cutting insight turnaround from days to seconds.
Developed a Databricks App for bordereaux (BDX) file intake with Claude LLM-powered column mapping — auto-normalizing headers and mapping columns to the target schema, eliminating manual fixes in lakehouse ingestion.
Built declarative medallion pipelines (bronze → silver → gold) on Lakeflow with data-quality expectations, CDC, Unity Catalog governance, and Auto Loader for schema-resilient ingestion.
Designed config-driven PySpark ETL frameworks reused across 30+ brokers' data and modernized CI/CD with Asset Bundles and Azure DevOps — ~40% faster Spark jobs, ~60% faster source onboarding.
Adopted agentic engineering workflows pairing Claude Code with the Databricks MCP server and CLI to scaffold pipelines, generate tests, and debug jobs — shipping multiple projects ahead of schedule.

01

Aug 2021 — Apr 2024

Data Engineer

GroupM · Toronto, ON

Built Databricks and PySpark pipelines on GCP integrating BigQuery, Bigtable, and Cloud Functions — improving reporting performance by 60%.
Delivered multi-cloud solutions across GCP, Azure, and AWS; led data governance for enterprise data warehousing.

02

Jan 2021 — Jul 2021

Data Engineer

Clue · Toronto, ON

Designed graph data models with Neo4j and Cypher; built ETL pipelines on AWS (S3, Athena, SageMaker, DataBrew) for digital marketing analytics.

03

Feb 2020 — Dec 2020

Data Analyst

Clue · Toronto, ON

Built Azure analytics services ingesting Salesforce data for media campaign reporting with near-zero downtime.

04

May 2019 — Feb 2020

Digital Analyst

Mindshare · Toronto, ON

Architected Datorama solutions for Nestlé and CPG clients, improving campaign ROI measurement.

05

07 Credentials

Education

M.Eng, Computer Engineering

University of Ottawa

Education

B.E, Computer Science

Visvesvaraya Technological University

Certification

Databricks Certified Data Engineer Associate

Databricks

Certification

AWS Certified Cloud Practitioner

Amazon Web Services · Valid through Dec 2026

Professional Development

Databricks Data + AI Summit 2026

San Francisco · Hands-on training on Lakeflow, Lakebase, and Genie