David Abdelmalek — Senior Data & AI Consultant

01About

Data platforms that regulated teams can trust.

Senior Data & AI Consultant building cloud warehouses for pharma, banking, aviation and energy across DACH. I pull scattered sources — mostly SAP — into one governed Snowflake platform, model it with dbt, and ship to production with automated testing.

On top I build in-warehouse ML and LLM features on Snowflake Cortex and agentic systems with LangGraph: lead scoring, forecasting, and a guard-railed Text-to-SQL agent. Azure certified (Solutions Architect Expert). I work in English and German.

SnowflakedbtCortexLangGraphAzureData Vault 2.0SAPTerraform

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class David:
    role:  str = "Senior Data & AI Consultant"
    based: str = "Essen, DE · DACH + remote"
    stack: tuple[str, ...] = ("Snowflake", "dbt", "Cortex", "LangGraph")

    def expertise(self) -> dict[str, list[str]]:
        return {
            "platforms": ["SAP → Snowflake", "Data Vault 2.0", "Kimball"],
            "ai":        ["Cortex ML & LLM", "agentic Text-to-SQL", "RAG"],
            "delivery":  ["Terraform", "CI/CD", "automated testing"],
        }

david = David()  # 6+ yrs · pharma · banking · aviation · energy

03Selected work

SAP · PharmaDACH pharmaceutical group

End-to-end SAP Data Vault warehouse

A DACH pharmaceutical group needed one governed platform for group-wide reporting instead of data scattered across systems. I built its global analytics warehouse on Snowflake with dbt: a layered Data Vault 2.0 architecture from raw ingestion and staging, through business-vault enrichment, to subject-area marts and a governance layer. It consolidates SAP ECC and HANA, flat files, and manual reference data, and covers sales order book, invoicing, P&L, supply chain (including OTIF), and data quality. Automated incremental pipelines are promoted across dev, QA, and production on Azure DevOps.

Snowflakedbt CoreData Vault 2.0SAP ECC / HANAautomate_dvAzure DevOps

ELT · AviationDACH airport operator

Aviation analytics data warehouse

An analytics platform that turns SAMS flight data into reliable insights for operations, airlines, capacity, and planning. Data flows from Azure into a Snowflake landing zone, through automated ELT into a star-schema data mart, and on to Power BI. It covers three domains: Coordinated (scheduled flights and slot allocations), Operated (actual flight performance), and Utilization (airport capacity). Snowflake Streams and Tasks run the incremental, dependency-based loads, so dimensions are always ready before facts.

SnowflakeELTStreams & TasksStar schemaAzurePower BI

Platform · BankingRegulated DACH bank

Config-driven ingestion framework

Every new data source used to mean hand-written pipelines and ad-hoc schema changes. I built a configuration-driven staging framework where a single JSON file provisions the entire ingestion stack (Snowpipes, Streams, Tasks, formats and roles), deployed via Terraform across all environments. Every staged row keeps a full audit trail back to its source file.

~200

entities

environments

custom code per source

SnowflakeTerraform / OpenTofuSnowpipePython CLIAzure Blob

Risk · Energy tradingDACH energy trader

Energy trading risk-reporting platform

A risk and performance reporting platform for a DACH energy trader, consolidating trading, risk, and master data into one governed Snowflake analytics layer. I led a team of 4 building this 2+ TB platform on Snowflake and Azure: layered dbt from raw ingestion and staging, through a Risk Data Vault and business vault, to hourly and daily reporting marts. Sources include Python risk metrics (VaR, PAR, Expected Shortfall), PSI trade quantities, HPFC price curves, contract and portfolio master data, risk limits, and PnL adjustments, with Oracle integrated via SAP OData. dbt handles Data Vault automation, data-quality tests, and column masking; GitLab CI/CD runs scheduled, manual, and merge-request pipelines.

SnowflakedbtData Vault 2.0AzureSAP ODataGitLab CI/CD

AI · Open sourceReference implementation

Agentic Text-to-SQL on Snowflake Cortex

A reference implementation for safe, observable GenAI over a real warehouse, not a single-prompt demo. Business users ask in plain English; a LangGraph agent uses Snowflake Cortex to generate SQL, enforces read-only safety with sqlglot and a dedicated read-only role, runs the query, and summarizes the answer, with bounded self-repair when a query fails. It runs on the UCI Online Retail II dataset, ingested with Python into Snowflake and modeled with dbt into a Kimball star schema and a generated semantic layer. Served through a FastAPI /ask endpoint and a Typer CLI, traced with Langfuse, and scored by execution-accuracy evaluation against a gold question set. Code is public on GitHub.

LangGraphSnowflake CortexsqlglotdbtFastAPILangfuse

View repository →

AI · Live on this siteRetrieval-augmented assistant

Ask DoDo — RAG assistant on Snowflake Cortex

The assistant answering questions on this page right now. I chunk my CV and projects, embed them with Cortex AI_EMBED into a Snowflake VECTOR column, and retrieve the most relevant context per question, then ground it into an AI_COMPLETE answer behind a Cloudflare Worker that rate-limits, caches and logs. Bilingual (English / German) and answers only from real facts — it won't invent experience.

Snowflake CortexAI_EMBEDVECTORAI_COMPLETERAGCloudflare Worker

Open source github.com/DavidAbdelmalek ↗

agentic-text-to-sql

Safe, observable, evaluated LangGraph text-to-SQL agent over a Snowflake warehouse — read-only guardrails, a generated semantic layer, and execution-accuracy evals.

Python

clickhouse-vs-snowflake

A hands-on benchmark of ClickHouse and Snowflake on the same analytical workload — comparing load, query latency, and cost trade-offs.

Python

data_engineering_etl_projects

A set of self-built ETL projects exploring different data-engineering tools and patterns end to end, from ingestion through transformation.

Python

04Experience & education

Experience

Jun 2023 — Present

Senior Data & AI Consultant

INFORM DataLab · Aachen

Cloud data platforms, SAP integration and AI features for enterprise clients across pharma, banking, aviation, energy trading and B2B services. Client engagements stay anonymised under NDA.

Feb 2022 — Feb 2023

Cloud Data & MLOps Engineer

Bosch · Stuttgart · Working student

Kubernetes-based ingestion pipelines and scalable sensor-data workflows on Azure Data Lake and Databricks.

May 2022 — Dec 2022

AI Researcher — MSc Thesis

Fraunhofer IPT · Aachen

Reinforcement learning with Graph Neural Networks for fair hospital resource scheduling (CAR-T therapy).

Feb 2021 — Jul 2021

Data Scientist / AI Engineer

BMW · Munich · Intern

Migrated statistical workflows from R to Python on AWS/Azure; built PySpark BI pipelines on after-sales data.

Jan 2020 — Jun 2020

Data Scientist

Henkel · Düsseldorf · Intern

AI-driven raw-material price forecasting with TensorFlow and PySpark to support procurement planning.

Education

Oct 2019 — Dec 2022

MSc Data Science

RWTH Aachen University · Aachen, Germany

Sep 2014 — Jun 2019

BSc Computer Science & Engineering

German University in Cairo · Cairo, Egypt

David
Abdelmalek.

Data platforms that regulated teams can trust.

Architecture Assessment

Cloud Data Platforms

AI & LLM Features

End-to-end SAP Data Vault warehouse

Aviation analytics data warehouse

Config-driven ingestion framework

Energy trading risk-reporting platform

Agentic Text-to-SQL on Snowflake Cortex

Ask DoDo — RAG assistant on Snowflake Cortex

Have a data problem
worth solving?

Ask DoDo

Data platforms that regulated teams can trust.

Architecture Assessment

Cloud Data Platforms

AI & LLM Features

End-to-end SAP Data Vault warehouse

Aviation analytics data warehouse

Config-driven ingestion framework

Energy trading risk-reporting platform

Agentic Text-to-SQL on Snowflake Cortex

Ask DoDo — RAG assistant on Snowflake Cortex

Have a data problemworth solving?

Have a data problem
worth solving?