David
Abdelmalek.

Senior Data & AI Consultant
Portrait of David Abdelmalek
01About

I'm a data and AI consultant. I take data from scattered source systems and bring it into one trusted place, cleaned up and ready for reporting and AI.

The sources vary: mostly SAP, but also HubSpot CRM, web-scraped data, and other ERP and CRM systems. I build the warehouse on Snowflake and dbt, run it on Azure or AWS, and ship every change to production with automated testing. Most of my clients are in regulated fields: banking, aviation, pharma, and energy across the DACH region.

On HubSpot CRM data, I score leads with Cortex ML classification so sales knows where to spend their time, forecast how leads and revenue will grow, and flag unusual patterns with anomaly detection. I also use Cortex LLM functions (AI_COMPLETE) to tag engagement notes by sentiment and intent, and build agentic systems with LangGraph. I work in English and German.

Outside client work I build my own tools, like an agentic text-to-SQL app and the Ask DoDo assistant running on this page, both on Snowflake Cortex.

Certifications
Microsoft AzureSolutions Architect Expert · Administrator · Developer · Fundamentals
Based in
Essen, GermanyAvailable remote across DACH
Languages
Arabic · English · German
02What I do
  • A full review of your current setup: what you have, what it's costing you, and where it breaks
  • Stakeholder interviews so the plan fits how your team actually works
  • A proposed target architecture built on best practices, with a clear recommendation on which tools to use and why
  • Every workshop output handed over, including Miro boards, notes and recordings, so nothing lives only in someone's head
  • The person who scopes the work also builds it, so the knowledge stays in one place
03Selected work
SAP · PharmaDACH pharmaceutical group

End-to-end SAP Data Vault warehouse

A DACH pharmaceutical group needed one governed platform for group-wide reporting instead of data scattered across systems. I built its global analytics warehouse on Snowflake with dbt: a layered Data Vault 2.0 architecture from raw ingestion and staging, through business-vault enrichment, to subject-area marts and a governance layer. It consolidates SAP ECC and HANA, flat files, and manual reference data, and covers sales order book, invoicing, P&L, supply chain (including OTIF), and data quality. Automated incremental pipelines are promoted across dev, QA, and production on Azure DevOps.

Snowflakedbt CoreData Vault 2.0SAP ECC / HANAautomate_dvAzure DevOps
ELT · AviationDACH airport operator

Aviation analytics data warehouse

An analytics platform that turns SAMS flight data into reliable insights for operations, airlines, capacity, and planning. Data flows from Azure into a Snowflake landing zone, through automated ELT into a star-schema data mart, and on to Power BI. It covers three domains: Coordinated (scheduled flights and slot allocations), Operated (actual flight performance), and Utilization (airport capacity). Snowflake Streams and Tasks run the incremental, dependency-based loads, so dimensions are always ready before facts.

SnowflakeELTStreams & TasksStar schemaAzurePower BI
Platform · BankingRegulated DACH bank

Config-driven ingestion framework

Every new data source used to mean hand-written pipelines and ad-hoc schema changes. I built a configuration-driven staging framework where a single JSON file provisions the entire ingestion stack (Snowpipes, Streams, Tasks, formats and roles), deployed via Terraform across all environments. Every staged row keeps a full audit trail back to its source file.

~200
entities
4
environments
0
custom code per source
SnowflakeTerraform / OpenTofuSnowpipePython CLIAzure Blob
Risk · Energy tradingDACH energy trader

Energy trading risk-reporting platform

A risk and performance reporting platform for a DACH energy trader, consolidating trading, risk, and master data into one governed Snowflake analytics layer. I led a team of 4 building this 2+ TB platform on Snowflake and Azure: layered dbt from raw ingestion and staging, through a Risk Data Vault and business vault, to hourly and daily reporting marts. Sources include Python risk metrics (VaR, PAR, Expected Shortfall), PSI trade quantities, HPFC price curves, contract and portfolio master data, risk limits, and PnL adjustments, with Oracle integrated via SAP OData. dbt handles Data Vault automation, data-quality tests, and column masking; GitLab CI/CD runs scheduled, manual, and merge-request pipelines.

SnowflakedbtData Vault 2.0AzureSAP ODataGitLab CI/CD
AI · Open sourceReference implementation

Agentic Text-to-SQL on Snowflake Cortex

A reference implementation for safe, observable GenAI over a real warehouse, not a single-prompt demo. Business users ask in plain English; a LangGraph agent uses Snowflake Cortex to generate SQL, enforces read-only safety with sqlglot and a dedicated read-only role, runs the query, and summarizes the answer, with bounded self-repair when a query fails. It runs on the UCI Online Retail II dataset, ingested with Python into Snowflake and modeled with dbt into a Kimball star schema and a generated semantic layer. Served through a FastAPI /ask endpoint and a Typer CLI, traced with Langfuse, and scored by execution-accuracy evaluation against a gold question set. Code is public on GitHub.

LangGraphSnowflake CortexsqlglotdbtFastAPILangfuse

You're looking at one of these projects right now. The Ask DoDo button is a retrieval-augmented assistant I built on Snowflake Cortex: my CV is chunked, embedded with AI_EMBED into a Snowflake VECTOR column, and retrieved per question, then grounded into an AI_COMPLETE answer behind a Cloudflare Worker. It answers in English or German and only from real facts. Ask it anything.

04Experience & education
Experience
Jun 2023 — Present
Senior Data & AI Consultant
INFORM DataLab · Aachen
Cloud data platforms, SAP integration and AI features for enterprise clients across pharma, banking, aviation, energy trading and B2B services. Client engagements stay anonymised under NDA.
Feb 2022 — Feb 2023
Cloud Data & MLOps Engineer
Bosch · Stuttgart · Working student
Kubernetes-based ingestion pipelines and scalable sensor-data workflows on Azure Data Lake and Databricks.
May 2022 — Dec 2022
AI Researcher — MSc Thesis
Fraunhofer IPT · Aachen
Reinforcement learning with Graph Neural Networks for fair hospital resource scheduling (CAR-T therapy).
Feb 2021 — Jul 2021
Data Scientist / AI Engineer
BMW · Munich · Intern
Migrated statistical workflows from R to Python on AWS/Azure; built PySpark BI pipelines on after-sales data.
Jan 2020 — Jun 2020
Data Scientist
Henkel · Düsseldorf · Intern
AI-driven raw-material price forecasting with TensorFlow and PySpark to support procurement planning.
Education
Oct 2019 — Dec 2022
MSc Data Science
RWTH Aachen University · Aachen, Germany
Sep 2014 — Jun 2019
BEng Computer Science & Engineering
German University in Cairo · Cairo, Egypt
05Tech stack
SnowflakedbtAzure ADFDatabricksSynapseSAP ECCS/4HANASAP HANATerraformGitLab CI/CDPythonPySparkData Vault 2.0KimballSnowflake CortexLangGraphPower BIQlik
Let's talk

Have a data problem
worth solving?

// DACH enterprise & remote · typically replies within a couple of days