I'm a data and AI consultant. I take data from scattered source systems and bring it into one trusted place, cleaned up and ready for reporting and AI.
The sources vary: mostly SAP, but also HubSpot CRM, web-scraped data, and other ERP and CRM systems. I build the warehouse on Snowflake and dbt, run it on Azure or AWS, and ship every change to production with automated testing. Most of my clients are in regulated fields: banking, aviation, pharma, and energy across the DACH region.
On HubSpot CRM data, I score leads with Cortex ML classification so sales knows where to spend their time, forecast how leads and revenue will grow, and flag unusual patterns with anomaly detection. I also use Cortex LLM functions (AI_COMPLETE) to tag engagement notes by sentiment and intent, and build agentic systems with LangGraph. I work in English and German.
Outside client work I build my own tools, like an agentic text-to-SQL app and the Ask DoDo assistant running on this page, both on Snowflake Cortex.
A DACH pharmaceutical group needed one governed platform for group-wide reporting instead of data scattered across systems. I built its global analytics warehouse on Snowflake with dbt: a layered Data Vault 2.0 architecture from raw ingestion and staging, through business-vault enrichment, to subject-area marts and a governance layer. It consolidates SAP ECC and HANA, flat files, and manual reference data, and covers sales order book, invoicing, P&L, supply chain (including OTIF), and data quality. Automated incremental pipelines are promoted across dev, QA, and production on Azure DevOps.
An analytics platform that turns SAMS flight data into reliable insights for operations, airlines, capacity, and planning. Data flows from Azure into a Snowflake landing zone, through automated ELT into a star-schema data mart, and on to Power BI. It covers three domains: Coordinated (scheduled flights and slot allocations), Operated (actual flight performance), and Utilization (airport capacity). Snowflake Streams and Tasks run the incremental, dependency-based loads, so dimensions are always ready before facts.
Every new data source used to mean hand-written pipelines and ad-hoc schema changes. I built a configuration-driven staging framework where a single JSON file provisions the entire ingestion stack (Snowpipes, Streams, Tasks, formats and roles), deployed via Terraform across all environments. Every staged row keeps a full audit trail back to its source file.
A risk and performance reporting platform for a DACH energy trader, consolidating trading, risk, and master data into one governed Snowflake analytics layer. I led a team of 4 building this 2+ TB platform on Snowflake and Azure: layered dbt from raw ingestion and staging, through a Risk Data Vault and business vault, to hourly and daily reporting marts. Sources include Python risk metrics (VaR, PAR, Expected Shortfall), PSI trade quantities, HPFC price curves, contract and portfolio master data, risk limits, and PnL adjustments, with Oracle integrated via SAP OData. dbt handles Data Vault automation, data-quality tests, and column masking; GitLab CI/CD runs scheduled, manual, and merge-request pipelines.
A reference implementation for safe, observable GenAI over a real warehouse, not a single-prompt demo. Business users ask in plain English; a LangGraph agent uses Snowflake Cortex to generate SQL, enforces read-only safety with sqlglot and a dedicated read-only role, runs the query, and summarizes the answer, with bounded self-repair when a query fails. It runs on the UCI Online Retail II dataset, ingested with Python into Snowflake and modeled with dbt into a Kimball star schema and a generated semantic layer. Served through a FastAPI /ask endpoint and a Typer CLI, traced with Langfuse, and scored by execution-accuracy evaluation against a gold question set. Code is public on GitHub.
You're looking at one of these projects right now. The Ask DoDo button is a retrieval-augmented assistant I built on Snowflake Cortex: my CV is chunked, embedded with AI_EMBED into a Snowflake VECTOR column, and retrieved per question, then grounded into an AI_COMPLETE answer behind a Cloudflare Worker. It answers in English or German and only from real facts. Ask it anything.






