Understanding Model Calibration: A Gentle Introduction & Visual Exploration

How Reliable Are Your Predictions? About To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blog post we’ll take a look at the most commonly used definition for calibration and then dive into a frequently used evaluation measure for model calibration. […]
The post Understanding Model Calibration: A Gentle Introduction & Visual Exploration appeared first on Towards Data Science.

Data vs. Business Strategy

There seems to be a consensus that leveraging data, analytics, and AI to create a data-driven organization requires a clear strategic approach. However, there is less clarity and agreement on exactly what this strategic approach should look like in practice. This article provides a short overview of what strategy work I believe is required to […]
The post Data vs. Business Strategy appeared first on Towards Data Science.

Polars vs. Pandas — An Independent Speed Comparison

Overview Introduction — Purpose and Reasons Speed is important when dealing with large amounts of data. If you are handling data in a cloud data warehouse or similar, then the speed of execution for your data ingestion and processing affects the following: As you’ve probably understood from the title, I am going to provide a […]
The post Polars vs. Pandas — An Independent Speed Comparison appeared first on Towards Data Science.

Six Ways to Control Style and Content in Diffusion Models

Stable Diffusion 1.5/2.0/2.1/XL 1.0, DALL-E, Imagen… In the past years, diffusion models have showcased stunning quality in image generation. However, while producing great quality on generic concepts, these struggle to generate high quality for more specialised queries, for example generating images in a specific style, that was not frequently seen in the training dataset. We […]
The post Six Ways to Control Style and Content in Diffusion Models appeared first on Towards Data Science.

The Gamma Hurdle Distribution

Which Outcome Matters? Here is a common scenario : An A/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. “A” could be a communication or offer and “B” could be no communication or no […]
The post The Gamma Hurdle Distribution appeared first on Towards Data Science.

Triangle Forecasting: Why Traditional Impact Estimates Are Inflated (And How to Fix Them)

Accurate impact estimations can make or break your business case. Yet, despite its importance, most teams use oversimplified calculations that can lead to inflated projections. These shot-in-the-dark numbers not only destroy credibility with stakeholders but can also result in misallocation of resources and failed initiatives. But there’s a better way to forecast effects of gradual […]
The post Triangle Forecasting: Why Traditional Impact Estimates Are Inflated (And How to Fix Them) appeared first on Towards Data Science.

I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms

Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and how the release of such open-source models could genuinely change the course of LLMs forever. That is really exciting! And also, too big of a scope to write about… but when a model like DeepSeek […]
The post I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms appeared first on Towards Data Science.

Synthetic Data Generation with LLMs

Popularity of RAG Over the past two years while working with financial firms, I’ve observed firsthand how they identify and prioritize Generative AI use cases, balancing complexity with potential value. Retrieval-Augmented Generation (RAG) often stands out as a foundational capability across many LLM-driven solutions, striking a balance between ease of implementation and real-world impact. By combining […]
The post Synthetic Data Generation with LLMs appeared first on Towards Data Science.