Posts by Tags

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 4: To Compare, or Not to Compare

25 minute read

Published: November 07, 2025

In this post, I go back to the “How Good is My Model?” lane and continue the journey. However, since the next stop is the “Cross-Validation land,” which in my field is mainly about model comparison, one needs to go through this sanity check before moving to comparison. This check should indicate whether I am ready to start comparing models—or not yet.

How Good is My Model? Part 5: When cross-validation went rogue!

32 minute read

Published: January 09, 2026

In the last technical post, I talked about how to tell when one is in a state to start comparing models. I found that I needed to satisfy some conditions before concluding that my model is suitable for my data and representation. Now, assuming that I have such a suitable model and I want to compare it to other suitable models—or I found no such model, and I just want to see which of my suboptimal models is the least suboptimal—Is cross-validation the next logical step?

Short answer: Not as we use it today!

How Good is My Model? Part 5: When cross-validation went rogue!

32 minute read

Published: January 09, 2026

In the last technical post, I talked about how to tell when one is in a state to start comparing models. I found that I needed to satisfy some conditions before concluding that my model is suitable for my data and representation. Now, assuming that I have such a suitable model and I want to compare it to other suitable models—or I found no such model, and I just want to see which of my suboptimal models is the least suboptimal—Is cross-validation the next logical step?

Short answer: Not as we use it today!

How Good is My Model? Part 4: To Compare, or Not to Compare

25 minute read

Published: November 07, 2025

In this post, I go back to the “How Good is My Model?” lane and continue the journey. However, since the next stop is the “Cross-Validation land,” which in my field is mainly about model comparison, one needs to go through this sanity check before moving to comparison. This check should indicate whether I am ready to start comparing models—or not yet.

How Good is My Model? Part 5: When cross-validation went rogue!

32 minute read

Published: January 09, 2026

In the last technical post, I talked about how to tell when one is in a state to start comparing models. I found that I needed to satisfy some conditions before concluding that my model is suitable for my data and representation. Now, assuming that I have such a suitable model and I want to compare it to other suitable models—or I found no such model, and I just want to see which of my suboptimal models is the least suboptimal—Is cross-validation the next logical step?

Short answer: Not as we use it today!

How Good is My Model? Part 4: To Compare, or Not to Compare

25 minute read

Published: November 07, 2025

In this post, I go back to the “How Good is My Model?” lane and continue the journey. However, since the next stop is the “Cross-Validation land,” which in my field is mainly about model comparison, one needs to go through this sanity check before moving to comparison. This check should indicate whether I am ready to start comparing models—or not yet.

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 1: µ ± σ and a Bit More

13 minute read

Published: August 22, 2025

In my first blog post, I made two bold statements about transformer models for molecular property prediction in our review¹ — namely, that they fall short in terms of both novelty and benchmarking. In this and the following posts, I hope to convince you of these conclusions.

Publicly accessible pre-print version here ↩

How Good is My Model? Part 5: When cross-validation went rogue!

32 minute read

Published: January 09, 2026

In the last technical post, I talked about how to tell when one is in a state to start comparing models. I found that I needed to satisfy some conditions before concluding that my model is suitable for my data and representation. Now, assuming that I have such a suitable model and I want to compare it to other suitable models—or I found no such model, and I just want to see which of my suboptimal models is the least suboptimal—Is cross-validation the next logical step?

Short answer: Not as we use it today!

How Good is My Model? Part 4: To Compare, or Not to Compare

25 minute read

Published: November 07, 2025

In this post, I go back to the “How Good is My Model?” lane and continue the journey. However, since the next stop is the “Cross-Validation land,” which in my field is mainly about model comparison, one needs to go through this sanity check before moving to comparison. This check should indicate whether I am ready to start comparing models—or not yet.

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 1: µ ± σ and a Bit More

13 minute read

Published: August 22, 2025

In my first blog post, I made two bold statements about transformer models for molecular property prediction in our review¹ — namely, that they fall short in terms of both novelty and benchmarking. In this and the following posts, I hope to convince you of these conclusions.

Publicly accessible pre-print version here ↩

The Beginning: Something is Broken!

8 minute read

Published: July 25, 2025

So, my PhD topic is to implement transformer models for toxicity prediction and molecular generation. Or in plain english, given a molecule (e.g., a drug), I want to train an “AI” model to predict its possible toxic effect on the body and the environment. Even better, create new formulas for molecules that will be safe biologically and environmentally. However, in this post, I will show you how I had to make a major detour before I even attempt to do so!

How Good is My Model? Part 1: µ ± σ and a Bit More

13 minute read

Published: August 22, 2025

In my first blog post, I made two bold statements about transformer models for molecular property prediction in our review¹ — namely, that they fall short in terms of both novelty and benchmarking. In this and the following posts, I hope to convince you of these conclusions.

Publicly accessible pre-print version here ↩

How Good is My Model? Part 4: To Compare, or Not to Compare

25 minute read

Published: November 07, 2025

In this post, I go back to the “How Good is My Model?” lane and continue the journey. However, since the next stop is the “Cross-Validation land,” which in my field is mainly about model comparison, one needs to go through this sanity check before moving to comparison. This check should indicate whether I am ready to start comparing models—or not yet.

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

The Beginning: Something is Broken!

8 minute read

Published: July 25, 2025

So, my PhD topic is to implement transformer models for toxicity prediction and molecular generation. Or in plain english, given a molecule (e.g., a drug), I want to train an “AI” model to predict its possible toxic effect on the body and the environment. Even better, create new formulas for molecules that will be safe biologically and environmentally. However, in this post, I will show you how I had to make a major detour before I even attempt to do so!

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 5: When cross-validation went rogue!

32 minute read

Published: January 09, 2026

In the last technical post, I talked about how to tell when one is in a state to start comparing models. I found that I needed to satisfy some conditions before concluding that my model is suitable for my data and representation. Now, assuming that I have such a suitable model and I want to compare it to other suitable models—or I found no such model, and I just want to see which of my suboptimal models is the least suboptimal—Is cross-validation the next logical step?

Short answer: Not as we use it today!

How Good is My Model? Part 4: To Compare, or Not to Compare

25 minute read

Published: November 07, 2025

In this post, I go back to the “How Good is My Model?” lane and continue the journey. However, since the next stop is the “Cross-Validation land,” which in my field is mainly about model comparison, one needs to go through this sanity check before moving to comparison. This check should indicate whether I am ready to start comparing models—or not yet.

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 1: µ ± σ and a Bit More

13 minute read

Published: August 22, 2025

In my first blog post, I made two bold statements about transformer models for molecular property prediction in our review¹ — namely, that they fall short in terms of both novelty and benchmarking. In this and the following posts, I hope to convince you of these conclusions.

Publicly accessible pre-print version here ↩

The Beginning: Something is Broken!

8 minute read

Published: July 25, 2025

So, my PhD topic is to implement transformer models for toxicity prediction and molecular generation. Or in plain english, given a molecule (e.g., a drug), I want to train an “AI” model to predict its possible toxic effect on the body and the environment. Even better, create new formulas for molecules that will be safe biologically and environmentally. However, in this post, I will show you how I had to make a major detour before I even attempt to do so!

How Good is My Model? Part 5: When cross-validation went rogue!

32 minute read

Published: January 09, 2026

In the last technical post, I talked about how to tell when one is in a state to start comparing models. I found that I needed to satisfy some conditions before concluding that my model is suitable for my data and representation. Now, assuming that I have such a suitable model and I want to compare it to other suitable models—or I found no such model, and I just want to see which of my suboptimal models is the least suboptimal—Is cross-validation the next logical step?

Short answer: Not as we use it today!

The Beginning: Something is Broken!

8 minute read

Published: July 25, 2025

So, my PhD topic is to implement transformer models for toxicity prediction and molecular generation. Or in plain english, given a molecule (e.g., a drug), I want to train an “AI” model to predict its possible toxic effect on the body and the environment. Even better, create new formulas for molecules that will be safe biologically and environmentally. However, in this post, I will show you how I had to make a major detour before I even attempt to do so!

How Good is My Model? Part 5: When cross-validation went rogue!

32 minute read

Published: January 09, 2026

In the last technical post, I talked about how to tell when one is in a state to start comparing models. I found that I needed to satisfy some conditions before concluding that my model is suitable for my data and representation. Now, assuming that I have such a suitable model and I want to compare it to other suitable models—or I found no such model, and I just want to see which of my suboptimal models is the least suboptimal—Is cross-validation the next logical step?

Short answer: Not as we use it today!

How Good is My Model? Part 4: To Compare, or Not to Compare

25 minute read

Published: November 07, 2025

In this post, I go back to the “How Good is My Model?” lane and continue the journey. However, since the next stop is the “Cross-Validation land,” which in my field is mainly about model comparison, one needs to go through this sanity check before moving to comparison. This check should indicate whether I am ready to start comparing models—or not yet.

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 1: µ ± σ and a Bit More

13 minute read

Published: August 22, 2025

In my first blog post, I made two bold statements about transformer models for molecular property prediction in our review¹ — namely, that they fall short in terms of both novelty and benchmarking. In this and the following posts, I hope to convince you of these conclusions.

Publicly accessible pre-print version here ↩

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 1: µ ± σ and a Bit More

13 minute read

Published: August 22, 2025

In my first blog post, I made two bold statements about transformer models for molecular property prediction in our review¹ — namely, that they fall short in terms of both novelty and benchmarking. In this and the following posts, I hope to convince you of these conclusions.

Publicly accessible pre-print version here ↩

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

How Good is My Model? Part 5: When cross-validation went rogue!

32 minute read

Published: January 09, 2026

In the last technical post, I talked about how to tell when one is in a state to start comparing models. I found that I needed to satisfy some conditions before concluding that my model is suitable for my data and representation. Now, assuming that I have such a suitable model and I want to compare it to other suitable models—or I found no such model, and I just want to see which of my suboptimal models is the least suboptimal—Is cross-validation the next logical step?

Short answer: Not as we use it today!

How Good is My Model? Part 4: To Compare, or Not to Compare

25 minute read

Published: November 07, 2025

In this post, I go back to the “How Good is My Model?” lane and continue the journey. However, since the next stop is the “Cross-Validation land,” which in my field is mainly about model comparison, one needs to go through this sanity check before moving to comparison. This check should indicate whether I am ready to start comparing models—or not yet.

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 1: µ ± σ and a Bit More

13 minute read

Published: August 22, 2025

In my first blog post, I made two bold statements about transformer models for molecular property prediction in our review¹ — namely, that they fall short in terms of both novelty and benchmarking. In this and the following posts, I hope to convince you of these conclusions.

Publicly accessible pre-print version here ↩

The Beginning: Something is Broken!

8 minute read

Published: July 25, 2025

So, my PhD topic is to implement transformer models for toxicity prediction and molecular generation. Or in plain english, given a molecule (e.g., a drug), I want to train an “AI” model to predict its possible toxic effect on the body and the environment. Even better, create new formulas for molecules that will be safe biologically and environmentally. However, in this post, I will show you how I had to make a major detour before I even attempt to do so!

How Good is My Model? Part 5: When cross-validation went rogue!

32 minute read

Published: January 09, 2026

In the last technical post, I talked about how to tell when one is in a state to start comparing models. I found that I needed to satisfy some conditions before concluding that my model is suitable for my data and representation. Now, assuming that I have such a suitable model and I want to compare it to other suitable models—or I found no such model, and I just want to see which of my suboptimal models is the least suboptimal—Is cross-validation the next logical step?

Short answer: Not as we use it today!

How Good is My Model? Part 4: To Compare, or Not to Compare

25 minute read

Published: November 07, 2025

In this post, I go back to the “How Good is My Model?” lane and continue the journey. However, since the next stop is the “Cross-Validation land,” which in my field is mainly about model comparison, one needs to go through this sanity check before moving to comparison. This check should indicate whether I am ready to start comparing models—or not yet.

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 1: µ ± σ and a Bit More

13 minute read

Published: August 22, 2025

In my first blog post, I made two bold statements about transformer models for molecular property prediction in our review¹ — namely, that they fall short in terms of both novelty and benchmarking. In this and the following posts, I hope to convince you of these conclusions.

Publicly accessible pre-print version here ↩

The Beginning: Something is Broken!

8 minute read

Published: July 25, 2025

So, my PhD topic is to implement transformer models for toxicity prediction and molecular generation. Or in plain english, given a molecule (e.g., a drug), I want to train an “AI” model to predict its possible toxic effect on the body and the environment. Even better, create new formulas for molecules that will be safe biologically and environmentally. However, in this post, I will show you how I had to make a major detour before I even attempt to do so!

How Good is My Model? Part 5: When cross-validation went rogue!

32 minute read

Published: January 09, 2026

In the last technical post, I talked about how to tell when one is in a state to start comparing models. I found that I needed to satisfy some conditions before concluding that my model is suitable for my data and representation. Now, assuming that I have such a suitable model and I want to compare it to other suitable models—or I found no such model, and I just want to see which of my suboptimal models is the least suboptimal—Is cross-validation the next logical step?

Short answer: Not as we use it today!

How Good is My Model? Part 4: To Compare, or Not to Compare

25 minute read

Published: November 07, 2025

In this post, I go back to the “How Good is My Model?” lane and continue the journey. However, since the next stop is the “Cross-Validation land,” which in my field is mainly about model comparison, one needs to go through this sanity check before moving to comparison. This check should indicate whether I am ready to start comparing models—or not yet.

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

How Good is My Model? Part 2: LLN, CLT, i.i.d., and a Messy World

17 minute read

Published: August 29, 2025

In the last post, I discussed how a model’s performance is more faithfully reported as a distribution rather than a single value. Also, how it’s important to report it mindfully to distinguish between what is typical and what is expected, and how standard deviation (σ) can quantify a model’s stability. However, I also discussed that one distribution from a small test set is not guaranteed to give an overview of the model’s true distribution. In this post, I’ll discuss how to move forward from this bottleneck.

How Good is My Model? Part 1: µ ± σ and a Bit More

13 minute read

Published: August 22, 2025

In my first blog post, I made two bold statements about transformer models for molecular property prediction in our review¹ — namely, that they fall short in terms of both novelty and benchmarking. In this and the following posts, I hope to convince you of these conclusions.

Publicly accessible pre-print version here ↩

How Good is My Model? Part 3: Truce with Small Datasets.

15 minute read

Published: September 12, 2025

In the last two posts, I was trying to answer a question but found that it only became entangled with the limitations of reality. To know how good a model is, I need to land on the true distribution of its performance—to see what is common and what is rare. However, this requires a large number of i.i.d. samples, which unfortunately are not easy to find in our field.

Afnan Sultan

Posts by Tags

Analytical vs empirical

Bootstrapping

Central Limit Theorem (CLT)

Confidence

Cross-Validation

Data Limitations

Diagnostics

Distributions

Evaluation

Expected vs Typical

Homoscedasticity

Independent and Identically Distributed

Language models

Law of Large Number (LLN)

Machine learning (ML)

Model Limitations

Molecular Property Prediction (MPP)

Reporting

Sample quality

Sample size

Small Datasets

Standards

Statistics

Transformers

Trustworthy ML

Uncertainty