Those working in classification metrics will often come across terms like true positive rate, false positive rate, recall and precision and it is useful to understand what they mean in depth.

There are actually many good sources out there explaining these terms but true understanding only comes from sitting down and thinking about things. Furthermore, we have our own ways of thinking about things so there is no one size fits all. In this note I am crystallizing my understanding of these terms for my reference and for anyone who thinks like me.

Instead of introducing all these terms outright…

Looking for an explanation of what “exactly” goes on with class weights, I not only didn’t find anything, reverse-engineering the implementations in tensorflow and PyTorch I found that they do not agree. So in this note, I will thrash out the math taking a binary classification problem as an example. After that I will also discuss the multi-class classification problem very quickly.

A disbalanced dataset is one in which the number of datapoints with different classes is vastly different. …

Yesterday I wrote an article on why most articles on medium about the central limit theorem are misleading because they claim that irrespective of the underlying distribution, the sample mean tends to Gaussian for large sample size. This is incorrect and the Cauchy distribution is a counter example. I imagine this oversight originates from several folks not being fluent with fat tailed distributions and even fewer knowledgeable of distributions like the Cauchy distribution that do not have well defined means and/or variances.

If I were to hazard a guess, I would say that most people are not aware of this…

Recently I have come across many articles on medium claiming that the central limit theorem is very important for data scientists to know and claiming to teach or exemplify the theorem but doing it incorrectly.

The statements usually go like

the distribution of sample means of random variables - drawn not necessarily from a Gaussian - tends to a Gaussian when the sample size is sufficiently large

This is **simply incorrect** and before explaining why, let me show you some code to demonstrate how the above statement is violated

`import numpy as np`

import seaborn as sns

import matplotlib.pyplot …

Docker and Docker-Compose are great utilities that support the microservice paradigm by allowing efficient containerization. Within the python ecosystem the package manager Conda also allows some kind of containerization that is limited to python packages. Conda environments are especially handy for data scientists working in jupyter notebooks that have different (and mutually exclusive) package dependencies.

However, due to the peculiar way in which conda environments are setup, getting them working out of the box in Docker, as it were, is not so straightforward. Furthermore, adding kernelspecs for these environments to jupyter is another very useful but complicated step. …

Keras has simplified DNN based machine learning a lot and it *keeps getting better*. Here we show how to implement metric based on the confusion matrix (recall, precision and f1) and show how using them is very simple in tensorflow 2.2. You can directly run the notebook in Google Colab.

When considering a multi-class problem it is often said that accuracy is not a good metric if the classes are imbalanced. While that is certainly true, **accuracy is also a bad metric when all classes do not train equally well even if the datasets are balanced.**

In this article, I…

If you are reading this you probably know that for computational efficiency as well as reducing too much stochasticity in the gradient descent path, learning is done in batches (which for some reason are christened mini-batches). If you are processing sequences then in most cases the sequences will not be of the same length and then to make batches you would “0-pad” the sequences. This can be done at the beginning or the end of the sequences.

Now if you do learning on such mini-batches your system would eventually learn to ignore the 0-padded entries but it will waste learning…

A few weeks ago I was solving the Cartpole problem in Reinforcement Learning and found something very interesting. Actually that’s an understatement. It really amazed me that the AI algo found the solution I am going to describe before I thought of it (I am a senior machine learning engineer at Lore AI as well as a string theorist with expertise in quantum information and black holes so I don’t know if the above is a praise for the AI or that I am just getting rusty. The reader can judge for herself.)

OpenAI Gym provides several environments for people…

On January 23rd Hubei province was locked down by China when the cumulative caseload was 444 and cumulative fatality was 17 for Covid-19. In the following 28 days the cumulative caseload and fatality rose to 62031 and 2029 respectively.

The effects of lockdown were not immediately visible because of the (now infamous) incubation period of the virus. People infected weeks before the lockdown kept developing symptoms and dying for weeks after the lockdown. When the dust settled, the cumulative number of cases and fatality increased by more than 120 fold from the day of the lockdown. **…**

This post is written to show an implementation of Convolutional Neural Networks (CNNs) using numpy. I have made a similar post earlier but that was more focused on explaining what convolution in general and CNNs in particular are, whereas in this post the focus will also be more on *implementing them efficiently in numpy by using vectorization*.

It is worth pointing out that compared to packages like tensorflow, pytorch etc., using numpy does not offer any benefits in speed in general. In fact if one includes development time, it is most likely slower. The idea of the post is not…

Formless and shapeless pure consciousness masquerading as a machine learning researcher, a theoretical physicist and a quant.