Posts by Tags

Extreme event detection

L1/L2 regularization

Normalize if you regularize

1 minute read

Published:

There is a world of preprocessing methods from which a scientist has to choose. In each of the preprocessing tasks of cleaning, missing data handling, standardizing, encoding, binning, binarizing and etcetera, there are further choices to make. Some of these choices are data dependent, some not. Some are model dependent, some not. I want to talk about one such choice: z-score normalizing your training set before training.

R Shiny

Statistical tests

On normality tests

1 minute read

Published:

I recently came across a very interesting problem. I wanted to verify if some of my data could be assumed to be normally distributed since I wanted to use statistical methods that assumed the same. One would assume that you would just run a statistical test for normality such as the Shapiro-Wilk or Anderson-Darling tests on the data and if the null hypothesis is not rejected, you would be okay to go ahead with the normality assumption.

Test for normality

On normality tests

1 minute read

Published:

I recently came across a very interesting problem. I wanted to verify if some of my data could be assumed to be normally distributed since I wanted to use statistical methods that assumed the same. One would assume that you would just run a statistical test for normality such as the Shapiro-Wilk or Anderson-Darling tests on the data and if the null hypothesis is not rejected, you would be okay to go ahead with the normality assumption.

imbalanced classes

On balancing classes for an imbalanced class problem

1 minute read

Published:

The common suggestion that balancing classes in an imbalanced class problem boosts accuracy, either through oversampling the minority class or undersampling the majority class, is an over-generalization. In many cases, this is simply not true unless the minority class oversampling process includes data augmentation. Intuitively, this is because the amount of information in your minority class is fixed even if you oversample it (you’re just creating duplicates, which do not change the decision boundary). I realize this discussion is restricted to discriminative modeling.

industry vs. academia

Industry vs. academia after a PhD

1 minute read

Published:

I’ve been encouraged by colleagues to share my take on choosing a position industry or academia after graduating with a PhD. So here goes.

reading list

Non-technical reading

1 minute read

Published:

As I grow in maturity and have started taking on more responsibility in my roles, I realize that I’m starting to digest a lot of non-technical content that informs my views and beliefs. To complement my technical reading post, this is a list of non-technical reading that has shaped me. It isn’t necessarily in a particular order, and while I’ll add to it in reverse chronological order, they are all timeless reads as far as I’m concerned. The topics revolve around philosophy, management, organization, leadership, productivity, customer development, marketing, etc.

Technical reading

5 minute read

Published:

This is a list of select research works that I’ve read and like from most to least recently read, much like a communication of my stream of consciousness. I would like to think that I could have collaborated to write some of these and it is my dream that one day I might produce works like these.

z-score normalization

Normalize if you regularize

1 minute read

Published:

There is a world of preprocessing methods from which a scientist has to choose. In each of the preprocessing tasks of cleaning, missing data handling, standardizing, encoding, binning, binarizing and etcetera, there are further choices to make. Some of these choices are data dependent, some not. Some are model dependent, some not. I want to talk about one such choice: z-score normalizing your training set before training.