Posts by Tags

Conditional probability maps of extreme temperatures

1 minute read

Published: January 01, 2016

So I created my first R Shiny app and I obviously think its pretty cool. Props to the Shiny team for making the learning process of how to make these apps so enjoyable.

Normalize if you regularize

1 minute read

Published: August 01, 2017

There is a world of preprocessing methods from which a scientist has to choose. In each of the preprocessing tasks of cleaning, missing data handling, standardizing, encoding, binning, binarizing and etcetera, there are further choices to make. Some of these choices are data dependent, some not. Some are model dependent, some not. I want to talk about one such choice: z-score normalizing your training set before training.

Conditional probability maps of extreme temperatures

1 minute read

Published: January 01, 2016

So I created my first R Shiny app and I obviously think its pretty cool. Props to the Shiny team for making the learning process of how to make these apps so enjoyable.

On normality tests

1 minute read

Published: January 01, 2017

I recently came across a very interesting problem. I wanted to verify if some of my data could be assumed to be normally distributed since I wanted to use statistical methods that assumed the same. One would assume that you would just run a statistical test for normality such as the Shapiro-Wilk or Anderson-Darling tests on the data and if the null hypothesis is not rejected, you would be okay to go ahead with the normality assumption.

On normality tests

1 minute read

Published: January 01, 2017

I recently came across a very interesting problem. I wanted to verify if some of my data could be assumed to be normally distributed since I wanted to use statistical methods that assumed the same. One would assume that you would just run a statistical test for normality such as the Shapiro-Wilk or Anderson-Darling tests on the data and if the null hypothesis is not rejected, you would be okay to go ahead with the normality assumption.

On balancing classes for an imbalanced class problem

1 minute read

Published: September 01, 2017

The common suggestion that balancing classes in an imbalanced class problem boosts accuracy, either through oversampling the minority class or undersampling the majority class, is an over-generalization. In many cases, this is simply not true unless the minority class oversampling process includes data augmentation. Intuitively, this is because the amount of information in your minority class is fixed even if you oversample it (you’re just creating duplicates, which do not change the decision boundary). I realize this discussion is restricted to discriminative modeling.

Industry vs. academia after a PhD

1 minute read

Published: November 01, 2020

I’ve been encouraged by colleagues to share my take on choosing a position industry or academia after graduating with a PhD. So here goes.

Non-technical reading

1 minute read

Published: January 29, 2023

As I grow in maturity and have started taking on more responsibility in my roles, I realize that I’m starting to digest a lot of non-technical content that informs my views and beliefs. To complement my technical reading post, this is a list of non-technical reading that has shaped me. It isn’t necessarily in a particular order, and while I’ll add to it in reverse chronological order, they are all timeless reads as far as I’m concerned. The topics revolve around philosophy, management, organization, leadership, productivity, customer development, marketing, etc.

Technical reading

7 minute read

Published: November 01, 2020

This is a list of select research works that I’ve read and like from most to least recently read, much like a communication of my stream of consciousness. I would like to think that I could have collaborated to write some of these and it is my dream that one day I might produce works like these.

Normalize if you regularize

1 minute read

Published: August 01, 2017

There is a world of preprocessing methods from which a scientist has to choose. In each of the preprocessing tasks of cleaning, missing data handling, standardizing, encoding, binning, binarizing and etcetera, there are further choices to make. Some of these choices are data dependent, some not. Some are model dependent, some not. I want to talk about one such choice: z-score normalizing your training set before training.

Tiny

Posts by Tags

Extreme event detection

Conditional probability maps of extreme temperatures

L1/L2 regularization

Normalize if you regularize

R Shiny

Conditional probability maps of extreme temperatures

Statistical tests

On normality tests

Test for normality

On normality tests

imbalanced classes

On balancing classes for an imbalanced class problem

industry vs. academia

Industry vs. academia after a PhD

reading list

Non-technical reading

Technical reading

z-score normalization

Normalize if you regularize