Naive bayes #68

sfsf9797 · 2021-05-15T13:43:03Z

What I did

Implement Gaussian naive Bayes Class and Basic documentation.

How I did it

Refer to the formula of Gaussian naive Bayes from the notes

How to verify it

compare the performance with the gaussianNB from sci-kit learn.

ddbourgin · 2021-05-25T00:46:43Z

Amazing! At first glance this looks great, @sfsf9797 ! I'm pretty slammed with work right now, but am going to have a look at this this weekend.

…names + descriptions, expand documentation

ddbourgin · 2021-05-30T17:40:17Z

numpy_ml/naive_bayes/naive_bayes.py

+ prob = -self.n_features / 2 * np.log(2 * np.pi) - 0.5 * np.sum(
+ np.log(sigma )
+ )
+ prob -= 0.5 * np.sum(np.power(X -mean, 2) / (sigma), 1)
+
+ joint_log_likelihood = prior + prob
+ return joint_log_likelihood


I think there are a few errors here.

The log Gaussian likelihood calc in prob isn't quite right (see later commit for details)

In the joint_log_likelihood = prior + prob line, you're adding a log-transformed vector (prob) to raw probabilities (prior), which doesn't make sense. I think what you want is np.log(prior) + prob. See my later commit for details.

This isn't a joint likelihood, right? You're computing the the joint class likelihood in prob, but when you add in the prior, this will be proportional to the log class posterior.

numpy_ml/naive_bayes/naive_bayes.py

ddbourgin · 2021-05-30T17:46:37Z

numpy_ml/naive_bayes/naive_bayes.py

+
+ def prob(self,X,mean,sigma,prior):
+ """
+ compute the joint log likelihood of data based on gaussian distribution


Nomenclature: I'm not sure it's right to call this the joint log likelihood? That is, this function computes the unnormalized quantity P(y = c | X, mean_c, sigma_c), which I'd think is the class posterior.

ddbourgin · 2021-05-30T17:47:48Z

numpy_ml/linear_models/naive_bayes.py

+ prior = P["prior"][class_idx]
+ sigsq = P["sigma"][class_idx]
+
+ # log likelihood = log X | N(mu, sigsq)


@sfsf9797 this is what I believe the proper log likelihood + log class posterior calc should be. Let me know if you agree!

ddbourgin · 2021-05-30T17:55:46Z

Hey @sfsf9797 - Thanks for the PR! I just had a more thorough look and committed a few changes. Brief summary:

I moved this under the linear_models module rather than keeping it as a single hanging model.
I think there might have been a few bugs in the original log likelihood calc. I've committed what I believe is the correct version, but please go through it and make sure you agree.
I expanded the unit test you included -- comparing model accuracies is a good start (thanks!), but I think it actually masked some problems with the implementation. In particular, testing on multiple random cases revealed that there were mismatches in accuracy between sklearn and the current implementation, and comparing the actual class probabilities (rather than just the predictions) revealed a bug in the log posterior calculation.
I expanded the documentation to provide a better overview of the model.

Please feel free to make adjustments or ask questions. Once we both agree on the implementation and are happy with the model performance, I'm happy to merge.

sfsf9797 · 2021-05-31T03:13:39Z

Hi, thank you so much for all the feedbacks, I will go through all these and get back to you this weekend.

sfsf9797 · 2021-06-10T04:19:37Z

sorry, I am kind of busy these few weeks, but I will get you back latest by another 2 weeks times.

sfsf9797

yeah, true, naive Bayes is definitely under linear models when likelihood factors p(x∣c) are from exponential families.

sfsf9797 · 2021-06-20T07:16:53Z

Hi @ddbourgin thanks for correcting my implementation. I have learnt a lot from you. I am pretty satisfied with the model after a few round of checking. Lastly, Thank you a lot for the all the comments!

ddbourgin · 2021-06-23T03:33:36Z

Awesome, merged! Thanks for the PR @sfsf9797 :)

sfsf9797 added 6 commits May 5, 2021 22:38

readme first draft

2596c2c

added gaussianNB

b92e8f9

added __init__ file

d3a1c51

unit test for gaussianNB

a4eff01

update readme

bae8921

update readme

f34aefc

ddbourgin self-assigned this May 25, 2021

ddbourgin added 4 commits May 29, 2021 17:30

Move naive bayes classifer under linear models

eaa1267

Add more stringent tests for GaussianNBClassifier

8589275

Overhaul GaussianNBClassifier: fix log posterior calc, fix attribute …

0f6a156

…names + descriptions, expand documentation

Update README for GaussianNBClassifier

126602d

ddbourgin reviewed May 30, 2021

View reviewed changes

numpy_ml/naive_bayes/naive_bayes.py Outdated Show resolved Hide resolved

ddbourgin reviewed May 30, 2021

View reviewed changes

sfsf9797 commented Jun 20, 2021

View reviewed changes

ddbourgin merged commit 1e10697 into ddbourgin:master Jun 23, 2021

ddbourgin mentioned this pull request Sep 23, 2021

Naive Bayes #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Naive bayes #68

Naive bayes #68

sfsf9797 commented May 15, 2021 •

edited

Loading

ddbourgin commented May 25, 2021

ddbourgin May 30, 2021

ddbourgin May 30, 2021 •

edited

Loading

ddbourgin May 30, 2021

ddbourgin commented May 30, 2021

sfsf9797 commented May 31, 2021

sfsf9797 commented Jun 10, 2021

sfsf9797 left a comment

sfsf9797 commented Jun 20, 2021

ddbourgin commented Jun 23, 2021

Naive bayes #68

Naive bayes #68

Conversation

sfsf9797 commented May 15, 2021 • edited Loading

ddbourgin commented May 25, 2021

ddbourgin May 30, 2021

Choose a reason for hiding this comment

ddbourgin May 30, 2021 • edited Loading

Choose a reason for hiding this comment

ddbourgin May 30, 2021

Choose a reason for hiding this comment

ddbourgin commented May 30, 2021

sfsf9797 commented May 31, 2021

sfsf9797 commented Jun 10, 2021

sfsf9797 left a comment

Choose a reason for hiding this comment

sfsf9797 commented Jun 20, 2021

ddbourgin commented Jun 23, 2021

sfsf9797 commented May 15, 2021 •

edited

Loading

ddbourgin May 30, 2021 •

edited

Loading