Skip to content

Is smoothing really needed for prob calc in bayes_classifier? #12

@jiabinf

Description

@jiabinf

Thanks for creating NaturalNode!

I am using your Bayes Classifier in my project, when looking into the implementation, I found it adds smoothing when calculating the probabilities.

This smoothing on unknown words in test set will cause probability to be skewed towards whichever class has the least amount of features. For instance:

say smoothing === 1, class A has 2 features, class B has 3, (0 + 1) / 2 is bigger than (0 + 1) / 3, A also wins.

I understand it may be good to have smoothing in training set, but is it really necessary for test set? Why not just discarding the tokens which are not in classFeatures[label]?

    while(i--) {
        if(observation[i]) {
            var count = this.classFeatures[label][i] || this.smoothing;
            // numbers are tiny, add logs rather than take product
            prob += Math.log(count / this.classTotals[label]);
        }
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions