Is smoothing really needed for prob calc in bayes_classifier?

Thanks for creating NaturalNode!

I am using your Bayes Classifier in my project, when looking into the implementation, I found it adds smoothing when calculating the probabilities.

This smoothing on unknown words in **test set** will cause probability to be skewed towards whichever class has the least amount of features. For instance:

say smoothing === 1, class A has 2 features, class B has 3, (0 + 1) / 2 is bigger than (0 + 1) / 3, A also wins.

I understand it may be good to have smoothing in **training set**, but is it really necessary for **test set**? Why not just discarding the tokens which are not in classFeatures[label]?

```
    while(i--) {
        if(observation[i]) {
            var count = this.classFeatures[label][i] || this.smoothing;
            // numbers are tiny, add logs rather than take product
            prob += Math.log(count / this.classTotals[label]);
        }
    }
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is smoothing really needed for prob calc in bayes_classifier? #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is smoothing really needed for prob calc in bayes_classifier? #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions