Page Blocks Classification Data Set
The 5473 examples comes from 54 distinct documents. Each observation concerns one block. All attributes are numeric. Data are in a format readable by C4.5.
| - | - |
|---|---|
| Data Set Characteristics | Multivariate |
| Attribute Characteristics | Integer, Real |
| Number of Attributes | 10 |
| Number of Instances | 5473 |
| Associated Tasks | Classification |
-
Original Owner:
Donato Malerba Dipartimento di Informatica University of Bari
-
Donor:
Donato Malerba
Measure the accuracy of the test subset (30% of instances)
| Model | Accuracy | Training Time |
|---|---|---|
| Decision Tree Scikit Learn | 0.9622 | 00:00.035 |
| Decision Tree From Scratch | 0.9608 | 05:58.906 |
- Add support for missing (or unseen) attributes
- Prune the tree to prevent overfitting
- Add support for regression