Skip to content

Persistence tests fail for LogisticRegression et al. with multiclass classification #233

@BenjaminBossan

Description

@BenjaminBossan

Right now, when testing persistence of classifiers, we create a binary classification task, and the classifiers all pass. However, when switching to a multiclass classification task, LogisticRegression and related estimators fail (e.g. CalibratedClassifierCV which uses lr under the hood by default).

To reproduce, replace the following lines:

X, y = make_classification(
n_samples=N_SAMPLES, n_features=N_FEATURES, random_state=0
)

by these lines:

        X, y = make_classification(
            n_samples=N_SAMPLES, n_features=N_FEATURES, random_state=0, n_classes=3, n_redundant=1, n_informative=N_FEATURES - 1,
        )

(note that n_redundant and n_informative are irrelevant here, they just need to be changed for make_classification to work)

The error is that the contiguity of the coef_ attributes is not the same. Strangely enough, it is the original estimator that seems to be "wrong":

>>> estimator.coef_.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

>>> loaded.coef_.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

I haven't investigated further, but I suspect that lr uses a different algorithm under the hood when dealing with binary classification, which is why it only occurs in the multiclass setting. EDIT: See below, that's not the reason.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpersistenceSecure persistence feature

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions