Mini-batching time cost

Firstly, this is a great library for clustering unit-normed feature spaces fast and coherently!

I had a query about mini-batching: for some reason I expected each mini-batch to take roughly the same amount of time (or even less time on subsequent batches) when calling partial_fit. However, each mini-batch seems to take longer than the last, almost in linear fashion. This is on actual structured data with hierarchies and clusters to be found, not on randomly generated matrices.

Pseudo-behaviour
First 10,000 data-points: 10 seconds to run
Second 10,000 data-points: 20 seconds to run
Third 10,000 data-points: 30 seconds to run
Total time taken for 30000 data-points: 60 seconds.

Is this expected behaviour? As a side-note, while I found implementation details in the accompanying SCC paper and could follow them, I cannot find any details regarding the mini-batching.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mini-batching time cost #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mini-batching time cost #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions