- [ ] getml-io - [ ] serialize getML pipeline and data to disk as standardized data structure - [ ] Project - [x] dataframes & metadata into parquet + custom metadata - [x] raw (input) data; everything, that goes into fit & predict → `Container` - [x] transforms (features) [check iter-batches] → `pipeline.transform(container.split)` - [x] predictions → `pipeline.predict(container.split)` - [ ] NOT statistics -> getml_agents.context - [ ] Pipeline - [x] data model - [x] - roles - [x] - joins - [x] ..- on - [x] ..- timestamps - [x] ..- memory - [x] ..- horizon - [x] ..- relationship - [x] feature learner - [ ] hyperparams - [ ] predictions - [ ] root hyperparams - [ ] scores - [ ] features - [ ] columns - [ ] tables - [ ] Project Plan - [ ] Write issues / tasks - [x] Takes project name, pipeline id / container id - [x] Have configurable / settable output directory - [x] Connect to getML, set project and container - [ ] - [ ] Dependency injection - [ ] Unit tests - [ ] Integration tests - [ ]  <hr> - [ ] Problems - [ ] → "raw (input) data; everything, that goes into fit & predict" - Scenario A) Input comes from Container - ✅ DataFrames inside Container are serialized - Scenario B) Inputs are just DataFrames - ❓ Save all DataFrames → `getml.data.list_data_frames()` - ❓ unknown, which of those DataFrames are used for what, or if at all - Scenario C) Inputs are mixed, some "raw" DataFrames, some from Container - ✅ the ones from Container are already saved - ⭕ the "raw" ones are missing - ❓ use `getml.data.list_data_frames()` to select and serialize DataFrames not in Container - ❓ unknown, which of those DataFrames are used for what, or if at all - ❗ → Currently, we always expect a Container. And that all used DataFrames are living inside the container. - ✅ → Scenario A - ❌ →Scenario B - ❌ → Scenario C
getml-io
Project
Containerpipeline.transform(container.split)pipeline.predict(container.split)Pipeline
Project Plan
getml.data.list_data_frames()getml.data.list_data_frames()to select and serialize DataFrames not in Container