Skip to content

getML-IO - Project #4

@Urfoex

Description

@Urfoex
  • getml-io

    • serialize getML pipeline and data to disk as standardized data structure
  • Project

    • dataframes & metadata into parquet + custom metadata
    • raw (input) data; everything, that goes into fit & predict → Container
    • transforms (features) [check iter-batches] → pipeline.transform(container.split)
    • predictions → pipeline.predict(container.split)
    • NOT statistics -> getml_agents.context
  • Pipeline

    • data model
      • - roles
      • - joins
        • ..- on
        • ..- timestamps
        • ..- memory
        • ..- horizon
        • ..- relationship
    • feature learner
    • hyperparams
    • predictions
    • root hyperparams
    • scores
    • features
    • columns
    • tables
  • Project Plan

    • Write issues / tasks
    • Takes project name, pipeline id / container id
    • Have configurable / settable output directory
    • Connect to getML, set project and container
    • [ ]
    • Dependency injection
    • Unit tests
    • Integration tests
    • [ ]

Image


  • Problems
    • → "raw (input) data; everything, that goes into fit & predict"
      • Scenario A) Input comes from Container
        • ✅ DataFrames inside Container are serialized
      • Scenario B) Inputs are just DataFrames
        • ❓ Save all DataFrames → getml.data.list_data_frames()
        • ❓ unknown, which of those DataFrames are used for what, or if at all
      • Scenario C) Inputs are mixed, some "raw" DataFrames, some from Container
        • ✅ the ones from Container are already saved
        • ⭕ the "raw" ones are missing
        • ❓ use getml.data.list_data_frames() to select and serialize DataFrames not in Container
        • ❓ unknown, which of those DataFrames are used for what, or if at all
      • ❗ → Currently, we always expect a Container. And that all used DataFrames are living inside the container.
        • ✅ → Scenario A
        • ❌ →Scenario B
        • ❌ → Scenario C

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions