This is a kedro plugin that writes information from kedro hooks into a database. The code was created in the Waldo research project.
Activate the virtual environment in your project with venv\Scripts\activate and:
pip install -e $PATH_TO_PLUGIN_PROJECT
For Example, pip install -e ~/waldo-kedro-plugin/
pip -e --editable let us edit the code for the package without having to re-install the package every time. It will technically not install the package but will create a .egg-link in the deployment directory back to the project source code directory, meaning instead of copying to the site-packages it adds a symbolic link .egg-link.
Install it from PyPI:
pip install waldo-kedro-plugin
All the hook specifications provided in kedro.framework.hooks are Available. The names are self explanatory.
after_catalog_createdbefore_node_runafter_node_runon_node_errorbefore_pipeline_runafter_pipeline_runon_pipeline_errorbefore_dataset_loadedafter_dataset_loadedbefore_dataset_savedafter_dataset_saved
After you install the plugin, kedro automatically detects it. You do not need to do anything because hook implementations are automatically registered to the project context when the plugin is installed. However, the data schema must match otherwise the writing to database will fail.
This plugin makes use of following three tables:
catalogs:
| column | hash 🔑 | content |
|---|---|---|
| Type | varchar(8) | json |
events:
| column | id 🔑 | run_id | event_type | target_id | target_name | timestamp |
|---|---|---|---|---|---|---|
| Type | bigint | char(36) | text | varchar(8) | text | timestamp |
pipelines:
| column | hash 🔑 | name | content |
|---|---|---|---|
| Type | varchar(8) | text | json |
samples:
| column | id 🔑 | --- |
|---|---|---|
| Type | bigint | --- |
contexts:
| column | id 🔑 | run_id | algorithm | parameters |
|---|---|---|---|---|
| Type | int | char(36) | text | text |
outlier_score:
| column | context_id 🔑 | sample_id 🔑 | score | prediction |
|---|---|---|---|---|
| Type | int | bigint | float | boolean |
_Note: samples table has only one hard constraint,i.e, it must contain a column named id, which can serve as a foreign key to the generic table outlier_score.
All hooks write into events table, whereas, only after_catalog_created writes into catalogs table.
On the other hand, only before_pipeline_run writes into pipelines table.
All the required packages will be installed when you install the plugin, however, there are certain things you need to consider.
-
We are using postgresql. Therefore, you need to have a postgres server running. Please create an empty database and provide the credentials to access it in a
credentials.ymlfile in a kedro project in this format.postgres: con: postgresql://$USER_NAME:$PASSWORD@$SERVER_NAME:$PORT/$DB_NAME -
Database can have any name, as long as it is correctly provided in
credentials.ymlfile. -
All the required tables will be created inside the database automatically, once you will run the project for the first time.
The parameters for anomaly detection algorithms are to be provided in parameters.yml file in a kedro project. The parameters should be nested with model parameter and follow the following format:
model_param1:
.....
IsolationForest:
IsolationForest.n_estimators: 100
...