Skip to content

Detailed outlier flags#611

Merged
vergauwenthomas merged 154 commits intodevfrom
detailed_outlier_flags
Mar 23, 2026
Merged

Detailed outlier flags#611
vergauwenthomas merged 154 commits intodevfrom
detailed_outlier_flags

Conversation

@vergauwenthomas
Copy link
Copy Markdown
Owner

No description provided.

vergauwenthomas and others added 30 commits September 29, 2025 14:39
* fix bug and add tests baselines (#571)

* fix bug and add tests baselines

* fix location of baseline

* black edits

* handle comma as decimal symbol when importing data (#576)

* Add convert_to_numeric_series function and integrate into dataset and sensordata modules

* black edits

* fix review

* Error make plot model name (#579)

* added modeldata_name variable to make_plot() of stations class.

* add modeldata_kwargs to make_plot in order to select a specific modeldata series

* add plot test

* black edits

* new parameter modeltype adds functionality to plot different type of modeldata compared to obstype, if nothing is specified modeltype=obstype

* trigger

* resolve empty dicts

* tests added

* tests adapted (no fake argument anymore)

* remove check for dataset if modeltype exists

* arg renaming + argsorder change

* black edits

---------

Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be>

* Fix buddy check bug (#583)

* add reference of the iteration to the detailed msg

* fix duplicates by joining messages

* add a test to trigger this edgecase

* black edits

* Update tests/test_qc.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* grammer errors in comments

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* develop alpha version bump

* first attempt max_gap_duration_to_fill implementation

* changed > to >=

* rechanged >= to >

* print add

* print

* gap is filled or not filled , not partly

* small error fix

* errorfix

* implementation of intuitive gap overview on all levels called 'singular_gaps() ( datset, station, sensor) + deal with edgecase of partially filled gaps by including a new option in def fillstatus _partially_successful_label = _partially_successful_label = "partially successful gapfill"
currently this will cause the gap to be refilled totally (also were the gap filling was succesfull) when a new gapfilling is called. we can extend this later to only filling the 'gap in the gap'

* version set to prerelease

* Refactor gap handling: remove singular_gaps property and add gap_status_overview_df method for improved gap status reporting

* make the gapfill status flagger more human readable

* Refactor gap handling: replace singular_gaps property with gap_status_overview_df method for improved gap status reporting

* Refactor gap handling: remove singular_gaps property and implement gap_status_overview_df method for enhanced gap reporting; update max_gap_duration_to_fill default to 12 hours

* Update max_gap_duration_to_fill default to 12 hours and improve gap size validation checks

* minor code style refactoring

* black edits

* fix identation bug

* code refactoring of gaps overview in sensordata

* gap_overview_dfs in seperate module

* add the new method to the API docs

* update and add tests

* Update the example

* rename function to gap_overview_df

* fix bug

* black edits

* serialize timedelta and timestamp attr in xr conv

* fix unicode error when exporting to netcdf

* fix tests

* by default overwrite the partially GF

* update tests

* black edits

* black edits

---------

Co-authored-by: Thomas Vergauwen <82087298+vergauwenthomas@users.noreply.github.com>
Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* develop version bump

* fix the filtering to modeldata dataframe

* use helper func for extracting modeltimeseries

* test different plotting methods of modeldata on different levels

* add log_entry

* black edits
* functionality of pd_plot

* move the sensordata pd plot to seperate module

* fix warnings

* implementation for modeldata pd_plot

* add the pd_plot methods to the api doc

* use id for sensordata label in pd_plot

* add tests

* prerelease version bump

* black edits

* code review fixes

* fix docstring identation

* black edits
…ent unphysical data (#598)

* Initial plan

* Add min_value and max_value parameters to core gap-fill functions

Co-authored-by: ADRIE-A3 <109740197+ADRIE-A3@users.noreply.github.com>

* Add tests for min_value and max_value gap-fill parameters

Co-authored-by: ADRIE-A3 <109740197+ADRIE-A3@users.noreply.github.com>

* Add documentation and examples for gap-fill value constraints

Co-authored-by: ADRIE-A3 <109740197+ADRIE-A3@users.noreply.github.com>

* Add implementation summary for gap-fill value constraints

Co-authored-by: ADRIE-A3 <109740197+ADRIE-A3@users.noreply.github.com>

* remove unwanted files

* Remove min_value and max_value parameters from Gap and SensorData classes

* pass minvalue maxvalue args for GF methods

* Add min_value and max_value parameters to Dataset gap-filling methods

* remove unwanted files

* minor version bump

* add test

* black edits

* Gee credits local pipeline (#599)

* black edits

* bugfix that data._credentials does not exist in ee v1.6.12. Now with a test catch condition on ee.initialize

* add gee credential testing and checking for the credential file in the (local) testing pipeline

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ADRIE-A3 <109740197+ADRIE-A3@users.noreply.github.com>
Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be>
Co-authored-by: Thomas Vergauwen <82087298+vergauwenthomas@users.noreply.github.com>
* add comment

* open logs in default text browser

* close figures after comparison to fix the failing tests

* black edit

* Update pyproject.toml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Add catch_white_records function and update repetitions_check to exclude white records from outlier detection

* implement white records for repetitions check on all levels.

* whitelist for persistence

* Update white_records parameter type to Union[pd.Index, None] in Dataset and Station classes

* white records for gross value check

* add white_records to step and window variation check

* implement whitelist argument for buddy checks

* pass white records for repetitions check

* update to use the sensorwhiteset argument

* update to using the sensorwhiteset argument

* implement the sensorwhiteset argument

* refactor: replace white_records with whiteset in QC methods and related functions

* cleanup the module

* add tests

* black formatting

* add WhiteSet to api docs

* add important box

* fix for timezone bug

* add topic and refer in the qc example to the topic on WhiteSets

* comment protector

* black edits

* remove default value for sensorwhiteset in persistence_check, repetitions_check, and step_check functions

* refactor SensorWhiteSet initialization to use None as default for white_timestamps

* remove default value for sensorwhiteset in window_variation_check function

* Update tests/test_qc.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor save_whitelist_records function to improve documentation and remove unused code

* add logging decorator to get_info method in WhiteSet class

* remove unused import of SensorWhiteSet from whitelist module

* add a string formatting for in the xr attributes

* black edits

* refer to the topic section

* sync docstrings

* fix futurwarning

* dev version bump

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Add catch_white_records function and update repetitions_check to exclude white records from outlier detection

* implement white records for repetitions check on all levels.

* whitelist for persistence

* Update white_records parameter type to Union[pd.Index, None] in Dataset and Station classes

* white records for gross value check

* add white_records to step and window variation check

* implement whitelist argument for buddy checks

* pass white records for repetitions check

* update to use the sensorwhiteset argument

* update to using the sensorwhiteset argument

* implement the sensorwhiteset argument

* refactor: replace white_records with whiteset in QC methods and related functions

* cleanup the module

* add tests

* black formatting

* add WhiteSet to api docs

* add important box

* fix for timezone bug

* add topic and refer in the qc example to the topic on WhiteSets

* comment protector

* black edits

* remove default value for sensorwhiteset in persistence_check, repetitions_check, and step_check functions

* refactor SensorWhiteSet initialization to use None as default for white_timestamps

* remove default value for sensorwhiteset in window_variation_check function

* Update tests/test_qc.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor save_whitelist_records function to improve documentation and remove unused code

* add logging decorator to get_info method in WhiteSet class

* remove unused import of SensorWhiteSet from whitelist module

* add a string formatting for in the xr attributes

* black edits

* refer to the topic section

* sync docstrings

* fix futurwarning

* dev version bump

* adding a 'details' column to the outliersdf

* implement the buddy with safetynets (generalisation)

* black edits

* check the safety net argument

* exclude 'details' form comparison

* fix tests

* update the examples

* update solutions

* fix: correct formatting of log messages and documentation comments

* minor version bump

* Update src/metobs_toolkit/qc_collection/buddy_check.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: deprecate buddy_check_with_LCZ_safety_net method in Dataset class

* rename safety_nets to safetynets

* black edits

* drop explicit attributs in whitelist

* fix test issue

* fix link to whitelist topic page

* drop duplicated attributes in docstring

* fix bool and list attributes of QC

* fix tests

* black edits

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* rename all occurences of target_obstype to obstype

* revert function name

* rename trgobstype to obstype

* rename target_obstype to obstype

* rename obstype name to name

* fix tests

* rename arguments to more easy to read version of gee_*_dataset

* implement a formatter for output files

* filepath arugment consistency

* add initialize_gee argument where missing and consitent use of update arguments

* pass modelname and modelvariable arugments to dataset GF methods

* minor version bump

* test examples

* simpler naming for gee_manager

* filepath handling for logging to file

* black edits

* store and check version compatibility

* fix circular import with lazy import

* fix tests

* fix test

* black edits

* running tests
@vergauwenthomas vergauwenthomas requested a review from Copilot March 19, 2026 13:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 66 out of 143 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (3)

tests/pkled_solutions/test_qc_solutions/testwhiterecords/test_white_multi_idx_records_buddy_check_with_safety_nets/datatype.json:1

  • This solution directory no longer contains a datatype.json, but SolutionFixer2.get_solution() unconditionally reads datatype.json to determine how to load the stored parquet files. Removing this file will cause the corresponding test to fail when loading the expected solution. Restore datatype.json (or update the solution loader to handle missing metadata).
    tests/pkled_solutions/test_qc_solutions/testwhiterecords/test_white_dt_only_records_buddy_check_with_safety_nets/datatype.json:1
  • This solution directory no longer contains a datatype.json, but SolutionFixer2.get_solution() unconditionally reads datatype.json to determine how to load the stored parquet files. Removing this file will cause the corresponding test to fail when loading the expected solution. Restore datatype.json (or update the solution loader to handle missing metadata).
    tests/pkled_solutions/test_qc_solutions/testwhiterecords/test_white_multi_idx_records_buddy_check/datatype.json:1
  • This solution directory no longer contains a datatype.json, but SolutionFixer2.get_solution() unconditionally reads datatype.json to determine how to load the stored parquet files. Removing this file will cause the corresponding test to fail when loading the expected solution. Restore datatype.json (or update the solution loader to handle missing metadata).

You can also share your feedback on Copilot code review. Take the survey.

@vergauwenthomas vergauwenthomas merged commit 5406252 into dev Mar 23, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QC details per check

4 participants