Skip to content

Add an option to check unicity before copy #2

@thislg

Description

@thislg

A lot of the time we have to add custom code to validate loaded data before copying it. A common use case is to ignore duplicate lines but still continue the import.

It could be set like this:

resources:
    my_resource_name:
        load:
            extra_fields:
                valid:
                    type: boolean
                    options:
                        default: true
        post_load:
            validate:
                -
                    columns:
                        - code
                    constraint_type: unique
                    label: 'Unique code'
                    on_invalid: ignore # abort|ignore
        copy:
            strategy_options:
                copy_condition: valid IS TRUE

A subscriber on ImportEvents::POST_LOAD would then execute an UPDATE on temporary table to set the "valid" field to false on failing rows. In case of validation error, when on_invalid is set to "ignore", it would add logs "Unique code validation constraint failed. Skipping duplicate my_resource_name (code: 12345) at lines 4, 5, 6" and import would continue without copying invalid lines. If on_invalid is set to "abort", it would stop the import without copying the data.

Other validation constraints could be added, like format validation (regex), etc.
A simpler option would be to skip the validation config, instead adding an option to run an arbitrary SQL query on post_load to set the "valid" flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions