Skip to content

Observation-graph loading should be transactional #120

@Robsteranium

Description

@Robsteranium

#115 introduces a graph-index (for #17) which is dropped/ re-inserted before the observation-pipeline runs. If the ETL process is interrupted while loading observations then it leaves the graph and observation indexes in an inconsistent state. If you try re-running the ETL process then nothing happens as it finds the graph-index to be up to date (even though the observation index is out of date and could include partially-loaded graphs).

To recover from this sort of interruption we need to manually delete the observation and graph indices before restarting, e.g.

curl -X DELETE http://localhost:9200/observation
curl -X DELETE http://localhost:9200/graph
sudo systemctl start etl

It'd be nice if it were a bit more transactional or at least didn't leave the indices in an inconsistent state after interruption - e.g. only update the graph index one doc at a time after all the observations are loaded for that graph. That way even if the observation-pipeline was interrupted mid-graph, it'd redo that graph on the next run (and retain any completed ones).

Metadata

Metadata

Assignees

Labels

etlRelated to the etl/pipelines

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions