Add backfill job for missing ML classification and translations#155
Add backfill job for missing ML classification and translations#155
Conversation
63357dd to
ecf9f57
Compare
|
|
||
| class Command(BaseCommand): | ||
| help = "Backfill missing ML classification and translations from BigQuery" | ||
|
|
There was a problem hiding this comment.
I might need to add locking to this command as well once #140 is merged
jgraham
left a comment
There was a problem hiding this comment.
This basically LGTM but with a few questions / minor style issues.
| ).exclude(comments="") | ||
|
|
||
| total_reports = reports_to_update.count() | ||
| LOG.info("Found %d reports needing ML backfill", total_reports) |
There was a problem hiding this comment.
I'd put this log message after the return, since otherwise they're duplicating information.
|
|
||
| client: bigquery.Client = bigquery.Client(**params) | ||
|
|
||
| for batch_num, report_batch in enumerate(batches, 1): |
There was a problem hiding this comment.
Is there a reason to do this in batches of 500? I'd kind of expect BigQuery to be happy with, well, big queries, but maybe this is better for some reason?
There was a problem hiding this comment.
It's batched mostly because of ReportEntry.objects.bulk_update below, though I think it accepts batch_size as a parameter, so maybe I'll use that and increase the outer batch size
|
|
||
| if uuid in bq_data: | ||
| data = bq_data[uuid] | ||
| ml_updated = False |
There was a problem hiding this comment.
Just a single updated flag would be enough here I think?
ecf9f57 to
3fed8be
Compare
No description provided.