Strict merge of CrUX and crawl#85
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR enforces a strict merge between CrUX and crawl data by aligning the calculation logic for origins across reports. The key changes include:
- Replacing custom origin computations with direct references to the tech_report_adoption table.
- Updating the SQL query structure for both versions and technologies reports to use unified filtering and aggregation.
- Incorporating CrUX data in the categories report via a merged pages CTE, ensuring only relevant pages present in both datasets are considered.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| definitions/output/reports/tech_report_versions.js | Replaces manual origin calculation with tech_report_adoption. |
| definitions/output/reports/tech_report_technologies.js | Updates the origins query and UNION ALL logic for technologies. |
| definitions/output/reports/tech_report_categories.js | Introduces CrUX data merge and adjusts join conditions accordingly. |
Comments suppressed due to low confidence (3)
definitions/output/reports/tech_report_versions.js:11
- Ensure that the 'adoption' value from tech_report_adoption fully replicates both version-specific and total origins as done by the original union query.
adoption AS origins
definitions/output/reports/tech_report_technologies.js:51
- Verify that filtering with 'WHERE technology = 'ALL'' correctly aggregates total origins as intended, matching previous logic.
technology,
definitions/output/reports/tech_report_categories.js:60
- Confirm that joining on merged_pages.technologies yields the expected behavior, especially if 'technologies' is stored as an array or complex structure.
INNER JOIN merged_pages.technologies AS tech
|
@tunetheweb so here is the version without any extensions of CrUX or crawl data, using only direct device matches. |
| INNER JOIN merged_pages.technologies AS tech | ||
| INNER JOIN tech.categories AS category |
There was a problem hiding this comment.
Should these be outer joins for pages with no technologies?
There was a problem hiding this comment.
We aggregate by known categories here, so pages without any technologies will be excluded here.
If not here, then in the next steps:
INNER JOIN technology_stats ON category_stats.category IN UNNEST(technology_stats.categories)- or
INNER JOIN category_descriptions USING (category)
For the pages without any technologies (and thus no categories) we have a part after UNION ALL (based on merged_pages).
Co-authored-by: Barry Pollard <barrypollard@google.com>
Following #42
Matched calculation of origins in
category,technologiesandversionsdictionaries to the logic inadoptionmetric.Including only phone/mobile and desktop origins that are present in both crawl and CrUX datasets.