Skip to content

Technology categories are missing in 2024-12-01 crawl #31

@max-ostapenko

Description

@max-ostapenko

Checked like this:

SELECT
  date,
  client,
  category,
  COUNT(DISTINCT root_page)
FROM crawl.pages
LEFT JOIN pages.technologies AS tech
LEFT JOIN tech.categories AS category
WHERE
  date >= '2024-11-01'
  AND rank <= 10000
  AND tech.technology = 'WordPress'
GROUP BY 1,2,3
ORDER BY 1,2,3;
date	        client	category f0_
2024-11-01	desktop	Blogs	 545
2024-11-01	desktop	CMS	 545
2024-11-01	mobile	Blogs	 832
2024-11-01	mobile	CMS	 832
2024-12-01	desktop		 537
2024-12-01	desktop	Blogs	 47
2024-12-01	desktop	CMS	 47
2024-12-01	mobile		 815
2024-12-01	mobile	Blogs	 50
2024-12-01	mobile	CMS	 50
2025-01-01	desktop	Blogs	 534
2025-01-01	desktop	CMS	 534
2025-01-01	mobile	Blogs	 809
2025-01-01	mobile	CMS	 809

@pmeenan do you have an idea?
Any way to restore?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions