Page MenuHomePhabricator

Refactor Category::refreshCounts logic to a job and simplify
Open, MediumPublic

Description

As of https://gerrit.wikimedia.org/r/506032 we now have four ways of updating category counts:

1. If a non-locking master read says the stale count is zero, we do a full recount.

This is used after an edit to a page, for the categories that were in the previous revision, but not in the new one. (From LinksUpdate, via WikiPage::updateCategoryCounts).

2. If a non-locking master read says the stale count is <= 200, we do a full recount.

This is used for a category after its category description page is deleted.

3. If no row exists yet, or it appears corrupt, we do a full recount.

This can happen through any of the following scenarios:

  • Reading a category page.
  • Viewing "Page information" (action=info) for a category page.
  • Parsing wikitext containing {{pagesincategory}}.
  • Viewing search results on Special:Search for a match that is a category page.
  • UploadWizard/ApiQueryAllCampaigns for querying the file count from a campaign's category.

This is triggered whenever one of these methods is called on a Category object: getPageCount(), getSubcatCount(), getFileCount(), or getTitle(). This then uses the path via Category->initialize( Category::LAZY_INIT_ROW ).

4. Relative increments/decrements (including creation/deletion of the row)

From WikiPage::updateCategoryCounts after edits for categories associated with that page.


I'd like to re-explore whether we still need use case three. It seems to me like, at least in theory, it wouldn't be needed. If we can validate that relatively easily, I would propose we remove it in favour of a warning being logged with stack trace so that we can find out why and whether that is preventible.

Alternatively, if it cannot be prevented within reason (e.g. too costly or impossible to get right given scale requirements), then I suggest we move it to a job and have use case 1, 2 and 3 be reduced to the queuing of a job that takes care of things.

  • Document and/or reference from the code how case 2 is possible.
    • If rare/unlikely:
      • Consider removing in favour of a manual recount admins can trigger via purge of the category page.
    • If common and not easily preventible:
      • Move to job queue as a "validate recount", emit log warning if result turned out different.
  • Determine whether case 3 is still probable.
    • If so:
      • Move refresh logic (recount, auto-create, auto-delete) to a job and queue that for case 1, 2, and 3.
    • If not:
      • Replace recount with a log warning from case 1 and 3.

Event Timeline

The problem persists. Any plans to have this resolved? Any chance to trigger the recount manually?

I moved T224321 in the hierarchy of tasks so that the current task would be its parent. @Superyetkin that is essentially what you are asking for. @Krinkle I think it would be nice to start over in a clean state; this can help with identifying the root causes of mismatches, whether before or after #3 is refactored.

Problems with counting starts from T224209

aaron triaged this task as Low priority.Jun 6 2019, 10:45 AM
Anomie raised the priority of this task from Low to Medium.Mar 24 2020, 9:00 PM

Same bug? On hu:Kategória:Tudományos egyértelműsítő lapok, there are 189 pages and one subcategory listed, however, its parent category hu:Kategória:Egyértelműsítő lapok lists it as having 190 pages and zero subcategories. The subcategory hu:Kategória:Biológiai egyértelműsítő lapok was first accidentally created in the main namespace as hu:Biológiai egyértelműsítő lapok, then moved to the category namespace, then deleted and recreated in the category space.