As of https://gerrit.wikimedia.org/r/506032 we now have four ways of updating category counts:
1. If a non-locking master read says the stale count is zero, we do a full recount.
This is used after an edit to a page, for the categories that were in the previous revision, but not in the new one. (From LinksUpdate, via WikiPage::updateCategoryCounts).
2. If a non-locking master read says the stale count is <= 200, we do a full recount.
This is used for a category after its category description page is deleted.
3. If no row exists yet, or it appears corrupt, we do a full recount.
This can happen through any of the following scenarios:
- Reading a category page.
- Viewing "Page information" (action=info) for a category page.
- Parsing wikitext containing {{pagesincategory}}.
- Viewing search results on Special:Search for a match that is a category page.
- UploadWizard/ApiQueryAllCampaigns for querying the file count from a campaign's category.
This is triggered whenever one of these methods is called on a Category object: getPageCount(), getSubcatCount(), getFileCount(), or getTitle(). This then uses the path via Category->initialize( Category::LAZY_INIT_ROW ).
4. Relative increments/decrements (including creation/deletion of the row)
From WikiPage::updateCategoryCounts after edits for categories associated with that page.
I'd like to re-explore whether we still need use case three. It seems to me like, at least in theory, it wouldn't be needed. If we can validate that relatively easily, I would propose we remove it in favour of a warning being logged with stack trace so that we can find out why and whether that is preventible.
Alternatively, if it cannot be prevented within reason (e.g. too costly or impossible to get right given scale requirements), then I suggest we move it to a job and have use case 1, 2 and 3 be reduced to the queuing of a job that takes care of things.
- Document and/or reference from the code how case 2 is possible.
- If rare/unlikely:
- Consider removing in favour of a manual recount admins can trigger via purge of the category page.
- If common and not easily preventible:
- Move to job queue as a "validate recount", emit log warning if result turned out different.
- If rare/unlikely:
- Determine whether case 3 is still probable.
- If so:
- Move refresh logic (recount, auto-create, auto-delete) to a job and queue that for case 1, 2, and 3.
- If not:
- Replace recount with a log warning from case 1 and 3.
- If so: