Reduce DiscussionTools' usage of the parser cache
Open, Needs TriagePublic
Actions

Description

This task represents two streams of work:

The longer-term work associated with reducing DiscussionTools' usage of the parser cache and
The near-term work of modifying the parser cache expiry to reduce its usage so the Editing Team can proceed with scaling DiscussionTools features.

Plans

This section contains the in-progress plan for reducing DiscussionTools' usage of the parser cache.

Near-term plan of reducing parser cache usage

STEP	DESCRIPTION	TICKET	STATUS
Step 1a.	Pre-deploy: Draft plan for interim mitigation with Performance-Team and DBA.	(this one)	✅
Step 1b.	Pre-deploy: Write down how Performance-Team and DBA monitors outcome.	T280602	✅
Step 2.	Execute mitigation plan.	T280605	✅
Step 3.	Post-deploy: Evaluate impact on site performance (for at least 21 days).	T280606
Step 4b.	Post-deploy: Ramp up parser cache retention while keeping an eye on parser cache utilization.	T280604

Longer-term plan of decreasing #discussion-tools's usage of parser cache

Step	Description	Ticket
1	Avoid splitting parser cache on user language	T280295
2	Avoid splitting parser cache on opt-out wikis	T279864
3	Deploy to more wikis as opt-out	T275256

Details

	Project	Branch	Lines +/-	Subject
	mediawiki/extensions/TimedMediaHandler	master	+2 -4	Avoid using ParserCache as a general purpose cache

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T276497 Scale DiscussionTools to all projects
Open	None	T251207 [Epic] Scale DiscussionTools to all Wikipedias
Open	None	T262331 Make Reply Tool available as opt-out preference at all Wikipedias
Open	None	T269062 Make the the Reply Tool available as an opt-out preference at all Phase 0, 1, 2 and 3 Wikipedias
Open	matmarex	T284339 Offer the Reply Tool as opt-out setting at Wikimania wiki
Open	None	T280599 Reduce DiscussionTools' usage of the parser cache
Resolved	ppelberg	T280204 [SPIKE] Determine short-term path forward to alleviate strain on parser cache
Resolved	ppelberg	T279864 Always apply HTML transformations on wikis that have discussion tools enabled by default, reducing parser cache usage
Resolved	Esanders	T280295 Avoid splitting parser cache on user language for reply links
Resolved	Krinkle	T280602 Pre-deployment; gather site performance baseline + threshold(s)
Open	None	T280604 Post-deployment: (partly) ramp parser cache retention back up
Open	Krinkle	T285993 [SPIKE] Estimate growth in demand for Parser Cache storage
Resolved	Krinkle	T280605 Reduce parser cache retention temporarily for DiscussionTools
Open	Krinkle	T280606 Post-deployment: evaluate impact on site performance
Resolved	Krinkle	T282761 purgeParserCache.php should not take over 24 hours for its daily run

Event Timeline

ppelberg created this task.Apr 19 2021, 10:43 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 19 2021, 10:43 PM

ppelberg added subtasks: T280204: [SPIKE] Determine short-term path forward to alleviate strain on parser cache, T279864: Always apply HTML transformations on wikis that have discussion tools enabled by default, reducing parser cache usage.Apr 19 2021, 10:44 PM

ppelberg moved this task from Untriaged to Larger Strategic Things on the Editing-team board.

ppelberg added a subtask: T280295: Avoid splitting parser cache on user language for reply links.

ppelberg updated the task description. (Show Details)Apr 19 2021, 11:18 PM

ppelberg renamed this task from Decide on a plan to reduce DiscussionTools' dependence on the parser cache to Decide on a plan to reduce DiscussionTools' usage of the parser cache.Apr 20 2021, 6:07 PM

ppelberg updated the task description. (Show Details)

Esanders updated the task description. (Show Details)Apr 20 2021, 6:20 PM

dpifke added a project: Performance-Team.Apr 20 2021, 6:26 PM

dpifke moved this task from Inbox to Doing: Prio Interrupt on the Performance-Team board.Apr 20 2021, 6:33 PM

ppelberg renamed this task from Decide on a plan to reduce DiscussionTools' usage of the parser cache to Reduce DiscussionTools' usage of the parser cache.Apr 21 2021, 4:47 PM

@LSobanski As we discussed the other day, we will be merging T280295 and T279864 this week to be deployed next week. Let us know if see any issue with that.

@Esanders thanks for the heads up. Since these are expected to decrease the disk usage, I am nothing but supportive :)

ppelberg closed subtask T280295: Avoid splitting parser cache on user language for reply links as Resolved.Apr 24 2021, 12:33 AM

ppelberg closed subtask T279864: Always apply HTML transformations on wikis that have discussion tools enabled by default, reducing parser cache usage as Resolved.

ppelberg moved this task from Backlog to Triaged on the DiscussionTools board.Apr 28 2021, 5:37 PM

Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.May 3 2021, 6:30 PM

Krinkle closed subtask T280602: Pre-deployment; gather site performance baseline + threshold(s) as Resolved.May 4 2021, 5:15 PM

Krinkle updated the task description. (Show Details)May 4 2021, 5:23 PM

Krinkle updated the task description. (Show Details)May 4 2021, 5:38 PM

Tacsipacsi mentioned this in T280605: Reduce parser cache retention temporarily for DiscussionTools.May 4 2021, 9:49 PM

ppelberg closed subtask T280204: [SPIKE] Determine short-term path forward to alleviate strain on parser cache as Resolved.May 5 2021, 1:19 AM

ppelberg mentioned this in T280204: [SPIKE] Determine short-term path forward to alleviate strain on parser cache.

Kormat added a subscriber: Kormat.May 5 2021, 8:53 AM

Krinkle moved this task from Limbo to Perf recommendation on the Performance-Team (Radar) board.May 10 2021, 1:28 AM

@Krinkle + @LSobanski: below is an update about the Editing Team's plans for scaling DiscussionTools features to more projects as opt-out settings.

The below is for y'alls awareness. I don't see anything about the below changing/impacting the steps we've defined in this "epic". Although, if you see this differently, please comment as much...

Editing Team's plan for scaling DiscussionTools features

This week, we're beginning conversations with volunteers at ~25 Wikipedias inviting their feedback about our plans to offer the Reply Tool as an opt-out setting (T262331) at their project. We will not be making any commitments about specific deployment dates considering these dates depend on us resolving the parser cache utilization issue. T281533 captures the work involved with having said "conversations."
Once we start receiving consent from wikis to turn the Reply Tool on by default, we'll comment here asking y'all about the parser cache's utilization status so we can, in turn, provide updates to projects about when they can potentially expect to see the Reply Tool available to everyone at their projects.
Once we are comfortable with the parser cache's utilization, we'll proceed with offering the Reply Tool as an opt-out setting at the projects referenced in "2."

ppelberg added a parent task: T269062: Make the the Reply Tool available as an opt-out preference at all Phase 0, 1, 2 and 3 Wikipedias.May 24 2021, 3:59 PM

ppelberg moved this task from Larger Strategic Things to Next Quarter on the Editing-team board.Jun 7 2021, 10:00 PM

Krinkle closed subtask T280605: Reduce parser cache retention temporarily for DiscussionTools as Resolved.Jun 16 2021, 5:07 PM

LZaman mentioned this in T284339: Offer the Reply Tool as opt-out setting at Wikimania wiki.Jun 24 2021, 5:17 PM

LZaman mentioned this in T285162: Add DiscussionTools extension in ptwikinews.

Update: 1 July 2021

Documenting the next steps that emerged in the two meetings related to this issue today...

Next steps

@Krinkle to verify whether the optimizations made in T282761 have been effective: T280606
@Krinkle to estimate the growth in demand for Parser Cache storage: T285993 / T280604
Editing Team to estimate the growth in DiscussionTools' demand for Parser Cache storage: T285995 [i]

i. The need for this estimate emerged in a second conversation between @DannyH, @marcella, @DAbad, and myself.

Whatamidoing-WMF added a subscriber: Whatamidoing-WMF.Jul 2 2021, 9:15 PM

Edu added a subscriber: Edu.Wed, Jul 7, 9:00 PM

Krinkle updated the task description. (Show Details)Mon, Jul 12, 5:54 PM

In T280599#7191703, @ppelberg wrote:

@Krinkle to verify whether the optimizations made in T282761 have been effective

That's T280606: Post-deployment: evaluate impact on site performance.

@Krinkle to estimate the growth in demand for Parser Cache storage: T285993

This'll be part of T280604: Post-deployment: (partly) ramp parser cache retention back up , moved as subtask there.

In T280599#7206574, @Krinkle wrote:

In T280599#7191703, @ppelberg wrote:

@Krinkle to verify whether the optimizations made in T282761 have been effective

That's T280606: Post-deployment: evaluate impact on site performance.

Noted.

@Krinkle to estimate the growth in demand for Parser Cache storage: T285993

This'll be part of T280604: Post-deployment: (partly) ramp parser cache retention back up , moved as subtask there.

Noted. Excellent. Thank you.

ppelberg mentioned this in T259864: Enable wikis to customize the syntax used to indent comments posted with Reply Tool.Wed, Jul 21, 12:03 AM

ppelberg mentioned this in T287098: Ensure discussiontools markup is hidden by CSS when parser split is removed.Fri, Jul 23, 6:01 PM

Aklapper added a parent task: T284339: Offer the Reply Tool as opt-out setting at Wikimania wiki.Sat, Jul 24, 1:58 PM

I fell down the rabbit hole of ParserCache when I was investigating for T285987: Do not generate full html parser output at the end of Wikibase edit requests (unrelated to discussion tools but related to ParserCache). I have some results, I would like to share, where should I post my numbers?

matmarex added a subscriber: matmarex.Wed, Jul 28, 11:13 AM

I don't know where to put this so I put my findings here. I did a sampling of 1:256 and checked the keys. In total we have 550M PC entries.

I'm struggling to see how discussion tools can cause issues for parsercache. Its current fragmentation is next to nothing (0.28% extra rows, currently around 1.4M rows). Maybe the reduction of expiry has helped but I would like to see some numbers on that.

The actual problem is parser cache entries of commons. It's currently 29% of all parser cache entries and over 160M rows. To compare, this is more than all PC entries of enwiki, wikidata, zhwiki, frwiki, dewiki, and enwiktionary combined. I think it's related to a bot that purges all pages in commons or can be due to refreshlinks jobs or cirrussearch sanitizer job misbehaving (or combination of all). This needs a way deeper and closer look.

Looking at commons a bit closer: Out of 160M entries:

136M rows are non-canonical and only 24M rows are canonical
100M rows have wb=3 on them. I don't know what wikibase is supposed to do on commons for parsercache but this doesn't sound right at all. We don't have new termbox there.
108M are not render requests and 60M are render requests.
52M are fragmentation due to user language not being English.
39M rows are because of 'responsiveimages=0'.

I keep looking at this in more depth and keep you all posted.

Some random stuff I found:

People can fragment parsercache by choosing random languages. For example I found an entry with userlang=-4787_or_5036=(select_(case_when_(5036=4595)_then_5036_else_(select_4595_union_select_4274)_end))--_emdu
TMH seems to be using ParserCache as a general purpose cache in ApiTimedText. I found entries like commonswiki:apitimedtext:Thai_National_Anthem_-_US_Navy_Band.ogg.ru.srt:srt:srt there. This is not much but has potential to explode.
There is a general problem of bots editing pages and triggering a parsed entry while actually no one looking at them. e.g. ruwikinews a very small wiki in terms of traffic apparently now has 15M ParserCache rows (ten times bigger than all of discussion tools overhead) mostly because they recently imported a lot of news from an old place. We can rethink and maybe avoid parsing the page and storing PC if the bot flag is set.

I dig more and let you know.

100M rows have wb=3 on them. I don't know what wikibase is supposed to do on commons for parsercache but this doesn't sound right at all. We don't have new termbox there.

This is added by WikibaseRepo and will probably appear in ALL commons and wikidata (and the associated test site) pcache keys
https://github.com/wikimedia/Wikibase/blob/c1791fbca79be6f14b42a4117367ddaa1e618023/repo/includes/RepoHooks.php#L1069-L1073
Though this has consistnetly been 3 for years now, so no extra splitting should be happening here
We could probably drop this.

Not sure why only some % of commons entries seem to have this? the hook looks like it always adds it?
Could be something to do with MCR? not sure?

There is a general problem of bots editing pages and triggering a parsed entry while actually no one looking at them. e.g. ruwikinews a very small wiki in terms of traffic apparently now has 15M ParserCache rows (ten times bigger than all of discussion tools overhead) mostly because they recently imported a lot of news from an old place. We can rethink and maybe avoid parsing the page and storing PC if the bot flag is set.

This is also a problem for Wikidata, and we are going to stop this from happening T285987: Do not generate full html parser output at the end of Wikibase edit requests

Change 708520 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/TimedMediaHandler@master] Avoid using ParserCache as a general purpose cache

https://gerrit.wikimedia.org/r/708520

gerritbot added a project: Patch-For-Review.Wed, Jul 28, 1:06 PM

I'm struggling to see how discussion tools can cause issues for parsercache. Its current fragmentation is next to nothing

The fragmentation issue in DT was solved many months ago at the source already, and later with the reduced retention, so it is expected to be very low now.

commons. It's currently 29% of all parser cache entries and over 160M rows. To compare, this is more than all PC entries of enwiki, wikidata, zhwiki, frwiki, dewiki, and enwiktionary combined.

Thanks, this is very nice. We hadn't yet tried to break it down this way. Right now, though, I'd say we're not actively looking to decrease. Previous experience does tell us that even low hit rates are useful in PC given the high cost of generating them. I'm actually thinking about a possible future where PC is more like ExternalStore, in that it would not have a TTL at all, but basically append-only (apart from replacing entires with current revisions, and applying deletions). Especially as we get closer to Parsoid being used for page views, which has a relatively strong need to have an expansion ready to go at all times. As well as improving performance for page views more broadly by getting the miss-rate so low that we could potentially even serve an error if a PC entry is missing (and queue a job or something). This will require a lot more work, but it shows a rough long-term direction that I'm considering. (Nothing is decided on yet.)

Some random stuff I found:

People can fragment parsercache by choosing random languages. For example I found an entry with userlang=-4787_or_5036…

This is required for the int-lang hack. These should be given a shortened TTL, same as for old revisions and non-canonical preferences, but at least so long as we support this feature, still worth caching I imagine.

I'm hoping to, in the next 1-2 years, deprecate and remove this feature as it seems the various purposes for it have viable alternatives nowadays. It'll take a long time to migrate, but during the migration we could potentially disable caching at some point, or severely limit which wikis/namespaces it is cached for, and eventually disabled (e.g. normalised to a valid language code).

TMH seems to be using ParserCache as a general purpose cache in ApiTimedText. I found entries like commonswiki:apitimedtext:Thai_National_Anthem_-_US_Navy_Band.ogg.ru.srt:srt:srt there. This is not much but has potential to explode.

Ack. I think we may have one or two other things like this. These are basically using PC as if it is the MainStash, where we are currently short on space. Being worked on at T212129.

There is a general problem of bots editing pages and triggering a parsed entry while actually no one looking at them. e.g. ruwikinews a very small wiki in terms of traffic apparently now has 15M ParserCache rows (ten times bigger than all of discussion tools overhead) mostly because they recently imported a lot of news from an old place. We can rethink and maybe avoid parsing the page and storing PC if the bot flag is set.

As mentioned above, PC benefits a lot from the long-tail. So intentional measures not to pre-cache entries during edits would affect performance of API queries, Jobs, and eventually page views. It may be good to have this as one of several emergency levers we can pull to reduce load, but I'm not sure about it in general.

In general though, I think right now we're stable and I'd prefer not to make major changes to demand if we can avoid it until this task and its subtasks are completed.

Reduce DiscussionTools' usage of the parser cacheOpen, Needs TriagePublicActions