Page MenuHomePhabricator

TJones (Trey Jones)
Sr. Software Engineer, Search Platform Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Jul 8 2015, 3:02 PM (194 w, 5 d)
Availability
Available
IRC Nick
Trey314159
LDAP User
Tjones
MediaWiki User
TJones (WMF) [ Global Accounts ]

I would have written a shorter comment, but I did not have the time.

I'm part of the Search Platform team and I spend my time working on search & relevance, trying to better support search in various languages, analyzing queries, and doing random mathy things. I tend to write long, detailed notes about my investigations (so as to improve the bus number of my work).

When I have to work on _GitHub,_ /‍‍/Phab,/‍‍/ and ''MediaWiki'' all on the same day, I sometimes suffer Severe Markup Incongruence Fatigue.

I � Unicode.

Recent Activity

Fri, Mar 29

TJones added a comment to T216055: Move backend for current search dashboard to pull data from Hadoop.

@mpopov are there any of these metrics we want to remove in the light of the classification that @TJones did on spreadsheet?

Fri, Mar 29, 4:12 PM · Discovery-Search (Current work), Patch-For-Review, Product-Analytics, Epic

Thu, Mar 28

TJones moved T219550: Harmonize language analysis across languages from needs triage to Language Stuff on the Discovery-Search board.
Thu, Mar 28, 7:47 PM · Discovery-Search
TJones added a project to T219550: Harmonize language analysis across languages: Discovery-Search.
Thu, Mar 28, 7:47 PM · Discovery-Search
TJones added a parent task for T180387: Enable hiragana/katakana mapping for other languages: T219550: Harmonize language analysis across languages.
Thu, Mar 28, 7:47 PM · Discovery-Search, Discovery, CirrusSearch
TJones added a parent task for T219108: Investigate applying aggressive_splitting everywhere, not just on English-language wikis: T219550: Harmonize language analysis across languages.
Thu, Mar 28, 7:47 PM · Discovery, CirrusSearch, Discovery-Search
TJones added subtasks for T219550: Harmonize language analysis across languages: T170625: Investigate disabling or modifying word_break_helper in language analyzers., T219108: Investigate applying aggressive_splitting everywhere, not just on English-language wikis, T180387: Enable hiragana/katakana mapping for other languages.
Thu, Mar 28, 7:47 PM · Discovery-Search
TJones added a parent task for T170625: Investigate disabling or modifying word_break_helper in language analyzers.: T219550: Harmonize language analysis across languages.
Thu, Mar 28, 7:47 PM · Discovery-Search
TJones created T219550: Harmonize language analysis across languages.
Thu, Mar 28, 7:46 PM · Discovery-Search
TJones renamed T219108: Investigate applying aggressive_splitting everywhere, not just on English-language wikis from Cross-wiki search tokenizer is better than local search one to Investigate applying aggressive_splitting everywhere, not just on English-language wikis.
Thu, Mar 28, 7:04 PM · Discovery, CirrusSearch, Discovery-Search
TJones added a comment to T219108: Investigate applying aggressive_splitting everywhere, not just on English-language wikis.

As I thought, this is a customization that was added to the English Language Analysis years ago before my time. It was originally limited to search on MediaWiki.org in 2013, and then expanded to all English-language wikis in 2014, but it was never expanded beyond that.

Thu, Mar 28, 6:59 PM · Discovery, CirrusSearch, Discovery-Search
TJones added a comment to T219108: Investigate applying aggressive_splitting everywhere, not just on English-language wikis.

I'll take a look today. I'm pretty sure I know what's happening, but will double check.

Thu, Mar 28, 5:31 PM · Discovery, CirrusSearch, Discovery-Search

Wed, Mar 20

TJones renamed T212891: [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 2 for CJK languages from [EPIC-ish][Milestone 3] Implement NLP Search Suggestion Method 2 for CJK languages to [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 2 for CJK languages.
Wed, Mar 20, 3:45 PM · Chinese-Sites, Discovery-Search, Epic
TJones renamed T212889: [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 1 for 10 languages from [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 1 for 10 languages to [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 1 for 10 languages.
Wed, Mar 20, 3:44 PM · Discovery-Search, Epic
TJones renamed T212888: [EPIC-ish][Milestone 0] Implement NLP Search Suggestion Method 0 for English from [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 0 for English to [EPIC-ish][Milestone 0] Implement NLP Search Suggestion Method 0 for English.
Wed, Mar 20, 3:44 PM · Patch-For-Review, Discovery-Search, Epic

Tue, Mar 19

TJones updated the task description for T174116: Another look at multi-hyphen tokens on enwiki and zhwiki.
Tue, Mar 19, 5:25 PM · Discovery-Search (Current work), Chinese-Sites, Discovery

Tue, Mar 12

TJones moved T217602: Properly handle language-specific lowercasing in language analyzers from Needs review to Done on the Discovery-Search (Current work) board.
Tue, Mar 12, 1:45 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work)
TJones moved T203117: Greek language analysis generates unexpected empty tokens from Needs review to Done on the Discovery-Search (Current work) board.
Tue, Mar 12, 1:44 PM · Patch-For-Review, Discovery-Search (Current work)

Fri, Mar 8

TJones moved T216083: Update required version of TextCat in CirrusSearch from Needs review to Done on the Discovery-Search (Current work) board.
Fri, Mar 8, 3:13 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones added a comment to T216083: Update required version of TextCat in CirrusSearch.

Thanks, @Smalyshev & @EBernhardson, for the vendor patch!

Fri, Mar 8, 3:12 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery

Thu, Mar 7

TJones claimed T174116: Another look at multi-hyphen tokens on enwiki and zhwiki.
Thu, Mar 7, 6:10 PM · Discovery-Search (Current work), Chinese-Sites, Discovery
TJones moved T174116: Another look at multi-hyphen tokens on enwiki and zhwiki from Language Stuff to Current work on the Discovery-Search board.
Thu, Mar 7, 6:10 PM · Discovery-Search (Current work), Chinese-Sites, Discovery
TJones moved T216083: Update required version of TextCat in CirrusSearch from in progress to Needs review on the Discovery-Search (Current work) board.
Thu, Mar 7, 6:08 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones claimed T216083: Update required version of TextCat in CirrusSearch.
Thu, Mar 7, 6:06 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T216083: Update required version of TextCat in CirrusSearch from Language Stuff to Current work on the Discovery-Search board.
Thu, Mar 7, 6:06 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery

Wed, Mar 6

TJones added a comment to T217602: Properly handle language-specific lowercasing in language analyzers.

After refactoring the lowercase-to-ICU-normalization upgrade code for Greek (T203117) so that the lowercase filter is kept if it is language-specific, I needed to test it for the other language-specific cases: Turkish and Irish. The impact is positive but small because it is limited to the plain field and other fields besides the text field (where the lang-specific lowercasing is already in effect because the analyzers have not been unpacked). Full details on MediaWiki.

Wed, Mar 6, 11:17 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work)
TJones added a comment to T203117: Greek language analysis generates unexpected empty tokens.

Unpacking the Greek analyzer exposes the lowercase filter, which is upgraded to icu_normalizer, losing the Greek-specific processing therein! So, we need to keep the Greek lowercasing even if we do ICU normalization. After that, everything is copacetic. Full write up on MediaWiki.

Wed, Mar 6, 11:14 PM · Patch-For-Review, Discovery-Search (Current work)
TJones updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Wed, Mar 6, 11:06 PM · Discovery-Search (Current work), Discovery
TJones renamed T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek) from Reindex Greek-language wikis to enable empty-token filtering to Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek).
Wed, Mar 6, 11:05 PM · Turkish-Sites, Discovery-Search
TJones added a subtask for T217602: Properly handle language-specific lowercasing in language analyzers: T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek).
Wed, Mar 6, 11:04 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work)
TJones added a parent task for T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek): T217602: Properly handle language-specific lowercasing in language analyzers.
Wed, Mar 6, 11:04 PM · Turkish-Sites, Discovery-Search
TJones moved T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek) from needs triage to Language Stuff on the Discovery-Search board.
Wed, Mar 6, 11:02 PM · Turkish-Sites, Discovery-Search
TJones edited projects for T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek), added: Discovery-Search; removed Discovery-Search (Current work).
Wed, Mar 6, 11:02 PM · Turkish-Sites, Discovery-Search
TJones created T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek).
Wed, Mar 6, 11:01 PM · Turkish-Sites, Discovery-Search
TJones moved T203117: Greek language analysis generates unexpected empty tokens from in progress to Needs review on the Discovery-Search (Current work) board.
Wed, Mar 6, 11:00 PM · Patch-For-Review, Discovery-Search (Current work)
TJones moved T217602: Properly handle language-specific lowercasing in language analyzers from in progress to Needs review on the Discovery-Search (Current work) board.
Wed, Mar 6, 11:00 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work)

Mon, Mar 4

TJones created T217602: Properly handle language-specific lowercasing in language analyzers.
Mon, Mar 4, 8:49 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work)

Feb 26 2019

TJones claimed T203117: Greek language analysis generates unexpected empty tokens.
Feb 26 2019, 4:49 PM · Patch-For-Review, Discovery-Search (Current work)
TJones moved T203117: Greek language analysis generates unexpected empty tokens from Language Stuff to Current work on the Discovery-Search board.
Feb 26 2019, 4:48 PM · Patch-For-Review, Discovery-Search (Current work)

Feb 21 2019

TJones moved T216740: Advanced search syntax for newbies from Backlog to Trainings / Skill sharing on the Wikimedia-Hackathon-2019 board.
Feb 21 2019, 5:00 PM · Wikimedia-Hackathon-2019
TJones created T216740: Advanced search syntax for newbies.
Feb 21 2019, 5:00 PM · Wikimedia-Hackathon-2019
TJones renamed T216738: Reindex Korean-language wikis to enable Nori analyzer from Reindex Korean-language wikis to Reindex Korean-language wikis to enable Nori analyzer.
Feb 21 2019, 4:54 PM · Discovery-Search
TJones updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Feb 21 2019, 4:54 PM · Discovery-Search (Current work), Discovery
TJones moved T216738: Reindex Korean-language wikis to enable Nori analyzer from needs triage to Language Stuff on the Discovery-Search board.
Feb 21 2019, 4:52 PM · Discovery-Search
TJones created T216738: Reindex Korean-language wikis to enable Nori analyzer.
Feb 21 2019, 4:52 PM · Discovery-Search
TJones moved T206874: Add Nori (Korean) configuration to AnalysisConfigBuilder from in progress to Done on the Discovery-Search (Current work) board.

We need to reindex, but not until after the ES6 upgrade is complete, and LTR has been disabled.

Feb 21 2019, 4:47 PM · Patch-For-Review, Discovery-Search (Current work), Discovery

Feb 20 2019

TJones added a comment to T215969: Measure mutation latency across the newly split elasticsearch clusters.

@EBernhardson, thanks for the explanation!

Feb 20 2019, 10:36 PM · Patch-For-Review, Discovery-Search (Current work)
TJones added a comment to T215969: Measure mutation latency across the newly split elasticsearch clusters.

The spikes on create_index are pretty extreme, with 194s for chi-eqiad-with-archive and 291s for omega-eqiad-with-archive. Is that just bad luck, or is something going on with the archives that makes this sometimes take much longer?

Feb 20 2019, 9:52 PM · Patch-For-Review, Discovery-Search (Current work)
TJones awarded T215969: Measure mutation latency across the newly split elasticsearch clusters a Pterodactyl token.
Feb 20 2019, 9:50 PM · Patch-For-Review, Discovery-Search (Current work)

Feb 14 2019

TJones added a comment to T63080: CirrusSearch: intitle:¢ returns no results despite there being a redirect at [[¢]].

Bleh. It looks like that symbol is turned into a text boundary by the standard analyzer which isn't nice.

Feb 14 2019, 9:56 PM · Discovery-Search, good first bug, Discovery, CirrusSearch

Feb 13 2019

TJones moved T216083: Update required version of TextCat in CirrusSearch from needs triage to Language Stuff on the Discovery-Search board.
Feb 13 2019, 10:38 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones renamed T216083: Update required version of TextCat in CirrusSearch from Update required version of TextCat in Mediawiki to Update required version of TextCat in CirrusSearch.
Feb 13 2019, 10:38 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones triaged T216083: Update required version of TextCat in CirrusSearch as Normal priority.
Feb 13 2019, 10:36 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T213936: Deploy new version of TextCat from in progress to Done on the Discovery-Search (Current work) board.
Feb 13 2019, 10:34 PM · Discovery-Search (Current work), Discovery
TJones assigned T213936: Deploy new version of TextCat to Smalyshev.

Cool! Thanks, @Smalyshev!

Feb 13 2019, 10:34 PM · Discovery-Search (Current work), Discovery
TJones added a comment to T215966: Requesting access to Production Shell for julia.glen.

Woo hoo!

Feb 13 2019, 9:18 PM · Patch-For-Review, Operations, SRE-Access-Requests
TJones added a comment to T215966: Requesting access to Production Shell for julia.glen.

Change 490412 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] admin: reset Julia SSH key

https://gerrit.wikimedia.org/r/490412

Feb 13 2019, 9:10 PM · Patch-For-Review, Operations, SRE-Access-Requests
TJones moved T206874: Add Nori (Korean) configuration to AnalysisConfigBuilder from Language Stuff to Current work on the Discovery-Search board.
Feb 13 2019, 7:00 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T138958: Detect "wrong keyboard" queries for Russian/American keyboards on EN/RU Wikipedias from Tech Debt/Misc to Language Stuff on the Discovery-Search board.

Removing this from current work and moving it to the "Language Stuff" backlog. I'm the only one who could work on this this quarter, and I'm a bit out of my depth with the integration. We'll reprioritize this for future work when we can assign a slightly larger team (≥2 people) to work on it.

Feb 13 2019, 6:59 PM · Discovery-Search, Russian-Sites, Discovery
TJones edited projects for T138958: Detect "wrong keyboard" queries for Russian/American keyboards on EN/RU Wikipedias, added: Discovery-Search; removed Discovery-Search (Current work).
Feb 13 2019, 6:58 PM · Discovery-Search, Russian-Sites, Discovery

Feb 12 2019

TJones added a comment to T215966: Requesting access to Production Shell for julia.glen.

@Julia.glen, I think this patch should give you an account, but as user juliaglen. You may need to add User juliaglen to your ssh config.

Feb 12 2019, 10:05 PM · Patch-For-Review, Operations, SRE-Access-Requests
TJones added a comment to T215916: ElasticSearch 6 migration plan checklist (search cluster).

Hmm—what about Nori (the Korean analyzer) and LTR? I believe we have to disable LTR for Korean, enable Nori, gather more data, then rebuild the LTR model. Sounds like maybe all of that should wait until after the ES upgrade, even though it means re-indexing Korean wikis at a later date.

Feb 12 2019, 4:49 PM · Discovery-Search
TJones added a comment to T215916: ElasticSearch 6 migration plan checklist (search cluster).

Looks good, and all the detail is much appreciated.

Feb 12 2019, 3:51 PM · Discovery-Search

Feb 11 2019

TJones added a comment to T212889: [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 1 for 10 languages.

Sounds good to me! If it turns out that the smallest volume languages have trouble, we can fall back to larger languages on the list.

Feb 11 2019, 8:43 PM · Discovery-Search, Epic
TJones added a comment to T212885: NLP contractor set up and access.

Should be done now—so try again, please!

Feb 11 2019, 8:11 PM · Discovery-Search (Current work)
TJones added a comment to T212885: NLP contractor set up and access.

@Julia.glen, my hue username has the same weird capitalization as Gerrit (Tjones), which I don't use elsewhere.

Feb 11 2019, 8:06 PM · Discovery-Search (Current work)
TJones added a comment to T212885: NLP contractor set up and access.

I am unable to access hue.wikimedia.org with my LDAP account. Could you take a look? Thanks.

Feb 11 2019, 7:32 PM · Discovery-Search (Current work)
TJones moved T212885: NLP contractor set up and access from in progress to Done on the Discovery-Search (Current work) board.
Feb 11 2019, 6:39 PM · Discovery-Search (Current work)
TJones updated the task description for T212885: NLP contractor set up and access.
Feb 11 2019, 6:38 PM · Discovery-Search (Current work)
TJones added a comment to T212889: [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 1 for 10 languages.

What languages should we initially investigate?

Feb 11 2019, 2:47 PM · Discovery-Search, Epic

Feb 8 2019

TJones moved T212885: NLP contractor set up and access from Waiting/Blocked to in progress on the Discovery-Search (Current work) board.
Feb 8 2019, 8:44 PM · Discovery-Search (Current work)
TJones added a comment to T215346: Enable access to OOUI elements for DWIM gadgets (and maybe others).

Thanks, @Mooeypoo! That looks like it could work. I really appreciate your explanations and your patches!

Feb 8 2019, 5:51 PM · Patch-For-Review, OOUI
TJones moved T194849: Investigate language analyzers in ElasticSearch 6 from Needs review to Done on the Discovery-Search (Current work) board.
Feb 8 2019, 3:55 PM · Discovery-Search (Current work), Chinese-Sites
TJones added a comment to T194849: Investigate language analyzers in ElasticSearch 6.

Everything looks good now. Serbian (et al.) and Esperanto are working as expected. Thanks, @dcausse!

Feb 8 2019, 3:54 PM · Discovery-Search (Current work), Chinese-Sites
TJones added a comment to T215555: access to turnilo for members of search team.

I can log in now. Thanks!

Feb 8 2019, 2:45 PM · LDAP-Access-Requests
TJones moved T212885: NLP contractor set up and access from in progress to Waiting/Blocked on the Discovery-Search (Current work) board.
Feb 8 2019, 2:50 AM · Discovery-Search (Current work)
TJones updated the task description for T212885: NLP contractor set up and access.
Feb 8 2019, 2:49 AM · Discovery-Search (Current work)
TJones moved T194849: Investigate language analyzers in ElasticSearch 6 from in progress to Needs review on the Discovery-Search (Current work) board.
Feb 8 2019, 1:25 AM · Discovery-Search (Current work), Chinese-Sites
TJones added a comment to T194849: Investigate language analyzers in ElasticSearch 6.

First draft done. Full details on MediaWiki.

Feb 8 2019, 1:24 AM · Discovery-Search (Current work), Chinese-Sites

Feb 7 2019

TJones updated the task description for T194849: Investigate language analyzers in ElasticSearch 6.
Feb 7 2019, 10:13 PM · Discovery-Search (Current work), Chinese-Sites
TJones claimed T194849: Investigate language analyzers in ElasticSearch 6.
Feb 7 2019, 10:13 PM · Discovery-Search (Current work), Chinese-Sites
TJones moved T194849: Investigate language analyzers in ElasticSearch 6 from Language Stuff to Current work on the Discovery-Search board.
Feb 7 2019, 10:12 PM · Discovery-Search (Current work), Chinese-Sites
TJones added a comment to T215555: access to turnilo for members of search team.

By "LDAP" I assume you mean the login for Wikitech, etc. OIT uses "LDAP" to refer to your Google Apps login, too.

Feb 7 2019, 9:51 PM · LDAP-Access-Requests

Feb 6 2019

TJones added a comment to T215346: Enable access to OOUI elements for DWIM gadgets (and maybe others).

Thanks for looking into this, @Mooeypoo! It's too bad that there isn't a way to make it work now, but I'm glad this provides another use case for potential enhancements to OOUI. If the extra functionality ever gets implemented, ping me if you remember!

Feb 6 2019, 4:47 PM · Patch-For-Review, OOUI

Feb 5 2019

TJones added a comment to T215346: Enable access to OOUI elements for DWIM gadgets (and maybe others).

Well, users can do var searchInputWidget = OO.ui.infuse($('#searchText')); to get a handle on the OOUI widget. Is that not sufficient?

Feb 5 2019, 9:32 PM · Patch-For-Review, OOUI
TJones added a comment to T214623: Analytics query access for search platform NLP contractor @Julia.glen.

Thanks, @Dzahn!

Feb 5 2019, 9:18 PM · Patch-For-Review, Operations, SRE-Access-Requests, Discovery-Search (Current work)
TJones created T215346: Enable access to OOUI elements for DWIM gadgets (and maybe others).
Feb 5 2019, 8:45 PM · Patch-For-Review, OOUI
TJones updated the task description for T212885: NLP contractor set up and access.
Feb 5 2019, 6:46 PM · Discovery-Search (Current work)

Jan 31 2019

TJones added a comment to T170099: Search returns random results when search query begins with a hyphen.

I regret not expressing my gratitude or commenting here at the time.

Jan 31 2019, 4:49 PM · Discovery-Search, Discovery, CirrusSearch

Jan 30 2019

TJones closed T124291: Searching for an IRC channel name (beginning with '#') redirects to the main page as Resolved.

Seems to be fixed now.

Jan 30 2019, 11:05 PM · Discovery-Search, CirrusSearch, Discovery
TJones closed T48334: Searching for # reloads the page as Resolved.

Seems to be fixed now.

Jan 30 2019, 11:05 PM · Discovery-Search, MediaWiki-Search
TJones moved T139647: Search box at top right of pages should italicize redirects from later on... to UI tickets on the Discovery-Search board.
Jan 30 2019, 11:00 PM · CirrusSearch, Need-volunteer, good first bug, Discovery-Search, Discovery
TJones moved T72899: Search box needs some normalization for Arabic Family languages from later on... to Language Stuff on the Discovery-Search board.
Jan 30 2019, 10:58 PM · Discovery-Search, CirrusSearch, Discovery, I18n, MediaWiki-Search
TJones closed T155670: Investigate Ratio of First to Second Result Scores as a Confidence Measure, a subtask of T140289: Investigate Improvements and Confidence Measures for TextCat Language Detection, as Declined.
Jan 30 2019, 10:54 PM · Discovery-Search, Epic, CirrusSearch, Discovery
TJones closed T155670: Investigate Ratio of First to Second Result Scores as a Confidence Measure as Declined.
Jan 30 2019, 10:54 PM · Discovery-Search, CirrusSearch, Discovery
TJones closed T149323: Qualitative confidence score for TextCat, a subtask of T140289: Investigate Improvements and Confidence Measures for TextCat Language Detection, as Resolved.
Jan 30 2019, 10:53 PM · Discovery-Search, Epic, CirrusSearch, Discovery
TJones closed T149323: Qualitative confidence score for TextCat as Resolved.
Jan 30 2019, 10:53 PM · CirrusSearch, Discovery, Discovery-Search
TJones moved T157771: [UI Enhancement] Show media license in search results from later on... to UI tickets on the Discovery-Search board.
Jan 30 2019, 10:50 PM · CirrusSearch, Discovery-Search, Discovery
TJones closed T140289: Investigate Improvements and Confidence Measures for TextCat Language Detection as Resolved.

Closing this because after looking into it a while back I decided that internal confidence isn't really a thing for TextCat to do, and easy things to improve the quality of TextCat results were done.

Jan 30 2019, 10:50 PM · Discovery-Search, Epic, CirrusSearch, Discovery
TJones closed T140289: Investigate Improvements and Confidence Measures for TextCat Language Detection, a subtask of T118278: EPIC: Improve Language Identification for use in Cirrus Search, as Resolved.
Jan 30 2019, 10:50 PM · Epic, Discovery
TJones closed T155822: Inconsistent search behavior when asciifolding is not activated on text/plain as Resolved.

I think everything here is fixed. ö, ä, and å are all treated as independent letters and using a instead of ä is the same as using u instead of ä, and other diacritics like á are ignored. Depending on whether you use the completion suggester, go feature, or full text search, you get additional suggestions depending on the place of the typos or the frequency of the incorrect word—all as expected.

Jan 30 2019, 10:37 PM · Discovery-Search, CirrusSearch, Discovery