-
GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (Research Paper Walkthrough)
#bert #wsd #wordnet
This research uses BERT for Word Sense Disambiguation use case in NLP by modeling the entire problem as sentence classification task using the Gloss knowledge. They show state-of-art results on benchmark datasets.
⏩ Abstract: Word Sense Disambiguation (WSD) aims to find the exact sense of an ambiguous word in a particular context. Traditional supervised methods rarely take into consideration the lexical resources like WordNet, which are widely utilized in knowledge-based methods. Recent studies have shown the effectiveness of incorporating gloss (sense definition) into neural networks for WSD. However, compared with traditional word expert supervised methods, they have not achieved much improvement. In this paper, we focus on how to better leverage gloss knowledge in a...
published: 07 Apr 2021
-
West Virginia (disambiguation)
West Virginia is a state in the United States of America.
West Virginia or Western Virginia may also refer to:
Western Virginia, a region in the state of Virginia
Kentucky County, Virginia, also referred to as "Western Virginia"
West Virginia, Minnesota
West Virginia University, the state's largest public university
West Virginia Mountaineers, the WVU athletic program
United States District Court for the Western District of Virginia
Western Virginia Campaign
Western Virginia Land Trust
Source: https://en.wikipedia.org/wiki/West_Virginia_(disambiguation)
Created with WikipediaReaderReborn (c) WikipediaReader
published: 05 Aug 2021
-
Champlain College (disambiguation)
Champlain College is a private, coeducational college located in Burlington, Vermont, United States.
Champlain College may also refer to:
Champlain Regional College, an English-language, publicly funded pre‑university college in Quebec, Canada
Champlain College Saint-Lambert, Champlain Regional College campus serving the Greater Montreal Area
Champlain College Lennoxville, Champlain Regional College campus serving the Eastern Townships
Champlain College St. Lawrence, Champlain Regional College campus serving Quebec City
Champlain College at Trent University in Peterborough, Ontario, Canada
Source: https://en.wikipedia.org/wiki/Champlain_College_(disambiguation)
Created with WikipediaReaderReborn (c) WikipediaReader
published: 26 May 2021
-
University of Agricultural Sciences (disambiguation) | Wikipedia audio article
This is an audio version of the Wikipedia Article:
https://en.wikipedia.org/wiki/University_of_Agricultural_Sciences
Listening is a more natural way of learning, when compared to reading. Written language only began at around 3200 BC, but spoken language has existed long ago.
Learning by listening is a great way to:
- increases imagination and understanding
- improves your listening skills
- improves your own spoken accent
- learn while on the move
- reduce eye strain
Now learn the vast amount of general knowledge available on Wikipedia through audio (audio article). You could even learn subconsciously by playing the audio while you are sleeping! If you are planning to listen a lot, you could try using a bone conduction headphone, or a standard speaker instead of an ...
published: 23 Jan 2019
-
Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Wi...
Title: Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Window-based Attention - (3 minutes introduction)
Authors: Junjie Li (Ping An Technology, China), Zhiyu Zhang (National Tsing Hua University, Taiwan), Minchuan Chen (Ping An Technology, China), Jun Ma (Ping An Technology, China), Shaojun Wang (Ping An Technology, China), Jing Xiao (Ping An Technology, China)
Category: Speech Synthesis: Linguistic processing, paradigms and other topics
Abstract: In this paper, we propose a novel system based on word-level features and window-based attention for polyphone disambiguation, which is a fundamental task for Grapheme-to-phoneme (G2P) conversion of Mandarin Chinese. The framework aims to combine a pre-trained language model with explicit word-lev...
published: 03 Feb 2022
-
Stavropol (disambiguation)
Stavropol is a city in southwestern Russia.
Stavropol may also refer to:
Stavropol Krai, the federal subject of Russia
Stavropol Soviet Republic, short-lived division of the RSFSR in 1918, merged into the North Caucasian Soviet Republic
Stavropol Urban Okrug, a municipal formation which the city of krai significance of Stavropol in Stavropol Krai, Russia is incorporated as
Source: https://en.wikipedia.org/wiki/Stavropol_(disambiguation)
Created with WikipediaReaderReborn (c) WikipediaReader
published: 02 Apr 2021
-
[PLDI24] Static Analysis for Checking the Disambiguation Robustness of Regular Expressions
Static Analysis for Checking the Disambiguation Robustness of Regular Expressions (Video, PLDI 2024)
Konstantinos Mamouras, Alexis Le Glaunec, Wu Angela Li, and Agnishom Chattopadhyay
(Rice University, USA; Rice University, USA; Rice University, USA; Rice University, USA)
Abstract: Regular expressions are commonly used for finding and extracting matches from sequence data. Due to the inherent ambiguity of regular expressions, a disambiguation policy must be considered for the match extraction problem, in order to uniquely determine the desired match out of the possibly many matches. The most common disambiguation policies are the POSIX policy and the greedy (PCRE) policy. The POSIX policy chooses the longest match out of the leftmost ones. The greedy policy chooses a leftmost match and fu...
published: 23 Jul 2024
-
Bootleg: Guidable Self-Supervision for Named Entity Disambiguation -- Chris Re (Stanford University)
September 18, 2020
Abstract
Mapping textual mentions to entities in a knowledge graph is a key step in using knowledge graphs, called Named Entity Disambiguation (NED). A key challenge in NED is generalizing to rarely seen (tail) entities. Traditionally NED uses hand-tuned patterns from a knowledge base to capture rare, but reliable, signals. Hand-built features make it challenging to deploy and maintain NED–especially in multiple locales. While at Apple in 2018, we built a self-supervised system for NED that was deployed in a handful of locales and that improved performance of downstream models significantly. However, due to the fog of production, it was unclear what aspects of these models were most valuable. Motivated by this experience, we built Bootleg, a clean-slate, open-source, s...
published: 12 Jan 2023
-
KDD 2023 - Web-Scale Academic Name Disambiguation: the WholsWho BenchMark, Leaderboard, and Toolkit
Bo Chen, Tsinghua University
Name disambiguation?a fundamental problem in online academic systems?is now facing greater challenges with the increasing growth of research papers. To promote the research community, we present WhoIsWho owning, a large-scale benchmark with over 1,000,000 papers built using an interactive annotation process, a regular leaderboard with comprehensive tasks, and an easy-to-use toolkit encapsulating the entire pipeline. To sum up, WhoIsWho is an ongoing, community-driven, open-source project. We intend to update the leaderboard as well as offer new datasets and methods over time. We also encourage contributions at oagwhoiswho@gmail.com.
published: 12 Jul 2023
-
Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Wi...
Title: Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Window-based Attention - (longer introduction)
Authors: Junjie Li (Ping An Technology, China), Zhiyu Zhang (National Tsing Hua University, Taiwan), Minchuan Chen (Ping An Technology, China), Jun Ma (Ping An Technology, China), Shaojun Wang (Ping An Technology, China), Jing Xiao (Ping An Technology, China)
Category: Speech Synthesis: Linguistic processing, paradigms and other topics
Abstract: In this paper, we propose a novel system based on word-level features and window-based attention for polyphone disambiguation, which is a fundamental task for Grapheme-to-phoneme (G2P) conversion of Mandarin Chinese. The framework aims to combine a pre-trained language model with explicit word-level ...
published: 03 Feb 2022
11:18
GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (Research Paper Walkthrough)
#bert #wsd #wordnet
This research uses BERT for Word Sense Disambiguation use case in NLP by modeling the entire problem as sentence classification task using t...
#bert #wsd #wordnet
This research uses BERT for Word Sense Disambiguation use case in NLP by modeling the entire problem as sentence classification task using the Gloss knowledge. They show state-of-art results on benchmark datasets.
⏩ Abstract: Word Sense Disambiguation (WSD) aims to find the exact sense of an ambiguous word in a particular context. Traditional supervised methods rarely take into consideration the lexical resources like WordNet, which are widely utilized in knowledge-based methods. Recent studies have shown the effectiveness of incorporating gloss (sense definition) into neural networks for WSD. However, compared with traditional word expert supervised methods, they have not achieved much improvement. In this paper, we focus on how to better leverage gloss knowledge in a supervised neural WSD system. We construct context-gloss pairs and propose three BERT-based models for WSD. We fine-tune the pre-trained BERT model on SemCor3.0 training corpus and the experimental results on several English all-words WSD benchmark datasets show that our approach outperforms the state-of-the-art systems.
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA?sub_confirmation=1
⏩ OUTLINE:
0:00 - Abstract
01:46 - Task Definition
02:11 - Data Collection approach
02:30 - WordNet Overview
03:35 - Sentence construction method table overview
05:27 - BERT(Token-CLS)
06:41 - GlossBERT
07:52 - Context-Gloss Pair with Weak Supervision
08:55 - GlossBERT(Token-CLS)
09:20 - GlossBERT(Sent-CLS)
09:44 - GlossBERT(Sent-CLS-WS)
10:09 - Results
⏩ Paper Title: GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge
⏩ Paper: https://arxiv.org/abs/1908.07245v4
⏩ Code: https://github.com/HSLCY/GlossBERT
⏩ Author: Luyao Huang, Chi Sun, Xipeng Qiu, Xuanjing Huang
⏩ Organisation: Fudan University
⏩ IMPORTANT LINKS
Full Playlist on BERT usecases in NLP: https://www.youtube.com/watch?v=kC5kP1dPAzc&list;=PLsAqq9lZFOtV8jYq3JlkqPQUN5QxcWq0f
Full Playlist on Text Data Augmentation Techniques: https://www.youtube.com/watch?v=9O9scQb4sNo&list;=PLsAqq9lZFOtUg63g_95OuV-R2GhV1UiIZ
Full Playlist on Text Summarization: https://www.youtube.com/watch?v=kC5kP1dPAzc&list;=PLsAqq9lZFOtV8jYq3JlkqPQUN5QxcWq0f
Full Playlist on Machine Learning with Graphs: https://www.youtube.com/watch?v=-uJL_ANy1jc&list;=PLsAqq9lZFOtU7tT6mDXX_fhv1R1-jGiYf
Full Playlist on Evaluating NLG Systems: https://www.youtube.com/watch?v=-CIlz-5um7U&list;=PLsAqq9lZFOtXlzg5RNyV00ueE89PwnCbu
*********************************************
If you want to support me financially which totally optional and voluntary :) ❤️
You can consider buying me chai ( because i don't drink coffee :) ) at https://www.buymeacoffee.com/TechvizCoffee
*********************************************
⏩ Youtube - https://www.youtube.com/c/TechVizTheDataScienceGuy
⏩ Blog - https://prakhartechviz.blogspot.com
⏩ LinkedIn - https://linkedin.com/in/prakhar21
⏩ Medium - https://medium.com/@prakhar.mishra
⏩ GitHub - https://github.com/prakhar21
⏩ Twitter - https://twitter.com/rattller
*********************************************
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA?sub_confirmation=1
Tools I use for making videos :)
⏩ iPad - https://tinyurl.com/y39p6pwc
⏩ Apple Pencil - https://tinyurl.com/y5rk8txn
⏩ GoodNotes - https://tinyurl.com/y627cfsa
#techviz #datascienceguy #ai #researchpaper #naturallanguageprocessing #bart
https://wn.com/Glossbert_Bert_For_Word_Sense_Disambiguation_With_Gloss_Knowledge_(Research_Paper_Walkthrough)
#bert #wsd #wordnet
This research uses BERT for Word Sense Disambiguation use case in NLP by modeling the entire problem as sentence classification task using the Gloss knowledge. They show state-of-art results on benchmark datasets.
⏩ Abstract: Word Sense Disambiguation (WSD) aims to find the exact sense of an ambiguous word in a particular context. Traditional supervised methods rarely take into consideration the lexical resources like WordNet, which are widely utilized in knowledge-based methods. Recent studies have shown the effectiveness of incorporating gloss (sense definition) into neural networks for WSD. However, compared with traditional word expert supervised methods, they have not achieved much improvement. In this paper, we focus on how to better leverage gloss knowledge in a supervised neural WSD system. We construct context-gloss pairs and propose three BERT-based models for WSD. We fine-tune the pre-trained BERT model on SemCor3.0 training corpus and the experimental results on several English all-words WSD benchmark datasets show that our approach outperforms the state-of-the-art systems.
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA?sub_confirmation=1
⏩ OUTLINE:
0:00 - Abstract
01:46 - Task Definition
02:11 - Data Collection approach
02:30 - WordNet Overview
03:35 - Sentence construction method table overview
05:27 - BERT(Token-CLS)
06:41 - GlossBERT
07:52 - Context-Gloss Pair with Weak Supervision
08:55 - GlossBERT(Token-CLS)
09:20 - GlossBERT(Sent-CLS)
09:44 - GlossBERT(Sent-CLS-WS)
10:09 - Results
⏩ Paper Title: GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge
⏩ Paper: https://arxiv.org/abs/1908.07245v4
⏩ Code: https://github.com/HSLCY/GlossBERT
⏩ Author: Luyao Huang, Chi Sun, Xipeng Qiu, Xuanjing Huang
⏩ Organisation: Fudan University
⏩ IMPORTANT LINKS
Full Playlist on BERT usecases in NLP: https://www.youtube.com/watch?v=kC5kP1dPAzc&list;=PLsAqq9lZFOtV8jYq3JlkqPQUN5QxcWq0f
Full Playlist on Text Data Augmentation Techniques: https://www.youtube.com/watch?v=9O9scQb4sNo&list;=PLsAqq9lZFOtUg63g_95OuV-R2GhV1UiIZ
Full Playlist on Text Summarization: https://www.youtube.com/watch?v=kC5kP1dPAzc&list;=PLsAqq9lZFOtV8jYq3JlkqPQUN5QxcWq0f
Full Playlist on Machine Learning with Graphs: https://www.youtube.com/watch?v=-uJL_ANy1jc&list;=PLsAqq9lZFOtU7tT6mDXX_fhv1R1-jGiYf
Full Playlist on Evaluating NLG Systems: https://www.youtube.com/watch?v=-CIlz-5um7U&list;=PLsAqq9lZFOtXlzg5RNyV00ueE89PwnCbu
*********************************************
If you want to support me financially which totally optional and voluntary :) ❤️
You can consider buying me chai ( because i don't drink coffee :) ) at https://www.buymeacoffee.com/TechvizCoffee
*********************************************
⏩ Youtube - https://www.youtube.com/c/TechVizTheDataScienceGuy
⏩ Blog - https://prakhartechviz.blogspot.com
⏩ LinkedIn - https://linkedin.com/in/prakhar21
⏩ Medium - https://medium.com/@prakhar.mishra
⏩ GitHub - https://github.com/prakhar21
⏩ Twitter - https://twitter.com/rattller
*********************************************
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA?sub_confirmation=1
Tools I use for making videos :)
⏩ iPad - https://tinyurl.com/y39p6pwc
⏩ Apple Pencil - https://tinyurl.com/y5rk8txn
⏩ GoodNotes - https://tinyurl.com/y627cfsa
#techviz #datascienceguy #ai #researchpaper #naturallanguageprocessing #bart
- published: 07 Apr 2021
- views: 2118
1:35
West Virginia (disambiguation)
West Virginia is a state in the United States of America.
West Virginia or Western Virginia may also refer to:
Western Virginia, a region in the state of Virgi...
West Virginia is a state in the United States of America.
West Virginia or Western Virginia may also refer to:
Western Virginia, a region in the state of Virginia
Kentucky County, Virginia, also referred to as "Western Virginia"
West Virginia, Minnesota
West Virginia University, the state's largest public university
West Virginia Mountaineers, the WVU athletic program
United States District Court for the Western District of Virginia
Western Virginia Campaign
Western Virginia Land Trust
Source: https://en.wikipedia.org/wiki/West_Virginia_(disambiguation)
Created with WikipediaReaderReborn (c) WikipediaReader
https://wn.com/West_Virginia_(Disambiguation)
West Virginia is a state in the United States of America.
West Virginia or Western Virginia may also refer to:
Western Virginia, a region in the state of Virginia
Kentucky County, Virginia, also referred to as "Western Virginia"
West Virginia, Minnesota
West Virginia University, the state's largest public university
West Virginia Mountaineers, the WVU athletic program
United States District Court for the Western District of Virginia
Western Virginia Campaign
Western Virginia Land Trust
Source: https://en.wikipedia.org/wiki/West_Virginia_(disambiguation)
Created with WikipediaReaderReborn (c) WikipediaReader
- published: 05 Aug 2021
- views: 1
0:49
Champlain College (disambiguation)
Champlain College is a private, coeducational college located in Burlington, Vermont, United States.
Champlain College may also refer to:
Champlain Regional Co...
Champlain College is a private, coeducational college located in Burlington, Vermont, United States.
Champlain College may also refer to:
Champlain Regional College, an English-language, publicly funded pre‑university college in Quebec, Canada
Champlain College Saint-Lambert, Champlain Regional College campus serving the Greater Montreal Area
Champlain College Lennoxville, Champlain Regional College campus serving the Eastern Townships
Champlain College St. Lawrence, Champlain Regional College campus serving Quebec City
Champlain College at Trent University in Peterborough, Ontario, Canada
Source: https://en.wikipedia.org/wiki/Champlain_College_(disambiguation)
Created with WikipediaReaderReborn (c) WikipediaReader
https://wn.com/Champlain_College_(Disambiguation)
Champlain College is a private, coeducational college located in Burlington, Vermont, United States.
Champlain College may also refer to:
Champlain Regional College, an English-language, publicly funded pre‑university college in Quebec, Canada
Champlain College Saint-Lambert, Champlain Regional College campus serving the Greater Montreal Area
Champlain College Lennoxville, Champlain Regional College campus serving the Eastern Townships
Champlain College St. Lawrence, Champlain Regional College campus serving Quebec City
Champlain College at Trent University in Peterborough, Ontario, Canada
Source: https://en.wikipedia.org/wiki/Champlain_College_(disambiguation)
Created with WikipediaReaderReborn (c) WikipediaReader
- published: 26 May 2021
- views: 7
0:27
University of Agricultural Sciences (disambiguation) | Wikipedia audio article
This is an audio version of the Wikipedia Article:
https://en.wikipedia.org/wiki/University_of_Agricultural_Sciences
Listening is a more natural way...
This is an audio version of the Wikipedia Article:
https://en.wikipedia.org/wiki/University_of_Agricultural_Sciences
Listening is a more natural way of learning, when compared to reading. Written language only began at around 3200 BC, but spoken language has existed long ago.
Learning by listening is a great way to:
- increases imagination and understanding
- improves your listening skills
- improves your own spoken accent
- learn while on the move
- reduce eye strain
Now learn the vast amount of general knowledge available on Wikipedia through audio (audio article). You could even learn subconsciously by playing the audio while you are sleeping! If you are planning to listen a lot, you could try using a bone conduction headphone, or a standard speaker instead of an earphone.
Listen on Google Assistant through Extra Audio:
https://assistant.google.com/services/invoke/uid/0000001a130b3f91
Other Wikipedia audio articles at:
https://www.youtube.com/results?search_query=wikipedia+tts
Upload your own Wikipedia articles through:
https://github.com/nodef/wikipedia-tts
Speaking Rate: 0.7325791154993099
Voice name: en-GB-Wavenet-C
"I cannot teach anybody anything, I can only make them think."
- Socrates
SUMMARY
=======
University of Agricultural Sciences could refer to one of two state agriculture universities in India started on the land grant university pattern.
University of Agricultural Sciences, Dharwad
University of Agricultural Sciences, Bangalore
https://wn.com/University_Of_Agricultural_Sciences_(Disambiguation)_|_Wikipedia_Audio_Article
This is an audio version of the Wikipedia Article:
https://en.wikipedia.org/wiki/University_of_Agricultural_Sciences
Listening is a more natural way of learning, when compared to reading. Written language only began at around 3200 BC, but spoken language has existed long ago.
Learning by listening is a great way to:
- increases imagination and understanding
- improves your listening skills
- improves your own spoken accent
- learn while on the move
- reduce eye strain
Now learn the vast amount of general knowledge available on Wikipedia through audio (audio article). You could even learn subconsciously by playing the audio while you are sleeping! If you are planning to listen a lot, you could try using a bone conduction headphone, or a standard speaker instead of an earphone.
Listen on Google Assistant through Extra Audio:
https://assistant.google.com/services/invoke/uid/0000001a130b3f91
Other Wikipedia audio articles at:
https://www.youtube.com/results?search_query=wikipedia+tts
Upload your own Wikipedia articles through:
https://github.com/nodef/wikipedia-tts
Speaking Rate: 0.7325791154993099
Voice name: en-GB-Wavenet-C
"I cannot teach anybody anything, I can only make them think."
- Socrates
SUMMARY
=======
University of Agricultural Sciences could refer to one of two state agriculture universities in India started on the land grant university pattern.
University of Agricultural Sciences, Dharwad
University of Agricultural Sciences, Bangalore
- published: 23 Jan 2019
- views: 2
3:03
Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Wi...
Title: Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Window-based Attention - (3 minutes introduction)
Authors:...
Title: Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Window-based Attention - (3 minutes introduction)
Authors: Junjie Li (Ping An Technology, China), Zhiyu Zhang (National Tsing Hua University, Taiwan), Minchuan Chen (Ping An Technology, China), Jun Ma (Ping An Technology, China), Shaojun Wang (Ping An Technology, China), Jing Xiao (Ping An Technology, China)
Category: Speech Synthesis: Linguistic processing, paradigms and other topics
Abstract: In this paper, we propose a novel system based on word-level features and window-based attention for polyphone disambiguation, which is a fundamental task for Grapheme-to-phoneme (G2P) conversion of Mandarin Chinese. The framework aims to combine a pre-trained language model with explicit word-level information in order to get meaningful context extraction. Particularly, we employ a pre-trained bidirectional encoder from Transformers (BERT) model to extract character-level features, and an external Chinese word segmentation (CWS) tool is used to obtain the word units. We adopt a mixed pooling mechanism to convert character-level features into word-level features based on the segmentation results. A window-based attention module is utilized to incorporate contextual word-level features for the polyphonic characters. Experimental results show that our method achieves an accuracy of 99.06% on an open benchmark dataset for Mandarin Chinese polyphone disambiguation, which outperforms the baseline systems.
For more details and PDF version of the paper visit: https://www.isca-speech.org/archive/interspeech_2021/li21n_interspeech.html
d04s07t04trim
https://wn.com/Improving_Polyphone_Disambiguation_For_Mandarin_Chinese_By_Combining_Mix_Pooling_Strategy_And_Wi...
Title: Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Window-based Attention - (3 minutes introduction)
Authors: Junjie Li (Ping An Technology, China), Zhiyu Zhang (National Tsing Hua University, Taiwan), Minchuan Chen (Ping An Technology, China), Jun Ma (Ping An Technology, China), Shaojun Wang (Ping An Technology, China), Jing Xiao (Ping An Technology, China)
Category: Speech Synthesis: Linguistic processing, paradigms and other topics
Abstract: In this paper, we propose a novel system based on word-level features and window-based attention for polyphone disambiguation, which is a fundamental task for Grapheme-to-phoneme (G2P) conversion of Mandarin Chinese. The framework aims to combine a pre-trained language model with explicit word-level information in order to get meaningful context extraction. Particularly, we employ a pre-trained bidirectional encoder from Transformers (BERT) model to extract character-level features, and an external Chinese word segmentation (CWS) tool is used to obtain the word units. We adopt a mixed pooling mechanism to convert character-level features into word-level features based on the segmentation results. A window-based attention module is utilized to incorporate contextual word-level features for the polyphonic characters. Experimental results show that our method achieves an accuracy of 99.06% on an open benchmark dataset for Mandarin Chinese polyphone disambiguation, which outperforms the baseline systems.
For more details and PDF version of the paper visit: https://www.isca-speech.org/archive/interspeech_2021/li21n_interspeech.html
d04s07t04trim
- published: 03 Feb 2022
- views: 8
0:36
Stavropol (disambiguation)
Stavropol is a city in southwestern Russia.
Stavropol may also refer to:
Stavropol Krai, the federal subject of Russia
Stavropol Soviet Republic, short-lived ...
Stavropol is a city in southwestern Russia.
Stavropol may also refer to:
Stavropol Krai, the federal subject of Russia
Stavropol Soviet Republic, short-lived division of the RSFSR in 1918, merged into the North Caucasian Soviet Republic
Stavropol Urban Okrug, a municipal formation which the city of krai significance of Stavropol in Stavropol Krai, Russia is incorporated as
Source: https://en.wikipedia.org/wiki/Stavropol_(disambiguation)
Created with WikipediaReaderReborn (c) WikipediaReader
https://wn.com/Stavropol_(Disambiguation)
Stavropol is a city in southwestern Russia.
Stavropol may also refer to:
Stavropol Krai, the federal subject of Russia
Stavropol Soviet Republic, short-lived division of the RSFSR in 1918, merged into the North Caucasian Soviet Republic
Stavropol Urban Okrug, a municipal formation which the city of krai significance of Stavropol in Stavropol Krai, Russia is incorporated as
Source: https://en.wikipedia.org/wiki/Stavropol_(disambiguation)
Created with WikipediaReaderReborn (c) WikipediaReader
- published: 02 Apr 2021
- views: 12
22:12
[PLDI24] Static Analysis for Checking the Disambiguation Robustness of Regular Expressions
Static Analysis for Checking the Disambiguation Robustness of Regular Expressions (Video, PLDI 2024)
Konstantinos Mamouras, Alexis Le Glaunec, Wu Angela Li, and...
Static Analysis for Checking the Disambiguation Robustness of Regular Expressions (Video, PLDI 2024)
Konstantinos Mamouras, Alexis Le Glaunec, Wu Angela Li, and Agnishom Chattopadhyay
(Rice University, USA; Rice University, USA; Rice University, USA; Rice University, USA)
Abstract: Regular expressions are commonly used for finding and extracting matches from sequence data. Due to the inherent ambiguity of regular expressions, a disambiguation policy must be considered for the match extraction problem, in order to uniquely determine the desired match out of the possibly many matches. The most common disambiguation policies are the POSIX policy and the greedy (PCRE) policy. The POSIX policy chooses the longest match out of the leftmost ones. The greedy policy chooses a leftmost match and further disambiguates using a greedy interpretation of Kleene iteration to match as many times as possible. The choice of disambiguation policy can affect the output of match extraction, which can be an issue for reusing regular expressions across regex engines. In this paper, we introduce and study the notion of disambiguation robustness for regular expressions. A regular expression is robust if its extraction semantics is indifferent to whether the POSIX or greedy disambiguation policy is chosen. This gives rise to a decision problem for regular expressions, which we prove to be PSPACE-complete. We propose a static analysis algorithm for checking the (non-)robustness of regular expressions and two performance optimizations. We have implemented the proposed algorithms and we have shown experimentally that they are practical for analyzing large datasets of regular expressions derived from various application domains.
Article: https://doi.org/10.1145/3656461
ORCID: https://orcid.org/0000-0003-1209-7738, https://orcid.org/0000-0002-5444-5924, https://orcid.org/0000-0002-4523-3401, https://orcid.org/0009-0007-0462-8080
Video Tags: regex, automata, parsing, disambiguation strategy, static analysis, pldi24main-p806-p, doi:10.1145/3656461, orcid:0000-0003-1209-7738, orcid:0000-0002-5444-5924, orcid:0000-0002-4523-3401, orcid:0009-0007-0462-8080
Presentation at the PLDI 2024 conference, June 24–28, 2024, https://pldi24.sigplan.org/
Sponsored by ACM SIGPLAN,
https://wn.com/Pldi24_Static_Analysis_For_Checking_The_Disambiguation_Robustness_Of_Regular_Expressions
Static Analysis for Checking the Disambiguation Robustness of Regular Expressions (Video, PLDI 2024)
Konstantinos Mamouras, Alexis Le Glaunec, Wu Angela Li, and Agnishom Chattopadhyay
(Rice University, USA; Rice University, USA; Rice University, USA; Rice University, USA)
Abstract: Regular expressions are commonly used for finding and extracting matches from sequence data. Due to the inherent ambiguity of regular expressions, a disambiguation policy must be considered for the match extraction problem, in order to uniquely determine the desired match out of the possibly many matches. The most common disambiguation policies are the POSIX policy and the greedy (PCRE) policy. The POSIX policy chooses the longest match out of the leftmost ones. The greedy policy chooses a leftmost match and further disambiguates using a greedy interpretation of Kleene iteration to match as many times as possible. The choice of disambiguation policy can affect the output of match extraction, which can be an issue for reusing regular expressions across regex engines. In this paper, we introduce and study the notion of disambiguation robustness for regular expressions. A regular expression is robust if its extraction semantics is indifferent to whether the POSIX or greedy disambiguation policy is chosen. This gives rise to a decision problem for regular expressions, which we prove to be PSPACE-complete. We propose a static analysis algorithm for checking the (non-)robustness of regular expressions and two performance optimizations. We have implemented the proposed algorithms and we have shown experimentally that they are practical for analyzing large datasets of regular expressions derived from various application domains.
Article: https://doi.org/10.1145/3656461
ORCID: https://orcid.org/0000-0003-1209-7738, https://orcid.org/0000-0002-5444-5924, https://orcid.org/0000-0002-4523-3401, https://orcid.org/0009-0007-0462-8080
Video Tags: regex, automata, parsing, disambiguation strategy, static analysis, pldi24main-p806-p, doi:10.1145/3656461, orcid:0000-0003-1209-7738, orcid:0000-0002-5444-5924, orcid:0000-0002-4523-3401, orcid:0009-0007-0462-8080
Presentation at the PLDI 2024 conference, June 24–28, 2024, https://pldi24.sigplan.org/
Sponsored by ACM SIGPLAN,
- published: 23 Jul 2024
- views: 165
56:30
Bootleg: Guidable Self-Supervision for Named Entity Disambiguation -- Chris Re (Stanford University)
September 18, 2020
Abstract
Mapping textual mentions to entities in a knowledge graph is a key step in using knowledge graphs, called Named Entity Disambiguat...
September 18, 2020
Abstract
Mapping textual mentions to entities in a knowledge graph is a key step in using knowledge graphs, called Named Entity Disambiguation (NED). A key challenge in NED is generalizing to rarely seen (tail) entities. Traditionally NED uses hand-tuned patterns from a knowledge base to capture rare, but reliable, signals. Hand-built features make it challenging to deploy and maintain NED–especially in multiple locales. While at Apple in 2018, we built a self-supervised system for NED that was deployed in a handful of locales and that improved performance of downstream models significantly. However, due to the fog of production, it was unclear what aspects of these models were most valuable. Motivated by this experience, we built Bootleg, a clean-slate, open-source, self-supervised system to improve tail performance using a simple transformer-based architecture. Bootleg improves tail generalization through a new inverse regularization scheme to favor more generalizable signals automatically. Bootleg-like models are used by several downstream applications. As a result, quality issues fixed in one application may need to be fixed independently in many applications. Thus, we initiate the study of techniques to fix systematic errors in self-supervised models using weak supervision, augmentation, and training set refinement. Bootleg achieves new state-of-the-art performance on the three major NED benchmarks by up to 3.3 F1 points, and it improves performance over BERT baselines on tail slices by 50.1 F1 points.
Bootleg is open source at http://hazyresearch.stanford.edu/bootleg/.
Biography
Christopher (Chris) Ré is an associate professor in the Department of Computer Science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with products from technology and enterprise companies. He has cofounded four companies based on his research into machine learning systems,SambaNova and Snorkel, along with two companies that are now part of Apple, Lattice (DeepDive) in 2017 and Inductiv (HoloClean) in 2020.
He received a SIGMOD Dissertation Award in 2010, an NSF CAREER Award in 2011, an Alfred P. Sloan Fellowship in 2013, a Moore Data Driven Investigator Award in 2014, the VLDB early Career Award in 2015, the MacArthur Foundation Fellowship in 2015, and an Okawa Research Grant in 2016. His research contributions have spanned database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016.
https://wn.com/Bootleg_Guidable_Self_Supervision_For_Named_Entity_Disambiguation_Chris_Re_(Stanford_University)
September 18, 2020
Abstract
Mapping textual mentions to entities in a knowledge graph is a key step in using knowledge graphs, called Named Entity Disambiguation (NED). A key challenge in NED is generalizing to rarely seen (tail) entities. Traditionally NED uses hand-tuned patterns from a knowledge base to capture rare, but reliable, signals. Hand-built features make it challenging to deploy and maintain NED–especially in multiple locales. While at Apple in 2018, we built a self-supervised system for NED that was deployed in a handful of locales and that improved performance of downstream models significantly. However, due to the fog of production, it was unclear what aspects of these models were most valuable. Motivated by this experience, we built Bootleg, a clean-slate, open-source, self-supervised system to improve tail performance using a simple transformer-based architecture. Bootleg improves tail generalization through a new inverse regularization scheme to favor more generalizable signals automatically. Bootleg-like models are used by several downstream applications. As a result, quality issues fixed in one application may need to be fixed independently in many applications. Thus, we initiate the study of techniques to fix systematic errors in self-supervised models using weak supervision, augmentation, and training set refinement. Bootleg achieves new state-of-the-art performance on the three major NED benchmarks by up to 3.3 F1 points, and it improves performance over BERT baselines on tail slices by 50.1 F1 points.
Bootleg is open source at http://hazyresearch.stanford.edu/bootleg/.
Biography
Christopher (Chris) Ré is an associate professor in the Department of Computer Science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with products from technology and enterprise companies. He has cofounded four companies based on his research into machine learning systems,SambaNova and Snorkel, along with two companies that are now part of Apple, Lattice (DeepDive) in 2017 and Inductiv (HoloClean) in 2020.
He received a SIGMOD Dissertation Award in 2010, an NSF CAREER Award in 2011, an Alfred P. Sloan Fellowship in 2013, a Moore Data Driven Investigator Award in 2014, the VLDB early Career Award in 2015, the MacArthur Foundation Fellowship in 2015, and an Okawa Research Grant in 2016. His research contributions have spanned database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016.
- published: 12 Jan 2023
- views: 139
1:58
KDD 2023 - Web-Scale Academic Name Disambiguation: the WholsWho BenchMark, Leaderboard, and Toolkit
Bo Chen, Tsinghua University
Name disambiguation?a fundamental problem in online academic systems?is now facing greater challenges with the increasing growth ...
Bo Chen, Tsinghua University
Name disambiguation?a fundamental problem in online academic systems?is now facing greater challenges with the increasing growth of research papers. To promote the research community, we present WhoIsWho owning, a large-scale benchmark with over 1,000,000 papers built using an interactive annotation process, a regular leaderboard with comprehensive tasks, and an easy-to-use toolkit encapsulating the entire pipeline. To sum up, WhoIsWho is an ongoing, community-driven, open-source project. We intend to update the leaderboard as well as offer new datasets and methods over time. We also encourage contributions at oagwhoiswho@gmail.com.
https://wn.com/Kdd_2023_Web_Scale_Academic_Name_Disambiguation_The_Wholswho_Benchmark,_Leaderboard,_And_Toolkit
Bo Chen, Tsinghua University
Name disambiguation?a fundamental problem in online academic systems?is now facing greater challenges with the increasing growth of research papers. To promote the research community, we present WhoIsWho owning, a large-scale benchmark with over 1,000,000 papers built using an interactive annotation process, a regular leaderboard with comprehensive tasks, and an easy-to-use toolkit encapsulating the entire pipeline. To sum up, WhoIsWho is an ongoing, community-driven, open-source project. We intend to update the leaderboard as well as offer new datasets and methods over time. We also encourage contributions at oagwhoiswho@gmail.com.
- published: 12 Jul 2023
- views: 186
14:55
Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Wi...
Title: Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Window-based Attention - (longer introduction)
Authors: Ju...
Title: Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Window-based Attention - (longer introduction)
Authors: Junjie Li (Ping An Technology, China), Zhiyu Zhang (National Tsing Hua University, Taiwan), Minchuan Chen (Ping An Technology, China), Jun Ma (Ping An Technology, China), Shaojun Wang (Ping An Technology, China), Jing Xiao (Ping An Technology, China)
Category: Speech Synthesis: Linguistic processing, paradigms and other topics
Abstract: In this paper, we propose a novel system based on word-level features and window-based attention for polyphone disambiguation, which is a fundamental task for Grapheme-to-phoneme (G2P) conversion of Mandarin Chinese. The framework aims to combine a pre-trained language model with explicit word-level information in order to get meaningful context extraction. Particularly, we employ a pre-trained bidirectional encoder from Transformers (BERT) model to extract character-level features, and an external Chinese word segmentation (CWS) tool is used to obtain the word units. We adopt a mixed pooling mechanism to convert character-level features into word-level features based on the segmentation results. A window-based attention module is utilized to incorporate contextual word-level features for the polyphonic characters. Experimental results show that our method achieves an accuracy of 99.06% on an open benchmark dataset for Mandarin Chinese polyphone disambiguation, which outperforms the baseline systems.
For more details and PDF version of the paper visit: https://www.isca-speech.org/archive/interspeech_2021/li21n_interspeech.html
d04s07t04lng
https://wn.com/Improving_Polyphone_Disambiguation_For_Mandarin_Chinese_By_Combining_Mix_Pooling_Strategy_And_Wi...
Title: Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-pooling Strategy and Window-based Attention - (longer introduction)
Authors: Junjie Li (Ping An Technology, China), Zhiyu Zhang (National Tsing Hua University, Taiwan), Minchuan Chen (Ping An Technology, China), Jun Ma (Ping An Technology, China), Shaojun Wang (Ping An Technology, China), Jing Xiao (Ping An Technology, China)
Category: Speech Synthesis: Linguistic processing, paradigms and other topics
Abstract: In this paper, we propose a novel system based on word-level features and window-based attention for polyphone disambiguation, which is a fundamental task for Grapheme-to-phoneme (G2P) conversion of Mandarin Chinese. The framework aims to combine a pre-trained language model with explicit word-level information in order to get meaningful context extraction. Particularly, we employ a pre-trained bidirectional encoder from Transformers (BERT) model to extract character-level features, and an external Chinese word segmentation (CWS) tool is used to obtain the word units. We adopt a mixed pooling mechanism to convert character-level features into word-level features based on the segmentation results. A window-based attention module is utilized to incorporate contextual word-level features for the polyphonic characters. Experimental results show that our method achieves an accuracy of 99.06% on an open benchmark dataset for Mandarin Chinese polyphone disambiguation, which outperforms the baseline systems.
For more details and PDF version of the paper visit: https://www.isca-speech.org/archive/interspeech_2021/li21n_interspeech.html
d04s07t04lng
- published: 03 Feb 2022
- views: 11