Episode 148: Ilias Sarantopoulos

Tuesday, 24 October 2023 16:49 UTC

🕑 1 hour 4 minutes

Ilias Sarantopoulos is a Senior Machine Learning Operations Engineer at the Wikimedia Foundation. He has been at the WMF since 2022.

Links for some of the topics discussed:

Open language identification API for 200+ languages

Tuesday, 24 October 2023 10:00 UTC

Language identification, commonly referred to as LID, plays a pivotal role in many natural language processing (NLP) systems. Consider the user interfaces we interact with daily. They usually have an option allowing users to specify the language of the content they’re dealing with. However, imagine if this manual selection step was bypassed and the system could predict the language on its own! This advancement would certainly elevate the user experience.

For instance, consider the need for viewing translated messages in platforms like Wikipedia Talk pages. If users are given translations without having to pinpoint the source language, it simplifies their interaction with the platform. Another example is a machine translation system, where user provides source text and selects target language. The system automatically selects the source language based on the language identification.

While there are numerous LID tools in existence, none can boast detecting all 300+ languages that Wikipedia is available in. For perspective, the Compact Language Detector 2 library can identify 83 languages, whereas FastText’s lid model can discern up to 176 languages. A notable challenge here is that many of these models don’t make their training data public.

This is where the project “An Open Dataset and Model for Language Identification” spearheaded by researchers from the University of Edinburgh steps in. Their efforts have culminated in a dataset and a model that can detect an impressive 201 languages. This potentially makes it the most adept and high-performing LID system available.

In light of this development, Language team and in collaboration with Machine learning team is introducing a new API designed to predict the language of any given text. This is hosted in the LiftWing system – a scalable machine learning model serving infrastructure by Wikimedia.

An animated image showing text in various languages identified

Using the API

Please refer the API documentation at Wikimedia API Portal

An example using curl:


$ curl https://api.wikimedia.org/service/lw/inference/v1/models/langid:predict -X POST -d '{"text": "Some sample text in any language that we want to identify"}' -H "Content-type: application/json"

About the potential usage, ethical consideration, caveats and recommentation, please see the model card

Thanks

We thank Laurie Burchell and Alexandra Birch and Nikolay Bogoychev and Kenneth Heafield of University of Edinburgh for their research and the model that made this API possible.

From 15 to 21 October 2023, Toumon Wikipedian Club Japan held an online editathon week for Wikimedia Japan-Malaysia Friendship. Wikimedians from Japan and Malaysia contributed to Wikimedia projects about their respective communities.

Ahmad Ali Karim, CC BY-SA 4.0

Details

Five Wikimedians participated in the online event and contributed to Wikimedia projects as follows. You can also check out our results on MetaWiki [[Wikimedia Japan-Malaysia Friendship/Editathon 15-21 October 2023]].

I am very happy to see that various Wikimedia projects were improved through the event. Actually, I think that editathons in Japan have tended to focus only on Wikipedia, and I have wanted to change that, especially after Wikimania 2023, where I learned a lot about Wikimedia projects from Wikimedians around the world. So this editathon week is a good opportunity to implement the idea.

Wikidata Workshop in Wikimania 2023. (Eugene Ormandy, CC BY-SA 4.0)

Acknowledgement

I would like to thank all the participants of the event. I hope we will work together regularly.

Tech/News/2023/43

Monday, 23 October 2023 23:16 UTC

Other languages: Deutsch, English, Tiếng Việt, español, français, italiano, norsk bokmål, polski, português, português do Brasil, svenska, čeština, русский, українська, עברית, العربية, ಕನ್ನಡ, 中文, 日本語

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

  • There is a new Language and internationalization newsletter, written quarterly. It contains updates on new feature development, improvements in various language-related technical projects, and related support work.
  • Source map support has been enabled on all wikis. When you open the debugger in your browser’s developer tools, you should be able to see the unminified JavaScript source code. [1]

Changes later this week

  • The new version of MediaWiki will be on test wikis and MediaWiki.org from 24 October. It will be on non-Wikipedia wikis and some Wikipedias from 25 October. It will be on all wikis from 26 October (calendar).

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

Since October 2012, Wikidata has evolved a lot to become one of the most important open knowledge graphs, providing semantic knowledge about various topics in multiple languages. This effort includes the development of quality information for Biomedicine that can be reused for clinical decision support among other very important tasks.

In 2019, we conducted a research study to assess the coverage of health-related information in Wikidata and we found that it lacks support of various important types of information and that a significant set of biomedical relations has a limited precision and is not linked to references. Despite the use of crowdsourcing and human editing, the situation does not evolve as it should be. We needed a hack to change all the game.

MeSH Keywords as a valuable resource

MeSH (Medical Subject Headings) keywords play a pivotal role in the realm of biomedical knowledge representation, making them a valuable resource in various aspects of healthcare research and practice. It is composed of a heading providing the main topic of a research paper and a qualifier identifying the facet of the topic that is discussed by the paper.

Created and maintained by the National Library of Medicine (NLM), MeSH keywords provide a standardized vocabulary for indexing, cataloging, and searching for biomedical and health-related information. Here are some key reasons why MeSH keywords are considered a valuable resource:

Standardized Terminology

MeSH keywords offer a standardized and structured way to describe biomedical concepts. Each keyword is associated with a unique identifier, facilitating interoperability and data integration across various biomedical databases and systems. This standardization ensures that researchers, healthcare professionals, and data scientists are speaking the same language when referring to specific medical topics, which is crucial in a field where precise terminology is paramount.

Improved Search and Retrieval

MeSH keywords significantly enhance information retrieval in biomedical databases such as PubMed. Researchers and healthcare practitioners can use MeSH terms to refine their searches, ensuring that they find highly relevant articles and resources. This precision in search and retrieval expedites literature reviews, clinical decision-making, and evidence-based practice.

Hierarchy and Relationships

MeSH keywords are organized into a hierarchical structure. This hierarchy enables researchers to navigate from broader concepts to more specific ones, making it easier to explore related topics and delve deeper into a subject area. This feature is especially beneficial for researchers seeking to understand complex medical phenomena or clinicians aiming to grasp the broader context of a particular condition.

Facilitation of Data Annotation

In the context of open knowledge graphs like Wikidata, MeSH keywords are instrumental in annotating and linking biomedical concepts. By associating MeSH terms with specific entities or topics, it becomes possible to create structured and interconnected knowledge representations. These annotations not only serve as a basis for data integration but also enable advanced semantic querying, classification, and reasoning.

Enabling Multidisciplinary Collaboration

MeSH keywords bring together professionals from diverse backgrounds, including medicine, biology, pharmacology, and computer science. This shared terminology ensures that collaborative projects can effectively bridge the gap between clinical knowledge and technical expertise. Multidisciplinary teams can collaborate seamlessly to enrich, validate, and apply biomedical knowledge in innovative ways.

Research and Clinical Decision Support

The value of MeSH keywords extends beyond research; they are instrumental in clinical decision support systems. Healthcare providers can use structured MeSH terminology to access the latest medical literature and clinical guidelines, aiding them in diagnosing patients, determining treatment plans, and staying current with advancements in healthcare. MeSH keywords empower healthcare professionals to make informed decisions based on a robust foundation of medical knowledge.

Using MeSH Keywords for adjusting Wikidata

The integration of MeSH (Medical Subject Headings) keywords into Wikidata represents a significant leap forward in the effort to enhance and adjust the open knowledge graph for clinical and biomedical applications. MeSH keywords, meticulously curated and structured for biomedical content, provide a powerful framework for improving the quality, coverage, and relevance of Wikidata in the context of healthcare and clinical practice. The Project entitled “Adapting Wikidata to support clinical practice using Data Science, Semantic Web and Machine Learning” has been funded by the Wikimedia Foundation Research Fund to assess this direction. It was launched in August 2022 by the Data Engineering and Semantics Research Unit at the University of Sfax, Tunisia alongside the School of Data Science at the University of Virginia, United States of America and the Institute for Technological Innovation at the University of Pretoria, South Africa.

Identifying concerns about biomedical relations in Wikidata

In order to identify potentially inconsistent relations between Wikidata items aligned with MeSH taxonomy, we employ the Pointwise Mutual Information (PMI) metric for each relation. PMI serves as a corpus-derived measure of semantic relatedness, highlighting relations as significant when they surpass a predefined threshold, typically set at 2.

To calculate PMI using MeSH keyword associations, several values are required:

  1. N(x): The frequency of occurrences of the subject in PubMed.
  2. N(y): The frequency of occurrences of the object in PubMed.
  3. N(x,y): The count of associations between the subject and the object in PubMed.
  4. P: The total number of PubMed records available.

These essential values are used in the PMI calculation to assess the strength of relations between MeSH-aligned Wikidata items. This method can be scaled to any kind of statements, including non-relational ones, and adapted to be driven by search engines.

The PMI calculation for the 109,302 Wikidata relations between MeSH-aligned items reveals the following:

  1. 12,898 relations (11.8%) cannot be verified due to inaccurate MeSH ID data in Wikidata. You can find the list of these erroneous MeSH ID values at MeSH Verification Spreadsheet.
  2. 40,725 relations (37.2%) are likely to be incorrect and require verification by medical specialists as they fall below the predefined PMI threshold. The list of these semantic relations needing attention can be accessed at Relations Requiring Verification.

These resources provide detailed information for further analysis and validation of the identified issues in the MeSH-aligned Wikidata relations. Please note that several accurate non-biomedical relations can have PMI values less than 2 as PubMed does not efficiently cover scholarly information not related to medical practice.

Identifying new biomedical relations from PubMed

We identified the most common 5,000 MeSH keywords in PubMed and we studied the associations with them using PMI. This stage involve the computation of 25 million PMI values and consequently requires parallel computing and high computational capacities. Due to this computational complexity, we developed a data center at the University of Sfax thanks to the efforts of two contractors and the funding of the Wikimedia Foundation, the WikiCred Grant Initiative, and the Tunisian Ministry of Higher Education and Scientific Research among other institutions.


The Project’s Final Office Hour in French (September 30, 2023) featuring a demonstration of the project’s work and the data center. Please enable auto-translated closed captions in English for more context.

Thanks to these efforts, we successfully identified 835,111 new relations between the 5000 most common MeSH-aligned Wikidata items. These new relations are supported by PubMed references identified thanks to the PubMed Best Match sorting method. This number seems to be huge when compared to the current number of relations between MeSH-aligned Wikidata items. That is why we need a significant community of Wikidata contributors having robust medical knowledge to go through these relations and verify whether they are relevant or not. As well, we need to identify the Wikidata properties corresponding to these associations so that they can be added to Wikidata using QuickStatements or another batch upload tool. You can find the list of the significant associations between the 5000 most important MeSH keywords here.

Classifying new biomedical relations

In the classification process, we employed qualifiers associated with both the subject and object of each relation across 20 publications or fewer. This data served to create an association matrix that links subject qualifiers to object qualifiers for each statement. Following this, classification is conducted using a dense neural network, responsible for assigning both the relevant Wikidata Property for the relation and determining the appropriate type of Wikidata Property for assignment to the relation.

In cases where a conflict arises between the returned Wikidata property and the assigned property type, the classification is deemed incorrect, and its result is not ascribed to the respective relation. The classification process has been proven to be accurate at a rate of 89.40% for superclass-based classification and 75.32% for relation type-based classification. The joint verification of the superclass-based classification and the relation type-based classification has been identified as efficient in removing 93.1% of the mistaken classification outputs. The source code for the supervised classification algorithm is available at MeSH2Wikidata. While this method demonstrates efficiency, there is no guarantee that it can classify a significant portion of new relations, primarily because several MeSH associations lack qualifiers. That is why the contribution of the Wikidata community is required to classify these relations.

Conclusion

We are confident that our efforts mark one of the initial strides toward revolutionizing the application of Artificial Intelligence in Wikimedia Projects. We look forward to collaborating with Wikimedia Deutschland and the Wikimedia Foundation Research and Development teams to advance this undertaking. Their collaboration with our consortium is crucial to expand the scope of our accomplishments and establish a sustainable solution that can elevate Wikidata into a dependable and versatile multidisciplinary knowledge graph.

7 reasons you should donate to Wikipedia

Monday, 23 October 2023 15:48 UTC

Tento příspěvek je dostupný i v češtině.
Questo post è anche disponibile in italiano.

People give to Wikipedia for many different reasons. The Wikimedia Foundation, the nonprofit that operates Wikipedia, ensures that every donation we receive is invested back into serving Wikipedia, Wikimedia projects, and our free knowledge mission. 

While many visit Wikipedia on a daily basis, it’s not always obvious what it takes to make that visit possible. Here are 7 reasons to donate to the Foundation that also clarify who we are, what we do, and why your donations matter: 

  1. We’re a nonprofit, and readers and donors around the world keep us independent.

Many people are surprised to learn that Wikipedia is hosted by a nonprofit organization. It is actually the only website in the top-ten most-visited global websites to be run by a nonprofit. That’s important because we are not funded by advertising, we don’t charge a subscription fee, and we don’t sell your data. The majority of our funding comes from donations ($11 is the average) from people who read Wikipedia. Many see fundraising banners on Wikipedia and give through those. This model preserves our independence by reducing the ability of any one organization or person to influence the content on Wikipedia. 

We’ve long-followed industry best practices for nonprofits and have consistently received the highest ratings by nonprofit groups like Charity Navigator for financial efficiency and transparency. We also publish annual reports about our finances and fundraising that are open for anyone to review. 

  1.  Wikipedia serves millions of readers and runs at a fraction of the cost of other top websites. 

Wikipedia is viewed more than 15 billion times every month. We have the same (if not higher) levels of global traffic as many other for-profit internet companies at a fraction of the budget and staffing. 

More than 700 people work at the Wikimedia Foundation. The majority work in product and technology ensuring quick load times, secure connections, and better reading and editing experiences on our sites. They maintain the software and infrastructure on which we operate some of the world’s most multilingual sites with knowledge available in over 300 languages. While our mission and work are unique, by comparison, Google’s translation tool currently supports 133 languages; Meta has more than 70,000 employees; and Reddit has about 1,400 employees

  1. Reader donations support the technology that makes Wikipedia possible and improvements to how people read, edit, and share knowledge on Wikipedia. 

About half of our budget goes directly towards maintaining Wikipedia and other Wikimedia projects. This supports the technical infrastructure that allows billions of visits to Wikipedia monthly, as well as the staff who play a vital role in contributing to the maintenance of our systems, including site reliability engineering, software engineering, security, and other roles. 

Because Wikipedia is available in over 300 languages, it needs top-notch multilingual technology to ensure readers and editors can view and contribute knowledge in their preferred language. Funding also helps with improvements to the user experience on Wikipedia and supporting the growth of global volunteer editor communities to increase knowledge on the site, so that it remains relevant, accurate, and useful.

  1. We’re evolving to meet new needs in a changing technology landscape and respond to new global threats.

If you regularly visited Wikipedia in our first decade, there was a good chance you’d get an error message at some point. Because of our steady investments in technology, that’s no longer the case. New investments allow Wikipedia to handle record-breaking spikes in traffic with ease, preventing any disruption to the reading or editing experience. 

We’re also adapting to meet new challenges, including sophisticated disinformation tactics and threats of government censorship, as well as cybersecurity attacks and changes to laws that regulate the web. New security protocols limit the potential for attackers to take advantage of our sites, while our legal staff help to protect our free knowledge mission.

More than half of our traffic now comes from mobile devices. Voice-activated devices and websites increasingly leverage Wikipedia to serve their users’ knowledge needs. We’re continuing to evolve to meet these preferences.

Editor’s note: Since this blog was originally published in November 2022, the Foundation, alongside the volunteer Wikimedia community, have been exploring how artificial intelligence could transform how people search for knowledge. We’ve recently developed a new, experimental Wikipedia plugin for ChatGPT, which allows users to search for the most up-to-date information on Wikipedia, and provides proper attribution and citations.

  1. We manage our finances responsibly and balance Wikipedia’s immediate needs with long-term sustainability. 

You probably don’t use your checking account in the same way you use a savings account. One is probably for more day-to-day expenses and the other is likely for emergencies, like if your car suddenly breaks down, or for long-term financial goals, like retirement.  

It’s similar for nonprofits. We have two accounts that act like savings accounts for us. Our reserve is like a rainy day fund for emergencies, such as an economic crisis. 

Our endowment is a long-term permanent fund. The investment income from the endowment supports the future of Wikipedia and Wikimedia projects. These funds are set aside for particular long-term purposes. However, we use the vast majority of the donations we receive from Wikipedia readers to support the current work we are doing that year. 

Sustaining healthy financial reserves and having a working capital policy is considered a best practice for organizations of all types. The Wikimedia Foundation Board of Directors defined our working capital policy so that it is designed to sustain our work and provide support to Wikimedia affiliates — a global network of groups that support Wikipedia, Wikimedia projects, and the mission globally — and volunteers in the event of unplanned expenses, emergencies, or revenue shortfalls. It also enables us to have sufficient cashflow to cover our expenses throughout the year.

  1. Supporting Wikipedia means you’re helping it become more representative of all the world’s knowledge. 

The Wikimedia Foundation supports individuals and organizations around the world with funding to increase the diversity, reach, quality, and quantity of free knowledge. Over the last four years, we have given over $47 million to members of the volunteer Wikimedia community in 94 countries. Recently, we made changes to the way we allocate our revenue to be more inclusive to newer and smaller Wikimedia affiliates.

While we recognize there are still big gaps to fill, the knowledge on Wikipedia has become more globally representative of the world, as have the editors that contribute to the site. For example, from 2020 to 2023, the community of volunteer editors in Sub-Saharan Africa has grown by 36 percent. This is because of steady programmatic efforts led by Wikimedia volunteers, affiliates, and others — many of whom have received funding, trainings, and other support from the Foundation. 

Why does global representation of Wikipedia volunteer editors matter? It matters because Wikipedia is a reflection of the people who contribute to it. Diverse perspectives create higher quality, more representative, and relevant knowledge for all of us.  

  1. Contributions from readers keep us going.

The humans who give back to Wikipedia — whether through donations, words of support, edits, or through the many other ways people contribute  — inspire us every day. All of us here at the Wikimedia Foundation want to take this opportunity to thank them. We’d like to share some of our favorite messages from donors over the years. We hope they move you as much as they have moved us: 

“I am astonished at the capabilities of Wikipedia! As I read scientific and medical articles on one monitor, I always have Wikipedia open on the other to check the meaning and background of the increasingly obscure terminology in these areas. Wikipedia is not only the largest collaborative project in human history, it’s the best!”

Donor from the US

“Please accept my heartfelt thanks for keeping Wikipedia going, for not letting it be anyone’s personal property, for maintaining its integrity, quality, and its sanctity, for making it accessible to anyone and everyone across the geography. I understand how difficult it can be to not compromise and keep going especially in today’s profit-seeking digital world.”

Donor from India

We hope that we helped to deepen your understanding about how important reader donations are to Wikipedia. If you have any questions, please check out our FAQ

If you are in a position to give, you can make a donation to Wikipedia at donate.wikimedia.org.

Editor’s note: This post was originally published on 3 November 2022. Several data points, figures, and links were updated in October 2023 to include more recent information.

Related resources

The post 7 reasons you should donate to Wikipedia appeared first on Wikimedia Foundation.

Diffquote volume 4 (30 November 2023)

Monday, 23 October 2023 15:28 UTC

In this post, I quote some Diff posts published between 24 September 2023 and 30 September 2023.

Wikimania 2023

AdjoEsse introduce their experience in Wikimania 2023!

Wikimania as a whole has been an invaluable source of inspiration. I attended captivating presentations on a wide range of subjects, from artificial intelligence to the preservation of cultural heritage. There was also a presentation by Iola Pensa, chair of Wikimedia Italy, on the collaboration between Wikimedia projects and museums, and one by Mark Graham, director of Wayback Machine, on the latest advances at Internet Archive aiming to make Wikipedia more useful and reliable. I had the opportunity to understand how Wikimedia actively contributes to influencing our understanding of the world through its open collaboration.

AdjoEsse and Linguistcorner (25 September 2023) “Wikimania 2023 : My Rewarding Adventure in the Universe of Collaborative Knowledge” Diff.
AdjoEsse at Wikimania 2023 in Singapore

Project Korikath

Have you heard of Project Korikath? You can read about its great results in a diff post by Mrb Rafi!

Now the project has 17.5k+ images and videos, 350+ of them are quality images on Commons, three of them are valued, one is featured (also commons POTD), and one video was used as the “media of the day” on 5 March, 2023. The media files have been used in over 3k Wikimedia project pages and received a total view count of an overwhelming number of over 6 million so far. The project has over 50 members from seven countries who collectively can communicate in 11 languages. And the best part is, most of them are young and high schoolers. The members have together conducted more than 50 photo walks, 3 photo tours and one Wikipedia editing campaign – everything without any grant support from the Wikimedia Foundation.

Mrb Rafi (26 September 2023) “Project Korikath celebrates its first birthday!” Diff.
The amazing logo of Project Korikath was designed by User:Meghmollar2017 and it is now available under CC BY SA 4.0.

Deforestation in Nigeria

Deforestation is an important issue. A project called Deforestation in Nigeria is tackling it through Wikimedia.

This project is a collaboration between Wikimedians in Southeast Nigeria, the Department of Geography, and the Institute of Climate Change Studies, Energy, and Environment, University of Nigeria, Nsukka (UNN). Wikimedians in Southeast Nigeria are made up of Wikimedians in UNN and Nnamdi Azikiwe University, Awka, (UNIZIK). The project is a six-month project supported by the Wikimedia Foundation with a grant. The project involved the training of the students of the Department and Institute to document information on Deforestation in Nigeria in Wikipedia based on existing literature and to support them on field trips to deforestation sites in Nigeria to capture pictures to support their documentation.

Ngozi osadebe (28 September 2023) “History of the Deforestation in Nigeria Wikipedia Project of the University of Nigeria” Diff.
Olugold, CC BY-SA 4.0

Wikimedia Community Kilimanjaro

We can learn about the great mission of Wikimedia Community Kilimanjaro through a Diff post titled “Our Success Story: Wikimedia Community Kilimanjaro”.

From the outset, our mission was clear: to create a platform that would not only bring together like-minded individuals but also serve as a catalyst for positive change through the power of knowledge-sharing and collaboration. As we reflect on our journey, we are thrilled to share the milestones that define our success.

Justine Msechu (28 September 2023) “Our Success Story: Wikimedia Community Kilimanjaro” Diff.
Justine Msechu, CC BY-SA 4.0

Acknowledgement

First of all, I would like to thank my friend Caner (User:Kurmanbek) for creating a fantastic logo for Diffquote. I’m really happy to work with such a talented designer and Wikimedian in the Wikimedia Japan-Türkiye Friendship Project. Of course, I would also like to thank all the contributors, organisers and readers of Diff. It’s a great pleasure for me to get to know the Wikimedia movement around the world through Diff.

Kurmanbek, CC BY-SA 4.0

Sentence segmentation is a fundamental process in natural language processing. It involves breaking down a given text into individual sentences, a task that finds applications in various contexts. Whether you need to split a paragraph into sentences for further analysis or present sentence boundaries in a user-friendly frontend application, sentence segmentation is crucial.

At first glance, identifying sentence boundaries might seem straightforward – just look for a period or full stop. However, it quickly becomes complex when you consider cases where a period is used for abbreviations such as “Dr.” or in numerical values like “3.14.” This simple punctuation mark doesn’t always signal the end of a sentence.

In many languages, the period isn’t the standard sentence delimiter. For instance, Hindi uses a unique character called the ‘danda’ sign (।) to indicate the end of a sentence. Additionally, sentence segmentation must account for periods inside quotations, which don’t denote sentence boundaries, or periods within email addresses, which certainly aren’t sentence endings.

The challenge lies in finding sentence segmentation libraries that can handle these complexities across a wide range of languages while addressing language-specific intricacies. Most existing libraries fall short in this regard. They tend to focus on English or support only a limited number of languages. Furthermore, they may not be adequately performant or actively maintained.

Here at the language team, we needed a robust sentence segmentation library for our machine translation system and our section translation project. The former is a Python project, while the latter is a Node.js project. After a thorough examination of existing libraries, we decided to create our own – not just one, but two libraries: one in Python and another in JavaScript, both serving the same purpose.

sentencex

We are introducing ‘sentencex’ library, available in both Python and JavaScript. This remarkable sentence segmentation library boasts extensive language support while emphasizing speed and practicality. When it comes to the balance between linguistic precision and versatility for various applications, sentencex prioritizes usability.

In situations where ambiguity arises and linguistic insight is needed, our library errs on the side of caution, avoiding unnecessary splits rather than risking incorrect segmentations. Our performance benchmarks reveal that this library not only delivers impressive speed but also excels in evaluation datasets.

The name sentencex stands for sentence extraction or sentence✂

Our overarching goal is to provide support for all languages that have a presence on Wikipedia. Instead of defaulting to English for languages not explicitly defined in the library, we’ve implemented a fallback chain mechanism. This means that the closest language included in the library will be utilized. We’ve defined fallbacks for approximately 244 languages, and we also have abbreviation data available for around 30 languages.

sentencex python library

Souce code: github repository. Please refer the documentation for usage examples
Python package: sentencex
Demo: https://wikimedia.github.io/sentencex/

sentencex js library

Souce code: github repository. Please refer the documentation for usage examples
NPM package: sentencex
Demo: https://wikimedia.github.io/sentencex-js

This library is already in use in MinT project. It is also replacing the minimal sentence segmentation libray we had in CX-cxserver project. As we start using it in more project, we hope to support more languages and existing languages better.

Tech News issue #43, 2023 (October 23, 2023)

Monday, 23 October 2023 00:00 UTC
previous 2023, week 43 (Monday 23 October 2023) next

Tech News: 2023-43

weeklyOSM 691

Sunday, 22 October 2023 09:34 UTC

10/10/2023-16/10/2023

lead picture

Choropleth layer in uMap [1] © ybon | map data © OpenStreetMap contributors

Mapping

  • Spencer Alves attended the dedication of a new park, with a LiDAR-equipped phone, and micromapped the park in OSM. His toot on Mastodon showed that other map providers look pretty outdated in comparison.
  • Klenje blogged on the basics of mapping playgrounds. In the comments, Sven Geggus offered to provide the code from OpenCampingMap to anyone interested in creating an OpenPlayGround map.
  • The following proposals are waiting for your comment:

Mapping campaigns

  • Tourist Information Eichstätt is looking for help in correcting the hiking trails in OSM. You can find up-to-date GPX tracks on their website.

Community

  • Kingsley AMANKWE described his involvement in the SotM Nigeria 2023 as ‘unforgettable, inspiring, and hopeful’.
  • Matt_ recounted his experience correcting the location of lakes and dams on OpenStreetMap.
  • Mikel Maron is looking for a workflow to make use of the Overture places data in OSM.
  • Romeo Ronald wrote a diary post about what mapping Juba, the capital city of South Sudan, means to him and explained the benefits he sees for the community: empowerment, inclusivity, engagement, and innovation.

OpenStreetMap Foundation

  • The next monthly public meeting of the OSMF board will take place on Thursday 26 October at 13:00 UTC.
  • The OSM Foundation is continuing to ask for a wide range of support. With the current donation drive ongoing, the Foundation is also happy to collect your contribution.

Local chapter news

Events

  • The Geomob podcast interviewed Ben Abelshausen, one of the organisers of the State of the Map Europe 2023 event, which will take place next month.
  • OpenStreetMap India has joined, as a ‘community partner’, in the free and open-source software conference ‘India FOSS 3.0’, which will take place on 28 and 29 October in Bengaluru.
  • Mikel Maron has summarised some of the key messages from the keynote talk he gave at the State of the Map Nigeria 2023.

Education

  • The UN Mappers will be offering a new course in Spanish on OpenStreetMap and humanitarian mapping. It will start on Wednesday 1st November and is open to all. Registrations are open until Sunday 29th October.

Humanitarian OSM

  • Pete Masters gave an update on the disaster response activities for the Morocco (earthquake) and Libya (floods) events. He posted OSM statistics, local data options, and reactions from different organisations.

Maps

  • Julien Minet has set up a tile server for OpenArdenneMap. It is designed specifically for printing topographic maps in a ‘cartoCSS/Mapnik’ style and has recently been made available as a QGIS style. If you’re looking for ready-to-print maps, you’ll find them at hiking.osm.be.

OSM in action

  • The meteorologist Jörg Kachelmann, who is very well known in the German-speaking world, is using OSM data for his weather maps. He also uses the correct attribution prescribed by OSM. His data source is easier to see here, for example.
  • Akihiko Kusanagi has published ‘Mini Tokyo 3D’, a 3D map that displays the position of trains, planes, and weather around Tokyo in real time. The map uses OpenStreetMap data combined with various related open data.

Licences

  • Regin Lippold announced that from October 2023, basemap.de products and services will be available under an open data licence. In response to this news, Mcliquid tooted noting that the open data licence in question is CC BY 4.0, which is not compatible with OpenStreetMap’s ODbL licence.

Software

  • Rihards Olups tooted that JOSM can now process the .FIT file format used by Garmin GPS devices.
  • Vincent de Château-Thierry (vdct contributor) announced that the Pifomètre tool (for integrating address points in OSM in France from open data sources) now has a map display called Pifomap .
  • HeiGIT announced that SketchMapTool, an application to support paper-based digital mapping activities, has received funding support from the German Red Cross and the German Foreign Office.

Programming

  • Nikhil VJ showed how to copy shapes out of vector tiles.
  • Riley Walz and Mehran Jalali have made Fuzzy, an AI that uses OSM to answer complicated geospatial queries. For example, “get walking directions to the sweetgreen closest to City Hall and avoid passing by any McDonald’s”.
  • OpenCage have released jopencage v2.0.0, a Java SDK to access their geocoding API.
  • Ilya Zverik has written cli-oauth2, a Python library for performing OpenStreetMap authentication based on OAuth2.

Releases

  • [1] Yohan Boniface, aka ybon, has released uMap version 1.9.2. This version now includes a feature to create choropleth maps.
  • MapTiler has released MapTiler Server version 4.4.

Did you know …

  • … the opensource-based interactive web map system ‘Protomaps‘?
  • … the website that tries to verify that the OSM map is complete? Of course it is not possible, but this website gives metrics on how many of some large store brands have been mapped compared to the number expected. Statistics are also available on the number of museums, sculptures, airports, etc.
  • … that you can find and add automated external defibrillator location data to OpenStreetMap using OpenAEDMap?

Upcoming Events

Where What Online When Country
Lagos #MapNigeria Monthly Meetup 2023-10-21 ng
Chambéry Mapathon débutant saison 23/24 CartONG 2023-10-23 flag
San Jose South Bay Map Night 2023-10-25 flag
Wien 69. Wiener Stammtisch 2023-10-25 flag
Düsseldorf Düsseldorfer OpenStreetMap-Treffen 2023-10-25 flag
[Online] OpenStreetMap Foundation board of Directors – public videomeeting 2023-10-26
Lübeck 136. OSM-Stammtisch für Lübeck und Umgebung 2023-10-26 flag
Bengaluru IndiaFOSS 3.0 – OSM Workshop 2023-10-27 – 2023-10-28 flag
Signa OpenStreetMap al Linux Day di Signa (Firenze) 2023-10-28 flag
Saint-Étienne Rencontre Saint-Étienne et sud Loire 2023-10-31 flag
OSMF Engineering Working Group meeting 2023-11-01
Stuttgart Stuttgarter OpenStreetMap-Treffen 2023-11-01 flag
IJmuiden OSM Nederland bijeenkomst (online) 2023-11-01 flag
Thrissur OSM Kerala Annual Community Meetup 2023 2023-11-03 – 2023-11-04 flag
Dublin OpenStreetMap Ireland AGM 2023-11-04 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by MatthiasMatthias, Michael Montani, Strubbl, TheSwavu, barefootstache, derFred, rtnf.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

The Afroyanga 1.0 Bootcamp, developed by African & Proud (AP), is a physical capacity-building conference held in Nigeria, with participants from various states across the country. This bootcamp is dedicated to empowering individuals with the knowledge and skills necessary to contribute effectively to Wikimedia Foundation projects while also focusing on personal development and networking opportunities.

The Wikimedia community is built on the foundation of knowledge sharing, cultural diversity, and a passion for making information accessible to all. It’s a space where contributors from around the world converge to enrich the digital sphere with valuable content. In the spirit of this shared mission, we are thrilled to take you on a journey through Afroyanga Bootcamp 1.0, a remarkable event that recently unfolded in Nigeria.

Afroyanga Bootcamp 1.0, organized by African & Proud (AP), commenced with a surge of enthusiasm and purpose. We received an overwhelming response of over 200 applications from eager individuals across Nigeria. Selecting the final 30 participants was a task that reflected our commitment to inclusivity. We aimed to ensure a truly diverse representation by including voices from various cultural groups. This diversity injected a vibrant and inclusive atmosphere right from the start.

The heart of Afroyanga 1.0 was dedicated to empowering Wikimedia contributors. It all began with a comprehensive introduction by Ayokanmi Oyeyemi, the Programs Director of Wikimedia Nigeria. Participants delved deep into the essence of the Wikimedia Projects, learning about its core values and the rules governing the creation of articles, among other foundational knowledge. This strong foundation prepared participants to become well-rounded editors.

The journey continued with participants immersing themselves in various Wikimedia Foundation Projects, including English Wikipedia, Yoruba Wikipedia, Igbo Wikipedia, Hausa Wikipedia, WikiQuote, Wikidata, and Wikibooks. They equipped themselves with the skills and knowledge required to make a meaningful impact. The bootcamp’s approach skillfully blended transferring knowledge from experienced contributors to newcomers while honing the expertise of those already familiar with Wikimedia projects.

Beyond Wikimedia, Afroyanga Bootcamp 1.0 emphasized holistic development. Workshops on soft skills, facilitated by experts, covered essential areas such as facilitation techniques, emotional intelligence, proactivity, and digital skills. Participants left with a well-rounded skill set, ready to excel in their Wikimedia contributions. An additional session on optimizing LinkedIn accounts, led by Oluwaseun ‘Gabby’ Akinola, proved invaluable in enhancing participants’ online presence and networking abilities.

In today’s interconnected world, online collaboration is paramount. Afroyanga Bootcamp delved into the complexities of this landscape, exploring various aspects. Participants learned about the types of online conflicts that can arise and honed their communication skills. The sessions also delved into effective conflict resolution tools available within the Wikimedia community. It was a practical and insightful journey, led by Kemi Makinde.

Afroyanga 1.0 was not just about learning; it was about connecting. Guest speakers, young professionals, and entrepreneurs shared their insights, leaving us with a broader perspective and a stronger network. As part of our networking extravaganza, we held a closing dinner night where participants dressed in their cultural attire, further promoting African culture and diversity.

As the bootcamp concluded, participants left with a hunger for more. The thirst for knowledge, collaboration, and connection has ignited a fire within us. We are eager to harness this energy for the benefit of the Wikimedia Foundation at large.

Afroyanga Bootcamp 1.0 has been an incredible journey of growth, learning, and connection. We extend our heartfelt gratitude to the Wikimedia Foundation, core organizing team, including Co-Founders Kolawole Oyewole and Richard Edozie, and the supporting team members Demilade Ajala and Lawretta. We also thank all our participants, speakers, and supporters who made this event a resounding success.

Stay tuned for more exciting adventures on our Wikimedia journey! Together, we can make a difference.

Link: African & Proud Website: African & Proud – Non-profit Initiative (africanandproud.org)

Link: Meta page: Afroyanga Bootcamp – Meta (wikimedia.org)

Reading Wikipedia in the Classroom is the Wikimedia Foundation’s flagship teacher training programme that supports educators and students in acquiring critical media and information literacy skills for the twenty-first century such as:

  • Understanding how information is produced.
  • Learning how to access and evaluate content online.
  • Appreciating the biases and knowledge gaps in the information they consume.
  • Improving teachers media and information Literacy skills.

Like many developing nations, Nigeria faced and is still facing some challenges with its educational system that need to be resolved. For many years, the four-level system of Education – Creche, Basic, Secondary, and Tertiary Education has mostly been based on books, chalkboards, lectures and in-person activities that were carried out in formal settings (Classrooms, Libraries, Halls, etc). With the advent of technology which revolutionized and improved service delivery in different sectors including education, the system of teaching and learning requires substantial change through the introduction of various aspects of digital learning and personalized education. Adoption of technology for teaching was enhanced during the covid-19 period when schools were forced to close and some that had technological capacity started teaching students online. 

Reading Wikipedia in the Classroom Programme (RWIC) enables teachers to provide pupils with a high-quality online education. It is one potential way to improve the educational experience of both teachers and students for the following reasons:

  1. It is a pedagogical tool for critical engagement with teachers and students.
  2. It encourages collaboration between teachers developing innovative classroom curriculum and lesson plan to enhance teaching and learning in the classroom.
  3. It promotes media and information literacy skills for teachers to access, evaluate, and create information on Wikipedia.
  4. It develops active contributors to Wikipedia
  5. It introduces the concept of open source ideology in the school system

Measures taken to introduce RWIC in Nigeria

The RWIC programme has a teachers guide which is a manual developed by Wikimedia Foundation used to teach teachers in the programme. The Teacher’s Guide has 3 modules. Modules 1 discussed assessing information; module 2 discussed evaluating information while module 3 discussed creating information. We modified the Teachers Guide by including some local content and translated it to the Yoruba language in 2022. We further updated the modules of the Teachers Guide by adding additional information, making it more colourful; and translated it to the Igbo Language in 2023.

In 2022 the first implementation of RWIC was carried out in Kwara State, Nigeria Led by Bukola James. In that programme, a total of 75 Teachers from 35 secondary schools were trained. At the end of the training in July 2022, 60 teachers were certified. 

Later, 3 teachers that were certified in the programme conducted another RWIC progamme for 45 students who are in senior secondary levels. This programme also took place in Kwara State in 2022.

In 2023 a series of implementations in different parts of the country, starting with Lagos state Nigeria with Kemi Makinde as the Lead Trainer. 50 Teachers from 36 secondary schools in Lagos State were trained and 36 were certified at the end of the first implementation in February 2023.

In Abuja the second implementation was done by the collaborative effort between three certified trainers Oby Ezeilo as Lead Trainer, Ismail Atiba and Clement Dike. 70 Teachers from 51 secondary schools in the 6 Area councils of Abuja were trained and certified 52 at the end of the first implementation in April 2023.

In Onicha Anambra State we had our third implementation with Oby Ezeilo as Lead trainer, Jane onuchukwu and another certified trainer Ngozi Osuchukwu served as the implementation team. Over 50 Teachers from 8 secondary schools in Onicha Anambra were trained and 39 of them were certified at the end of the first implementation in July 2023.

Between July 2022 and July 2023. A total of 245 teachers were trained and have benefited form the RWIC programme in Nigeria while 187 completed the training and all the necessary criteria required for certification and were certified.

Challenges

Lack of digital literacy: Most teachers were not familiar with navigating online resources like Wikipedia.

Lack of Internet and Infrastructure: Access to the internet may be restricted, unreliable or unavailable in many areas of Nigeria. Some schools lack the infrastructure required to enable ongoing online education.

Regulatory, Policy and School Structure: In public schools, students are not allowed to come to school with digital device like mobile phones and laptops but in Private schools revice is the case.

Other challenges were IP blocks, Teachers absence at physical events due to removal of fuel subsidy and political crisis of presidential election, Securing partnerships with key Education Stakeholders, and Limited funding.

Our success story

Collaboration and Partnership: Partnering with some stakeholders gave us leverage in getting the right candidates for the program. It also helped in identifying what the partners need to help do in order to sustain the program , one of which is providing infrastructures that will help bridge the digital dived in private and public schools and documentation of the program in the government facilities. While collaborating with each other and our partners provided improved valuable ideas and structured division of labour . Below are Some of our Partners.

Cultural Relevance: Using the local language in instruction increased the course’s relevance, effectiveness and teachers’ mastery of the subject matter.

Wiki Clubs: Establishment of Wiki clubs in Kwara, Abuja, Onitcha and Lagos were one of our success stories.

EduWiki Nigeria: In order to grow our RWIC Community we are thrilled to establish the EduWiki Nigeria. A diverse Community of Certified trainers, volunteers, educators, students, and professionals, who are happy and eager to smoothly integrate Wikipedia into Nigeria’s educational system while creating a culture of knowledge sharing, skill development, and lifelong learning. 

On 7 October 2023, Wikimedia Community User Group MalaysiaToumon Wikipedian Club Japan and Wikimedia Medan Community held an online Wiktionary Editathon for the Wikimedia Japan-Malaysia Friendship.

Ahmad Ali Karim, CC BY-SA 4.0

Details

Wikimedians from Malaysia, Japan and Indonesia met at Zoom. From 21:00-21:15 JST, we introduced ourselves and then focused on editing. Sometimes Taufik Rosman, a veteran Wiktionary editor and Wikimedian of the Year 2023, gave a talk for beginners.Wikimedians edited the entries of their friends’ language in their Wiktionary. For example, I edited the Malay and Central Dusun entries in the Japanese Wiktionary using dictionaries and asking our friends.

This editathon resulted in the creation of 84 new entries and the improvement of 25 entries. You can see the result on MetaWiki [[Wikimedia Japan-Malaysia Friendship/Wiktionary Editathon 2023]].

Acknowledgement

I would like to thank all the participants, especially Taufik. It was a good opportunity for me to learn Malay and Central Dusun, and I am very happy to see the entries of Japanese words in the Wiktionary of other languages. I would like to do more collaborative projects in the future.

📎 Balancing security and openness

Thursday, 19 October 2023 20:00 UTC

Wikimedia software is quite selective in its dependencies and we often even audit the sources ourselves. Progressive enhancement not only makes for a blazing fast and accessible site, I argue it’s also the cheaper choice in the long run.

How does an open philosophy jive with performance and security practices? I wrote about it over on the OpenJS Foundation blog, check it out!

→ How the Wikimedia Foundation Balances Security and Open Information


This post appeared on timotijhof.net. Reply via email

An Invitation to Talk

Thursday, 19 October 2023 19:52 UTC

Join the Community Affairs Committee of the Board and Foundation Leadership in Conversation

Please visit Meta-Wiki for ‎العربية , Bahasa Indonesia, 中文, Deutsch, español, français, Kiswahili, polski, português do Brasil, українська. You can help with more languages also.

Wikimedia Foundation CEO Maryana Iskander recently shared a message reflecting on developments since her initial listening tour in 2021 to understand the challenges and needs facing the Wikimedia movement now.

Talking: 2024

Two years later, a lot has changed in the world and in our communities. Talking: 2024 is a series of conversations intended to put more effort and intentionality into communicating the right information, at the right time, and in the right way, even knowing that we can never meet everyone’s expectations. 

It is important for us to talk to each other throughout the year – formally and informally. Over the next few months, the Community Affairs Committee of the Board of Trustees, senior leaders at the Foundation, and Maryana will be available to ask Wikimedians: what is on your mind about consequential events taking place in 2024, about the Foundation’s annual plan, or our longer-range priorities?

Let’s Talk

To take part in Talking: 2024, please sign up on wiki and make sure to check your talk page or have the “email this user” feature activated so someone from the Movement Communications team can get in touch with you. You can also send us an email to movementcomms@wikimedia.org.

Please let us know if you would like an individual call or if there are others you would like to join in the conversation with you. Is there a specific person you would like to talk to? Are you comfortable in English or can we support you with live interpretation? Plus, your availability. We’ll try our best to match.

Looking forward to talking together.

Greetings, We are happy to announce the Indic Wikimedia Hackathon 2023 by Indic MediaWiki Developers User Group! This exciting event is set to take place on the 16-17 December 2023. We invite all enthusiastic contributors, developers, translators, designers, technical writers, and anyone passionate about Wikimedia's technical spaces to join us. Event Details: Dates: 16, 17 … Continue reading Announcing Indic Wikimedia Hackathon 2023 and Invitation to Participate

Tech News issue #42, 2023 (October 16, 2023)

Monday, 16 October 2023 00:00 UTC
previous 2023, week 42 (Monday 16 October 2023) next

Tech News: 2023-42

weeklyOSM 690

Sunday, 15 October 2023 14:21 UTC

03/10/2023-09/10/2023

lead picture

Orientation for traffic signs for JOSM [1] © Stfmani | map data © OpenStreetMap contributors

Mapping

  • [1] Stfmani has created a JOSM map style to display the direction of traffic sign placement.
  • Anne-Karoline Distel has started mapping boot scrapers. She described them in detail with behind the scenes thoughts and even has a video.
  • barefootstache shared his experience of exploring the Croatian countryside and how to connect ‘no outlet’ paths.
  • MapComplete announced the addition of a feature to mark stores and restaurants that serve sugar-free, gluten-free, and lactose-free products.

Mapping campaigns

  • Mapeoabierto invites you to map Ecuador. The group has set up projects for Quito, Guayaquil and Latacunga, on the Tasking Manager and projects for Tena and Loja in MapSwipe.
  • Microsoft is planning an OpenStreetMap mapping campaign for an additional 11 countries (Moldova, Latvia, Lithuania, Estonia, Belarus, Georgia, Poland, Czech Republic, Slovakia, Mongolia, and Russia). Local mapping communities are invited to participate in this activity.

Community

  • Robert Grübler had expressed the wish for a business card that he can hand to astonished passers-by while mapping. The Austrian local chapter has designed a business card for exactly this purpose and finds that it is better received than an explanatory flyer.
  • The OSM India community pushed for India to be a subcategory on the OSM Community forum, which has been now been created. The category will be moderated with joint efforts from veteran contributors Contrapunctus, Sahilister, ReueId, Muzirian, and a more recent mapper NLBRT.
  • Volker Krause shared his experience of participating in the Karlsruhe Hack Weekend September 2023. During the event, he focused on the development of the KDE Itinerary application, specifically on the POI search and map styling features in the indoor map.
  • OpenStreetMap Belgium and Mapillary are teaming up to improve the availability of street-level imagery in the European Union.

OpenStreetMap Foundation

  • Dorothea Kazazi published some additional information about the OpenStreetMap Foundation Board member elections that will take place in December. The submission of questions to candidates is now open and the deadline for nominations is Saturday 21 October.
  • We would like to remind you, once again, that donating to the OSM Foundation is still possible.

Events

  • Trufi Association hosted a webinar ‘Unpacking the Power of Data in Active Mobility’, including a discussion of OSM data, with experts from the World Bank and ITDP.

OSM in action

  • CityLab, a project of the Technologie-Stiftung Berlin (funded by the City of Berlin), has published a map ‘Gieß den Kiez’ (Watering the neighbourhood). The map shows 800,000 urban trees whose watering can be interactively ‘adopted’ by people from the neighbourhood.
  • Geoconfirmed, a crowdsourced conflict zone geolocation website, now has a map of the conflict in Israel and Palestine.
  • In an article on gnulinux.ch caos described the different ways and approaches to get started with OpenStreetMap.

Open Data

  • Mikhail Sarafanov explained how you can combine OSM and Landsat data to verify areas of green zones.

Software

  • The original Trufi App for public transport in Cochabamba, Bolivia has just surpassed 100,000 installs with help from OSM (and TikTok). The app set the model for using OSM as the basis for transport route open data.

Releases

  • Every Door version 4.0 has been released. It comes with improved GPS accuracy, allows adding breaks to opening hours intervals spanning midnight, switched the default imagery to Mapbox, and disabled Maxar (which has forsaken us) plus much more.
  • Sarah Hoffmann announced the release of Nominatim version 4.3.1.
  • OpenStop version 0.5.0 has been released and now has an English version (in addition to its German version). Try it and add information about the bus stops near you!

Upcoming Events

Where What Online When Country
Abuja State of the Map Nigeria 2023 2023-10-11 – 2023-10-14 ng
Lorain County OpenStreetMap Midwest Meetup 2023-10-12 flag
Mapathon in support of peacekeeping 2023-10-12
Amsterdam Mapping the Future with MaptimeAMS 2023-10-12 flag
Gárdony OSM Fonó: Short editing introduction and remote help (online, live) 2023-10-12 flag
Hannover OSM-Stammtisch Hannover 2023-10-12 flag
Montrouge Rencontre contributeurs Sud de Paris 2023-10-12 flag
Anglet Rencontre groupe local Pays Basque Sud Landes 2023-10-13 flag
Berlin OSM-Stammtisch Berlin/Brandenburg 2023-10-13 flag
Berlin OSM Hackweekend Berlin 10/2023 2023-10-14 – 2023-10-15 flag
Budapest Walnut-themed surveying hike around Diósd (in person) 2023-10-14 flag
Henrietta Township Journey to the Centers of Michigan 2023-10-14 flag
Pittsburgh A Synesthete’s Atlas: Cartographic Improvisations Between Eric Theise and Trē Seguritan Abalos 2023-10-15 flag
Kalkaji Tehsil 3rd OSM Delhi Mapping Party 2023-10-15 flag
Waitematā FOSS4G SotM Oceania 2023 Conference 2023-10-15 – 2023-10-19 flag
The Municipal District of Kilkenny City Kilkenny History Mappers MeetUp 2023-10-16 flag
臺北市 OpenStreetMap x Wikidata 月聚會 #57 2023-10-16 flag
Lyon Réunion du groupe local de Lyon 2023-10-17 flag
Bonn 168. OSM-Stammtisch Bonn 2023-10-17 flag
Berlin Missing Maps – DRK Online Mapathon 2023-10-17 flag
Lüneburg Lüneburger Mappertreffen (online) 2023-10-17 flag
OSMF Engineering Working Group meeting 2023-10-18
Salt Lake City Salt Lake City monthly Map Night 2023-10-19 flag
Zürich Missing Maps Zürich Mapathon 2023-10-18 flag
Chambéry Mapathon débutant saison 23/24 CartONG 2023-10-23 flag
San Jose South Bay Map Night 2023-10-25 flag
Düsseldorf Düsseldorfer OpenStreetMap-Treffen 2023-10-25 flag
Wien 69. Wiener Stammtisch 2023-10-25 flag
[Online] OpenStreetMap Foundation board of Directors – public videomeeting 2023-10-26
Lübeck 136. OSM-Stammtisch für Lübeck und Umgebung 2023-10-26 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Elizabete, MatthiasMatthias, Nordpfeil, PierZen, Strubbl, Ted Johnson, TheSwavu, barefootstache, derFred, rtnf.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

New videos help student editors move work live

Thursday, 12 October 2023 20:49 UTC

Wiki Education is excited to announce that we have developed video-based training modules to assist students when they’re ready to make their work live on Wikipedia! The videos are now live and can be found in our training module on moving work from the sandbox. They can also be found on the resources tab of any course page. Both videos guide students on how to move work from their sandboxes into the live main space with one focusing on adding material to an existing article and the other tackling how to create an entirely new entry.

Screenshot of the video "Adding content to an existing article"
Instructional video “Adding content to an existing article”

Videos have long been one of our most-requested help materials from instructors and student editors alike. The biggest challenge in producing videos to support our students has been the frequency with which Wikipedia is updated, meaning that our videos will need regular revisions. We recognize that students learn best through a variety of different media, and hope these videos provide another avenue for students to learn some of the ins and outs of Wikipedia.

We chose the topics of moving work live for our first video topic as this can be one of the biggest technical challenges for student editors. While the Visual Editor eliminates the need for students to learn wiki code, it does make it tricky to move work live. One common mistake is copying while not in edit mode, meaning students lose the references they’ve carefully added to their sandbox. Students creating new articles often also get confused by different namespaces on Wikipedia. These videos are designed to help address these common obstacles for student editors moving content from a sandbox for the first time.

Special thanks to Wikipedia Expert Brianda Felix and Scholars and Scientists Program Manager Will Kent for all your hard work in making these videos possible!

This Month in GLAM: September 2023

Wednesday, 11 October 2023 15:04 UTC

Reflections on Spring 2023

Wednesday, 11 October 2023 00:18 UTC

Spring 2023 is the first term that I helped oversee from start to finish as a Wikipedia Expert. It is fitting that it be the first term that I summarize for the Wiki Education blog! Usually, I am focused on reviewing thousands of student contributions, but it is always a mind boggling experience when I take a step back, and look at the total numbers of our term. When I say thousands, it’s not just a casual hyperbole but rather a reference to the 5,980 students and the 7,430 articles that were edited as a part of the Spring 2023 term. To think that the 351 courses we supported, collectively added around 5 million words is a truly impressive feat. Aside from these numerical accomplishments, the real achievements are in the skill building and learnings that instructors and students experience throughout the Wikipedia project.

A collaborative, team-learning spirit (everyone’s a learner)

As a first time editor, editing Wikipedia can feel like a very solitary activity, especially when the onus falls on the editor to seek out the Wikipedia community either through the Teahouse or a Wikiproject that catches their attention. The students that participate in the Student Program are in a unique position, since they are learning the ropes alongside their peers and instructors. One instructor shared, “The Wikipedia assignment helped to create a collaborative, team-learning spirit: I hope that students emerge more aware of how we are all learners, professors included!” Another instructor commented, “Often we were teaching each other, sharing tips, and edits–it really enlivens the classroom because everyone is so invested in the outcome.” Invested not only in the outcome of their individual assignments, but in the success of all their fellow peers collaborating to contribute quality information to the largest knowledge repository in the world.

The kind of collaboration fostered among students and instructors through this project is a refreshing change to the typical, top to bottom structure of most college courses. The majority of our instructors have very little to no experience as Wikipedia editors. Our most successful instructors take a vulnerable step forward alongside students to learn the processes, and inevitably to make mistakes as a new editor does. As one instructor put it, “This helped me challenge the traditional classroom hierarchy of the instructor and student, as I was viewed as a fellow Wikipedia editor!” Fostering that sense of collaboration across traditional power dynamics helps establish a trust and curiosity among instructors and students that encourages an excitement to learn and complete the project.

Establishing instructor & student relationships

Another facet of the Wikipedia assignment that results from its collaborative nature, is the opportunity for instructors and students to get to know each other as people. They’re able to share their interests within the course or outside of it, connect over the difficulties of learning new material and reach an understanding that comes from being able to frequently engage in 1-on-1 discussions. An instructor shared how they had the opportunity to have more individualized time with their students and describes the “great impact” it had on their instruction, “on a weekly basis I get to work closely with them, answer questions about how to research, which sources are reliable, how to cite something, what information is notable, and more. It’s a 1-on-1 situation I’ve never had before and I really value that time.” Even if it is hybrid or in person, the novelty of the Wikipedia assignment encourages students to engage in more inquisitive discussions with their instructors and can result in connections over unexpected challenges, such as in the case of this instructor who said the assignment, “really helped us to bond over finding reliable sources – students really got it.”

What really stood out to me were the personal connections made between the students and instructors. It was a surprise at how the Wikipedia assignment served to cut through the mundanity of the usual filing in and out of classes (in person or virtually). The assignment “provided an opportunity to get to know student interests and tailor other work in line with those interests. That is, allowed more opportunity to get to know them as people.” Especially as we reel back from the remote learning years of the COVID pandemic, it’s wonderful to see in person connections taking place as a result of this project. Other instructors shared similar experiences, “The Wikipedia assignment helped me to get to know my students better! Since I let them choose their articles to edit, I learned more about their interests and career plans.”

Each term, we hear from instructors and students alike about the great sense of accomplishment they feel about the global impact and reach of this project. Now we might sound like a broken record player at this point, but it really highlights how empowering it is for our students to be able to add their little grain of knowledge to Wikipedia. To relay an instructor’s insight, “I think it got students excited to have ownership over a project. They seemed to appreciate that it was an assignment that had a bigger impact beyond just me and them.” To become an active participant in the Wikipedia project as an editor, and to then intentionally reflect and zoom out of the immediate communities and relationships that we hold in our daily lives, and be able to attempt to grasp the idea of our participation in the knowledge building process on the global scale that Wikipedia functions at, is not an easy ask of anyone, or our students. Yet that very thinking becomes a motivating factor in producing quality work, “The fact that the work they were contributing was going to be seen on a global level made the quality of the work improve significantly over previous assignments.”

Contributing information for a larger, global audience also helps students think critically about the accessibility of their information. Students that participate in the Wikipedia project are from all over the US and Canada, and bring their unique, diverse lens to the knowledge creation process. One instructor shared a powerful reflection about their students, “One thing that I realized working with a class full of first generation students was that writing for Wikipedia allowed them to share their work with their families, many of whom had limited English skills. In someways writing for Wikipedia, writing for more general, less educated, global audiences, meant writing in ways that were more accessible for their own families. Many students in my class translated their work into Spanish, making it even more accessible, they were so excited to write for the world, especially when that included people close to them.”

Before I let you go, a quick recap of those fantastic numbers:

  • Number of students: 5,980
  • Number of courses: 351
  • Words added: 5.08 million
  • Refs added: 51,100
  • Article edited: 7,430
  • New articles created: 466

Thanks to all of the new and returning instructors and students that took that leap of faith to give the Wikipedia assignment a chance! We are grateful for your participation and contributions to Wikipedia. Cheers to future collaborations!

Train the Trainer 2023 – call for participants

Tuesday, 10 October 2023 14:14 UTC

Would you like to share your passion for Wiki editing by running Wikipedia editing events to train new editors and communities? If so, our next Train the Trainer (TtT) course  is taking place November to December 2023 and will equip you with the skills and resources needed to deliver editing events. We would love to hear from you if you would like to take part. 

TtT trains volunteers who are keen to deliver Wikipedia editing events. Volunteer trainers play a key role in the delivery of Wikimedia UK programmes. They extend our work to underrepresented communities and support them to be part of Wikipedia and the other Wikimedia free knowledge projects. They train new and existing editors across the country, in-person, online or in hybrid sessions. Demand for training often outstrips staff capacity to fulfil, and we’re conscious that our existing networks do not always allow us to reach all the communities with whom we’d like to work.

This training will equip candidate trainers with the skills, experience and resources to deliver a standard ‘Introduction to Wikipedia’ edit-a-thon.In advance of the hybrid sessions in December, participants will complete a four week WikiLearn course, starting 9th November, with a commitment of approx 1-2 hours per week. We will be training volunteers on delivering online and in-person editing events and this year we will also be delivering training specifically on how to deliver hybrid events.

The session ‘Running Hybrid Events’ will be focused on equipping you with skills needed to run an event with participants online and in-person. This session is also open to applications from accredited trainers as well. It will also be an opportunity for new and accredited trainers to get to know each other.

Expressions of interest are welcomed from all. However, given the current demographic mix of our training network, we are particularly interested in hearing from Black, Asian and minority groups; women; and members of LGBT+ community. For this iteration of TtT we’ll be prioritising applications from areas of the UK in which we have gaps.

The in-person training will take place in London. So please do indicate if you would like to participate in-person or online. We have a budget to cover travel and accommodation costs, and will provide lunch and refreshments.

Course content and key dates

  • 9th November onwards – Four week WikiLearn self led course
  • Thursday 7th November 2023 /18:00-20:30/online: Introductory Session
  • Saturday 9th December 2023 /10:00-16:00/in-person and online: Train the Trainer Day 1 – new trainers 
  • Sunday 10th December 2023 /10:00-16:00/in-person: Running Hybrid Events – open to new and existing WMUK trainers. This session will be run by Bhav Patel.

What you could expect from us

  • Full training and support to deliver a standard ‘Introduction to Wikipedia’ edit-a-thon and similar events
  • Access to event materials
  • Ongoing support
  • Volunteer expenses where appropriate
  • Job references upon request

What we would expect from you

  • Familiarity with, or desire to increase your knowledge of the Wikimedia Projects, particularly Wikipedia.
  • Full attendance at the training course
  • To complete the online self led course (1-2 hours per week)
  • To lead training for a minimum of 2 events per year. This would be a mixture of third party events which we would field to you, and those you would organise yourself. Please note that we do receive requests for training to be delivered within office hours
  • To be responsive to communication from Wikimedia UK staff and fellow volunteers and Event Organisers, including in advance of the event.
  • To complete basic reporting, including returning sign up information
  • To represent Wikimedia UK well during the time in which you are volunteering
  • To adhere to our Safe Spaces policy, and the Code of Conduct

How to apply

Places are limited to make sure that each participant gets individual attention and feedback, so please apply via the following forms to express your interest. Call for interest will close on the 26th of October. You will hear from us by the 2nd of November to confirm your place.

Candidate Trainers Application Form (for new trainers)

Accredited Trainers Application Form (for existing trainers)

Further information

The Wikimedia UK Volunteer Trainer Role description.

The post Train the Trainer 2023 – call for participants appeared first on WMUK.

Tech News issue #41, 2023 (October 9, 2023)

Monday, 9 October 2023 00:00 UTC
previous 2023, week 41 (Monday 09 October 2023) next

Tech News: 2023-41

weeklyOSM 689

Sunday, 8 October 2023 11:01 UTC

26/09/2023-02/10/2023

lead picture

Map style "Streetscape Map" [1] © Alex Seidel aka Supaplex030 | map data © OpenStreetMap contributors

Mapping

  • Anne-Karoline Distel blogged about mapping Martello towers.
  • KlausG reported on how he had to deal with the direction of validity and the direction of view when tagging road signs and using the direction tag. When indicating a cardinal direction, e.g. with direction=E (for east), the direction is indicated in which the sign is facing – and this is usually opposite to the direction of the road.
  • Valerie Norton explained some of the alternative data sources they use to map hiking trails on OpenStreetMap.
  • The following proposals are waiting for your comment:

    • cycleway=waiting aid to map street furniture and devices for cyclists that are intended to make waiting, especially at traffic lights, more comfortable.
    • historic=millstone to map a large round stone used for grinding grain.

Community

  • Gloria del Puerto, from Paraguay, is the UN Mapper of the month for October.
  • OpenStreetMap Belgium has announced its October 2023 Mapper of the Month: Lorenzo Stucchi.
  • gvwaa blogged on how they systematically mapped the public bookcases in the Greater Longfellow community of Minneapolis.
  • Koreller blogged on how he motivated himself to map Pyongyang, an urban area 15 km wide by 12 km long. It took over a year, from April 2022 to May 2023, to finish this project, particularly adding buildings and roads.
  • mstock explained the workflow of the Programme Committee – Global State of the Map team.
  • OpenStreetMap asked, on Mastodon, for pictures of your preferred map style. Your answer is still missing 😉

Local chapter news

  • OpenStreetMap US announced that AllTrails, a social networking platform for outdoor enthusiasts, has joined as an OpenStreetMap US Supporter Member.

Events

  • On Friday 29 September the DevSeed team, at its office in Ayacucho 🇵🇪, conducted a workshop on OpenStreetMap mapping with the participation of students and professionals from different public and private entities in Ayacucho, Peru.

Humanitarian OSM

  • IHE Delft, in collaboration with HOT, has organised a mapathon to map areas affected by the Moroccan earthquake and Libyan floods.

Maps

  • [1] Alex Seidel, aka Supaplex030, tooted that the map style for the ‘Streetscape Map’ (a detailed map based on OpenStreetMap that focuses on the urban landscape and the spatial design of public and street space) is now available on GitHub. Alex has also published an impressive example from Neukölln, a district of Berlin. (We reported earlier with a gif in issue 598 of January 2022.)
  • Igor Sukhorukov has analysed, with his tool openstreetmap_h3, the neighbourhoods of Berlin , Moscow , and several other cities to see if they are attractive to live in.
  • James Killick reviewed how well several map making applications ranging from Esri, Google My Maps, Felt, to Microsoft Excel could map an address list. Meanwhile, Oliver Roick commented on this review.

OSM in action

  • Times change. So do our categories from time to time. ‘Switch2OSM’ is now ‘OSM in action’. In this category we want to point out examples of the use of OSM data. Feel free to recommend such or similar applications to our readers.

    A good first example is the GPS Hiking Atlas . It is a free travel guide that also handles the OSM licence conditions correctly.

  • caos has published a non-representative survey on the use of navigation apps, on GNU/Linux.ch.

Software

  • David Larlet presented the new permissions feature of uMap, which can be set for individual layers. This will help people who have contributive maps to set up a stable/fixed base layer. Further, it allows for even more control over which objects can be edited.

Programming

  • Manish Mehra has written a tutorial on how to build a real-time location sharing app.
  • Jacek Galowicz explained how to configure the OpenStreetMap rendering server on the NixOS operating system.
  • Mikhail Sarafanov shared how to use OSM data to select a threshold value of Normalised Difference Vegetation Index (NDVI), derived from Landsat data, in order to identify green zones in cities.

Releases

  • Frederik Ramm reported that the Geofabrik regional taginfo servers have been updated to the latest version. This latest version now includes a chronological report feature to see the progress of tag usage.
  • The September 2023 version of Organic Maps has been released.

Did you know …

Other “geo” things

  • OpenCage has released the German edition of their #geoweirdness geography trivia series.
  • James Killick introduced Mappedin Maker, a web application that can automatically convert building floor plans into indoor maps.

Upcoming Events

Where What Online When Country
Bengaluru OSM Bengaluru Mapping Party 2023-10-07 flag
Philadelphia Bobby Zankel’s Wonderful Sound, Upholstery & Veronica Mercedes Jurkiewicz/Carlos Santiago/Matt Engle/Eric Theise : A Benefit for Fire Museum Presents 2023-10-08 flag
Phường Quán Thánh OSM Vietnam weekly meeting 2023-10-07 flag
City Of Gosnells Social mapping Sunday: Beckenham 2023-10-08 flag
København OSMmapperCPH 2023-10-08 flag
Washington A Synesthete’s Atlas: Cartographic Improvisations Between Eric Theise, Jim Ryan, and Darien Baiza / Tangent Universes 2023-10-09 flag
Chambéry Mapathon débutant saison 23/24 CartONG 2023-10-09 flag
HOT Mapathon: Libya Floods 2023 2023-10-10
Osm2pgsql Virtual Meetup 2023-10-10
München Münchner OSM-Treffen 2023-10-10 flag
San Jose South Bay Map Night 2023-10-11 flag
Abuja State of the Map Nigeria 2023 2023-10-11 – 2023-10-14 ng
Brest Rencontre groupe local 2023-10-11 flag
Zürich OSM-Stammtisch 2023-10-11 flag
Lorain County OpenStreetMap Midwest Meetup 2023-10-12 flag
Hannover OSM-Stammtisch Hannover 2023-10-12 flag
Montrouge Rencontre contributeurs Sud de Paris 2023-10-12 flag
Berlin OSM Hackweekend Berlin 10/2023 2023-10-14 – 2023-10-15 flag
Henrietta Township Journey to the Centers of Michigan 2023-10-14 flag
Pittsburgh A Synesthete’s Atlas: Cartographic Improvisations Between Eric Theise and Trē Seguritan Abalos 2023-10-15 flag
Waitematā FOSS4G SotM Oceania 2023 Conference 2023-10-15 – 2023-10-19 flag
臺北市 OpenStreetMap x Wikidata 月聚會 #57 2023-10-16 flag
Berlin Missing Maps – DRK Online Mapathon 2023-10-17 flag
Bonn 168. OSM-Stammtisch Bonn 2023-10-17 flag
Lüneburg Lüneburger Mappertreffen (online) 2023-10-17 flag
OSMF Engineering Working Group meeting 2023-10-18
Zürich Missing Maps Zürich Mapathon 2023-10-18 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by MatthiasMatthias, PierZen, Strubbl, TheSwavu, TrickyFoxy, barefootstache, derFred.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

Beacons

Thursday, 5 October 2023 17:42 UTC

The Reading Web team recently discovered a bug in Firefox wherein a load event is fired when Firefox loads certain pages from its Back-Forward Cache (BFCache). To JavaScript on those pages, this event is a second load event (the first having been fired before the user navigated away from the page). This proved to be problematic for the cornerstone of our instrumentation, the EventLogging extension and delayed the deployment of Page Previews for approximately three months.

Background

The Page Previews instrumentation revolves around the notion of a link interaction. Every time the user hovers over a link with their mouse or focuses on a link with their keyboard, a new interaction begins. Every link interaction has a unique identifier (herein, a “token”).

The token is a 64-bit integer in hexadecimal format that’s generated using crypto#getRandomValues(), which should use a “well-established cryptographic PRNG seeded with high-quality entropy”. If this is the case, then the probability of token collision should approach 50% when over 4 billion tokens are generated.

On Monday, 27th March 2017, @Tbayer reported an unusually high number of events with duplicate tokens ("duplicate events") being inserted into the EventLogging MySQL table. Naturally, we assumed that this was being caused by one or more bugs in the instrumentation. During the following two sprints we tracked down and fixed what we thought was all of them. We even went so far as to instrument the instrumentation so that we could be confident that the fixes that were deployed weren’t causing more duplicate events to be logged. However, while our instrumentation was assuring us that Page Previews wasn't generating duplicate events, the number of events with duplicate tokens wasn't affected.

While we were tracking down and fixing bugs, @Tbayer investigated further and reported that Firefox v51 and v52 were sending circa 98% of the duplicate events. He also noted that the distribution of OS's that were sending duplicate events didn't deviate from the general distribution for pageviews. The Readers Web engineers tried to consistently reproduce the issue in Firefox but to no avail.

The Discovery

@Jdlrobson's aha! moment was when he noticed that a lot of the duplicate events were being logged on pages with dense clusters of links, which the user might be clicking accidentally and immediately navigating back. Immediately after trying this, he saw duplicate events being logged by Firefox v54 and raised a thorough – and startling – bug report.

Under certain conditions, Firefox will serialize an entire page, including the state of the JavaScript VM, to memory in order to make navigating backward and forward between pages very fast. This feature is often referred to as the "Back-Forward Cache" (the BFCache). @Jdlrobson discovered that when Firefox loads a page from the BFCache, the load event fired. From the point of view of the JavaScript on that page, however, this is a second load event, the first being fired before the page was serialized prior to navigation.

The EventLogging protocol subscribes to the event.* topic after the document and its sub-resources have loaded so that logging events doesn't consume resources before or during rendering the page. So, when Firefox resumed a page from the BFCache the subscriber was re-subscribed due to the second load event, leading to logging of exact duplicate events.

The strict conditions that the BFCache requires and the user having to navigate backward or forward to a serialized page to trigger the issue very neatly explain the wild peaks and troughs in the daily % of Popups events with duplicate tokens that we were seeing.

Workaround

Both @Jdlrobson and I suggested a workaround.

While I was trying to reproduce and understand how the bug was affecting the EventLogging codebase, I noticed that the DOMContentLoaded event was behaving correctly and only firing once. I suggested that EventLogging register the subscriber when the latter event fires. However, the subscriber is meant to be registered as late as possible so as not to impact page load time on resource constrained devices.

@Jdlrobson's suggestion, on the other hand, was delightfully simple: make EventLogging register the subscriber exactly once. The workaround itself was a 1 character change, which, as they often do, required a much larger comment to provide much-needed context.

Just under an hour after the change was deployed, the number of duplicate events per day dropped from between 10-30% to roughly 0.08%. This new rate is consistent with the background noise levels of duplication in our other much simpler instrumentation, e.g. ReadingDepth and RelatedArticles.

What We Learned

This issue took us a little over three months to track down and fix so we had /a lot/ of time to reflect on what we could've done better.

How Are We Doing?

The Page Previews instrumentation is complex. The complexity of the instrumentation is proportional to the number of general questions we're asking about the feature. This complexity meant that implementing, testing, and QA'ing the instrumentation all took considerable time. Since our initial hypothesis was that the issue(s) were in the instrumentation itself, we also spent comparable amounts of time trying to re-verify that the instrumentation was working correctly.

Were we to answer only one or two questions at a time, i.e. collect less data, we may have saved ourselves some time as it would've been simpler to test our hypothesis. As software engineers, we regularly sacrifice velocity for confidence in our implementation; I don't see how this is any different.

QA Without A Test Plan

The Readers Web kanban board has a Needs QA column. For better or worse, technical tasks like implementing the instrumentation, tend to skip this column as QA tends to be done at the browser level. Moreover, the Readers Web engineers didn't set an expectation that QA would be done as part of code review and if it was, then there was no test plan created before or after merging the code.

This situation has since greatly improved as a response to a variety of problems. We've agreed that all tasks should move from the Needs Code Review column to Needs QA after all of the associated changes have been merged into the codebase – if a task is exceptional, then it must be documented. We've also agreed that before a task can be moved into the Needs QA column it must have a test plan in its description.

We've yet to talk about setting expectations around planning, executing, and documenting QA as part of code review as a team. When we do, I'm sure that we'll write about it.

Integration Testing

One of the goals of the Page Previews architecture and a driving force behind the design of new changes is testability. The extension is remarkably well covered by its unit tests.

The instrumentation is no exception and is 100% covered by unit tests. However, the focus of the unit tests was and still is the correctness of the properties of the events produced by the system, which is a result of misrepresenting events as POJO's and not value objects or types. So while the vanity metric of coverage is maximized, we lack proof that key invariants are holding.

Focussing on higher level integration tests, fuzzing, and mutation testing to prove the instrumentation correct would have allowed us to immediately decline our initial hypothesis that the bug was in the instrumentation. On the other hand, the suite of unit tests will give us confidence when refactoring the system to allow for these changes.

Notes

  1. The title of this post comes from both the Beacon API, which is used to log events for the Page Previews instrumentation, and Cloudkicker’s Beacons album, which is pretty darn rad.
  2. The Discovery, another Cloudkicker album, is also equally rad 🤘

ആം ചിഹ്നം ഫോണ്ടുകളിൽ

Wednesday, 4 October 2023 11:00 UTC

ഏറെക്കാലമായി പലരും ചോദിക്കുന്ന ഒരു സംശയമാണ് അക്കങ്ങളുടെ കൂടെ ആം ചിഹ്നം എഴുതുമ്പോൾ വരുന്ന കുത്തുവട്ടം എങ്ങനെ കളയാമെന്ന്. 16-ാം, 18ാം തുടങ്ങിയ ഉദാഹരണങ്ങൾ. ഒട്ടുമിക്ക അപ്ലിക്കേഷനുകളുടെ പുതിയ പതിപ്പുകളിലും ഈ പ്രശ്നം കാണില്ല. ലിബ്രെഓഫീസിൽ പക്ഷേ ഈ പ്രശ്നം തുടരുന്നുണ്ട്. ഇതിനു പരിഹാരമായി ഫോണ്ടുകളിൽ ഒരു ചെറിയ പുതുക്കൽ നടത്തിയിട്ടുണ്ട്. മഞ്ജരി, ഗായത്രി, ചിലങ്ക, നൂപുരം ഫോണ്ടുകളുടെ പുതിയ പതിപ്പുപയോഗിച്ചാൽ ഈ പ്രശ്നം ഒഴിവാകും. പുതിയ പതിപ്പുകൾ smc.org.in/fonts എന്ന പേജിൽ നിന്നെടുക്കാം.

MetaPost previewer

Wednesday, 4 October 2023 10:30 UTC

I created a simple MetaPost playground website mpost.thottingal.in where people can quickly write MetaPost code and preview the result. This avoids the need of setting up MetaPost in your computer. Your edits in the code will be automatically executed. This is part of exploration to use MetaPost for typeface design. Checkout the Nupuram and Malini typefaces designed using MetaPost. I also started a repository of various type design concepts illustrated using MetaPost: https://github.

Wikimania 2023

Wednesday, 4 October 2023 09:30 UTC

I attended Wikimania 2023, an annual conference of people working on Wikipedia and other Wikimedia projects. This year’s conference was at Singapore. State of Machine Learning on the Wikimedia projects I presented a talk titled “State of Machine Learning on the Wikimedia projects”. Machine learning is used in many Wikimedia projects. This talk was be round up of various projects that use ML. I talked about: How Machine learning is used in our project, the benefits and impact.

SMWCon Fall 2023 announced

Wednesday, 4 October 2023 09:14 UTC

July 31, 2023

SMWCon Fall 2023 will be held in Germany

Save the date! SMWCon Fall 2023 will take place December 11 - 13, 2023 in Paderborn, Germany. The conference is for everybody interested in wikis and open knowledge, especially in Semantic MediaWiki. You are welcome to propose a related talk, tutorial, workshop and more via the conference page.

Sentence segmentation is a fundamental process in natural language processing. It involves breaking down a given text into individual sentences, a task that finds applications in various contexts. Whether you need to split a paragraph into sentences for further analysis or present sentence boundaries in a user-friendly frontend application, sentence segmentation is crucial. At first glance, identifying sentence boundaries might seem straightforward – just look for a period or full stop.