Minecraft Wiki:Wiki rules/Generative AI policy

From Minecraft Wiki
Jump to navigation Jump to search
Shortcuts

This policy prohibits the usage of generative AI on the Minecraft Wiki, and defines exceptions where the usage is allowed. Generative AI includes (and is not limited to) large language models (LLMs, e.g. ChatGPT) and text-to-image models (e.g. DALL-E).

Can I use generative AI to edit the Minecraft Wiki?

TL;DR:  No.

Using generative AI (for example, ChatGPT) to edit the wiki is not allowed. This includes writing new article content, rewording articles, and uploading AI-generated images. It also includes writing forum proposals or talk page messages.

Generative AI excels at creating content that looks good at first glance, but may be factually incorrect, unhelpful, or completely hallucinated. This content can be produced incredibly quickly and easily, and people using generative AI to edit the wiki often lack the expertise or awareness to verify whether the output is factual and useful to readers. This creates a burden on our volunteers who moderate new wiki edits.

Additionally, the copyright status of output from large language models (LLMs) is murky, and LLMs used widely by consumers (notably ChatGPT) have been known to output, verbatim, long chunks of content from copyrighted sources. Our copyright policy says: "Work which you do not hold the copyright to, or which is not available under [CC BY-NC-SA] license, should not be added to the wiki." The use of LLMs or other generative AI to contribute to the wiki may violate someone's copyright.

Exceptions

There might be situations where AI tools, if used appropriately, could reduce the cost of human resources on specific wiki projects, especially when the AI tools are not used in a generative nature. The currently allowed methods of using AI to contribute to this wiki are:

  • Writing audio captions, for a sound or video file
  • Adding transparency to an image
  • Voice generation, for demonstrating pronunciation
  • Machine translation, if checked and edited by human
  • Grammar and spell checking, except for unsupervised automated edits

If you want to use the above AI tools to make any sort of changes to the wiki, the usage must be indicated clearly in the edit summary. Additional exceptions may be approved by the community.

Can I use the content of the Minecraft Wiki as training data for a generative AI project?

TL;DR: Unless you attribute outputs to the sources that contributed to a particular output (the articles and their authors), then using wiki content as training data is likely a violation of the CC BY-NC-SA license.

Content on the Minecraft Wiki is licensed under Creative Commons BY-NC-SA 3.0. It is an unsettled legal question whether fair use doctrine applies to generative AI training, and a blockbuster court case (New York Times v. OpenAI) is likely to considerably change the landscape.

This does not constitute legal advice, but the Minecraft Wiki's current position is that using wiki content for training data does not constitute fair use, and therefore, any generative AI using wiki content must follow the restrictions of the CC BY-NC-SA license. See the full license text for full details, but this essentially means:

  • attribution – you must provide appropriate credit to the copyright holders
  • non-commercial – the generative AI results must not be used for commercial purposes
  • share-alike – the generative AI results must be licensed under the same license (CC BY-NC-SA 3.0)

The "attribution" requirement is very hard to meet for training data. For reuse of a single page, linking to ?action=history is generally considered acceptable – see guidance provided by Creative Commons wiki and Wikipedia for more information. For generative AI models that incorporate the entire wiki into their training, it is not clear how they would give proper credit to the authors of the articles that contributed to particular output. This concept of "interpretability" is one of the central open problems in AI research, and as of this writing, we are not aware of any large language models that are able to effectively attribute their outputs to specific training documents.

It is currently incredibly difficult for the operator of a large language model or other generative AI to provide the level of attribution that is required under the CC BY-NC-SA 3.0 license. Based on Weird Gloop's position, the community strongly recommends against training generative AI on content from the Minecraft Wiki.

References

This policy is adapted from Generative AI policy on Weird Gloop Meta Wiki, licensed under CC BY-NC-SA 3.0.

Navigation