Help:Import

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Importers, transwiki importers, and administrators can apply two types of import using the Special:Import page:

  • Transwiki import or interwiki import: import pages directly from another WMF wiki; the settings of the destination wiki determine which source wikis are enabled; message with id 'import-interwiki-text' (talk) appears. Transwiki imports can be performed by administrators and transwiki importers.
  • Importupload: import a file in a special XML format produced by exporting pages from another wiki; message with id 'importtext' (talk) appears. This type of import is restricted to importers and stewards.

Others apply Wikipedia:Requests for page importation.

After importing, you would be able to see any new pages which were in the file. Where pages had the same name as existing pages in the wiki, the pages will be overwritten by the content from the file if the timestamp of the article is newer. If an error occurred during the import, e.g. due to badly formatted XML in the file, then you may find the import is partially complete (some pages imported, but not all). Since pages are overwritten, attempting the import again should not be a problem.

If you included history information when you performed the export, then you should also see information about the edits in the 'history' of the imported pages, and in the user contributions. The edits will not show up in 'recent changes' (neither positioned at the time of the original edit, nor at the time of importing).

Editing the import file[edit]

In the case of upload import, because of the simple readable file format the XML file can easily be edited between exporting and importing. This should be done with caution and integrity, one can make antedated edits and use false user names, and in combination with deletion, one can "change history". Applications of this editing include:

  • adding a note to the edit summary about the importing
  • changing user names and/or page names to avoid name conflicts (just between the title tags and between the username tags or also in links and signatures)
  • changing namespace names into the generic or the applicable ones (ditto)

Note that if two versions of the page have the same timestamp (because one was uploaded with the same timestamp as a preexisting version), the later (imported) version will show up in the edit history but not in the article itself.

Merging histories and other complications[edit]

If the import includes history information, and the edits involved a user name which in the importing project is used by somebody else, then upload import should be applied, and the occurrences of the user name in the XML file should first be replaced by another name, to avoid ambiguity. If the user name was not used yet in the importing project then the user contributions are available anyway, although an account is not automatically created.

Just like when a page is referred to in a link, and/or put in a URL, generic namespace names are automatically converted, and if a prefix is not a namespace name the page will arrive in the main namespace. However, e.g. "Meta:" may be ignored (dropped) on a project that uses that prefix for interwiki linking. It may be desirable to change it in the XML file to "Project:" before importing.

If a page name exists already, importing revisions of a page with that name causes the page histories to be merged. Note that after inserting a revision between two existing revisions in the page history, the change made by the user who made the next edit seems different from what it actually has been: to see the actual change made by the user one has to take the diff between the two already existing revisions, not the diff with respect to the inserted one. Therefore this should not be done except to reconstruct the true page history.

A revision is not imported if a revision of the same date, and exactly the same time up to the second, exists already. In practice this occurs only when the revision has already been imported before, or when the revision one attempts to import was imported the other way around, or both were imported from a third site.

An edit summary may refer to, and possibly link to, another page. This may be confusing when the page has been imported but the target page has not.

The edit summary does not automatically show that the page has been imported, but in the case of upload import that can be added to the edit summaries in the XML file before importing. That can avoid some potential sources of ambiguity and/or confusion. When editing the XML file with find/replace, note that adding a text to the edit summaries requires distinguishing between edits which already have an edit summary, hence comment tags in the XML file, and those without these tags. If there are multiple pairs of comment tags, only the last one is effective.

Large-scale transfer[edit]

For a large-scale transfer, somebody with sufficient system privileges can move data within the server, which is more practical than sending large XML files from the server to a user's local computer and then back to the server.

Large files may be rejected for two reasons. The PHP upload limit, found in PHP configuration file php.ini

 ; Maximum allowed size for uploaded files.
 upload_max_filesize = 20M

And also the hidden variable limiting the size in the input form. Found in the mediawiki source code, includes/specials/SpecialImport.php

<input type='hidden' name='MAX_FILE_SIZE' value='20000000' />

Maybe you should change following four derectives in php.ini

; Maximum size of POST data that PHP will accept.
post_max_size = 20M

max_execution_time = 1000  ; Maximum execution time of each script, in seconds
max_input_time = 2000      ; Maximum amount of time each script may spend parsing request data

; Default timeout for socket based streams (seconds)
default_socket_timeout = 2000

See also[edit]