Help:Using the Wayback Machine

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Shortcuts:

This page gives information about using the Wayback Machine to cite archived copies of web pages used by articles. This is useful if a web page has changed, moved, or disappeared; links to the original content can be retained.

Editors are also encouraged to add an archive link as a part of each citation, or at least submit the referenced URL for archiving, at the same time that each citation is created or updated.

Visit the web form at http://archive.org/, enter the original URL of the web page of interest in the "Wayback Machine" search box and then select BROWSE HISTORY. The next screen may

  • redirect to the latest archived copy,
  • show a box near the bottom of the page with a link inviting the user to Save this url in the Wayback Machine,
  • show a calendar listing the snapshot dates for all archived copies of that page, or
  • show an error message explaining why the page cannot be archived.

URL formats

A link to the Wayback Machine usually starts with http://web.archive.org/web/ followed either by a single asterisk or a 14-digit datetime reference, then a slash and finally the URL of the original web page.

Initial request

The following example usually shows a calendar linking to all archived copies of the main index page of Wikipedia.

Use the above URL format to discover the extent to which the requested page has been archived. Click one of the highlighted dates to select that specific archived copy.

If the target web page hasn't yet been archived, a box appears near the bottom of the page with a link inviting the user to Save this url in the Wayback Machine. Clicking this invokes a request to

The above URL will show the current version of the requested web page and start the process that will attempt to archive the web page. If successful, the archived copy will become available immediately the process is completed.

For some requested pages, the Wayback Machine will return an error message explaining why that particular page has not and cannot be archived. In those cases, try a different archiving service such as WebCite.

Specific archive copy

Once the target web page has been archived, each of the specific dated archives can be individually requested using the format shown below.

The next example links to the archived copy of the main index page of Wikipedia exactly as it appeared on 30 September 2002 at 12:35:25 pm in the UTC timezone. The datetime format is YYYYMMDDhhmmss.

Use the above format to link directly to a specific archive copy.

Adding an asterisk immediately after the date (or in place of it) is a quick way to show the calendar view of all archived copies.

The following flags can be appended to the datetime field to modify the format in which the archived content is displayed12:

  • id_ Identity - perform no alterations of the original resource, return it as it was archived.
  • js_ JavaScript - return document marked up as JavaScript.
  • cs_ CSS - return document marked up as CSS.
  • im_ Image - return document as an image.

Depending on the circumstances under which the page images were archived, the rendering of these pages may not be consistent; therefore, it is recommended that the flags be tested before being incorporated into Wikipedia documents. When linking to pages which are no longer available, the id_ flag is the most transparent in presenting the intent of the original page, as the following example demonstrates for the Wikipedia page as it appeared on 30 September 2002 at 12:35:25 pm in the UTC timezone, without the Wayback Machine Toolbar being displayed. The datetime format is YYYYMMDDhhmmss with id_ appended.

Use the above format to link directly to a specific archive copy without the display of the Wayback Machine Toolbar.

Latest archive copy

The next example links to the most current version of the archived page.

Using the above format is discouraged. The request is redirected to the longform URL, including 14-digit datetime stamp, for the latest archive copy thereby defeating the purpose of using the archive to link directly to a specific old version of the page.

Likewise, a similar archive URL but with the number 1 links to the oldest archive copy.

See also: Advanced URL locator hints and tips – Internet Archive

Limitations

Before October 2013 it would often take weeks or months for an archived copy of a web page to become available. Nowadays, a request to archive a particular web page is actioned immediately and the result usually made available within minutes.

The Internet Archive honors the robots exclusion standard and will not archive sites that disallow access.

For example, The New York Times has a robots.txt page at http://www.nytimes.com/robots.txt which includes:

User-agent: *
Disallow: /aponline/
Disallow: /archives/
Disallow: /reuters/

Thus, archive requests for URLs within those folders, and any other similarly listed folder of the New York Times website will be rejected.

The Washington Post uses the file http://www.washingtonpost.com/robots.txt which includes:

User-agent: ia_archiver
Disallow: /

This directive explicitly blocks the Internet Archive from accessing their entire website.

JavaScript bookmarklet

For a one-click-button in your browser, to use when you're at a dead-link web page, and go to whatever archive.org has saved for it, store the following code in a bookmark on your browser's toolbar, with a label like Wayback (e.g. Wayback):

javascript:void(location.href='http://web.archive.org/web/*/'+document.location.href)

For a one-click-button in your browser, to use when you're at a web page that you would like to archive, store the following code in a bookmark on your browser's toolbar, with a label like Wayback Save (e.g. Wayback Save):

javascript:void(location.href='http://web.archive.org/save/'+document.location.href)

Mozilla Firefox Add-on

If you are using a Mozilla Firefox browser, you can install a 404 error add-on which will automatically try to detect a missing page in Wayback Machine, and provides a button similar to the one described above.

Using the wayback template

{{wayback}} can create these links for you; use the |url=, |title= and |date= parameters to specify the URL, title and date. For example:

  • {{wayback |url=http://www.wikipedia.org/ |title=Wikipedia |date=20010727112808 }}
    Wikipedia at the Wayback Machine (archived July 27, 2001)

Without the date included:

Note that the date parameter defaults to *

Working with cite templates

{{citation}}, and all of the Citation Style 1 templates support the |archiveurl= parameter (Note that the |archivedate= parameter is also required!). Other citation templates may also support |archiveurl= — see their documentation.

  • {{citation |url=http://www.wikipedia.org/ |title=Wikipedia Main Page |archiveurl=http://web.archive.org/web/20020930123525/http://www.wikipedia.org/ |archivedate=2002-09-30 |accessdate=2005-07-06 }}
    "Wikipedia Main Page". Archived from the original on 2002-09-30. Retrieved 2005-07-06. 
  • Where an archived resource notes its original publication date, use |date= in place of |accessdate=.
  • When adding an archive URL to any citation where the original resource URL is still working, it is useful to add the |deadurl=no parameter. Should the original URL stop working, it is a simple job to either change this to |deadurl=yes or remove the parameter.

See also

References

  1. ^ "Wayback Administrator Manual". Internet Archive. Archived from the original on 2014-01-20. 
  2. ^ "How can I view a page without the Wayback code in it?". Internet Archive. Archived from the original on 2013-08-06. 







Creative Commons License