LPedia:Revision History Reconstruction Policies

From LPedia
Jump to navigation Jump to search

This is a breakdown of requirements, problems, and solutions for reconstructing partial revision histories from the Internet Archive for articles that have been deleted from other wikis.

As of April 2017, this task is prohibitively difficult and the steps for doing this described here should be considered experimental and in need of improvement.

Requirements

  1. The resulting article should be tagged in according to its licence terms. For example, a Wikipedia article that was deleted prior to Wikipedia's transition to CC-BY-SA 3.0 should have the {{Wikipedia-GFDL}} on the final edit at a minimum. If practical, it should also be included with each reconstructed edit.
  2. The list of edits for the article, such as this one for our article on the Libertarian Party of Louisiana needs to be recovered, especially for licences that require full attribution. which is most of them. For Wikipedia, it may be possible to acquire such a list by asking an admin on the #wikipedia-en IRC channel on Freenode.net
  3. An effort must be made to recover every surviving edit from any available source, such as the Internet Archive. While theoretically possible for an admin on Wikipedia to take steps to retrieve a full history export file, this does not presently seem to be something they are able to perform for deleted articles without getting themselves into trouble.
  4. An XML file needs to be created with every edit's author and any comments represented -- whether the edit remains extant or not -- that can be imported by an admin. Each unavailable revision should include the {{Lost Edit}} tag.
  5. After testing, an admin needs to import the XML file to LPedia.

Process

Acquire Revision History

If it is not possible to recover this, it may not be possible to include the article on LPedia without violating the terms under which the page was made available.

Find All Recoverable Edits

For example, these for the Carol Moore article: http://web.archive.org/web/20040101000000*/http://en.wikipedia.org/wiki/Carol_Moore

Reconstruct Each Edit into MediaWiki Format

This is one of the two aspects of this that are a significant problem.

Markup like this:

<p><b>Carol Moore</b> (b. 1948) is an <a href="/web/20080228101817/http://en.wikipedia.org/wiki/Ethicist" title="Ethicist">ethicist</a> and <a href="/web/20080228101817/http://en.wikipedia.org/wiki/Systems_theory" title="Systems theory">systems theorist</a> best known for her theories of <a href="/web/20080228101817/http://en.wikipedia.org/wiki/Secession" title="Secession">secession</a> and her analysis of <a href="/web/20080228101817/http://en.wikipedia.org/wiki/Mahatma_Gandhi" class="mw-redirect" title="Mahatma Gandhi">Mahatma Gandhi</a>'s methods as an "intuitive systems theorist".<sup id="_ref-0" class="reference"><a href="#_note-0" title="">[1]</a></sup></p>

And this:

<li id="_note-0"><b><a href="#_ref-0" title="">^</a></b> Moore, C: "The Nation State Break Up" NOMOS Magazine, 1985; Carol Moore, <a href="http://web.archive.org/web/20080228101817/http://www.context.org/ICLIB/IC12/Moore.htm" class="external text" title="http://www.context.org/ICLIB/IC12/Moore.htm" rel="nofollow">From Empire To Ecstasy A vision for the coming transition</a>, In Context Magazine, Winter 1985/86, Page 47; Carol Moore, <a href="http://web.archive.org/web/20080228101817/http://www.sacred-texts.com/bos/bos239.htm" class="external text" title="http://www.sacred-texts.com/bos/bos239.htm" rel="nofollow">Consciousness and Politics</a>, reprint at <a href="http://web.archive.org/web/20080228101817/http://sacred-texts.com" class="external text" title="http://sacred-texts.com" rel="nofollow">SacredText.Com</a>.</li>

Needs to be converted to:

'''Carol Moore''' (b. 1948) is an [[ethicist]] and [[systems theorist]] best known for her theories of [[secession]] and her analysis of [[Mahatma Gandhi]]'s methods as an "intuitive systems theorist".<ref>Moore, C: "The Nation State Break Up" NOMOS Magazine, 1985; Carol Moore, [http://www.context.org/ICLIB/IC12/Moore.htm From Empire To Ecstasy A vision for the coming transition], In Context Magazine, Winter 1985/86, Page 47; Carol Moore, [http://www.sacred-texts.com/bos/bos239.htm Consciousness and Politics], reprint at [http://sacred-texts.com/ SacredText.Com].</ref>

While this probably can't be entirely automated, it's possible that this might be partially automatable. One way to do this is in the absence of automation is to copy the generated text and then separately add the markup, including the markup for links.

Assemble the XML File to Import

This is the other hard part. In addition to correctly following the format for the export/import XML file, markup needs to be converted from the likes of:

[[Image:Carol Moore at No Armageddon rally.jpg|thumb|Carol Moore speaks at the [[DC Anti-War Network]]'s "No Armageddon For Bush" rally on [[June 6]], [[2006]].]] Carol Moore (b. 1948) is an [[ethicist]] and [[systems theorist]] best known for her theories of [[secession]] and her analysis of [[Mahatma Gandhi]]'s methods as an "intuitive systems theorist".<ref>Moore, C: "The Nation State Break Up" NOMOS Magazine, 1985; Carol Moore, [http://www.context.org/ICLIB/IC12/Moore.htm From Empire To Ecstasy A vision for the coming transition], In Context Magazine, Winter 1985/86, Page 47; Carol Moore, [http://www.sacred-texts.com/bos/bos239.htm Consciousness and Politics], reprint at [http://sacred-texts.com/ SacredText.Com].</ref>

To:

[[Image:Carol Moore at No Armageddon rally.jpg|thumb|Carol Moore speaks at the [[DC Anti-War Network]]'s &quot;No Armageddon For Bush&quot; rally on [[June 6]], [[2006]].]] Carol Moore (b. 1948) is an [[ethicist]] and [[systems theorist]] best known for her theories of [[secession]] and her analysis of [[Mahatma Gandhi]]'s methods as an &quot;intuitive systems theorist&quot;.&lt;ref&gt;Moore, C: &quot;The Nation State Break Up&quot; NOMOS Magazine, 1985; Carol Moore, [http://www.context.org/ICLIB/IC12/Moore.htm From Empire To Ecstasy A vision for the coming transition], In Context Magazine, Winter 1985/86, Page 47; Carol Moore, [http://www.sacred-texts.com/bos/bos239.htm Consciousness and Politics], reprint at [http://sacred-texts.com/ SacredText.Com].&lt;/ref&gt;

This second step is probably completely automatable, but the utility or script has yet to be written. A Python subroutine that converts the likes of " to &quot; does exist and is included in many of LPedia's bots.