<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.ywamit.com" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>YWAM Information Technology - The Wayback Machine - Comments</title>
 <link>http://www.ywamit.com/node/172</link>
 <description>Comments for &quot;The Wayback Machine&quot;</description>
 <language>en</language>
<item>
 <title>Robots work</title>
 <link>http://www.ywamit.com/node/172#comment-376</link>
 <description>&lt;p&gt;I&#039;ve just added the robots.txt file to my personal site and another ministry site I look after. Went to check on the wayback website and it was gone from there... Thanks&lt;/p&gt;
</description>
 <pubDate>Wed, 30 Aug 2006 08:53:48 +0100</pubDate>
 <dc:creator>alex.costa</dc:creator>
 <guid isPermaLink="false">comment 376 at http://www.ywamit.com</guid>
</item>
<item>
 <title>Exclude with robots.txt</title>
 <link>http://www.ywamit.com/node/172#comment-375</link>
 <description>&lt;p&gt;I looked in to the wayback machine again and realized,as Mike has pasted above, that you easily can exclude your site with a simple robots.txt file:&lt;/p&gt;
&lt;p&gt;#Keep the site from being indexed by the way back machine, web.archive.org&lt;br /&gt;
User-agent: ia_archiver&lt;br /&gt;
Disallow: /&lt;/p&gt;
&lt;p&gt;A search for our site now only brings up a screen saying that the webmaster has excluded it from the archive. Not even the older archived versions are available anymore.&lt;br /&gt;
Absolutely worth the effort if you want hide content in the future.&lt;/p&gt;
</description>
 <pubDate>Tue, 29 Aug 2006 13:23:24 +0100</pubDate>
 <dc:creator>wileur</dc:creator>
 <guid isPermaLink="false">comment 375 at http://www.ywamit.com</guid>
</item>
<item>
 <title>The Wayback Machine</title>
 <link>http://www.ywamit.com/node/172</link>
 <description>
&lt;p&gt;Browse through 55 billion web pages archived from 1996 to a few months ago.&lt;/p&gt;&lt;a title=&quot;http://www.archive.org/web/web.php&quot; target=&quot;_blank&quot; href=&quot;http://www.archive.org/web/web.php&quot;&gt;http://www.archive.org/web/web.php&lt;/a&gt;&amp;nbsp;&lt;p&gt;This is even copied (mirrored) to the Grat Library of Alexandria in &lt;strong&gt;Egypt &lt;/strong&gt;at &lt;a href=&quot;http://archive.bibalex.org/&quot;&gt;http://archive.bibalex.org&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;This has implications for all of us. Long since &#039;dead&#039; sites can be resurrected and unwanted references restored.&lt;/p&gt;&lt;p&gt;Technically it is rather cool, and sometimes two versions of a page are captured on one day. Check the site to make sure you have an appropriate &#039;robots.txt. file if you do not want&amp;nbsp; to be publicly archived there. (makes you think though about other agencies that may be collecting data...)&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;Here is a couple of &#039;tidbits&#039; from the FAQ.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;&lt;strong&gt;How large is the Wayback Machine?&lt;/strong&gt;&lt;/font&gt;
                              &lt;/p&gt;
&lt;p&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;The
Internet Archive Wayback Machine contains almost 2 petabytes of data
and is currently growing at a rate of 20 terabytes per month. This
eclipses the amount of text contained in the world&#039;s largest libraries,
including the Library of Congress.&lt;/font&gt;&lt;/p&gt;

                                                            
&lt;p&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;&lt;strong&gt;&lt;a name=&quot;10&quot;&gt;&lt;/a&gt;What type of machinery is used in this Internet Archive?&lt;/strong&gt;&lt;/font&gt;&lt;/p&gt;

                              
&lt;p&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;Much
of the Internet Archive is stored on hundreds of slightly modified x86
servers. The computers run on the Linux operating system. Each computer
has 512Mb of memory and can hold just over 1 Terabyte of data on ATA
disks. However we are developing a new way of storing our data on a
smaller machine. Each machine will store 1 terabyte. For more
information go to &lt;a href=&quot;http://www.petabox.org/&quot;&gt;www.petabox.org&lt;/a&gt;. 
&lt;/font&gt;&lt;/p&gt;

                                                            
&lt;p&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;&lt;strong&gt;&lt;a name=&quot;11&quot;&gt;&lt;/a&gt;How do you archive dynamic pages?&lt;/strong&gt;&lt;/font&gt;&lt;/p&gt;

                              
&lt;p&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;There
are many different kinds of dynamic pages, some of which are easily
stored in an archive and some of which fall apart completely. When a
dynamic page renders standard html, the archive works beautifully. When
a dynamic page contains forms, JavaScript, or other elements that
require interaction with the originating host, the archive will not
contain the original site&#039;s functionality.&lt;/font&gt;&lt;/p&gt;

                                                            
&lt;p&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;&lt;strong&gt;&lt;a name=&quot;12&quot;&gt;&lt;/a&gt;Why are some sites harder to archive than others?&lt;/strong&gt;&lt;/font&gt;&lt;/p&gt;

                              
&lt;p&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;If
you look at our collection of archived sites, you will find some broken
pages, missing graphics, and some sites that aren&#039;t archived at all.
Here are some things that make it difficult to archive a web site: &lt;/font&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;Robots.txt   -- We respect robot exclusion headers. &lt;/font&gt;&lt;/font&gt;&lt;/li&gt;&lt;li&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;Javascript
-- Javascript elements are often hard to archive, but especially if
they generate links without having the full name in the page. Plus, if
javascript needs to contact the originating server in order to work, it
will fail when archived. &lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/li&gt;&lt;li&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;Server
side image maps -- Like any functionality on the web, if it needs to
contact the originating server in order to work, it will fail when
archived. &lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/li&gt;&lt;li&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;Unknown
sites -- The archive contains crawls of the Web completed by Alexa
Internet. If Alexa doesn&#039;t know about your site, it won&#039;t be archived.
Use the Alexa Toolbar (available at &lt;a href=&quot;http://www.alexa.com/&quot;&gt;www.alexa.com&lt;/a&gt;),   and it will know about your page. Or you can   visit Alexa&#039;s Archive Your Site page at &lt;a href=&quot;http://pages.alexa.com/help/webmasters/index.html#crawl_site&quot;&gt;http://pages.alexa.com/help/webmasters/index.html#crawl_site&lt;/a&gt;.&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/li&gt;&lt;li&gt;&lt;font size=&quot;2&quot; face=&quot;Arial, Helvetica, sans-serif&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;&lt;font size=&quot;2&quot; face=&quot;Arial,&quot;&gt;Orphan
pages -- If there are no links to your pages, the robot won&#039;t find it
(the robots don&#039;t enter queries in search boxes.)&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
 <comments>http://www.ywamit.com/node/172#comments</comments>
 <category domain="http://www.ywamit.com/taxonomy/term/62">Risks</category>
 <pubDate>Mon, 29 May 2006 13:51:07 +0100</pubDate>
 <dc:creator>Mike</dc:creator>
 <guid isPermaLink="false">172 at http://www.ywamit.com</guid>
</item>
</channel>
</rss>
