Just over a week ago, the Internet Archive upgraded their support for Memento in the Wayback Machine. The Wayback Machine has had native Memento support for about 2.5 years, but they've just recently implemented a number of changes and now the Wayback Machine and version 08 of the Memento Internet Draft are synchronized. The changes will be mostly unseen by casual users, but developers will appreciate the changes that should make things even simpler. Perhaps even more importantly, these changes have been reflected in the open source version of the Wayback Machine, so the numerous sites that are running this software (for example, see the IIPC member list) should enjoy native Memento support upon their next upgrade.
The first and most significant change is that there is now just a single URI prefix for mementos (URI-M). Previously, the URI-M discovered through the Wayback Machine's UI was different from the URI-M discovered through the Memento interface (e.g., using the MementoFox add-on). For example, for the original resource thecribs.com (@ 2003-09-30) you used to have both:
Wayback UI:http://web.archive.org/web/20030930231814/http://www.thecribs.com/
Memento: http://api.wayback.archive.org/memento/20030930231814/http://www.thecribs.com/
(The second URI is not linked; the api.wayback.archive.org/* interface is now turned off and those URIs now produce 404s.)
The problem was that a web.archive.org URIs rewrote the URIs in the HTML to point back in to the archive (i.e., "Archival Replay Mode"), but lacked the necessary Memento-Datetime and Link HTTP response headers. The api.wayback.archive.org URIs had the necessary HTTP response headers, but lacked the rewritten HTML for Archival Replay Mode. So while both types of URIs (web.archive.org and api.wayback.archive.org) worked in their respective environments, a Memento user could not share (via email, Twitter, etc.) an api.wayback.archive.org URI with a non-Memento user, and likewise a Memento user would not have the additional Memento functionality with a web.archive.org URI.
Long story short: a single URI does it all now:
Never noticed that the dual URI thing? That's fine, neither did most other people. I included the above details only to document how things used to work in case you run across an old-style api.wayback.archive.org URI. Otherwise, don't worry about it.
The URI merger also changes the base URIs for the Timemaps and Timegates:
http://web.archive.org/web/timemap/link/{URI-R}
http://web.archive.org/web/timegate/{URI-R}
The second change that may impact people is that TimeMaps now support paging. The page size is large (currently 10,000), but popular sites like www.cnn.com have > 14,000 mementos. Instead of having explicit "page 1", "page 2", etc., paged TimeMaps now have a "self" link with "from" and "until" parameters to indicate the left-hand and right-hand temporal endpoints, respectively, for this TimeMap. It then links to the next TimeMaps with a "from" parameter to indicate the left-hand temporal endpoint of the next page (the "until" value might not be known if the last page is still being "filled", so to speak). It is easier to look at the example:
Together, the multiple pages form a single logical TimeMap and the pages are only for convenience of transport. The server determines how many links go into a single page. Most TimeMaps have < 10,000 URI-Ms so you might not notice this change right away, but please be aware that your applications can not longer assume they're getting the entire TimeMap with a single HTTP GET.
The third change is about defining a standard way for the archive to tell the client "this is not a memento, so do not attempt memento processing on it"*. This is new in section 4.5.8 of version 8 of the Internet Draft. The idea is that most of the resources embedded in, for example, http://web.archive.org/web/20030930231814/http://www.thecribs.com/ are mementos captured at some point in the past. However, some of the images, javascript, etc. are injected by the archive to assist in playback and are not actual mementos and thus the client should not attempt negotiation on those resources. Rather than having clients maintain regular expressions for what is and what is not a memento at various archives, the server can now just send back this HTTP response header:
Here is the full HTTP response for http://web.archive.org/static/js/jwplayer/jwplayer.js, a javascript file injected into the archived HTML to assist in the archival playback:
If you study the HTTP responses for both http://web.archive.org/web/20030930231814/http://www.thecribs.com/ and http://web.archive.org/static/js/jwplayer/jwplayer.js, you will see that the former has "X-Archive-Playback: 1" and the latter has "X-Archive-Playback: 0". In summary, section 4.5.8 of the Internet Draft just standardizes the current "X-Archive-Playback: 0" header with a Link header that is applicable to all kinds of Memento archives (and not just Wayback Machines).
We hope you will give the new Wayback Memento interfaces a test drive and let us know if you see any errors or have additional comments. The new interfaces were integrated in the LANL and ODU aggregators last week, so if you are using those you should have seen a switch already. We'd like to thank Ilya Kremer (IA) and Lyudmila Balakireva (LANL) for all of their feedback and efforts during this implementation and Kris Carpenter (IA) for her continued support of Memento.
--Michael
* or, if you prefer: "All these URIs are mementos except this one. Attempt no negotiation there. Use them together. Use them in peace."
The first and most significant change is that there is now just a single URI prefix for mementos (URI-M). Previously, the URI-M discovered through the Wayback Machine's UI was different from the URI-M discovered through the Memento interface (e.g., using the MementoFox add-on). For example, for the original resource thecribs.com (@ 2003-09-30) you used to have both:
Wayback UI:http://web.archive.org/web/20030930231814/http://www.thecribs.com/
Memento: http://api.wayback.archive.org/memento/20030930231814/http://www.thecribs.com/
(The second URI is not linked; the api.wayback.archive.org/* interface is now turned off and those URIs now produce 404s.)
The problem was that a web.archive.org URIs rewrote the URIs in the HTML to point back in to the archive (i.e., "Archival Replay Mode"), but lacked the necessary Memento-Datetime and Link HTTP response headers. The api.wayback.archive.org URIs had the necessary HTTP response headers, but lacked the rewritten HTML for Archival Replay Mode. So while both types of URIs (web.archive.org and api.wayback.archive.org) worked in their respective environments, a Memento user could not share (via email, Twitter, etc.) an api.wayback.archive.org URI with a non-Memento user, and likewise a Memento user would not have the additional Memento functionality with a web.archive.org URI.
Long story short: a single URI does it all now:
Never noticed that the dual URI thing? That's fine, neither did most other people. I included the above details only to document how things used to work in case you run across an old-style api.wayback.archive.org URI. Otherwise, don't worry about it.
The URI merger also changes the base URIs for the Timemaps and Timegates:
http://web.archive.org/web/timemap/link/{URI-R}
http://web.archive.org/web/timegate/{URI-R}
The second change that may impact people is that TimeMaps now support paging. The page size is large (currently 10,000), but popular sites like www.cnn.com have > 14,000 mementos. Instead of having explicit "page 1", "page 2", etc., paged TimeMaps now have a "self" link with "from" and "until" parameters to indicate the left-hand and right-hand temporal endpoints, respectively, for this TimeMap. It then links to the next TimeMaps with a "from" parameter to indicate the left-hand temporal endpoint of the next page (the "until" value might not be known if the last page is still being "filled", so to speak). It is easier to look at the example:
Together, the multiple pages form a single logical TimeMap and the pages are only for convenience of transport. The server determines how many links go into a single page. Most TimeMaps have < 10,000 URI-Ms so you might not notice this change right away, but please be aware that your applications can not longer assume they're getting the entire TimeMap with a single HTTP GET.
The third change is about defining a standard way for the archive to tell the client "this is not a memento, so do not attempt memento processing on it"*. This is new in section 4.5.8 of version 8 of the Internet Draft. The idea is that most of the resources embedded in, for example, http://web.archive.org/web/20030930231814/http://www.thecribs.com/ are mementos captured at some point in the past. However, some of the images, javascript, etc. are injected by the archive to assist in playback and are not actual mementos and thus the client should not attempt negotiation on those resources. Rather than having clients maintain regular expressions for what is and what is not a memento at various archives, the server can now just send back this HTTP response header:
Link: <http://mementoweb.org/terms/donotnegotiate>; rel="type"
Here is the full HTTP response for http://web.archive.org/static/js/jwplayer/jwplayer.js, a javascript file injected into the archived HTML to assist in the archival playback:
If you study the HTTP responses for both http://web.archive.org/web/20030930231814/http://www.thecribs.com/ and http://web.archive.org/static/js/jwplayer/jwplayer.js, you will see that the former has "X-Archive-Playback: 1" and the latter has "X-Archive-Playback: 0". In summary, section 4.5.8 of the Internet Draft just standardizes the current "X-Archive-Playback: 0" header with a Link header that is applicable to all kinds of Memento archives (and not just Wayback Machines).
We hope you will give the new Wayback Memento interfaces a test drive and let us know if you see any errors or have additional comments. The new interfaces were integrated in the LANL and ODU aggregators last week, so if you are using those you should have seen a switch already. We'd like to thank Ilya Kremer (IA) and Lyudmila Balakireva (LANL) for all of their feedback and efforts during this implementation and Kris Carpenter (IA) for her continued support of Memento.
--Michael
* or, if you prefer: "All these URIs are mementos except this one. Attempt no negotiation there. Use them together. Use them in peace."