|
|
|
WS-DL members have addressed other parts of the problem. Alexander Nwala’s research has centered on finding seeds within search engine result pages (SERPs), social media stories, and news feeds. As part of his news research, Nwala developed StoryGraph, a tool that analyzes multiple news sources every hour and automatically determines the news story or stories that dominate the media landscape at that time. Mohamed Aturban developed ArchiveNow, a tool that accepts live web URI-Rs and submits them to web archives to produce memento URI-Ms. I partnered with Alexander Nwala to discuss how to tie StoryGraph together with tools from the Dark and Stormy Archives Toolkit to produce stories summarizing the biggest StoryGraph story of a given day. To honor the WS-DL tools used to generate these stories, I named this the StoryGraph Hypercane ArchiveNow Raintale Integration (SHARI) process.
To experience these SHARI stories, we invite you to visit the DSA Puddles web site. That site contains the stories produced from this and other experiments of the Dark and Stormy Archives Project. For updates on the DSA Puddles site and other DSA projects, follow @StormyArchives on Twitter and @DarkAndStormyArchives on Facebook.
Image may be NSFW. Clik here to view. ![]() |
The DSA Puddles web site demonstrates stories produced by the Dark and Stormy Archives Project. In this screenshot of the current Page 2, the top row consists of StoryGraph's biggest story of the day. The bottom two rows contain links leading to example stories generated for our CIKM study. In the future, we will publish other types of stories to this site. |
The SHARI process
Image may be NSFW. Clik here to view. ![]() |
The SHARI process for producing DSA web archive stories from StoryGraph. |
StoryGraph-Hypercane-ArchiveNow-Raintale Integration (SHARI) is a storytelling process for automatically creating stories summarizing news for a day. Two of its components are not yet released: the StoryGraph Toolkit and Hypercane. SHARI consists of the following steps, shown in the diagram above:
- with the StoryGraph Toolkit, query the StoryGraph service for the rank r story of the day
- submit these URI-Rs to Hypercane's
hc identify mementos
command to convert URI-Rs into URI-Ms, by
- first querying the LANL Memento Aggregator
- for each URI-M that does not exist, create each one by calling Aturban's ArchiveNow as a library
hc report entities
commandhc report terms
command that itself calls Nwala's sumgram as a libraryhc report image-data
commandhc order pubdate-else-memento-datetime
commandhc synthesize raintale-story
commandtellstory
command to generate a Jekyll HTML file for the day's rank r story based on inputs from: - the JSON file produced by Hypercane in step #7
- the SHARI story template in Jekyll format
- information on each URI-M from MementoEmbed
This process works because each component tries to be loosely coupled, have high cohesion, have explicit interfaces, and engage in information hiding. StoryGraph does not need to know about GitHub Pages to make this work. Each command passes data in the expected format to the next. For example, the StoryGraph Toolkit provides URI-Rs to Hypercane. Hypercane does not need to know about how StoryGraph generated them. Raintale receives story data in a JSON formatted file; it does not need to know that Hypercane produced it. MementoEmbed only works with single mementos, whereas Raintale can consider how to assemble the whole story. The diagram below indicates what each tool contributes to the story.
Image may be NSFW. Clik here to view. ![]() |
A diagram displaying how each of these tools contributes to SHARI stories. |
Discussion
StoryGraph is a valuable resource that I believe has additional unrealized potential. While developing the SHARI process, I experimented with interesting dates from Nwala's "365 dots in 2018" and "365 dots in 2019." We are not only able to create stories for today or yesterday, but all the way back to August 8, 2017, when StoryGraph was first created. As seen below, we can see how the world has evolved each year on StoryGraph's birthday.Image may be NSFW. Clik here to view. ![]() |
On StoryGraph's date of birth, the news was reporting North Korea's nuclear weapons. |
Image may be NSFW. Clik here to view. ![]() |
On StoryGraph's first birthday, the news was discussing the results of several US Congressional and gubernatorial primaries and other elections taking place on that date. |
Image may be NSFW. Clik here to view. ![]() |
On StoryGraph's second birthday, the news was discussing the shootings in El Paso and Dayton, and their aftermath. |
Image may be NSFW. Clik here to view. ![]() |
On my birthday in 2018, the biggest story of the day was about US President Trump's animosity toward the FBI's investigation of him. |
Image may be NSFW. Clik here to view. ![]() |
On my birthday in 2019, the biggest story of the day was about presumptive Democratic presidential primary candidate Joe Biden choosing a running mate long before the Democratic primaries. |
This year, the news on my birthday was about COVID-19 and Trump's response to the crisis. |
SHARI produces a familiar yet novel method of viewing news for a day in the past. It is different from other storytelling services like Wakelet because SHARI is entirely automated. The stories produced by SHARI are different from services like Google News or Flipboard because a user did not customize the story topics. Because StoryGraph samples content from multiple sides of the political spectrum, the SHARI process can provide a visualization of articles not tied to one interest area or even a single side's terminology. Historians, journalists, and other researchers could use this method to get a glance of the biggest story on a given day.
SHARI is not without its issues. While it is clear how to use StoryGraph to produce the biggest news story of the day, we're still discussing how to produce and render the second biggest, third biggest, and other news stories for a given day. Some resources are skipped in the SHARI process, and it tries to complete its story despite this. Due to a variety of reasons, ArchiveNow cannot create mementos from some live web pages. Sometimes mementos are still being preserved by an archive and hence do not have the proper headers to be evaluated later in the process. Sometimes MementoEmbed unearths images that were never preserved and thus SHARI cannot evaluate them for the story. We are still working on fixing issues, such as better stopword choices for sumgrams, story page performance, and implementing the lazy loading of images in the final story. Our eventual goal is to have the SHARI process produce these StoryGraph stories weekly or possibly even daily.
The DSA Puddles site exists to showcase the stories produced by the Dark and Stormy Archives Project. The input for these stories could be Archive-It collections, or it might come from other sources, like StoryGraph. StoryGraph stories all use the same Jekyll and Raintale templates. Stories for other data sources may need different templates to help users better understand their content. Below are examples of other types of stories that exist on the site.
Image may be NSFW. Clik here to view. ![]() |
Here is an example story told with browser thumbnails, an example from our CIKM 2019 paper. |
Image may be NSFW. Clik here to view. ![]() |
This is a reproduction of one of Yasmin AlNoamany's original human generated stories from Storify, re-created by submitting data from her original experiment to Hypercane and Raintale. |
Image may be NSFW. Clik here to view. ![]() |
Embeds of the Tweets produced by Raintale as featured in the blog post where we introduced Raintale. |
The SHARI process is possible due to the attempts by these tools to engage in loose coupling, high cohesion, explicit interfaces, and information hiding. Parts of the process would not be possible without the Memento standard. Most of these tools are available now, and the Dark and Stormy Archives project will release Hypercane later this year. SHARI is one of many different tool combinations possible with the output of the WS-DL group and the Memento standard. How can we improve the stories produced by SHARI? What other combinations can we build?
For more information on these components, please consult:
- For StoryGraph:
- Nwala's recent blog post: "StoryGraph at Computation + Journalism Symposium 2020 Non-Trip Report"
- Nwala's blog post: "365 dots in 2018 - top news stories of 2018"
- Nwala's blog post: "365 dots in 2019 - top news stories of 2019"
- Nwala's technical report: "365 Dots in 2019: Quantifying Attention of News Stories"
- follow @storygraphbot on Twitter
- For ArchiveNow:
- Aturban's blog post: "Archive Now (archivenow): A Python Library to Integrate On-Demand Archives"
- the ArchiveNow GitHub repository
- Aturban's JCDL 2018 publication: "ArchiveNow: Simplified, Extensible, Multi-Archive Preservation"
- For sumgram:
- Nwala's Hypertext 2018 publication: "Bootstrapping Web Archive Collections from Social Media"
- Nwala's blog post: "Introducing sumgram, a tool for generating the most frequent cojoined ngrams"
- the sumgram GitHub repository
- For Raintale:
- my blog post: "Raintale -- A Storytelling Tool For Web Archives"
- the Raintale web site
- the Raintale documentation
- the Raintale GitHub repository
- For MementoEmbed:
- my blog post: "A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages"
- the MementoEmbed documentation
- the MementoEmbed GitHub repository
- our CIKM 2019 publication: "Social Cards Probably Provide For Better Understanding Of Web Archive Collections"
- For other work from the Dark and Stormy Archives Project:
- the DSA web site
- follow @StormyArchives on Twitter
- follow @DarkAndStormyArchives on Facebook
-- Shawn M. Jones