After the success of the fully online C+J 2021, the organizers again used the ohyay.co platform for the online attendees and speakers. In general it worked great again. My only complaint is that the speakers' slides were sometimes too small because of the space given to speaker and panel video and other decorative elements (or maybe it was just too small to view on my laptop screen). Most of the sessions were parallel, so like last year, I had a little trouble deciding which sessions to attend; they all contained interesting work. I didn't take comprehensive notes, but I'll briefly link to the work from some of the sessions that I was able to catch. There were also some tweets about the conference using the #cj2022 hashtag.A full hybrid session in motion… #cj2022pic.twitter.com/WhPE7kL6x9
— Bahareh Heravi (@Bahareh360) June 10, 2022
First up was Zhouhan Chen from NYU presenting Information Tracer. This is a nice interface to explore information spread across multiple platforms. You can search by URL, hashtag, or string and see how it's been spread across Facebook, Reddit, Twitter, YouTube, and Gab. This includes viewing the retweet and reply networks that have been detected. I found it interesting that the tool includes the relatively new Gab platform and also that it includes information about how the queried item is shared in Facebook groups. A Python API library for Information Tracer is available on GitHub.Great to see Information Tracer, Localizer, Overtone and SimPPL present their work at the Computation + Journalism 22 conference today! #cj2022pic.twitter.com/7dZx2Yi9n0
— Matt MacVey (@Matt_MacVey) June 9, 2022
- Botometer - the highly-popular tool that provides a score representing the likelihood of the given account being a bot
- BotAmp - a new tool that helps users see what type of information bots are trying to amplify by comparing likely bot activity in two sets of tweets. A user can input two different queries and BotAmp will compare bot activity on the results (or the user's home timeline)
- Hoaxy - tool that visualizes the network of tweet spread based on a hashtag, colors are based on the account's Botometer score.
- Network Tool - similar to Hoaxy in that it allows for exploration on how information spreads on Twitter, but includes different data. A user can specify start date and end dates and explore the network based on retweets/quotes, mentions/replies, or co-occurrence. The dataset comes from the OSoMe decahose archive with 30 billion tweets from the past 3 years.
The results of their work are available at https://ukraine.bellingcat.com/, providing maps and context of incidents of civilian harm in Ukraine.Tools that @bellingcat uses for forensic archiving: https://t.co/qTfvEVO1yD, https://t.co/HdauXyiYJ9#cj2022
— Nick Diakopoulos (@ndiakopoulos) June 10, 2022
Journalists from Texty.org.ua next presented "How data journalism is responding to the war: What can satellite images say? How do we detect disinformation? What other data can we use?". Roman Kulchynskyj, Peter Bodnar, and Illya Samoilovych discussed Russian disinformation and how their organization tracks this. They have an English Twitter account that summarizes their data journalism, @Textyorgua_Eng.always appreciate how thoughtful and careful @bellingcat is about their data production. they bridging judicial, investigate, & news cultures and concerns #cj2022
— Rahul B (@rahulbot) June 10, 2022
The next paper in the session was "Mining Questions for Answers: How COVID-19 Questions Differ Globally" by Jenna Sherman, Smriti Singh, Scott Hale and Darius Kazemi from Meedan Digital Health Lab and the Harvard T.H. Chan School of Public Health. They used a database of Bing search queries to investigate how people in different parts of the world were searching about COVID-19.Link to paper: https://t.co/12FOTCYaOj
— Marianne Aubin Le Quéré (@marianneaubin) June 7, 2022
Link to dataset: https://t.co/cvQorJbktY
Interactive dataset website: https://t.co/lLPKEuXmjj
The final session I attended was an Invited Session on "COVID Reporting". The first presentation was "How The Economist estimated the pandemic's true death toll" by Sondre Ulvund Solstad and Martín González. They computed a metric of excess deaths, deaths observed minus deaths expected to observe, to highlight the impact of COVID-19 around the world. They used data from countries where they did have data to help predict excess deaths for countries where they didn't have data (like China). Their work produced excess death estimates that are 3-4x the official reporting. The code for the model they developed is available at https://github.com/TheEconomist/covid-19-the-economist-global-excess-deaths-model.This was such a fun conversation. Thank you @Bahareh360@JuliaAngwin@HilkeSchellmann! #CJ2022https://t.co/cvhw2YulCG
— Meredith Broussard (@merbroussard) June 10, 2022
Next was "Managing the challenges of data reporting and visualization in year three of the COVID-19 pandemic at The New York Times" presented by Lisa Waananen Jones, Aliza Aufrichtig, and Tiff Fehr. This was a fascinating behind-the-scenes look at how they collect data and produce the NY Times COVID-19 case charts that everyone I know has depended on. They showed some of the initial charts that were developed in early Feb/Mar/Apr 2020 and how they evolved into the tracker charts and other visualizations we have today. One thing they encountered, which I'd also heard about in an interview with someone from CDC, is the patchwork state of health statistics in the US. The NY Times folks gathered data from individual counties, sometimes manually because some counties reported data in non-standard formats, like infographics and PDFs.Official statistics paint a different picture of pandemic deaths than when you estimate via @Sondreus#cj2022pic.twitter.com/Lb2gaYNLAS
— Nick Diakopoulos (@ndiakopoulos) June 10, 2022
All of their data is available at https://github.com/nytimes/covid-19-data/, which is a great resource. They built hundreds of scrapers to grab the data from the different counties and even had to build monitoring for the scrapers to catch things when they inevitably broke (likely due to formatting changes on the websites). I hope some of these presentations will be made available later, because my data visualization students would greatly benefit from hearing about real-world challenges with gathering, cleaning, and analyzing real data.One takeaway from #cj2022 day 1 - it's hard to over-state how important spreadsheets are to journalism projects being showcased here (specifically Google Sheets) https://t.co/edlaVBnoCh
— Rahul B (@rahulbot) June 10, 2022
The final presentation I attended was "The Covid Financial Crisis" by Nick Thieme from The Atlanta Journal-Constitution (now with The Baltimore Banner). He used a machine learning approach to investigate bankruptcies in Georgia due to COVID-19. He noted the enormous amount of work that it took to take the data that was publicly available and turn it into something that could be analyzed.NYT built a dashboard of its COVID data scrapers to keep an eye on whether there were data collection issues: #cj2022pic.twitter.com/IaEXlXYRsW
— Nick Diakopoulos (@ndiakopoulos) June 10, 2022
Even though I wasn't able to attend all of the sessions, everything I saw was super interesting and many of the works combined my own interests in web archiving, social media, visualization, disinformation, and current events. This is definitely a venue that I will continue to track.The @pulitzercenter let me write about the statistical methods behind our investigation into the effects of COVID on the financial health of Georgians and Georgia.
— Nick Thieme (@FurrierTranform) February 16, 2022
Our work is open source, so I hope this helps people reproduce our journalism elsewhere https://t.co/1OvJ1tfn2n