2024-12-17: Are Micro-collections Still Present on Twitter in 2024?

Twitter allows its users to author original content and share links to other web pages. Archivists can mine tweets for these shared URIs, and use those as seeds to create web archive collections. These collections of URIs in social media posts were identified as micro-collections (MCs) by Dr. Alexander Nawala in his study Nwala et al.in 2019. The term micro-collection was given due to the scope and the size of these sets of URIs. Unlike the URIs collected by scraping search engines (e.g., Google), the collections of URIs in social media (e.g., Twitter) are curated by users to specific topics or events. These collections of external web resources reflect the editorial effort and domain expertise of the people using the platform, making them a vital source for seed URIs for web archive collections. However, with the recent changes to the platform (read more about the changes), including its rebranding to X, we think that this practice may be diminishing. In this blog post, I report quantitative data about MCs from our recent study which confirms our intuition whether it is still worth scraping Twitter to look for seed URIs for web archive collections.

What is a micro-collection?

Users on social media platforms routinely create and share posts consisting of hand-selected URIs of news stories, tweets, videos, etc. Nwala et al. identified these shared URIs as micro-collections and considered them an important source for archival seeds because the effort taken to create micro-collections is an indication of editorial activity and a demonstration of domain expertise.

Figure 1: Example of a micro-collection from Twitter by a single author (@dtdchange) consisting of a pair of three tweets that are part of a reply thread about the Flint water crisis. (Figure 1 in Nwala et al.)

Figure 1 shows a tweet thread containing external links related to the Flint water crisis. Nwala et al. introduced post-class terminology to understand the MCs:

P₁A₁ - Single post by a single author (Example - a single tweet)
P_nA₁/P_nA_n - Multiple posts by a single author / multiple posts by multiple authors (Example - a tweet thread)

An MC on Twitter can appear in three ways:

1) A single person creates a single tweet including multiple external URIs.

2) A single person creating a tweet thread including multiple URIs.

3) A group of people create a thread with multiple URIs.

The first case is rarely found because of the character limitations for regular users. However, in April 2023, Twitter increased the character limit up to 25000 for premium users, allowing them to create MCs in a single tweet. The example below is a tweet created by a single user with a premium account, containing multiple external links.

These are my Jewish Warriors. 👇

- Jewish authors, books, & podcasts on Palestine -

In the past 6 months I've read 20 books on Israel-Palestine to improve my knowledge of the conflict and have also had the great privilege of talking to 6 of those authors about it on… pic.twitter.com/8ki7jwBkvf
— Omar Nizam (@OmarNizam) June 6, 2024

Data Collection

We replicated Nwala et al. (2019) to study the availability of MCs on Twitter in 2024 after major changes in the platform. We selected six recent topics as queries for our study:

Israel-Palestine conflict
US presidential election 2024
Donald Trump's conviction
Aurora Borealis 2024 (Northern Lights 2024)
Solar eclipse 2024
Paris 2024 Summer Olympics

We issued the queries to Twitter to extract the first 100 tweets (posts) from the Search Engine Result Pages (SERPs). We also collected up to 100 replies per tweet for the first 100 tweets. We scraped tweets from both the top and latest categories. The top category shows tweets that are considered the most popular for a given search term and the latest category shows tweets in reverse chronological order, with the most recent tweets appearing first. Figures 2 and 3 show examples of tweets from the top category and latest category, respectively.

Figure 2: Example for a tweet in the top category for “Israel Palestine conflict” query. The tweet has 11000 reposts and 9100 likes as of 2024-11-26.

Figure 3: Examples for tweets (tweet 1, tweet 2) in the latest category for “Israel-Palestine conflict” query. Observed date and time: 2024-11-26T02:38:00. The tweets were posted 34 minutes and 36 minutes prior to the observed time.

We scraped Twitter SERPs using the third-party scraper twscrape and extracted the tweet id, tweet raw content, and the number of replies in the first step. Next, we categorized the collected tweets by considering the number of replies per tweet. A single tweet means a tweet with no replies (P₁A₁) and a tweet thread is a tweet with more than one reply (P_nA₁ or P_nA_n). Table 1 shows that the majority of the tweets in latest category are in P₁A₁ and majority of tweets in top category are in P_nA₁/P_nA_n.

Table 1: Tweet counts per query and post-class for the first 100 tweets in each SERP.

	Latest		Top
Query	P₁A₁	P_nA₁/P_nA_n	P₁A₁	P_nA₁/P_nA_n
Israel Palestine conflict	74	26	30	70
US presidential election 2024	67	33	16	84
Donald Trump's conviction	77	23	30	70
Aurora borealis 2024	72	28	48	52
Solar eclipse 2024	86	14	35	65
Paris 2024 Summer Olympics	64	36	31	69

To further investigate the number of tweets in a thread for both top and latest categories, we created graphs to compare the lengths of tweet threads in the top and latest categories. Figures 4 and 5 show that the top category contains more tweet threads having replies greater than 5 when compared to the latest category.

Figure 4: Distribution of tweet threads in the latest category based on the number of replies for 6 queries. The x-axis shows the number of tweets in a thread where n=1 being a single tweet and n >1 being a thread. The y-axis shows the number of threads.

Figure 5: Distribution of tweet threads in the Top category based on the number of replies for 6 queries. The x-axis shows the number of tweets in a thread where n=1 being a single tweet and n >1 being a thread. The y-axis shows the number of threads.

We extracted the Twitter short URLs (for example - https://t.co/pCgFKaRbUJ) and then dereferenced them to collect the full URLs. Next, we filtered out the URIs which redirect to other tweets or images and videos hosted at Twitter. The curl requestto https://t.co/pCgFKaRbUJ specifies the full URI in the Location response header. Figures 5 and 6 show that there is a higher number of external URIs in single tweets for the latest category and more external URIs present in tweet threads having six or more replies in the top category.

Figure 6: Distribution of external URIs in the latest category across tweet threads of varying lengths for 6 queries. The x-axis shows the number of tweets in a thread where n=1 being a single tweet and n>1 being a thread. The y-axis shows the number of external links.

Figure 7: Distribution of external URIs in the top category across tweet threads of varying lengths for 6 queries. The x-axis shows the number of tweets in a thread where n=1 being a single tweet and n>1 being a thread. The y-axis shows the number of external links.

Identifying micro-collections

We calculated the number of MCs by observing the collections of URIs in single tweets or tweet threads. Table 2 summarizes the number of MCs found in the first 100 tweets in a SERP.

Table 2: MC counts per query and per post-class observed in the first 100 tweets for each query and category.

	Latest		Top
	P₁A₁	P_nA₁/P_nA_n	P₁A₁	P_nA₁/P_nA_n
Israel Palestine conflict	0	2	1	2
US presidential election 2024	2	2	0	7
Donald Trump's conviction	1	2	0	18
Aurora borealis 2024	3	0	1	1
Solar eclipse 2024	0	1	0	11
Paris 2024 Summer Olympics	1	1	1	3

Calculating the precision of a micro-collection

Because some MCs could contain spam or other off-topic links, we calculated the precision of the resulting MCs. For instance, if we have an MC of 5 URIs for query 1 and none of the URIs in the collection contain any relevant information referring to query 1 or the selected event, that MC has precision=0.0 (0/5) and is of no use for generating collections of seed URIs. To calculate the precision, we manually observed all the URIs in discovered MCs and made a personal relevance judgment. Next, we divided the number of relevant URIs in the MC by total URIs in the MC to obtain a numerical value for the precision.

Results

There are 7 MCs in total for the P₁A₁ post-class over all 6 SERPs for the latest category and 3 MCs for the top category. The P_nA₁/P_nA_n post-class has 8 MCs in the latest category and 42 in the top category in all 6 SERPs. More MCs were found in P_nA₁/P_nA_n the post-class than in the P₁A₁ post-class in both categories.

When considering the precision of the MCs, from the top category in P₁A₁, all 3 MCs have precision values > 0.5, and 3 out of 7 MCs from the latest category have precision values > 0.5. In the P_nA₁/P_nA_n post-class, 4 out of 8 MCs exceed a precision value of 0.5 in the latest category, and 13 out of 42 MCs in the top category have precision values > 0.5. When considering the total (in both top and latest) MCs for the P₁A₁ post-class, only 4 (out of 10) MCs have precision values > 0.8, and for the P_nA₁/P_nA_n post-class, 9 (out of 50) MCs exceed a precision value of 0.8.

Discussion

The results from this study indicate a notable shift in the occurrence of MCs on the platform formerly known as Twitter in 2024, compared to 2019, when Nwala et al. studied the MCs before major changes to the platform. He identified 4549 MCs from Twitter top and 5110 MCs from Twitter latest. For the MCs collected from Twitter, the conditional probabilities of relevance were reported as 0.6 given that MC contains 1 URI, 0.61 given that MC contains 2 URIs, 0.46 given that MC contains 3-4 URIs, and 0.42 given that MC contains 5 or more URI. Our findings suggest that their frequency has declined in the case of individual posts (P₁A₁ post-class). Micro-collections remain, but they are now more commonly found in tweet threads (P_nA₁/P_nA_n post-class), where users or groups of users collectively share external web resources.

When comparing the number of MCs between the top and latest categories, the top category yielded a higher number of MCs in the P_nA₁/P_nA_n, which includes longer conversations and multiple contributors. We can assume that we would be able to find MCs in the top category tweet threads since they contain more extended interactions.

The study also highlights that certain types of events, such as political events like the U.S. presidential election and Trump’s conviction, are more likely to generate a significant number of micro-collections compared to other events, such as the solar eclipse or the aurora borealis. This suggests that events with higher public engagement and discourse tend to foster more MC activity.

Precision is an indicator of the relevance of an MC to a selected query. Results suggest that although we could find MCs, they do not have the expected quality. By quality, we mean that the tweeted URIs do not describe the specific event. Nearly half of the MCs are useful in the P₁A₁ and only 1/5 of the total MCs are useful in the P_nA₁/P_nA_n. We can assume that increased interactions of advertising bots in Twitter threads can be a reason for the decline of precision in MCs. Twitter’s rebranding to X and the changes in its algorithm, coupled with users’ shifting behavior, could be contributing factors to the reduced number of MCs. The platform's discouragement of posts with external links, focus on promoting commercialized content, and emphasis on monetized user engagement may be affecting the editorial activity that previously encouraged the organic curation of MCs. It is getting even worse with suppressing the tweets with external links so that they get less engagement which is a recent development in the platform. The significant departure of scientists, journalists, and other professionals from the platform who authored high quality content can be another reason for the lack of MCs in 2024.

Ultimately, the study demonstrates that while micro-collections still exist in 2024, their utility as a source for seed URIs is now questionable. However, we can still expect to find some MCs for high-engagement topics. We can recommend that when using Twitter to extract MCs, it is essential to implement a relevance test to ensure the extracted MCs align with the desired context and purpose. Additionally, it is important to explore methods for mining MCs from other social media platforms, including emerging alternatives to Twitter. Further research could focus on understanding how platform algorithms affect content curation practices and identifying potential strategies to efficiently mine valuable editorial content on evolving social media platforms.

Acknowledgements

I would like to express my sincere gratitude to Dr. Michael L. Nelson and Dr. Sampath Jayarathne for their guidance, support, and encouragement throughout this work.

-Kumushini Thennakoon (@KumushiniT)

2024-12-17: Are Micro-collections Still Present on Twitter in 2024?

What is a micro-collection?

Data Collection

Identifying micro-collections

Results

Discussion

Acknowledgements

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List