Quantcast
Channel: Web Science and Digital Libraries Research Group
Viewing all articles
Browse latest Browse all 747

2024-12-05: The 28th International Conference on Theory and Practice of Digital Libraries - Ljubljana, Slovenia Trip Report

$
0
0



 
First Day at TPDL 2024 Conference!


Attending TPDL 2024 in Ljubljana, Slovenia, was an unforgettable experience. The conference brought together a vibrant community of researchers and practitioners exploring the frontiers of digital libraries, cultural heritage, and accessibility. From keynote presentations to paper sessions and networking events, the week was a blend of inspiration, collaboration, and learning.

One of the highlights of the conference was presenting our paper, "Assessing the Accessibility and Usability of Web Archives for Blind Users," for which I am the lead author. This study delves into the unique challenges faced by blind users when navigating web archives and proposes tailored solutions to improve their accessibility and usability. By leveraging a combination of user studies and design recommendations, our research offers practical insights for creating more inclusive digital libraries. The thoughtful feedback and enthusiasm from the audience during my presentation underscored the growing recognition of accessibility as a critical priority in the field. It was incredibly rewarding to see the interest our work sparked among researchers and practitioners alike.


Some interesting takeaways:

1) Wenjun Sun presented "LIT: Label-Informed Transformers on Token-Based Classification" during the poster session, introducing an innovative architecture that enhances transformer-based models for token classification tasks like historical named entity recognition (NER) and automatic term extraction (ATE). The proposed LIT framework integrates label semantics into the encoder-decoder mechanism, enabling a more comprehensive utilization of semantic information. The results showed significant improvements, with up to a 9.5% increase in F1 scores for historical NER and an 11.2% rise for ATE tasks, excluding named entities. This work highlights the potential of label-informed approaches in advancing token-based classification tasks.


2) Rand Alchokr presented "Scholarly Quality Measurements: A Systematic Literature Review," which examined the evolving challenges of assessing the quality and impact of scientific papers in academic publishing. The research systematically reviewed 43 papers and identified 14 quality-assessment methods, analyzing their strengths, weaknesses, and usage in different contexts. By providing a comprehensive overview, the study highlights the advantages and limitations of current approaches while proposing potential enhancements. This work offers valuable insights for researchers navigating quality-assessment methods, supporting informed evaluations of reliability and suitability within various scientific disciplines.


3) Jiro Kikkawa presented "Enhancing Identification of Scholarly References on YouTube," which explored the growing presence of scholarly communication through YouTube videos and the limitations of existing datasets like Altmetric. The study proposed a novel method to identify scholarly references by analyzing domain names and constructing a comprehensive dataset. Applying this approach, the research identified approximately 480,000 references across 230,000 videos from 55,000 channels—resulting in a 150% increase in references when combined with the Altmetric dataset. The analysis revealed PubMed and DOI links as prominent, alongside numerous direct links to publisher platforms, highlighting the scattered nature of references across platforms. This work underscores the method's utility in identifying and analyzing scholarly references on YouTube while raising concerns about the long-term accessibility and reliability of such external links.

4)  Christof Bless presented "A Reputation System for Scientific Contributions Based on a Token Economy," which addressed the limitations of traditional bibliographic metrics in evaluating scientific reputation. The study highlighted how reliance on publication counts and impact factors can lead to misaligned incentives and the proliferation of low-quality research. As a solution, the paper proposed a reputation token system, leveraging a token economy to incentivize high-quality contributions and discourage fraudulent practices. The research included a prototypical implementation of this system as a smart contract on the Ethereum blockchain, complete with a user interface and network visualization. This innovative approach reimagines how scientific contributions can be evaluated, aiming to promote integrity and quality in academic research.

5)  Tim Wittenborg presented "SWARM-SLR - Streamlined Workflow Automation for Machine-actionable Systematic Literature Reviews," which tackled the inefficiencies and manual effort required in authoring survey and review articles. The study introduced SWARM-SLR, a comprehensive workflow designed to leverage the strengths of various tools and approaches to enhance efficiency while maintaining scientific integrity. Synthesizing 65 requirements from existing literature, the team evaluated current tools against these benchmarks and developed a prototype workflow that supports all stages of a systematic literature review (SLR), excelling in search, retrieval, information extraction, and knowledge synthesis. Evaluations through online surveys validated the workflow’s efficacy, highlighting its potential to enable sustainable collaboration and integrate individual tools into a semi-automated, structured review process. This innovative approach offers a path toward more manageable and efficient literature review workflows, streamlining knowledge discovery and distribution for researchers.

6) Saber Zerhoudi presented "Comparative Analysis: User Interactions in Public and Private Digital Libraries Datasets" during the poster session, addressing the critical challenges of limited public datasets with detailed user interaction data in digital libraries. By analyzing two datasets, EconBiz (a private dataset) and SUSS (a publicly available dataset), the study highlighted significant differences in the granularity of user logs. While both datasets share common patterns, EconBiz offers a richer depiction of diverse user interactions compared to SUSS. The research underscores the importance of enhancing public datasets to improve their utility for research and applications. Additionally, the study explored the potential of leveraging few-shot prompting with large language models (LLMs) to simulate richer user interaction data while preserving anonymity, extending the value of datasets like SUSS. These findings are pivotal in addressing the limitations of public datasets in reflecting real-world user behavior and advancing the development of more responsive and personalized digital library services.

7) Juliane Tiemann presented "Technical Services in Research Libraries – The Backdoor Discussions of Collection Development" during the poster session, offering a detailed look into the complexities of developing digital libraries for cultural heritage collections. Focusing on the University of Bergen Library's Special Collections, which include diverse materials and dissemination methods, the paper outlined the challenges of creating a unified digital solution. Key considerations included structuring metadata, addressing varied user needs, and navigating the constraints of existing technological frameworks. Drawing on work meetings and collaborative discussions, the study highlighted efforts to balance ambitious goals with practical limitations, align strategies, and make informed technological choices. By shedding light on the behind-the-scenes tasks and decision-making processes, the paper emphasized the critical role of developer expertise and the intricate planning required to deliver cohesive and accessible digital libraries to the public.

8) Giorgio Maria Di Nunzio presented "FAIR Terminology Meets CLEAR Global" during the poster session, showcasing the design and implementation of a web application aimed at improving access to multilingual terminology for speakers of marginalized languages. The database contains 9,506 essential terms translated into 39 languages, addressing the need for information accessibility in languages often overlooked. Developed using the R Shiny framework, the web application adheres to the FAIR principles of Open Science, ensuring efficient data organization, retrieval, and reusability. Users can search the database by selecting a source and optional target language, view concept definitions in English, and access filtered lists of language equivalents. Additionally, the interface allows users to export datasets in TermBase eXchange (TBX) format, enhancing its utility. This user-friendly tool not only simplifies navigation but also exemplifies how technology can bridge linguistic gaps and empower speakers of marginalized languages.

9) Giorgio Maria Di Nunzio presented "Exploring Historical Routes and Waypoints with MICOLL Digital Map" during the poster session, introducing a web-based application developed as part of the ERC project “Migrating Commercial Law and Language (MICOLL): Rethinking Lex Mercatoria.” This digital map allows historians and researchers to explore historical routes and waypoints that facilitated the exchange of goods and information between European cities from the 11th to 17th centuries. The project’s aim is to visualize the evolution of these routes over time by linking historical data to spatial waypoints. The interactive and visual interface of the MICOLL digital map enhances the discovery and analysis of historical connections, offering a powerful tool for understanding the dynamics of commerce and communication in medieval and early modern Europe. 

10) Florian Atzenhofer-Baumgartner presented "Is Text Normalization Relevant for Classifying Medieval Charters?" during the poster session, exploring the effects of historical text normalization on the classification of medieval charters. Focusing on document dating and locating tasks, the study analyzed Middle High German charters from the Monasterium.net digital archive using various classifiers, including traditional and transformer-based models, both with and without normalization. The findings revealed that while normalization slightly improved locating tasks, it reduced accuracy in dating, indicating that critical features in the original texts may be obscured by normalization. Support vector machines and gradient boosting outperformed transformers, raising questions about their efficiency in this context. The study advocates for a selective approach to text normalization, emphasizing the need to retain key textual characteristics essential for accurate classification of historical documents. 

11)  Adel Aly presented "Database Approaches to the Modelling and Querying of Musical Scores: A Survey" during the poster session, offering a comprehensive overview of digital musical score databases in the context of Music Information Retrieval (MIR). The study focuses on the symbolic representation of music content, as opposed to audio representation, and begins with an introduction to Western classical music notation for readers unfamiliar with the subject. The core analysis categorizes various approaches to the data management layer of digital score libraries, examining ASCII-based, semi-structured, graph-based, and high-level abstract data models. By comparing these models across multiple criteria, the paper highlights their strengths and limitations while identifying opportunities for future research. This survey serves as a valuable resource for advancing symbolic MIR and enhancing digital musical score databases.

12) Hannah Laureen Casey presented "Mapping Techniques for an Automated Library Classification: The Case Study of Library Loans at Bibliotheca Hertziana" during the poster session, showcasing an innovative method for visualizing research library collections. The approach integrates user loan data with deep mapping techniques to reveal usage patterns and thematic clusters, addressing the limitations of traditional classification systems. Using dimensionality reduction, the catalogue was visualized through book loans, while prompt engineering with large language models generated detailed summaries and titles for loan clusters. Applied to the Bibliotheca Hertziana's art history collection in Rome, the method was evaluated through expert interviews and an atlas featuring statistical insights on clusters. The results demonstrate the potential of this framework for visually mapping textual collections, offering an interdisciplinary perspective on their transformation and usage.

13) Muhammad Usman presented "Tracing the Retraction Cascade: Identifying Non-retracted but Potentially Retractable Articles" during the poster session, addressing the ripple effects of retracted scientific articles on the broader research ecosystem. The study focused on identifying citation patterns that contribute to cascading retractions, where articles citing retracted work may themselves become unreliable. Analyzing approximately 5,000 articles citing retracted works, including 953 cases of cascading retractions, the research proposed a retraction-centric ranking method. By measuring similarities to bibliographically coupled retracted articles, the approach prioritizes potentially problematic articles for scrutiny without requiring exhaustive re-examination. This method offers an efficient alternative to traditional reviews, providing a targeted strategy to uphold scientific integrity in the face of increasing retraction rates.

14) Carlos-Emiliano González-Gallardo presented "Leveraging Open Large Language Models for Historical Named Entity Recognition" during the poster session, exploring the potential of open-access large language models (LLMs) for named entity recognition (NER) in historical texts. Unlike contemporary texts, historical collections, such as newspapers and classical commentaries, pose challenges due to noise from OCR errors, spelling variations, and storage conditions. The study compared different Instruct variants of LLMs using both deductive (with guidelines) and inductive (without guidelines) prompt engineering approaches against fully supervised benchmarks. Experiments spanned three languages—English, French, and German—with code-switching on Ancient Greek, demonstrating that while Instruct models struggle with noisy inputs and perform below fine-tuned NER systems, their outputs offer valuable entities for further refinement by human annotators. This work highlights the potential of LLMs in supporting historical NER workflows, especially as tools for aiding human tagging processes.

In conclusion, TPDL 2024 in Ljubljana, Slovenia, was an incredibly enriching experience, offering a perfect blend of cutting-edge research, thought-provoking discussions, and community engagement. Presenting our paper and connecting with fellow researchers underscored the immense potential of digital libraries and information systems to bridge gaps in accessibility and cultural preservation.

Ljubljana itself was a remarkable host city, with its picturesque streets, vibrant culture, and welcoming atmosphere adding a unique charm to the conference. Exploring the city after sessions was as inspiring as the academic exchanges during the event.

Exploring the charm of Ljubljana

Finally, a heartfelt thanks to the organizers of TPDL 2024 for curating such an exceptional conference in a setting as stunning as Ljubljana.

- Mohan Krishna Sunkara (@mk344567)


Viewing all articles
Browse latest Browse all 747

Trending Articles