Organizational presentations often carry a wealth of data: images, charts, and text that tell a compelling story or present findings. Programs and projects routinely rely on meetings to address various problems that arise in addition to disseminating information about our work. However, the information contained in the slide decks associated with these meetings and presentations tends to get lost after the fact. One cause of this information loss may be that slides represent sparse information about what is being spoken about during these meetings.
In this blog post, I introduce the ALICE (AI Leveraged Information Capture and Exploration) system, my proposed solution for capturing and managing knowledge and data from sources that, historically, have not been comprehensively archived. The primary challenge addressed by ALICE is the prevalent issue of information loss in presentation slide decks following their related meetings or presentations. This system is designed to methodically extract text and visual elements from slides, employing a large language model such as OpenAI's GPT-4 to convert this information into structured, machine-readable formats. The aim is to not only preserve critical data but also to enhance it with comprehensive abstracts, relevant search terms, and a structured JSON for Linking Data (JSON-LD) for effective integration into knowledge graphs. This post will explore the intricacies of ALICE and its potential to redefine the management and interpretation of presentation data within NASA. First, I'll detail the proposed ALICE system. Then, I'll dive into the specifics of the ALICE system, particularly the integration of LLMs with a Knowledge Graph (KG) to enhance the LLM's reasoning over unstructured data. I'll discuss the technical aspects of how this system operates, including the text and image extraction scripts, front-end user interface, server environment, and LLM integration.
While humans can visually interpret slide content, converting that information into structured, machine-readable formats presents a challenge — not to mention autonomously providing more detailed information from the sparse information. Essentially, we want a way to effortlessly provide a slide deck and have a system "fill in the blanks" and provide a verifiably accurate description of what was presented relative to the slide deck in question. The mere existence of PowerPoint Karaoke points to the need for this capability.
Furthermore, presentation slide decks tend to not get archived. We routinely send the slides out in an email after the fact to some email list of presentation participants but we rarely commit them to an easily indexed and searchable archive location. Previous work has been done on aligning presentations with scholarly publications like SlideSeer (JCDL '07 proceedings) by Dr. Min-Yen Kan. SlideSeer is a handy tool for researchers and academics who often share their work in two ways: as written papers and as slideshow presentations. Slideseer discovers scholarly papers that match the content of slides in a slide deck and presents them together. This way, you get to see all the information in one place, whether it's something new or something repeated in both formats. ALICE differs in that it, at least initially, processes slides for any kind of communication by reasoning over its content and indexing it into a knowledge graph. This captures not only scholarly communication but also presentations and meeting slides for various project and program meetings.
Finally, for many projects, a slide deck is the final deliverable. They are the product. There is no accompanying paper so we have no source representing a more detailed record with more complete information. In many cases, these slide decks contain information from numerous data sources and are not a presentation summarizing information from a singular source, such as a slide deck for a presentation on a research paper. For example, the majority of slide decks in the Military Industrial Powerpoint Complex archive contain information relative to various briefings and information sessions which have no singular source.
In this blog post, we'll explore how you can efficiently scrape both text and visual elements from a presentation slide deck and then harness the capabilities of a large language model to derive meaningful abstracts, extended abstracts, search terms, hashtags, and even create a structured JSON-LD representation for each slide suitable for integration into knowledge graph software. I have proposed the ALICE system to be developed for use at NASA as part of our digital transformation and knowledge management efforts.
The Proposed ALICE System
Knowledge Graph
![]() |
A knowledge graph created using entity and relation extraction (fig 3 in Chaudri et al.) |
![]() |
A knowledge graph created using computer vision techniques (fig 4 in Chaudri et al.) |
- KG-enhanced LLMs, which incorporate KGs during the pre-training and inference phases of LLMs, or to enhance understanding of the knowledge learned by LLMs
- LLM-augmented KGs, that leverage LLMs for different KG tasks such as embedding, completion, construction, graph-to-text generation, and question answering
- Synergized LLMs + KGs, in which LLMs and KGs play equal roles and work in a mutually beneficial way to enhance both LLMs and KGs for bidirectional reasoning driven by both data and knowledge
Front-end user interface
Server environment
LLM integration
Breaking Down the Code
At the heart of our solution lies the ppt-scraper.py script, which leverages the python-pptx library to traverse through slides, extracting the text and any images and charts on each slide. I'll step through examples of each step using a presentation I gave to AIAA Scitech 2021. This presentation was retrieved from the NASA Technical Reports Server (NTRS) which will serve as the source for the dataset I will be using to finetune my own LLM, which I will talk about in my next blog post.
- Text Extraction - extract_text_from_pptx: This function delves into each slide and grabs text content. It outputs a list of strings, each prefixed with the slide number for easy referencing.
- Image and Chart Extraction - extract_images_and_charts_from_pptx: This function extracts images and charts from each slide. It's able to differentiate between different visual elements like pictures and charts. It further drills down into group shapes, ensuring that nested content is noticed. Here I have ChatGPT reasoning over output/slide_3/image_6.jpg:
Slide 3
- And here it is doing the same thing over output/slide_18/image_30.jpg:
![]() |
Slide 18 |
- The next version of ppt-scraper.py will have the option to inject the LLM's generated description of each image into the text description to enrich our text prompts. For example:
- After these scripts are run the output directory will have a structure similar to this:
From Raw Data to Knowledge
Once the data extraction is complete, the real magic begins. Leveraging a powerful AI language model, like OpenAI's GPT models, we can generate:
- Abstracts: Using the extracted text, the model can provide a concise summary capturing the essence of the presentation.
- Extended Abstracts: Need a more detailed summary? No problem! The model can be instructed to generate a longer, more detailed abstract, providing deeper insights into the presentation.
- Hashtags: Want to index or socially share your content? The model can generate relevant hashtags based on slide content, aiding in searchability and social media visibility.
- JSON-LD: JSON-LD is a lightweight Linked Data format. It is easy for humans to read and write. It is based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale. JSON-LD is an ideal data format for programming environments, REST Web services, and unstructured databases such as Apache CouchDB and MongoDB. It is also suitable for feeding into knowledge graphs. They can be used in semantic web applications, enhancing the interoperability and understandability of your presentation data in software ecosystems. Here is the JSON-LD object generated for this presentation:
- Of course, we can train our own large language model bootstrapped from an open-source model such as Llama and Llama 2. We want the model to be multi-modal, able to have not just text as input but also images so that we can extract as much information as possible from each slide. This is especially useful in the case that we can train a model that can accurately interpret charts and graphs with respect to its local context.
Future Work
![]() |
Slide 3 |
Conclusion
In conclusion, this blog post has provided a comprehensive overview of the ALICE system. We explored the challenges of preserving information in presentation slide decks and the innovative solution offered by ALICE. This system efficiently extracts text and visual elements from slides and employs a large language model, such as OpenAI's GPT-4, to transform this data into structured, machine-readable formats and interpolates the expanded semantic context of the presentation. We delved into the integration of LLMs with Knowledge Graphs, highlighting the potential synergy between these technologies in enhancing data interpretation and retrieval.
The post also outlined the technical specifics of ALICE, including its front-end interface for uploading slide decks, server environment for data processing, and the crucial role of LLMs in generating abstracts, search terms, and JSON-LD for Knowledge Graph integration. We discussed the key functionalities of our 'ppt-scraper.py' script in extracting diverse data types from presentations and how this technology can be further evolved.
In summary, ALICE represents a significant leap forward in managing and utilizing the wealth of data hidden in presentation slide decks, promising to enhance knowledge preservation and accessibility at NASA and potentially beyond.
You can follow the development of this project at my Github repo.
- Jim Ecker
Sources
- Min-Yen Kan. 2007. SlideSeer: a digital library of aligned document and presentation pairs. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (JCDL '07). Association for Computing Machinery, New York, NY, USA, 81–90. https://doi.org/10.1145/1255175.1255192
- Chaudhri, V. K., Baru, C., Chittar, N., Dong, X. L., Genesereth, M., Hendler, J., Kalyanpur, A., Lenat, D., Sequeda, J., Vrandečić, D., and Wang, K. 2022. “ Knowledge graphs: Introduction, history, and perspectives.” AI Magazine 43: 17–29. https://doi.org/10.1002/aaai.12033
- Pan, Shirui, et al. "Unifying Large Language Models and Knowledge Graphs: A Roadmap." arXiv preprint arXiv:2306.08302 (2023).
- Touvron, Hugo, et al. "Llama: Open and efficient foundation language models." arXiv preprint arXiv:2302.13971 (2023)
- Touvron, Hugo, et al. "Llama 2: Open foundation and fine-tuned chat models." arXiv preprint arXiv:2307.09288 (2023)