Quantcast
Channel: Web Science and Digital Libraries Research Group
Viewing all articles
Browse latest Browse all 738

2024-10-07: Leveraging LLMs for Transcript Generator - Summer Internship Experience at Amazon Inc.

$
0
0

 


This summer, I had the privilege of interning at Amazon with the Alexa Certification Technology (Alexa-Cert-Tech) team located in Sunnyvale, California, USA. My internship was a 13-week program which started on May 28th, 2024. During this internship, I worked mostly as a data scientist intern under the supervision of Praveen Chinnusamy, Bheema Rajulu, and Jingya Li. Throughout this program, I attended weekly meetings with the entire Alexa-Cert-Tech team. The weekly meetings were to update my progress, obtain feedback, resolve issues, or improve the solution. I had one-on-one meetings with my mentors Bheema Rajulu and Jingya Li twice weekly to discuss my progress and any issues I faced. I also met with my manager Praveen Chinnusamy at least once a week.

My project was focused on automating Behavior-Driven Development (BDD) test statement generation and improving transcript generation using Large Language Models (LLMs). This blog post will dive deep into the challenges, solutions, and achievements of my internship project.

Project Overview

My project was split into two key components:

  • Automating BDD statement generation using the NoCodeAutomation tool.
  • Developing an LLM-based Transcript Generator.
Both of these tasks revolved around improving existing testing frameworks to reduce manual intervention and introduce more efficient automated workflows.

Problem Statement

The primary challenge was the manual generation of BDD test statements. Manually creating BDD statements can be time-consuming and prone to errors, making it inefficient for large-scale operations. We sought to automate this process using the Open Automation Kit (OAK) User-Defined Test Feature File Generator, a NoCodeAutomation tool designed to simplify test generation through automation. Additionally, testing Alexa skills required manual interaction to ensure functionality. My goal was to reduce the manual effort involved in skill interaction by automating it through LLM-based Transcript Generation.

Solution

To tackle the problem, I developed Bedrock Agents, an advanced AI system designed to automatically answer user questions and handle complex tasks. Powered by LLMs, Bedrock Agents serve as the "brain" for the generative AI applications we were developing. This system was implemented to replace Langchain Agents, which had infrastructure overheads and were difficult to manage with third-party APIs.

The key advancements were:

  • Retrieval Augmented Generation (RAG) for NoCodeAutomation: This technique allowed us to retrieve and generate more accurate feature files by improving the knowledge base with preprocessed test cases and BDD scenarios.
  • Amazon Step Functions: This framework was introduced to manage the automation pipeline and improve integration with other team APIs, eliminating the third-party management issues we faced.

LLM-based Transcript Generator

                        Figure 1. Multi-turn LLM-based Transcript Generator for Banyan Skill Testing

One of the most exciting parts of my project was creating a Multi-turn LLM-based Transcript Generator for Banyan skill testing, which became an integral part of enhancing automated test processes at Alexa-Cert-Tech. This section will explain the challenges, design, and impact of this tool and how it connects to the broader goals of my internship.

Challenges and Problem Statement

Testing Alexa skills involves a great deal of manual interaction, especially when multiple rounds of dialogue are required to ensure functionality. Each interaction between Alexa's automation service and its skills needs to be carefully documented for automated test cases. This process often creates a bottleneck when transcripts—records of these interactions—are manually reviewed, making the testing less efficient.

A further challenge was the complexity of handling multi-turn conversations, where the system must remember and respond to the context of previous exchanges. This was particularly crucial in testing Banyan skills, a specific skill category for Alexa, which often required testers to input and track multiple turns of dialogue. Additionally, the metadata of these skills was not preprocessed, which negatively impacted the performance of the LLM in generating useful transcripts.

Designing the Multi-turn LLM-based Transcript Generator

To address these challenges, I developed a multi-turn LLM-based Transcript Generator designed to automatically document interactions between the automation service and Alexa Banyan skills. The generator facilitates continuous conversation tracking, allowing testers to maintain context through multiple rounds of interaction.

How it works

  • User Input: Testers can select a skill category, enter the test description, and upload an exemplar file for comparison.
  • Preprocessing: The generator preprocesses the transcripts by cleaning up the metadata, which improves the overall performance of the LLM. Preprocessed transcripts are critical as they ensure that the LLM receives clean, structured input to work with.
  • LLM-powered Conversation: The LLM generates transcripts based on these interactions, ensuring that multiple dialogue turns are recorded accurately, maintaining the conversation context, and producing precise documentation for further testing or manual review.

By introducing multi-turn capabilities, the generator automates the previously manual transcript review process. This means testers no longer need to manage transcripts line by line, but instead can rely on the LLM to capture continuous conversation flows

Challenges Faced

We faced several challenges during the projects implementation. One challenge was managing the interaction between multiple Bedrock Agents simultaneously within the AWS console. Thus, we could not connect the agents in a seamless workflow. To solve this, I implemented the Amazon Step Functions as an alternative to manage the end-to-end pipeline for feature file generation.

Another challenge was integrating the NoCodeAutomation tool with third-party APIs. The rapidly evolving software versions and configurations made it difficult to ensure a smooth integration. This issue was partially resolved through the introduction of the Step Functions.

Key Accomplishments

By the end of my internship, I had successfully:

  • Developed Bedrock Agents: I created two types of agents—QA and Structure Agents—that worked together to generate feature files from user inputs. These agents were integrated into an enhanced knowledge base to ensure more accurate BDD statement generation.

  • Optimized Multi-turn LLM-based Transcript Generator: I developed and optimized the LLM-based Transcript Generator to handle multi-turn conversations with Alexa, simulating real user interactions based on the skill selected and test descriptions provided. By introducing a preprocessor, I significantly improved the quality of the generated transcripts, ensuring that the LLM could accurately capture continuous dialogue flows and reduce latency during testing.

Conclusion

In conclusion, the Multi-turn LLM-based Transcript Generator was one of the most impactful projects of my internship at Amazon. It not only automated a significant portion of the testing process but also paved the way for future innovations about automatic skill testing. Through this project, I gained a deep understanding of leveraging LLMs in real-world applications while contributing to the advancement of Amazon’s testing infrastructure.


Overall Experience

This is my fourth internship program in the United States, following my internship at Microsoft in the summer of 2023, where I formulated a new metric to measure the node interruption rate and characterize the uncertainty around the metric using a homogenous Poisson process. During this internship at Amazon, I worked with the Alexa-Cert-Tech team as a data scientist intern. I focused on automating BDD testing and developing a Multi-turn LLM-based Transcript Generator to streamline the testing of Alexa Banyan skills. I developed Bedrock Agents and integrated them with an enhanced knowledge base for more accurate feature file generation. The experience I gained in AI-driven automation and large language models (LLMs), along with the opportunity to improve my communication of technical concepts to diverse audiences, is invaluable for my future career.

Acknowledgments

I would like to express my gratitude to my PhD advisor, Dr. Jian Wu, for his boundless support and encouragement towards getting this internship, to my internship manager,  Praveen Chinnusamy, and mentors, Bheema Rajulu and Jingya Li, for guiding me throughout my internship by providing feedback and suggestions. I am thankful for the opportunity to work as a data scientist intern with the Alexa-Cert-Tech team!


-Kehinde Ajayi (@KennyAj)

Viewing all articles
Browse latest Browse all 738

Trending Articles