EICS 2024
marks the sixteenth international
ACM SIGCHI conference,
focused on engineering interactive computing systems and their user
interfaces. The conference explores research at the intersection of user
interface design, software engineering, and computational interaction. Our
research paper, "All in One Place: Ensuring Usable Access to Online Shopping Items for Blind Users," was published in the June 2024 issue of the
Proceedings of the ACM on Human-Computer Interaction, Volume 8, in the EICS
category.
In this blog post, I will summarize our research paper, which focuses on
alleviating the significant interaction challenges that blind users
encounter when navigating through dispersed content across multiple sections
and pages on shopping websites. These issues arise because information
related to shopping items is frequently spread across different web page
sections, requiring users to move back and forth—a task that becomes
incredibly tedious and cumbersome for those relying on screen readers.
Additionally, even if users become familiar with the structure of one
shopping site, they must readapt when encountering a different site with a
new layout, as there is no uniform structure across shopping platforms. Our
paper proposes a solution that consolidates this dispersed content into a
single, consistent, and accessible interface, simplifying the browsing
experience for blind users and significantly reducing the time and effort
needed to access essential information.
Introduction
Online shopping involves purchasing goods or services over the Internet,
where customers browse, select, and buy products through websites or apps.
Content on these shopping platforms is often organized across multiple web
pages, such as a 'Query-Results' page summarizing items and 'Details'
pages providing complete information to streamline the user experience
(Figure 1 (A) and (B)). While sighted users benefit from visual cues that
allow quick scanning and information retrieval, blind users face a more
significant challenge. Their reliance on screen reader technology, which
primarily supports linear, one-dimensional content access, requires them
to invest additional time and effort to gather the same information.
When blind users seek comprehensive information from various sections of
the 'Details page,' such as descriptions, specifications, and reviews,
they must not only locate these sections but also mentally retain
information across multiple products to make informed decisions. For
example, imagine a user searching for a television with features like 4K
resolution and smart functionality. They start by navigating to the search
bar using the 'TAB' key or other shortcuts, entering a query like 'TV.'
After submitting the query, they browse through the list of item summaries
on the 'Query-Results page' using basic navigation keys. Upon selecting a
TV, they click the link to access the 'Details page,' where they use
various shortcut keys to check the specifications and then move to the
review section to assess user feedback on picture quality and smart
features. To compare another TV, the user must return to the
'Query-Results page,' repeating this process for each item. This
back-and-forth navigation requires significant manual effort, making it
difficult for blind users to efficiently compare multiple televisions,
adding considerable strain to their shopping experience. Here is a small
demo of basic screen reader navigation on Amazon:
In this paper, we introduce InstaFetch, a browser extension designed to enhance the online e-commerce experience for blind screen reader users, particularly when interacting with web data items. InstaFetch streamlines the information retrieval process by offering a direct query feature, allowing users to input specific queries about any data item and receive immediate responses, as shown in Figure 1 (C). Additionally, InstaFetch consolidates all relevant information—such as product details, specifications, and customer reviews—scattered across multiple pages into a single, consistent, screen reader-friendly interface, as illustrated in Figure 1 (D). Here is a demo showcasing the functionality of InstaFetch:
In a user study with 14 blind participants, InstaFetch was shown to
significantly decrease the need to access 'Details' pages compared to both a
state-of-the-art solution (SaIL) and their preferred screen reader, thereby reducing the burdensome
navigation between 'Details' and 'Query-Results' pages. Additionally,
InstaFetch improved the average time spent and the number of keys pressed per
data item, allowing participants to browse more items within the same
timeframe. Participants reported that InstaFetch reduced interaction fatigue
and increased their chances of finding better deals online.
InstaFetch Architectural Workflow
Figures 2(a) and (b) depict the architectural workflow of
InstaFetch, a web browser extension developed using
Google's open-source guidelines (not publicly available due to data privacy concerns and real-world
development challenges). Upon loading the 'Query-Results' webpage,
InstaFetch utilizes the
STEM algorithm
to identify item summaries and embeds an 'Options' button within each
summary. When a user selects an item by clicking the 'Options' button, the
Selenium webDriver
captures snapshots of the entire 'Details' page. These snapshots are then
processed by the
Mask R-CNN
model, which is trained with
Matterport's open-source code, to extract item details such as descriptions, specifications, and
reviews. The model was evaluated on 20 new websites, achieving a Mean
Average Precision (MAP) of 75.4% at a 50% Intersection over the Union (IoU)
threshold and 69.7% at a 75% IoU threshold, with a total loss of 0.529 at
convergence, indicating its accuracy in identifying regions of interest. The
Tesseract OCR engine processes the extracted item details, and a custom DOM search
algorithm subsequently retrieves relevant DOM subtrees for these details,
which are then stored as context within the Content Model.
InstaFetch leverages the contextual information stored in the Content Model to
support natural language queries, allowing users to ask product-specific
questions like "What are the battery life details?" or "Does this camera have
a warranty?" A pre-trained
LLaMA model
(LLM) is guided by prompt engineering to generate accurate, product-specific
responses by drawing on information from various sections of a webpage. By
incorporating
Chain-of-Thought (CoT)
and ReAct prompt
techniques, the LLM can break down complex questions step by step and take
proactive actions, such as retrieving the latest prices or checking reviews.
For example, when asked, "What's the battery life, and is it good?" the system
uses CoT to identify relevant details in specifications or reviews and then
applies ReAct to verify and summarize this information. The LLM responses were
evaluated using BLEU scores, achieving 0.78, with annotators rating the
responses 8 for factuality, 6.6 for relevance, and 9.2 for grammaticality.
Additionally, InstaFetch visualizes relevant content when users select
descriptions, specifications, or reviews (Figure 6 (4), (5), and (6)).
InstaFetch User Interface Recap
When a user lands on a shopping platform webpage and clicks the options
button (Figure 6 (1)), the InstaFetch overlay popup appears, offering four
functional tabs: 'Query,''Description,''Specifications,' and 'Reviews.'
These tabs are designed for easy navigation using standard 'TAB' and 'ARROW'
keys (Figure 6 (2)). Users can submit product-related questions via the
'Query' tab, with responses displayed below the form (Figure 6 (3)). In
contrast, the other tabs reflect and display content directly from the
'Details' page (Figure 6 (4), (5), and (6)). InstaFetch is fully compatible
with standard screen reader shortcuts, allowing seamless integration without
additional keyboard shortcuts.
User Study
A total of 14 blind participants were recruited through email lists and
snowball sampling, selected based on their experience with screen readers,
familiarity with the Chrome browser, and English proficiency. The group had a
balanced gender representation (6 female, eight male) with a mean age of 31.14
years, and none reported additional impairments that would affect task
completion. Table 1 provides detailed demographic information.
Table 1 Yash Prakash et al.: Demographics of blind participants in the InstaFetch evaluation study. The table illustrates the diverse backgrounds and online shopping behaviors of the participants. |
In a within-subject experimental design, participants were asked to complete
an online shopping task under three conditions: using their preferred screen
reader, with the SaIL state-of-the-art solution, and with the InstaFetch
browser extension. Each condition involved browsing a product list on a
different e-commerce website (Amazon, Etsy,
eBay) and selecting a
product that best matched their preferences, simulating real-world shopping
scenarios.
Task Performance Metrics
Table 2: Comparison of performance metrics of Screen Reader, SaIL, and InstaFetch across average time spent, shortcut presses, and items covered per task. |
In the study, participants spent the least time per data item using
InstaFetch (182 seconds) compared to SaIL (310 seconds) and screen readers
(478 seconds). InstaFetch also required fewer keyboard shortcuts (57) and
allowed more unique items to be explored (6.8 items) than SaIL and screen
readers (Table 2). Statistical analysis confirmed that InstaFetch
significantly outperformed screen readers and SaIL in all metrics,
highlighting its effectiveness in improving the user experience (Figure 4).
Query-Related Metrics
In the InstaFetch condition, 12 participants used natural language queries,
with some exploring out of curiosity while others were more focused and
refined their questions as they gained confidence. The system's response
accuracy was less than 50%, leading to varied participant reactions. While
most of the 12 participants brushed off incorrect responses, a few were
frustrated. Among them, some rephrased their queries, while others turned to
manual searches, highlighting both the potential and limitations of the
system in handling user queries effectively. Refer to Table 3
for detailed query-related metrics in the InstaFetch study.
SUS and TLX Scores
Table 4: Comparison of SUS (System Usability Scale) and NASA-TLX (Task Load Index) scores across Screen Reader, SaIL, and InstaFetch, showing average scores and standard deviations (SD). |
The
System Usability Scale (SUS)
questionnaire assessed usability by having participants rate various Likert
items, with responses aggregated into a single usability score, where higher
scores indicate better usability. InstaFetch received notably higher
usability ratings compared to both the screen reader and SaIL (Table 4). The
NASA Task Load Index (NASA-TLX)
measured perceived workload, with lower scores indicating less effort.
Participants reported a significantly lower workload for InstaFetch compared
to the other two conditions (Table 4). Overall, InstaFetch outperformed both
the screen reader and SaIL in terms of usability and reducing perceived
workload (Figure 5).
Qualitative Feedback
A common frustration reported by almost all participants was the tedious and
time-consuming process of navigating between pages to find the desired
information. Many appreciated InstaFetch for consolidating item information in
one place, reducing the need for revisits, and improving memory retention.
However, participants found navigation within a single webpage cumbersome,
often requiring them to sift through irrelevant content. While SaIL helped
filter out some of this content, InstaFetch was praised for presenting only
item-related information. Some participants suggested adding navigation
support within long item segments, like reviews and expressed a desire for
voice-based input and intelligent assistants to streamline interactions.
Additionally, participants reported experiencing interaction fatigue when
shopping online, which led to missing out on good deals. However, they noted
that InstaFetch significantly reduced these burdens, allowing them to consider
more items before deciding.
Conclusion
The traditional distribution of web data across multiple pages is challenging for blind users, leading to frustrating navigation experiences. Additionally, even if users become familiar with the structure of one shopping site, they must readapt when encountering a different site with a new layout, as there is no uniform structure across shopping platforms. InstaFetch, a browser extension designed for visually impaired users, centralizes essential item information—such as product descriptions, specifications, and reviews—into a single, screen reader-friendly interface. It also features a query function that allows users to directly access specific item-related information by simply posing a question, making the browsing experience more efficient and less cumbersome. In a study involving 14 blind participants, InstaFetch significantly outperformed standard screen readers and a state-of-the-art alternative, demonstrating its potential to enhance the online shopping experience for visually impaired users significantly.
References
Prakash, Y., Nayak, A.K., Sunkara, M., Jayarathna, S., Lee, H.N. and Ashok,
V., 2024.
All in One Place: Ensuring Usable Access to Online Shopping Items for Blind
Users. Proceedings of the ACM on Human-Computer Interaction, 8(EICS), pp.1-25.
- YASH PRAKASH (@LunaticBugbear)