A video presentation that was prepared and recorded before the CHI 2023 conference. This format allows presenters to share their content in a digital format, facilitating broader accessibility and convenience. The pre-recorded video ensures a polished and concise delivery of the material for the conference attendees.
Introduction
Smartphones have become crucial for blind users for information access and communication, with menu selection being an essential task they frequently perform on these devices. A computational model that replicates blind users' menu selection process can significantly contribute to the development and assessment of accessible interfaces, allowing for early-stage testing of menu designs before user deployment. Despite its importance in HCI research, the study of menu selection has primarily centered on sighted users, leaving the experiences of blind users relatively unexplored.
To choose an item from a menu, a blind user initially navigates to it using auditory feedback from a screen reader, such as VoiceOver on iPhone or TalkBack on Android, then selects the item with a double tap. Current smartphones, like iPhone and Android, typically offer three navigation actions, each with its advantages and drawbacks:
- Swiping: Executed by a quick finger flick on the screen, swiping moves the selection focus by one item at a time. While easy to perform, it allows only sequential menu navigation (Figure 1b).
- Gliding: Involves sliding the finger on the screen to explore. This method lets users check items as their finger passes over them but requires continuous finger contact with the screen (Figure 1c).
- Direct Touch: This entails moving the finger over the air and then landing it on a menu item. This approach provides rapid access to a specific item, but accurately controlling the finger's landing position can be challenging without visual cues (Figure 1d).
A blind user may use any of these three actions, or a combination, to locate and select a menu item, depending on the menu's design. Possessing a computational model capable of simulating how a blind user selects menu items on various layouts would be a significant advancement. Not only would it deepen our understanding of users' interaction behaviors in this crucial task, but it would also serve as an invaluable tool in designing and optimizing menu interfaces.
In this paper, we develop and evaluate a predictive model based on Reinforcement Learning (RL) for simulating the menu selection behavior of blind users (Figure 1). We conceptualize menu selection as a stochastic sequential decision-making problem, where the blind user, functioning as an agent, chooses one action (like swiping, gliding, direct touch, or selection) at each step until the intended menu item is selected. We specifically model the process as a Partially Observable Markov Decision Process (POMDP), acknowledging that the agent (representing a blind user) has only partial environmental awareness through auditory feedback instead of visual cues. The agent makes decisions based on a belief system, selecting actions that maximize total expected rewards, essentially minimizing the time cost. Within this framework, we train an optimal policy for menu item selection in a given layout using a Deep Q-network (DQN). This optimal policy effectively replicates the menu selection behaviors of blind users. Finally, we evaluated the model by comparing its simulated selection times and action compositions with empirical data obtained from a user study. The results of this evaluation demonstrate that the proposed model is effective in accurately simulating the menu selection behavior of blind users.
Design
The problem of selecting an item from a vertical linear menu using only touch input and auditory feedback can be described as follows: Given a menu M consisting of N items, represented as M = {m1, m2, ..., mN}, and a target menu item mt, the objective is to quickly and accurately select mt from M. This selection should be made using various actions such as swiping (as illustrated in Figure 1b), gliding (Figure 1c), direct touch (Figure 1d), and the selection action itself (Figure 1e).
Figure 2 Li et al.: In this figure we quantify the time cost (MT) of each action in the menu selection process. The modeling is based on empirical data gathered from a user study on menu selection. |
We model blind users' menu selection as a Partially Observable Markov Decision Process (POMDP) (Figure 2). The model's states include the target item's position, the finger's position, and the focus item's position in the menu. Since the target item position is not directly observable, the agent estimates the finger and focus positions relative to the phone screen. A memory model updates beliefs about the target item's position based on interaction history. This leads to a probability vector indicating the likelihood of each menu item being the target. Actions include swiping, gliding, direct touch, and selection, each with different characteristics and modeled time costs. For swiping and gliding, the focus shifts to the target position; for direct touch, the actual landing position is modeled with a Gaussian distribution due to uncertainty. The reward for each action is the negative value of its time cost, and for correct selections, additional rewards are assigned to guide the agent's behavior.
Selection correctness affects rewards, with correct selections earning higher rewards. We train an optimal policy for menu selection using a Deep Q-Network (DQN), considering the menu layout's impact on memory and action uncertainty, especially in direct touch, which is modeled with a Gaussian distribution to account for landing item uncertainty. This comprehensive approach aims to simulate blind users' menu navigation strategies effectively.
Note 1: The memory model (position memory) plays a crucial role, where the recalled position of an item is assumed to follow a Gaussian distribution, becoming more accurate with each visit. The position memory is represented as a probability distribution pi = {p1, p2, ..., pN}, where N is the length of the menu and pi is the normalized probability that the target menu item is at the ith position, calculated using a Gaussian distribution.
User Study
We conducted a user study on how blind users navigate linear menus. The study had two primary objectives: firstly, to empirically establish the parameter values for their proposed model, such as the duration of a swiping action. These parameters are essential for the model to simulate interaction behaviors through reinforcement learning accurately. Secondly, the study aimed to assess the model's effectiveness by comparing its generated interaction behaviors with those observed in real users and evaluating whether the model could accurately predict users' menu selection times and actions.
We engaged ten legally blind participants, four female and six male individuals aged between 34 and 60. These participants were selected specifically for their inability to use visual feedback in menu selection tasks. Detailed demographic information about these participants is provided in Table 1.
Result
- Selection Time: Analysis revealed longer selection times for 10-item menus compared to 6-item menus. Among 10-item menus, the grouped layout had the shortest selection time. A linear mixed-effects model showed significant effects of menu length and arrangement and their interaction on selection time (Figure 7a).
- Selection Accuracy: The overall accuracy of menu selection was high at 96.02%. Statistical analysis indicated no significant effect of menu length or arrangement on selection accuracy (Figure 4).
- Actions Analysis: The study recorded 13,856 actions (swiping, gliding, direct touch, and selection). Analysis of action sequences showed that direct touch usually preceded gliding, indicating a specific navigation pattern. The action composition (Figure 6) was consistent across conditions, but individual preferences varied among participants. Based on these preferences, participants were categorized into two subgroups for model evaluation, considering different training strategies.
Model Implementation
Table 4 Li et al.: This table presents empirical data on the variability of time taken (MT) for performing gliding and direct touch actions over different distances. |
Simulated vs. Observed Action Composition: The action compositions simulated by the models closely matched those observed in the user study, with minor discrepancies in the exact numbers of actions. The models predicted 100% selection accuracy, slightly higher than the observed 96.02%, indicating a small gap in capturing certain user errors (Figure 9).
Figure 10 Li et al.: This figure contrasts the observed selection strategies in the user study (shown in blue) with the predictions made by the two-subgroup model (illustrated in green). |
Train-Test Split Evaluation
To assess the model's generalizability, we conducted a train-test split evaluation, using 80% of the data for training and reserving 20% for testing. This approach ensured that the testing data were completely unseen by the model. Two types of validations were performed: leave-trial-out and leave-user-out.
Leave-trial-out Validation: In this method, 80% of trials were randomly chosen for training, and the remaining 20% were used for testing. This process was repeated 10 times. Each iteration simulated 6 conditions × 60 trials for each of the 10 users. The averaged mean selection time in the testing dataset was compared to that predicted by the one-group model (Figure 11). The results demonstrated the model's ability to predict the effects of menu length on selection time, with a mean absolute error (MAE) of 0.58 seconds and a mean absolute percentage error (MAPE) of 9.71%. As depicted in Figure 12, the examination of action composition further demonstrated the model's capability to predict the distribution of different actions accurately.
Discussion
While extensive research has focused on how sighted users navigate menu selection, there's a gap in knowledge regarding blind users' menu selection behaviors. This study introduces a computational model that simulates menu selection processes by blind users. The findings emphasize the importance of understanding how these users recall the positions of menu items, crucial in the context of non-visual feedback. Additionally, this research expands upon previous studies about blind users by incorporating a range of actions like swiping, gliding, direct touch, and selection into the model, rather than limiting it to a single action.
The model successfully captures several effects observed in user studies, such as longer selection times for extended menus, decreasing selection times with practice, varied action compositions, and the users' menu selection strategies. Notably, it outperforms input action model-based methods like Fitts' law in predicting selection times, and uniquely, it also predicts menu selection strategies and action compositions.
The model holds significant applications in interface design, allowing for the simulation of user behavior and thus aiding in interface optimization and evaluation. For instance, it can quantitatively assess menu performance for both novice and expert users, as shown in Figure 15, which illustrates the predicted selection times across different menu lengths and practice blocks.
Figure 15 Li et al.: This figure illustrates the model's predictions of the average selection times for menus with varying item counts, organized alphabetically, across different blocks of trials. |
However, the model also has limitations and areas for future exploration. While it effectively uses the mixture pointing model for gliding actions, applying Fitts' law for direct touch actions has room for improvement. Future research could explore adjustments in model assumptions, such as the probability distributions for parameters like the focused menu item position, to enhance its accuracy and applicability.
Conclusion
- We developed a computational model simulating blind users' interactions with linear menus using screen readers, focusing on actions like swiping, gliding, and direct touch. The model, grounded in boundedly optimal control theory, treats menu selection as a stochastic sequential decision problem under Partially Observable Markov Decision Processes, factoring in auditory feedback and memory constraints.
- Trained using a Deep Q-Network in a simulated environment, the model's predictions were validated against empirical data from a user study with ten legally blind participants. It accurately represented the impact of menu length and arrangement on selection time and the proportion of different user actions.
- This research is a valuable resource for designing accessible interfaces, offering insights into blind users' menu selection behaviors, and aiding in developing optimized, user-friendly designs for visually impaired users.