Quantcast
Channel: Web Science and Digital Libraries Research Group
Viewing all articles
Browse latest Browse all 737

2023-12-29: Paper Summary: "Modeling Touch-based Menu Selection Performance of Blind Users via Reinforcement Learning"

$
0
0

 

The ACM CHI Conference on Human Factors in Computing Systems, often called 'kai,' is the foremost global conference in Human-Computer Interaction (HCI). This annual event unites a diverse array of researchers and practitioners from around the globe, encompassing a variety of cultures, backgrounds, and perspectives. Their common objective is to enhance the world by developing and applying interactive digital technologies. In this blog post, I explore the research paper co-authored by Dr. Vikas Ashok, titled "Modeling Touch-based Menu Selection Performance of Blind Users via Reinforcement Learning," published in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems.
In this study, we developed a computational model to simulate the menu selection methods of blind users, incorporating techniques such as swiping, gliding, and direct touch. A vital feature of this model is its emulation of long-term memory, predicting users' recall or forgetfulness of menu item positions influenced by their previous menu interactions. To validate the model's accuracy, its predictions were compared with data from an empirical study of ten blind users. The model successfully mirrored several factors, including how menu length and layout affected selection time, the nature of actions performed, and the general strategies used by the participants in menu selection.

A video presentation that was prepared and recorded before the CHI 2023 conference. This format allows presenters to share their content in a digital format, facilitating broader accessibility and convenience. The pre-recorded video ensures a polished and concise delivery of the material for the conference attendees. 

Introduction

Smartphones have become crucial for blind users for information access and communication, with menu selection being an essential task they frequently perform on these devices. A computational model that replicates blind users' menu selection process can significantly contribute to the development and assessment of accessible interfaces, allowing for early-stage testing of menu designs before user deployment. Despite its importance in HCI research, the study of menu selection has primarily centered on sighted users, leaving the experiences of blind users relatively unexplored.

Figure 1 Li et al.: We developed a computational model using a reinforcement learning (RL) agent to simulate non-sighted users' menu selection strategies using screen readers. This model partially perceives the menu through auditory feedback and then employs a memory model that accounts for long-term interaction history to track and update menu item positions. Subsequently, the agent uses a Deep-Q Network to associate the memory-based position beliefs with the optimal action, such as swiping, gliding, direct touch, or selection. The goal is to minimize the selection time, thereby optimizing the menu navigation experience for non-sighted users.


To choose an item from a menu, a blind user initially navigates to it using auditory feedback from a screen reader, such as VoiceOver on iPhone or TalkBack on Android, then selects the item with a double tap. Current smartphones, like iPhone and Android, typically offer three navigation actions, each with its advantages and drawbacks:

  1. Swiping: Executed by a quick finger flick on the screen, swiping moves the selection focus by one item at a time. While easy to perform, it allows only sequential menu navigation (Figure 1b).
  2. Gliding: Involves sliding the finger on the screen to explore. This method lets users check items as their finger passes over them but requires continuous finger contact with the screen (Figure 1c).
  3. Direct Touch: This entails moving the finger over the air and then landing it on a menu item. This approach provides rapid access to a specific item, but accurately controlling the finger's landing position can be challenging without visual cues (Figure 1d).

A blind user may use any of these three actions, or a combination, to locate and select a menu item, depending on the menu's design. Possessing a computational model capable of simulating how a blind user selects menu items on various layouts would be a significant advancement. Not only would it deepen our understanding of users' interaction behaviors in this crucial task, but it would also serve as an invaluable tool in designing and optimizing menu interfaces.


In this paper, we develop and evaluate a predictive model based on Reinforcement Learning (RL) for simulating the menu selection behavior of blind users (Figure 1). We conceptualize menu selection as a stochastic sequential decision-making problem, where the blind user, functioning as an agent, chooses one action (like swiping, gliding, direct touch, or selection) at each step until the intended menu item is selected. We specifically model the process as a Partially Observable Markov Decision Process (POMDP), acknowledging that the agent (representing a blind user) has only partial environmental awareness through auditory feedback instead of visual cues. The agent makes decisions based on a belief system, selecting actions that maximize total expected rewards, essentially minimizing the time cost. Within this framework, we train an optimal policy for menu item selection in a given layout using a Deep Q-network (DQN). This optimal policy effectively replicates the menu selection behaviors of blind users. Finally, we evaluated the model by comparing its simulated selection times and action compositions with empirical data obtained from a user study. The results of this evaluation demonstrate that the proposed model is effective in accurately simulating the menu selection behavior of blind users.


Design


The problem of selecting an item from a vertical linear menu using only touch input and auditory feedback can be described as follows: Given a menu M consisting of N items, represented as M = {m1, m2, ..., mN}, and a target menu item mt, the objective is to quickly and accurately select mt from M. This selection should be made using various actions such as swiping (as illustrated in Figure 1b), gliding (Figure 1c), direct touch (Figure 1d), and the selection action itself (Figure 1e).

Figure 2 Li et al.: In this figure we quantify the time cost (MT) of each action in the menu selection process. The modeling is based on empirical data gathered from a user study on menu selection.

We model blind users' menu selection as a Partially Observable Markov Decision Process (POMDP) (Figure 2). The model's states include the target item's position, the finger's position, and the focus item's position in the menu. Since the target item position is not directly observable, the agent estimates the finger and focus positions relative to the phone screen. A memory model updates beliefs about the target item's position based on interaction history. This leads to a probability vector indicating the likelihood of each menu item being the target. Actions include swiping, gliding, direct touch, and selection, each with different characteristics and modeled time costs. For swiping and gliding, the focus shifts to the target position; for direct touch, the actual landing position is modeled with a Gaussian distribution due to uncertainty. The reward for each action is the negative value of its time cost, and for correct selections, additional rewards are assigned to guide the agent's behavior.

Selection correctness affects rewards, with correct selections earning higher rewards. We train an optimal policy for menu selection using a Deep Q-Network (DQN), considering the menu layout's impact on memory and action uncertainty, especially in direct touch, which is modeled with a Gaussian distribution to account for landing item uncertainty. This comprehensive approach aims to simulate blind users' menu navigation strategies effectively.

Note 1: The memory model (position memory) plays a crucial role, where the recalled position of an item is assumed to follow a Gaussian distribution, becoming more accurate with each visit. The position memory is represented as a probability distribution pi = {p1, p2, ..., pN}, where N is the length of the menu and pi is the normalized probability that the target menu item is at the ith position, calculated using a Gaussian distribution.

User Study

We conducted a user study on how blind users navigate linear menus. The study had two primary objectives: firstly, to empirically establish the parameter values for their proposed model, such as the duration of a swiping action. These parameters are essential for the model to simulate interaction behaviors through reinforcement learning accurately. Secondly, the study aimed to assess the model's effectiveness by comparing its generated interaction behaviors with those observed in real users and evaluating whether the model could accurately predict users' menu selection times and actions.

We engaged ten legally blind participants, four female and six male individuals aged between 34 and 60. These participants were selected specifically for their inability to use visual feedback in menu selection tasks. Detailed demographic information about these participants is provided in Table 1.

Table 1 Li et al.: Demographic information of the study participants, all of whom are legally blind and regular phone users. The label "NA" indicates that a participant chose not to disclose that particular piece of information.

Figure 3 Li et al.: Participant comfortably seated at a table, holding a phone in one hand and interacting with it using the other. The setup was used to confirm participants' familiarity with the system-supported actions – swiping, gliding, direct touch, and selection – as demonstrated in Fig. 1, through several trial runs in a warm-up session.

We implemented a [2×3] within-subject design for their study. The design included two independent variables: (1) menu length, which was divided into two levels - a 6-item and a 10-item linear menu, and (2) menu arrangement, which consisted of three levels - Alphabetic, Grouped, and Random.

Result


Figure 4 Li et al.: This figure displays the mean selection accuracy across different conditions. The overall selection accuracy in the study was remarkably high at 96.02%, with 3523 out of 3669 selections being successfully executed.

In the study, we focused on analyzing the selection time and actions used by participants during menu selection. Selection time was defined as the duration from the start of a trial to the selection of the target menu item.

  1. Selection Time: Analysis revealed longer selection times for 10-item menus compared to 6-item menus. Among 10-item menus, the grouped layout had the shortest selection time. A linear mixed-effects model showed significant effects of menu length and arrangement and their interaction on selection time (Figure 7a).
  2. Selection Accuracy: The overall accuracy of menu selection was high at 96.02%. Statistical analysis indicated no significant effect of menu length or arrangement on selection accuracy (Figure 4).
  3. Actions Analysis: The study recorded 13,856 actions (swiping, gliding, direct touch, and selection). Analysis of action sequences showed that direct touch usually preceded gliding, indicating a specific navigation pattern. The action composition (Figure 6) was consistent across conditions, but individual preferences varied among participants. Based on these preferences, participants were categorized into two subgroups for model evaluation, considering different training strategies.
Figure 5 Li et al.:  This matrix displays the sequence of actions, with the action on the Y-axis occurring before the one on the X-axis. The numbers represent the frequency of each sequence, and 'BoT' stands for 'Beginning of Trial'.

Figure 6 Li et al.: This figure demonstrates that each participant employed distinct menu selection strategies. Notably, participants 2, 6, and 7 (indicated by an asterisk) did not utilize the swiping action, while the others engaged in all four actions.

Model Implementation

In the methodology, we employed a two-phase process to determine the model's parameters (Table 2). Initially, we configured the parameters for input actions and memory models, aligning them with empirical data from the user study. Subsequently, we trained the reinforcement learning policy within a simulated menu environment that resembles an actual menu, allowing the model to learn through experimentation.

Table 2 Li et al.: This table details the parameters used in the models. 'MT' denotes the movement time for each action, calculated as MT = log2(D/W + 1), where D is the absolute moving distance and W, set at 1 cm, represents the menu item height. 'R²' measures the model's fit, indicating the coefficient of determination. 'AP' signifies the standard deviation, derived empirically from the user study, with further details in Table 4. 

Table 4 Li et al.: This table presents empirical data on the variability of time taken (MT) for performing gliding and direct touch actions over different distances.

We consolidated the optimized parameters obtained from all user data for the input actions and memory model in Table 2. Additionally, we provided a summary of the standard deviations for the gliding and direct touch actions in Table 4.

Model Evaluation

To evaluate the reinforcement learning framework for simulating blind users' menu selection behavior, we compared the model's simulated behaviors and performance predictions with actual user study data. We performed rigorous tests, splitting the user data into training and testing sets and conducting leave-trial-out and leave-user-out validations, ensuring the model's generalizability. 

Simulating Menu Selection Behaviors: Simulations were run using a one-group and two-subgroup models. For the one-group model, simulations replicated the user study's 10 users, each undergoing 6 conditions × 60 trials. For the two-subgroup model, subgroup A's model was applied to 7 users and subgroup B's to 3 users, each with 6 conditions × 60 trials. This matched the distribution of users in the study.

Simulated vs. Observed Selection Time: Figures 7b and 7c in the study display the simulated selection times for the one-group and two-subgroup models, showing mean absolute errors of 0.61 seconds and 0.39 seconds, respectively. These results demonstrate the model's accuracy in simulating the effect of menu length on selection times, with longer times for 10-item menus. Additionally, the model accurately reflected the impact of menu arrangements on selection times, particularly for 10-item menus. The study also observed a learning effect in both models, where selection times decreased as participants gained more experience, aligning with trends observed in the user study (Figure 8). 

 Figure 7 Li et al.: This figure compares the mean selection times as simulated by the one-group and two-subgroup models with those observed in the user study. The mean absolute error (MAE) between the model simulations and actual user study data is 0.61 seconds for the one-group model and 0.39 seconds for the two-subgroup model.
Figure 8 Li et al.: This figure displays the mean selection times divided into six blocks, with each block consisting of 10 trials out of the total 60. The division into blocks facilitates a detailed analysis of how selection times evolved over the course of the trials.

Comparison with Input Action Model-based Methods: We compared their reinforcement learning (RL) based model with traditional input action models like Fitts' law, gliding, and swiping models. The RL-based model outperformed these methods in predicting selection times, demonstrating its superiority in reflecting actual menu selection behaviors. As summarized in Table 3, the evaluation results show that the reinforcement learning (RL) model excelled over the three input action model-based methods in accurately modeling menu selection times across all trials. A detailed examination of the results, particularly in Figure 16, revealed that the input action model-based methods were ineffective in capturing the menu arrangement's impact on selection time. 
Table 3 Li et al.: This table presents a comparison between the reinforcement learning (RL)-based model and three input action model-based methods, showcasing their effectiveness in predicting menu selection times.
Figure 16 Li et al.: This figure illustrates the predicted mean selection times for each condition according to input action model-based methods. Dashed bars represent the observed mean selection times from the user study for comparison.

Simulated vs. Observed Action Composition: The action compositions simulated by the models closely matched those observed in the user study, with minor discrepancies in the exact numbers of actions. The models predicted 100% selection accuracy, slightly higher than the observed 96.02%, indicating a small gap in capturing certain user errors (Figure 9). 
Figure 9 Li et al.: This figure displays the distribution of different actions across conditions, as simulated by the model. It compares these simulations with actual user study data, noting a mean absolute error (MAE) of 3.78% for the one-group model and 3.89% for the two-subgroup model.

Simulated vs. Observed Menu Selection Strategy: The models successfully predicted the observed menu selection strategies in the user study (Figure 10). Strategies included swiping only, gliding only, and a combination of both. The two-subgroup model's predictions were particularly close to the observed data, affirming the model's accuracy in simulating different user behaviors and preferences.

Figure 10 Li et al.: This figure contrasts the observed selection strategies in the user study (shown in blue) with the predictions made by the two-subgroup model (illustrated in green).

Train-Test Split Evaluation

To assess the model's generalizability, we conducted a train-test split evaluation, using 80% of the data for training and reserving 20% for testing. This approach ensured that the testing data were completely unseen by the model. Two types of validations were performed: leave-trial-out and leave-user-out.

Leave-trial-out Validation: In this method, 80% of trials were randomly chosen for training, and the remaining 20% were used for testing. This process was repeated 10 times. Each iteration simulated 6 conditions × 60 trials for each of the 10 users. The averaged mean selection time in the testing dataset was compared to that predicted by the one-group model (Figure 11). The results demonstrated the model's ability to predict the effects of menu length on selection time, with a mean absolute error (MAE) of 0.58 seconds and a mean absolute percentage error (MAPE) of 9.71%. As depicted in Figure 12, the examination of action composition further demonstrated the model's capability to predict the distribution of different actions accurately.

Figure 11 Li et al.: This figure displays the aggregated average selection times for each condition from ten iterations of the leave-trial-out validation, highlighting the mean MAE (Standard Deviation) across these repetitions as 0.58 seconds (±0.09 seconds).

Figure 12 Li et al.: This figure presents the consolidated average distribution of actions for each condition, based on ten repetitions of the leave-trial-out validation. It notes a mean MAE (Standard Deviation) of 5.32% (±1.00%) across these repetitions.

Leave-user-out Validation: Here, data from 8 random users were used for training, and the data from the remaining 2 users were for testing. This procedure was also repeated 10 times. The one-group model’s predictions were compared to the testing dataset (Figure 13). The analysis showed that the model accurately captured the effect of menu length on selection time, though the prediction error was higher compared to the leave-trial-out validation, likely due to variations in individual user behavior.

Figure 13 Li et al.: This figure illustrates the aggregated mean selection times for each condition, derived from ten iterations of the leave-user-out validation, with the mean MAE (Standard Deviation) across these iterations reported as 0.93 seconds (±0.54 seconds).

Figure 14 Li et al.: This figure shows the combined average frequencies of different actions for each condition, based on ten cycles of the leave-user-out validation. It highlights a mean MAE (Standard Deviation) of 9.04% (±2.38%) across these repetitions.

Overall, these evaluations confirmed the model's ability to predict selection times, effects of menu length, and action composition. The leave-trial-out validation indicated close accuracy to results presented in previous sections, while the leave-user-out validation demonstrated effective capture of the effect of menu length, albeit with higher prediction error.

Discussion

While extensive research has focused on how sighted users navigate menu selection, there's a gap in knowledge regarding blind users' menu selection behaviors. This study introduces a computational model that simulates menu selection processes by blind users. The findings emphasize the importance of understanding how these users recall the positions of menu items, crucial in the context of non-visual feedback. Additionally, this research expands upon previous studies about blind users by incorporating a range of actions like swiping, gliding, direct touch, and selection into the model, rather than limiting it to a single action.

The model successfully captures several effects observed in user studies, such as longer selection times for extended menus, decreasing selection times with practice, varied action compositions, and the users' menu selection strategies. Notably, it outperforms input action model-based methods like Fitts' law in predicting selection times, and uniquely, it also predicts menu selection strategies and action compositions.

The model holds significant applications in interface design, allowing for the simulation of user behavior and thus aiding in interface optimization and evaluation. For instance, it can quantitatively assess menu performance for both novice and expert users, as shown in Figure 15, which illustrates the predicted selection times across different menu lengths and practice blocks.

Figure 15 Li et al.: This figure illustrates the model's predictions of the average selection times for menus with varying item counts, organized alphabetically, across different blocks of trials.

However, the model also has limitations and areas for future exploration. While it effectively uses the mixture pointing model for gliding actions, applying Fitts' law for direct touch actions has room for improvement. Future research could explore adjustments in model assumptions, such as the probability distributions for parameters like the focused menu item position, to enhance its accuracy and applicability.


Conclusion


  • We developed a computational model simulating blind users' interactions with linear menus using screen readers, focusing on actions like swiping, gliding, and direct touch. The model, grounded in boundedly optimal control theory, treats menu selection as a stochastic sequential decision problem under Partially Observable Markov Decision Processes, factoring in auditory feedback and memory constraints.
  • Trained using a Deep Q-Network in a simulated environment, the model's predictions were validated against empirical data from a user study with ten legally blind participants. It accurately represented the impact of menu length and arrangement on selection time and the proportion of different user actions.
  • This research is a valuable resource for designing accessible interfaces, offering insights into blind users' menu selection behaviors, and aiding in developing optimized, user-friendly designs for visually impaired users.

References

Li, Z., Ko, Y.J., Putkonen, A., Feiz, S., Ashok, V., Ramakrishnan, I.V., Oulasvirta, A. and Bi, X., 2023, April. Modeling Touch-based Menu Selection Performance of Blind Users via Reinforcement Learning. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1-18).

- YASH PRAKASH (@LunaticBugbear)

Viewing all articles
Browse latest Browse all 737

Trending Articles