2024-09-13: Paper Summary: Uncertainty Quantification in Table Structure Recognition

Figure 1. An illustration of the differences between aleatoric and epistemic uncertainties (Yang et al., 2023).

Introduction

Table Structure Recognition (TSR) is a task of document analysis that focuses on identifying rows and columns in digital table images [4]. While current TSR methods can identify cell locations, they lack the ability to predict uncertainties in their results [1]. This limitation has hindered the real-world application of TSR, such as automatically extracting data from table images in physical sciences.

In this blog post, we summarize our paper titled "Uncertainty Quantification (UQ) for Table Structure Recognition", presented at the 2024 IEEE International Conference on Information Reuse and Integration for Data Science. In this paper, we proposed a method called TTA-m(Test-Time Augmentation with multiple models) that aims to quantify uncertainties in TSR predictions, potentially enhancing how we extract and verify tabular data from digital documents.

Figure 2. A schematic illustration of the proposed UQ pipeline (TTA-m) (Figure 1 in our paper)

Dataset

We utilized the ICDAR 2019 dataset, which consists of both modern and historical tables:

Modern dataset: Includes tables from scientific papers, forms, and financial documents.
Historical dataset: Comprises hand-written accounting ledgers and train schedules.

Following the approach of Prasad et al. [2], we selected 543 table images from the ICDAR 2019 dataset. We randomly selected 443 table images for training our models and the remaining 100 table images for evaluation. This dataset provides diverse table structures and complexities, allowing for a robust evaluation of the proposed UQ method.

Method

The key components of our proposed UQ pipeline consist of the following:

Data Augmentation

Data augmentation has become a practice of developing robust and transformation-invariant models. We implemented a set of M = 4 distinct data augmentation techniques in the training and testing stages. The techniques include the elimination of all lines (NLT), the addition of horizontal lines (HLT), the addition of vertical lines (VLT), and the addition of both horizontal and vertical lines (HLT + VLT). Figure 2 presents examples of these augmented table images.

Figure 3. Augmentation examples of a table image (Figure 2 in our paper)

Model Training

We fine-tuned CascadeTabNet, a TSR model originally proposed by Prasad et al. [2], on both original and augmented table images.

Test-Time Augmentation (TTA)

Our baseline model is the vanilla TTA. TTA is an ensemble method that applies various data augmentations to the input during inference, generating multiple predictions which are then combined to produce a more robust final output. In this paper, we modified the vanilla TTA. Specifically, during inference, the model was applied to testing data that were augmented using the same method as the fine-tuning data. Then, we ensembled the outputs of the multiple models.

Confidence Estimation via Ensembles

The process of uncertainty estimation involves combining predictions from multiple models to generate a set of merged cells, each with an associated confidence score. Here’s a more detailed summary:

Initial Setup: We randomly selected a model from a set of M+1 models as the base model.

Bounding Box Collection: We gathered all the predicted bounding boxes from the base model.

Comparison and Merging: We took the predicted bounding boxes from the second model and compared them with the predicted bounding boxes by the base model using the Intersection over Union (IoU) metric. If the IoU between a pair of cells from the base and second model meets or exceeds a predefined threshold θ > 0, we merge the two cells into one. We removed the merged cell from the second model’s list.

Iterative Merging: We repeated the comparison and merging process for the remaining models (from the third to the M+1 models), always comparing with the base model’s cells.

Sequential Model Use: After processing with the initial base model, we sequentially selected other models as the new base models and repeated the merging process for any cells that have not been merged yet.

Confidence Score Calculation: For each merged cell combination created during the process, we counted how many distinct models contributed to that combination. We calculated the confidence score for each combination by dividing this count by M+1.

Baseline Methods

We compared our proposed UQ pipeline with the vanilla TTA and other variants such as TTA-t and TTA-tm as shown in Figure 3. TTA-t adds a small cell filter to the vanilla TTA to exclude small cells predicted by fine-tuned models. TTA-tm combines TTA-t and TTA-m, incorporating training data augmentation and the small cell filter.

Figure 4. A schematic comparison of TTA variants implemented by this paper (Figure 5 in our paper)

We also compared our work with an active learning model proposed by Choi et al. [3]. Their model aims to reduce labeling costs by selecting only the most informative samples in a dataset. It uses a mixture density network that estimates a probabilistic distribution for each localization and classification head’s output to explicitly assess the aleatoric and epistemic uncertainty in a single forward pass of a single model. It uses a scoring function that aggregates these uncertainties for both heads to obtain every image’s informativeness score.

Evaluation

To assess the effectiveness of our UQ pipeline, we proposed two novel evaluation techniques: masking and cell complexity quantification methods, because there are no annotated ground truth table cells.

Masking Method

This involves modifying the difficulty of table image recognition by adjusting pixel intensity. Specifically, we doubled and tripled the pixel intensities (capped at 255) of table images in our training set. Then, we evaluated the confidence scores of the TSR model at each intensity level.

Cell Complexity Quantification

We modeled the table images as non-directed graphs, with cells as nodes and adjacencies as edges. We considered four types of adjacencies: left, top, right, and bottom. Then, we manually annotated the relations between cells to construct graphs for each table in the test set.

Figure 5. An example of the graph model of a table (Figure 4 in our paper)

Evaluation Results

Performance Comparison with Baseline Methods

We compared the TSR results of our proposed model, TTA-m with the TTA variants and the active learning method. Based on the results in Table 1, TTA-m outperformed the baseline methods. The combined TTA-tm approach showed further improvements.

Table 1. Comparing TSR results of models used in our study (Table 1 in our paper)

Masking Method Results

Cell pixel intensity significantly influenced the distribution of confidence scores in our TSR model. As pixel intensity decreased (making cells fainter), the difficulty of accurate detection increased. This should lead to higher levels of uncertainty from the TSR model as shown in Figure 4.

Figure 6. Effects of masking on UQ in TSR. m1: no masking applied; m2: pixel values doubled; m3: pixel values tripled (Figure 7 in our paper)

Cell Complexity Quantification Results

Based on Table 2, the mean confidence level decreased as the degree of relationships between cells increased from 1 to 6, with an exception at degree 5. This suggests that the model's confidence is inversely related to the complexity of cell relationships in a table.

Table 2. Quantifying cell complexity based on the adjacency degree of table cells (Table 2 in our paper)

Conclusion

This paper investigated uncertainty quantification (UQ) in table structure recognition (TSR) by adapting the traditional test-time augmentation (TTA) technique and applying it to a customized CascadeTabNet [2] model. To assess the effectiveness of our UQ method, we employed masking and cell complexity quantification techniques. These techniques adjust cell pixel intensity and determine cell complexity based on relationships among cells in table images at different confidence levels. Our experiments demonstrated that the proposed UQ method offers more reliable uncertainty estimation compared to the standard TTA approach.

Unlike the vanilla TTA, which only considers data variation, our method extends data augmentation to the training phase, factoring in both data and model variations. While this increases the computational cost, it provides a more robust uncertainty estimation for TSR models. When applying our pipeline to datasets without ground truth labels, pre-fine-tuned models can be used, requiring only test-time augmentation.

Our approach is versatile and can be applied to various TSR models, ensuring that the model remains invariant to different table types through data augmentation. However, the study has limitations. The absence of ground truth data restricts our ability to accurately assess uncertainties. Although pixel intensity and adjacency degree served as proxies for predicting uncertainties, real-world data may present more complex scenarios. Ground truth data could be improved by incorporating human corrections of automatic annotations by TSR models. Additionally, our table image augmentation techniques might not capture the full range of variations found in table images. This limitation could be addressed by developing a comprehensive library of heuristics for image modification or creating a corpus of artificially synthesized tables.

Acknowledgment

I would like to acknowledge Leizhen Zhang for running key experiments crucial to our published paper and express my gratitude to Dr. Yi He for co-advising me alongside my advisor, Dr. Jian Wu throughout this work.

[1] K. Ajayi, L. Zhang, Y. He, and J. Wu, "Uncertainty Quantification in Table Structure Recognition" 2024 IEEE International Conference on Information Reuse and Integration for Data Science, 2024.

[2] D. Prasad, A. Gadpal, K. Kapadni, M. Visave, and K. Sultanpure, "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents" in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 572–573.

[3] J. Choi, H. Yoo, S. Lee, S. Yoon, and H. Yang, "Uncertainty-Aware Attention Gate U-Net for Active Learning in Medical Image Segmentation," in IEEE Access, vol. 9, pp. 36170-36181, 2021.

[4] Pascal Fischer, Alen Smajic, Giuseppe Abrami, and Alexander Mehler. "Multi-type-td-tsr–extracting tables from document images using a multi-stage pipeline for table detection and table structure recognition": From OCR to structured table representations. In German Conference on Artificial Intelligence (Künstliche Intelligenz), pages 95–108. Springer, 2021

[5] Yang, C.-I., & Li, Y.-P. (2023). "Explainable uncertainty quantifications for deep learning-based molecular property prediction". Journal of Cheminformatics.

2024-09-13: Paper Summary: Uncertainty Quantification in Table Structure Recognition

Dataset

Method

Data Augmentation

Model Training

Test-Time Augmentation (TTA)

Confidence Estimation via Ensembles

Baseline Methods

Evaluation

Masking Method

Cell Complexity Quantification

Evaluation Results

Performance Comparison with Baseline Methods

Masking Method Results

Cell Complexity Quantification Results

Conclusion

Acknowledgment

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112