BenchCouncil Transactions on Benchmarks, Standards and Evaluations (TBench)

Download Volume 6, Issue 1

Full Length Articles/Research articles

ABWS: The Arabic Boundary-aware Word Segmentation Benchmarkfor Reproducible Evaluation

Huda AlShuhayeb, Behrouz Minae-Bidgoli

Abstract

With the rapid adoption of natural language processing (NLP) systems for morphologically rich languages, it has become increasingly imperative to standardize a common set of measures and evaluation practices to ensure reproducibil- ity and fair comparison. Arabic word segmentation serves as a foundational layer in the NLP software stack; however, the field remains fragmented due to inconsistent datasets and an overreliance on opaque, aggregate metrics that mask systemic architectural biases.

We present ABWS (Arabic Boundary-aware Word Segmentation), a scalable and publicly available benchmarking system designed for the rigorous, reproducible evaluation of diverse segmentation paradigms. To enable paradigm-agnostic comparison across rule-based, statistical, and neural models, ABWS introduces a canonical boundary vector abstraction that normalizes disparate system outputs into a unified evaluation interface. The benchmarking harness includes a manu- ally verified gold-standard workload of 212,873 words across diverse genres and integrates seven widely used segmentation systems as reproducible baselines.

Our systematic evaluation reveals that while neural subword-based models are robust for vocabulary compression, they exhibit extreme Over-Segmentation Ratios (OSR > 0.58), leading to a significant drop in word-level exact match ac- curacy compared to rule-based engines. We further introduce Critical Boundary Accuracy (CBA), a linguistically weighted metric that prioritizes high-impact morphological boundaries. Our cross-layer analysis demonstrates that CBA is highly predictive of downstream performance in Machine Translation and Named Entity Recognition (ρ > 0.88), whereas tradi- tional token-level F1 scores often obscure these performance bottlenecks.

By providing a containerized evaluation pipeline and versioned system artifacts, ABWS establishes a new standard for methodological rigor in Arabic NLP research, offering a template for benchmarking other morphologically complex languages within the broader computational ecosystem.

TraceRTL: Agile Performance Evaluation for Microarchitecture Exploration

Zifei Zhang, Yinan Xu, Kaichen Gong, Sa Wang, Dan Tang, Yungang Ba

Abstract

While agile chip development methodologies have accelerated RTL design and simulation, performance evaluation re- mains constrained by challenges: (1) limited benchmark availability due to incomplete peripheral/software simulation environments or unavailable source code; (2) inefficient feature prototyping caused by the tight coupling between func- tional correctness and performance evaluation, particularly for large-scale, error-prone microarchitectures. To address these challenges, we propose TraceRTL, an agile, trace-driven performance evaluation methodology that decouples the functional and performance components of CPU RTL designs. It introduces three contributions to the benchmarking com- munity: (1) a trace-driven exploration framework that bypasses full functional correctness while preserving performance behavior and supports replaying workload traces on RTL designs; (2) a quantitative analysis and mitigation methodology to identify and reduce trace-driven performance discrepancies; (3) a trace transformation technique, TraceBridge, that converts benchmark traces between different formats and instruction sets. Using TraceRTL, we have developed the first trace-driven RTL CPU derived from XiangShan, a high-performance out-of-order RISC-V processor. TraceRTL achieves performance accuracy of 99.87% and 99.86% on SPECint2017 and SPECfp2017, respectively. With TraceBridge, we evaluate x86 Google workload traces on a RISC-V RTL CPU and reveal distinct memory-bound behavior.

Mapping the Intellectual Landscape of Blockchain in the Banking Industry: A Hybrid Bibliometric and Systematic Review (2015–2025)

Sadeq Abdullah Aladeeb, Fatima Zohra Sossi Alaoui

Abstract

The advent of blockchain technology has introduced new alternatives to traditional banking systems, providing a decen- tralized, secure, and transparent framework. However, its adoption is still complex and uneven for many reasons. This study provides a comprehensive mapping of the intellectual trajectory, thematic structure, and development of blockchain technology research in the banking sector. Using a hybrid literature review methodology that combines bibliometric anal- ysis and systematic content review, the study analyzes 389 peer-reviewed publications retrieved from Scopus (2015–May 2025). VOSviewer was employed to conduct performance analysis and science mapping, including co-authorship, co- citation, keyword co-occurrence, and bibliographic coupling analyses. In parallel, qualitative thematic analysis identified six clusters: (1) blockchain in banking and financial intermediation to enhance operational efficiency, (2) decentralized finance and cryptocurrencies, (3) integration of blockchain with other digital innovations, (4) trust-related dimensions, (5) institutional and regulatory aspects, and (6) strategies for modernizing banking business models. The findings reveal a steady rise in research output, regional disparities in collaboration, and thematic evolution from early conceptualiza- tion to recent signs of diversification of applied research. By integrating quantitative and qualitative insights, this study highlights key research gaps, offers directions for future work, and provides guidance for academics, practitioners, and policymakers on the transformative potential and challenges of blockchain in banking.