Giuseppe Abrami , Markos Genios , Filip Fitzermann , Daniel Baumartz , Alexander Mehler
Feb 2025
SoftwareX

Abrami, G., Genios, M., Fitzermann, F., Baumartz, D., & Mehler, A. (2025). Docker unified UIMA interface: New perspectives for NLP on big data. SoftwareX, 29, 102033. https://doi.org/10.1016/j.softx.2024.102033

ENTAILab
Processing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous deployment of separate, partly incompatible, not natively scalable applications. To overcome the technological bottleneck involved, we have developed Docker Unified UIMA Interface, a system for the standardized, parallel, platform-independent, distributed and microservices-based solution for processing large and extensive text corpora with any NLP method. We present DUUI as a framework that enables automated orchestration of GPU-based NLP processes beyond the existing Docker Swarm cluster variant, and in addition to the adaptation to new runtime environments such as Kubernetes. Therefore, a new driver for DUUI is introduced, which enables the lightweight orchestration of DUUI processes within a Kubernetes environment in a scalable setup. In this way, the paper opens up novel text-technological perspectives for existing practices in disciplines that deal with the scientific analysis of large amounts of data based on NLP.