Bundan, D., Abrami, G., & Mehler, A. (2025, September). MULTIMODAL DOCKER UNIFIED UIMA INTERFACE: New Horizons for Distributed Microservice-Oriented Processing of Corpora using UIMA. In Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers (pp. 257-268).

ENTAILab - 2 (CIRCLET)
In addition to textual corpora, there are multimodal corpora that contain a significant amount of data from a variety of codes (e.g., iconographic, textual) that are currently made processable by only a few tools. What the research community needs here is an effective, distributed system that provides a processing pipeline for the integration of reusable tools for analyzing such corpora. Such systems currently exist for text corpora, but rarely for video corpora. We present MULTIMODAL DOCKER UNIFIED UIMA INTERFACE as an extension of DUUI that fills this gap by enabling annotation and processing of video corpora based on the UIMA standard
Koch, T., Jaehne, M. F., Riediger, M., Rauers, A., & Holtmann, J. (2025). Idiographic Interrater Reliability Measures for Intensive Longitudinal Multirater Data. . PsychArchives.

Koch, T., Jaehne, M. F., Riediger, M., Rauers, A., & Holtmann, J. (2025). Idiographic Interrater Reliability Measures for Intensive Longitudinal Multirater Data. PsychArchives.

SHERPA
Interrater reliability plays a crucial role in various areas of psychology. In this article, we propose a multilevel latent time series model for intensive longitudinal data with structurally different raters (e.g., self-reports and partner reports). The new MR-MLTS model enables researchers to estimate idiographic (person-specific) rater consistency coefficients at both the dynamic and momentary level. Additionally, the model allows rater consistency coefficients to be linked to external explanatory or outcome variables. It can be implemented in Mplus as well as in the newly developed R package mlts. We illustrate the model using data from an intensive longitudinal multirater study involving 100 heterosexual couples (200 individuals) assessed across 86 time points. Our findings show that relationship duration and partner cognitive resources positively predict momentary, but not dynamic, rater consistency. Results from a simulation study indicate that the number of time points is critical for accurately estimating idiographic rater consistency coefficients, whereas the number of participants is important for accurately recovering the random effect variances. We discuss advantages, limitations, and future extensions of the MR-MLTS model.
Bönisch, K., Abrami, G., & Mehler, A. (2025). Towards Unified, Dynamic and Annotation-based Visualisations and Exploration of Annotated Big Data Corpora with the Help of UNIFIED CORPUS EXPLORER. . Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations) .

Bönisch, K., Abrami, G., & Mehler, A. (2025, April). Towards Unified, Dynamic and Annotation-based Visualisations and Exploration of Annotated Big Data Corpora with the Help of UNIFIED CORPUS EXPLORER. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations) (pp. 522-534).

ENTAILab - 2 (CIRCLET)
The annotation and exploration of large text corpora, both automatic and manual, presents significant challenges across multiple disciplines, including linguistics, digital humanities, biology, and legal science. These challenges are exacerbated by the heterogeneity of processing methods, which complicates corpus visualization, interaction, and integration. To address these issues, we introduce the Unified Corpus Explorer (UCE), a standardized, dockerized, open-source and dynamic Natural Language Processing (NLP) application designed for flexible and scalable corpus navigation. Herein, UCE utilizes the UIMA format for NLP annotations as a standardized input, constructing interfaces and features around those annotations while dynamically adapting to the corpora and their extracted annotations. We evaluate UCE based on a user study and demonstrate its versatility as a corpus explorer based on generative AI. Received Best Demo Award.

Leonard, M. M. (2025). Conducting Respondent-Driven Sampling with Ethnic Minority Populations: The State of the Field. Survey Practice19.

RDS
Ethnic minorities are often underrepresented in survey research, due to the challenges many researchers face in including these populations. Respondent-driven sampling (RDS) was developed in the late 1990s in order to investigate populations otherwise "hidden" from researchers due to a lack of extant sampling frames. RDS relies on individuals who recruit their fellow population members, allowing samples to grow through network linkages. RDS holds promise for recruiting ethnic minority respondents, and its use has steadily increased since the early 2000s. However, practicable guidance for implementing RDS with these populations is scarce. To address this methodological gap, I present the results of a scoping review of RDS studies targeting ethnic minority populations. I find that it is possible to conduct successful RDS studies with a range of ethnic minority populations. However, researchers intending to work with these populations must consider the intersectional nature of these populations' "hiddenness", including economic, educational, linguistic, legal, political, and social vulnerabilities, through all stages of the study design process.
Abrami, G., Genios, M., Fitzermann, F., Baumartz, D. , Mehler, A. (2025). Docker Unified UIMA Interface: New perspectives for NLP on big data . SoftwareX.

Abrami, G., Genios, M., Fitzermann, F., Baumartz, D., & Mehler, A. (2025). Docker Unified UIMA Interface: New perspectives for NLP on big data. SoftwareX29, 102033.

ENTAILab - 2 (CIRCLET)
Processing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous deployment of separate, partly incompatible, not natively scalable applications. To overcome the technological bottleneck involved, we have developed Docker Unified UIMA Interface, a system for the standardized, parallel, platform-independent, distributed and microservices-based solution for processing large and extensive text corpora with any NLP method. We present DUUI as a framework that enables automated orchestration of GPU-based NLP processes beyond the existing Docker Swarm cluster variant, and in addition to the adaptation to new runtime environments such as Kubernetes. Therefore, a new driver for DUUI is introduced, which enables the lightweight orchestration of DUUI processes within a Kubernetes environment in a scalable setup. In this way, the paper opens up novel text-technological perspectives for existing practices in disciplines that deal with the scientific analysis of large amounts of data based on NLP.
Hase, V., Jef, A., Laura, B., Nico, P., Heleen, J., Theo, A., Thijs, C., Claes, D.V., Jörg, H., Felicia, L. and Kmetty, Z. (2024). Fulfilling data access obligations: How could (and should) platforms facilitate data donation studies? . Internet Policy Review.

Hase, V., Jef, A., Laura, B., Nico, P., Heleen, J., Theo, A., ... & Mario, H. (2024). Fulfilling data access obligations: How could (and should) platforms facilitate data donation studies?. Internet policy review: Journal on internet regulation13(3).

Data Donation
Research into digital platforms has become increasingly difficult. One way to overcome these difficulties is to build on data access rights in EU data protection law, which requires platforms to offer users a copy of their data. In data donation studies, researchers ask study participants to exercise this right and donate their data to science. However, there is increasing evidence that platforms do not comply with designated laws. We first discuss the obligations of data access from a legal perspective (with accessible, transparent, and complete data as key requirements). Next, we compile experiences from social scientists engaging in data donation projects as well as a study on data request/access. We identify 14 key challenges, most of which are a consequence of non-compliance by platforms. They include platforms’ insufficient adherence to (a) providing data in a concise and easily accessible form (e.g. the lack of information on when and how subjects can access their data); (b) being transparent about the content of their data (e.g. the lack of information on measures); and (c) providing complete data (e.g. the lack of all available information platforms process related to platform users). Finally, we formulate four central recommendations for improving the right to access.