Author: Dr. Kyriaki Kalimeri is a Senior Research Scientist at UNICEF in New York and the ISI Foundation in Turin, Italy, where she works at the intersection of artificial intelligence and humanitarian action.

Her research focuses on developing machine learning and AI systems that transform large-scale, real-world data into actionable insights for crisis response, public health, and support to vulnerable populations. Her work spans computational social science, natural language processing, and statistical machine learning, with a strong emphasis on humanitarian applications, from extracting information from crisis data to improving situational awareness and decision-making in emergencies. 

Humanitarian situation reports are among the most consequential documents produced during a crisis. They inform resource allocation, shape coordination among responders, and guide decisions affecting millions of vulnerable people. Yet a persistent paradox haunts the sector: many of these reports, particularly within the UN system, are seldom read by their intended audience. The process of producing them is manual, time-consuming, and resource-intensive, often requiring analysts to synthesise hundreds of fragmented sources under severe time pressure. The result is a system where the demand for timely, accurate information consistently outpaces the capacity to deliver it.

Can AI Replicate Analytical Reasoning, Not Just Summarise?

The critical question is not whether large language models can generate text faster than humans. They can. The question is whether they can replicate the analytical reasoning that makes a situation report operationally useful: identifying what matters, grounding claims in verifiable evidence, and structuring information so that decision-makers can act on it. Simply prompting an LLM to summarise a crisis produces generic output lacking the structure, citations, and thematic depth that humanitarian professionals require.

To address this gap, the present study, conducted by Ivan Decostanzi, Yelena Mejova and myself, and in collaboration with experts from UNICEF and Data Friendly Space, we tackled this challenge through a modular framework that mirrors how human analysts actually work. The system ingests heterogeneous humanitarian documents, clusters them into thematic subtopics, automatically generates targeted questions, extracts evidence-backed answers with citations to source material, and produces multi-level summaries organised by both subtopics and Sustainable Development Goals. An executive summary provides rapid situational awareness, while four complementary visualisation modes allow analysts to navigate the report according to their specific needs.

What the Evidence Shows

We evaluated the framework across 13 humanitarian events, including natural disasters and armed conflicts, processing over 1,100 documents from verified sources such as ReliefWeb. The generated questions achieved 84.7% relevance, 84.0% importance, and 76.4% urgency as rated by humanitarian experts. Extracted answers reached 86.3% relevance, with citation precision and recall both exceeding 76%. When five senior humanitarian professionals compared our system against existing alternatives in a blind evaluation, three out of four selected it as their preferred tool, citing its comprehensiveness, navigability, and analytical depth. A complementary finding concerns the emerging role of LLMs as evaluators. Agreement between human expert assessments and automated LLM-based evaluations surpassed an F1 score of 0.80, suggesting that properly guided automated evaluation can serve as a scalable proxy for expert review.

Risks That Remain

Transparency demands acknowledging what the system cannot yet do. Citation reliability, while strong overall, is not perfect, and in humanitarian operations even a single unsupported claim can erode trust. The framework currently processes only text, excluding potentially critical information from images, tables, and geospatial data. Output quality remains dependent on the completeness and clarity of available source documents, meaning that data gaps in the field propagate through the pipeline. These are not minor caveats; they define the boundary between a decision-support tool and a decision-making one.

Looking Forward

The path ahead involves extending the framework to multimodal and multilingual inputs, attempting an offline functionality, embedding continuous expert feedback throughout the pipeline, and integrating with existing humanitarian platforms for real-time monitoring. The goal is not to replace human analysts but to fundamentally shift how their time is spent: from information gathering to critical interpretation and action. In a sector where the gap between available information and operational capacity continues to widen, structured AI-assisted reporting offers a concrete bridge, one that must be built with the same rigor and accountability that the humanitarian mandate demands.

Link to the paper: https://arxiv.org/abs/2512.19475

Code: https://github.com/idecost/LLM-SituationalReports

Demo page: https://idecost.github.io/LLM-SituationalReports/Viewer/viewer_v2.html