Flash floods rank among the deadliest weather events globally, claiming the lives of more than 5,000 people every single year. Despite their devastating impact, they remain notoriously difficult to forecast. Now, Google believes it has found an innovative solution to this life-threatening problem. By leveraging its Gemini artificial intelligence to read millions of old newspaper articles, Google is turning decades of historical journalism into a powerful tool to predict flash floods.
The Challenge of Data Scarcity
The core issue in forecasting these sudden disasters is a massive lack of historical data. While meteorologists have assembled extensive data for general weather patterns, temperature changes, and major river flows, flash floods present a unique challenge. They are incredibly short-lived and highly localized. Because they happen quickly and in specific areas, they are rarely measured comprehensively.
This data gap has traditionally prevented modern deep learning models from accurately anticipating when and where a flash flood might strike. The problem is severe in vulnerable regions where local governments cannot afford expensive weather-sensing infrastructure or simply do not possess extensive meteorological records. Traditional machine learning approaches fail when training data is sparse, because algorithms cannot predict phenomena that have not been systematically measured.
Transforming Old News Into Structured Data
To overcome this hurdle, Google researchers deployed Gemini, the company’s large language model. The team used the AI to sort through 5 million news articles gathered from around the world. From this massive archive, the AI isolated reports detailing 2.6 million different floods.
The AI’s job was to transform qualitative narrative accounts found in old newspapers into structured datasets. For instance, an old article describing floodwaters reaching “waist-high” near a specific bridge is converted by the large language model into a quantifiable data point.
The Groundsource Dataset
The culmination of this effort is a geo-tagged time series dataset named Groundsource. According to Gila Loike, a product manager at Google Research, this represents the first time the company has used language models for this specific kind of work. The research and resulting dataset were shared publicly on a Thursday morning to assist with disaster prevention efforts.
How the AI Forecasting Model Works
With Groundsource serving as a real-world baseline, researchers trained a new forecasting model built on a Long Short-Term Memory neural network. This system ingests global weather forecasts to generate the probability of flash floods occurring in specific areas. The approach bridges the gap between human narrative observations and the structured mathematical inputs that deep learning models require.
Google’s flash flood forecasting model is currently highlighting risks for urban areas across 150 countries. This information is accessible on the company’s Flood Hub platform and shared directly with global emergency response agencies. António José Beleza, an emergency response official at the Southern African Development Community, trialed the forecasting model. He reported that the AI tool successfully helped his organization respond to flood emergencies more quickly.
Limitations and Global Impact
Despite its innovative approach, the forecasting model has notable limitations. Currently, the system operates at a relatively low resolution, identifying flood risks across broad 20-square-kilometer areas rather than pinpointing exact neighborhoods. Additionally, it is not as precise as the existing flood alert system operated by the United States National Weather Service, partly because Google’s model does not incorporate local radar data for the real-time tracking of precipitation.
However, the project was designed to provide life-saving alerts in areas that lack advanced infrastructure entirely. Juliet Rothenberg, a program manager on Google’s Resilience team, explained that aggregating millions of historical news reports helps to rebalance the map. She noted that Groundsource enables the company to extrapolate predictions to other regions where measured information is largely unavailable.
The broader scientific community is taking note. Marshall Moutenot, the CEO of Upstream Tech—a company utilizing deep learning models to forecast river flows—highlighted the achievement’s significance. Moutenot, who also co-founded a group called dynamical.org to curate machine learning-ready weather data, pointed out that data scarcity remains a difficult challenge in geophysics. He praised Google’s strategy as a highly creative approach to acquiring training data.
Future Applications in Climate Adaptation
Looking ahead, this development signals an expanding role for artificial intelligence in climate adaptation as extreme weather becomes more frequent. Google researchers hope this method of using large language models to develop quantitative datasets from written sources can be expanded. The team envisions applying the same technique to other ephemeral weather phenomena, specifically mentioning heat waves and mud slides.
