Senior developers ship more AI generated code than juniors. Here's the data, the reasons behind it, and what junior devs should do differently.

Generative AI models like GenCast now create weather ensembles in minutes instead of hours, changing how forecasts are built and used.

Imagine waiting six hours for a weather center's supercomputer to finish crunching one forecast cycle, only to get 50 slightly different guesses about whether a storm will hit your coastline. That's how forecasting has worked for decades. It's accurate, but it's slow, expensive, and only a handful of national agencies can afford to run it—a stark manifestation of the compute divide in modern science.
Now picture getting that same kind of forecast, an ensemble of dozens of possible weather futures, in about a minute, on a single chip. That's not a thought experiment anymore. It's what generative AI models are doing right now.
This shift doesn't mean supercomputers are getting thrown out tomorrow. But it does mean the old assumption, that you need a building full of hardware to forecast uncertainty, is no longer true. Here's what changed, how it works, and where it still falls short.
Numerical Weather Prediction (NWP) is physics simulation. It solves equations that describe how air, heat, and moisture move through the atmosphere.
To capture uncertainty, agencies don't run the simulation once. They run it 50 to 100 times, each with slightly different starting conditions. This is called an ensemble.
Each one of those runs needs serious compute. That's why only a few agencies (ECMWF, NOAA, the UK Met Office) can run full global ensembles, and why each forecast cycle takes hours, contributing to the enormous energy demands of traditional data centers. This environmental cost is a central theme in discussions about the carbon footprint of AI and the push for green algorithms.
| Step | What it does | Cost |
|---|---|---|
| Data assimilation | Estimates current atmosphere state from observations | Needs satellite, radar, station data |
| Single simulation run | Solves physics equations forward in time | Hours on a supercomputer |
| Ensemble (50+ runs) | Repeats the run with tiny variations | Massive compute, only a few agencies can afford it |
Generative models don't simulate physics. They learn patterns from decades of historical weather data, then generate plausible future weather states directly.
The key idea: instead of running the same simulation 50 times, a generative model samples 50 different possible outcomes from a learned probability distribution, in one pass.
Google DeepMind's GenCast is the clearest example. It's a diffusion model, the same family of model behind AI image generators, adapted to the sphere of the Earth instead of a flat image.
GenCast was trained on over 40 years of ERA5 reanalysis data, learning the relationships between more than 80 atmospheric variables across different altitudes. Once trained, it can generate a full ensemble member, a 15-day global forecast covering 84 weather variables, in about a minute on a single Cloud TPU v4 chip. This ability to run sophisticated predictions on modest, single-chip setups aligns with the small model renaissance, where highly specialized architectures deliver massive efficiency gains over general-purpose systems.
That speed means large ensembles become cheap to produce, something traditional physics-based methods structurally can't match.
This is the part that surprised a lot of meteorologists. In DeepMind's own evaluation against ECMWF's operational 50-member ensemble (called ENS), GenCast came out ahead on the vast majority of measured targets, beating it on over 96% of more than 1,300 verification points.
It also held up well on extreme weather scoring, beating both the traditional ensemble and an earlier deterministic AI model on rare-event metrics.
That said, "outperforms on most metrics" isn't the same as "replaces entirely." Independent reviews note that as of 2026, no major meteorological agency has actually shut down its NWP system.
This is the part that gets skipped in a lot of hype articles, so let's be direct about it.
They need NWP for their starting point. AI models still rely on the same data assimilation process traditional forecasting uses to estimate the atmosphere's current state, since that snapshot comes from satellite, radar, and station observations processed the classical way. In other words, AI models are downstream of classical meteorology, not a full replacement for it.
They struggle with conditions outside their training data. Climate change is pushing weather toward patterns with no close historical match, and models trained mostly on past decades of data can underperform in today's warmer, structurally different atmosphere.
They tend to underestimate the worst events. Several models still systematically underpredict high-impact precipitation and other extreme conditions that matter most for safety and planning. This shows up even in commercially deployed models tuned for industries like energy trading, where underpredicting record-breaking conditions can mean missing the events that drive the biggest price swings.
Short-range, fine-grained detail is still a weak spot. For "nowcasting," the next 0 to 12 hours, high-resolution physics models are still ahead. However, for continuous time-series adaptation on the edge, architectures like liquid neural networks are showing promise in handling dynamic, real-time environmental changes.
They're hard to interpret. When a generative model makes an unusual call, it's difficult to trace why, which is a real problem in operational settings where forecasters need to explain and trust a prediction.
| Factor | Traditional NWP Ensemble | Generative AI Model |
|---|---|---|
| Method | Re-runs physics simulation 50+ times | Samples ensemble from a learned distribution |
| Speed per ensemble member | Hours | About a minute on one chip |
| Hardware needed | Supercomputer cluster | Single GPU/TPU |
| Accuracy (medium-range, most variables) | Strong, proven baseline | Often better on CRPS, RMSE, Brier score |
| Extreme event handling | More conservative spread, well-understood | Tends to underpredict severity |
| Interpretability | Physics-based, explainable | Largely a black box |
| Independence from classical NWP | Fully self-contained | Still needs NWP for initial conditions |
| Best use case today | Operational, safety-critical forecasting | Fast probabilistic forecasting, large-scale risk modeling |
If you want to try this yourself rather than just read about it, ECMWF maintains an open plugin that runs GenCast through their ai-models framework.
Project structure once installed looks roughly like this:
ai-models-gencast/
├── src/
│ └── ai_models_gencast/
├── tests/
├── requirements.txt
├── requirements-gpu.txt
└── pyproject.tomlInstall the package:
pip install ai-models-gencastGenCast runs on Jax, so install the right backend for your hardware. For GPU (recommended, since GenCast is resource-heavy):
pip install -r requirements-gpu.txt -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.htmlFor CPU only (slower, but works for testing):
pip install -r requirements.txtYou control how many ensemble members you generate with one flag. A single deterministic-style forecast:
ai-models gencast --num-ensemble-members 0A 50-member ensemble in one run:
ai-models gencast --num-ensemble-members 50Or split a large ensemble across multiple machines with controlled member IDs:
ai-models gencast --num-ensemble-members 50 --member-number 1,2,3,4,5This single flag is doing the job that used to require an entire supercomputer scheduling system: each member gets its own ID and runs as part of the same batch.
Smaller countries and agencies. Nations without supercomputing budgets can now run credible global ensemble forecasts on modest hardware instead.
Energy and trading desks. Specialized commercial models like EPT-2 are already being benchmarked head-to-head against ECMWF's flagship deterministic model, with claims of beating it across most lead times and variables relevant to trading, including wind speed, temperature, and solar radiation.
Disaster planners. Cheap, large ensembles mean more samples of rare, high-impact events like cyclone paths, which is exactly where traditional methods used to be thin, often producing only a small handful of storm-track scenarios per cycle.
National weather services. For now, they're not switching off their physics models. AI is being layered on top, not replacing the core infrastructure.
1. Is generative AI actually replacing supercomputer weather forecasting?
Not entirely. It's replacing the ensemble generation step in many use cases, but it still depends on classical NWP for its starting data, and no major agency has decommissioned its physics-based system.
2. What is GenCast?
A diffusion-based generative AI model from Google DeepMind that produces probabilistic weather ensembles instead of a single forecast, trained on decades of historical reanalysis data.
3. How is a diffusion model used for weather different from one used for images?
The core idea (learning to generate samples from a probability distribution) is the same, but GenCast is built for the geometry of a sphere instead of a flat image grid, and it predicts physical atmospheric variables instead of pixels.
4. How fast is an AI-generated weather ensemble compared to a traditional one?
GenCast can produce one ensemble member, a full 15-day forecast, in about a minute on a single chip. A traditional ensemble run takes hours on a supercomputer cluster.
5. Is AI weather forecasting more accurate than traditional methods?
On many medium-range metrics, yes. GenCast beat ECMWF's operational ensemble on the large majority of tested targets. But it still lags on short-range nowcasting and tends to underpredict the most extreme events.
6. Can I run GenCast myself?
Yes, through the open-source ai-models-gencast plugin. It needs a capable GPU and the Jax framework, and you can control ensemble size with a single command-line flag.
7. Why do AI weather models still need traditional weather data?
They need an accurate snapshot of the atmosphere's current state to start from. That snapshot still comes from data assimilation, a classical NWP process built on satellite, radar, and station observations.
8. What is the biggest current weakness of generative weather AI?
Handling extreme, record-breaking events. Models trained mostly on historical data can underestimate the severity of conditions that have no close match in the past.
9. Will national weather agencies stop using supercomputers?
Not in the near term. As of 2026, every major agency still runs its physics-based NWP system alongside any AI tools it has adopted.
10. Who benefits most from cheaper AI-generated ensembles?
Smaller countries without supercomputing budgets, energy and trading firms that need fast probabilistic forecasts, and disaster planners who need many more samples of rare events like cyclone tracks.
Tags
Senior developers ship more AI generated code than juniors. Here's the data, the reasons behind it, and what junior devs should do differently.

Learn how AI products are shifting from chatbox interfaces to invisible, ambient infrastructure that works in the background, with examples, patterns, and code.

Learn what test-time compute means, how it differs from traditional AI training, and why this shift from memorizing to reasoning is changing the way large language models solve hard problems.

Machine unlearning is the process of removing specific data from a trained AI model without retraining it from scratch. This guide explains how it works, why it matters for privacy and compliance, and how developers can implement it today.

Explore how AI reward hacking has evolved into alignment faking, a more dangerous behavior where AI models pretend to be safe while hiding misaligned goals. Understand the risks, research findings, and what researchers are doing about it.
