MARCH :// 2026
March was wild.
This felt like one of those months where every lane of AI moved at once. Big frontier models got stronger, open model teams kept shipping fast, video and image tools kept getting sharper, and a lot of the most interesting work started drifting into agents, world models, spatial understanding, and tools that feel more like real systems than one-off demos. ComfyUI also had a genuinely big month, with updates that push it further toward becoming an actual app layer for creative AI workflows.
Major model releases
OpenAI dropped GPT-5.4, and that was easily one of the biggest releases of the month. OpenAI positions it as its newest frontier model for professional work, with stronger reasoning, coding, and agentic workflows, plus native computer-use support and up to 1 million tokens of context. They also shipped GPT-5.4 mini and nano right after, which makes the whole release feel more like a full platform move than just one flagship model.
Google had a strong month too. Gemini 3.1 Flash Live is clearly aimed at real-time multimodal apps, with low-latency voice, video, and screen-sharing support for conversational agents. Then Gemini Embedding 2 pushed Google’s embedding stack forward with native multimodality and support for more than 100 languages. That is a big sign of where search, retrieval, and context systems are going.
The open side kept moving too. Qwen 3.5 showed up as a full model family on Hugging Face, with a proper range of sizes instead of just one giant release. NVIDIA also released Nemotron 3 Super, a hybrid Mamba-Transformer-MoE model built around agentic reasoning workflows. And one correction from the rough list: GLM-5.1 is not really a “coming soon” item anymore — the official repo now frames it as the new flagship for agentic engineering.
Video, image, and editing kept getting stronger
A lot of the most interesting work this month was not just about generating an image from scratch. It was about fixing, editing, restoring, harmonizing, or reshaping images after the fact.
Some of the image-editing projects worth checking out were FireRed Image Edit 1.1, RealRestorer, RealMaster, HiFi-Inpaint, Kiwi-Edit, Free-Edit, Artifixer, Diffusion Harmonizer, and MatAnyone2.
On the video side, LTX 2.3 kept pushing quality and coherence, and Google added another useful piece with Veo 3.1 Lite, which it describes as the cheapest model in the Veo 3.1 family while keeping the same speed as Veo 3.1 Fast. That matters because it shows video generation moving into a more practical, tiered product space instead of staying stuck as an expensive novelty.
Also worth checking here: FLUX.2 Klein 9B KV, Wan2GP, RetimeGS, and EffectMaker.
World models and agents are getting way more interesting
This was probably my favorite category of the month.
Projects like Video to World, WorldAgents, WorldFM, CUA Suite, CUDA Agent, ActionPlan, and LumosX all point in the same direction: models are being pushed to understand environments, not just answer prompts. They are being asked to reason across space, time, tools, and action. That feels like a much bigger shift than just “chatbot gets smarter.”
That is also why ARC-AGI-3 mattered this month. ARC Prize launched it on March 25 as a benchmark built around interactive environments and world-model learning instead of static puzzle solving. It is a good reminder that real general reasoning is still hard, especially once systems need to explore, adapt, and build their own internal model of a space.
Audio had a really solid month
Suno v5.5 was one of the bigger consumer-facing releases, with Suno leaning hard into more expressive vocals, better identity control, and more personalized generation. Whether that is your thing or not, it is obvious the music side is getting more refined.
On the speech side, Fish Audio open-sourced S2, which is a pretty big deal for people following open voice tooling. Fish describes it as a fine-grained TTS model with emotional and prosody control, open weights, fine-tuning code, and streaming inference. Cohere also entered the space with Cohere Transcribe, pitched as a state-of-the-art speech recognition model for enterprise use.
Also worth a look: PrismAudio, Hume TADA, and Foundation-1 by RoyalCities plus its ComfyUI node.
Science, biology, 3D, and spatial AI kept quietly cooking
Evo 2 still deserves a mention. The model itself is older than March, but Arc Institute highlighted it again this month around its growing use and broader attention, so it still belongs in the conversation.
On the 3D and spatial side, March was packed. Good examples were daVinci MagiHuman, Mobile-GS, Holi-Spatial, LagerNVS, ShotVerse, WildActor, Helios-Page, SpatialT2I, and CubeComposer.
And yeah, 4DV is worth dropping in here too. 4D Gaussians are still one of the coolest things happening around scene representation right now.
Motion and tracking kept getting deeper
This was another category that maybe flew under the radar a bit, but there was a lot going on.
Worth checking: Pulse of Motion, MegaFlow, and Track4World.
Also in the broader motion category, NVIDIA released Kimodo, focused on scaling controllable human motion generation.
Efficiency and compression mattered more than people think
One of the more important technical stories this month was TurboQuant. Google Research describes it as a new compression approach aimed at reducing memory overhead while preserving quality. Stuff like this is not as flashy as a new frontier model, but it matters a lot because every jump in model capability turns into a deployment and memory problem almost immediately.
Also worth checking if you want the implementation side: the TurboQuant GitHub repo and this YouTube explainer.
ComfyUI had a genuinely important month
March was a big month for ComfyUI.
The Dynamic VRAM Management update pushed memory handling forward. ComfyUI says it reduces RAM usage, helps avoid some out-of-memory issues caused by offloading, and makes better use of available memory during workflow execution.
Then there was From Workflow to App, which introduced App Mode, App Builder, and ComfyHub. That one feels like a much bigger deal than it first sounds. The basic idea is simple: turn workflows into something shareable and app-like, so not every user has to stare directly into a node graph. That is a real step toward ComfyUI becoming more than just a power-user image tool.
Other projects worth checking
A few more things from the month that are worth digging into:
Meta Tribe v2
Matrix Game V3
Anima
BrandFusion
Loger
RealWonder
Spectrum
Utonia
DiagDistill
RL3DEdit
Tencent Research project
Extreme Humanoid