Mistral Medium 3.5 Takes Coding Agents to the Cloud, MDM Media

For most of the past two years, AI coding agents have been tethered to a developer's local machine. They run when you run them, stop when you close the terminal, and compete for your attention at every step. The model that powers them has mattered, but the infrastructure around them has remained stubbornly personal and synchronous. That constraint is starting to break.

Mistral is pushing directly at that limitation today with the launch of Mistral Medium 3.5 in public preview, alongside remote agent capabilities in its Vibe coding environment and a new Work mode inside Le Chat. The release reframes what a foundation model can do when it is built for long-horizon tasks rather than single-turn responses.

One Model, Three Jobs

Mistral Medium 3.5 is described by the company as its first flagship merged model. It is a dense 128B parameter model with a 256k context window, handling instruction-following, reasoning, and coding within a single set of weights. That consolidation matters for agencies and studios managing diverse workloads: the same model can field a fast conversational reply or grind through a complex agentic run, with reasoning effort configurable per request. The vision encoder was trained from scratch to handle variable image sizes and aspect ratios.

Benchmark Performance Worth Noting

On SWE-Bench Verified, a standard measure of real-world software engineering capability, Mistral Medium 3.5 scores 77.6%. The company states this places it ahead of its own Devstral 2 and ahead of Qwen3.5 397B A17B. It also scores 91.4 on the τ³-Telecom benchmark, which the company cites as evidence of strong agentic capability. Self-hosting is possible on as few as four GPUs, which lowers the barrier for teams that want to run it on their own infrastructure rather than through a managed API.

Agents That Run While You Step Away

The headline infrastructure shift is the move to async cloud execution in Mistral Vibe. Coding sessions can now run in the cloud, be spawned from either the Vibe CLI or Le Chat, and continue processing while the developer is doing something else. Multiple sessions run in parallel. Local CLI sessions can be teleported to the cloud, carrying session history, task state, and approval queues with them. When a session finishes, the agent can open a pull request on GitHub and notify the developer, so the review is of the result rather than every intermediate step.

Where This Sits in an Existing Stack

Vibe connects to the tooling most engineering teams already use. It integrates with GitHub for code and pull requests, Linear and Jira for issue tracking, Sentry for incidents, and Slack or Teams for reporting. Each session runs in an isolated sandbox. The company positions it for high-volume, well-defined work that consumes developer time without requiring developer judgment: module refactors, test generation, dependency upgrades, CI investigations, and bug fixes. Mistral notes it originally built this for its own internal coding environment before rolling it out to enterprise customers, and it is now available broadly.

Work Mode Extends the Logic to Knowledge Tasks

The new Work mode in Le Chat applies the same async, multi-step agent logic to research and productivity workflows. Powered by Mistral Medium 3.5, Work mode lets Le Chat read and write across connected tools simultaneously, persist across many turns, and work through projects to completion. Connectors are on by default rather than requiring manual selection. Supported workflows include cross-tool catchups across email, calendar, and messages; research and synthesis into structured briefs; inbox triage; and Jira issue creation from team discussions. Every tool call and reasoning step is visible, and the agent requests explicit approval before taking sensitive actions such as sending a message or modifying data.

Mistral Medium 3.5 is the new default model in both Le Chat and Vibe CLI, replacing Devstral 2 in the coding agent. Through the API it is priced at $1.5 per million input tokens and $7.5 per million output tokens. Open weights are available on Hugging Face under a modified MIT license. The model is also available on NVIDIA GPU-accelerated endpoints at build.nvidia.com and as a containerized inference microservice through NVIDIA NIM. Remote agents and Work mode are available on Pro, Team, and Enterprise plans.

The response from developers and power users will determine whether the async model sticks. Parallel agent sessions and cloud teleportation are compelling on paper, but trust in the agent's judgment during unsupervised runs will take time to build. If Mistral's approval and visibility layer proves reliable in practice, this architecture could shift how creative and media teams think about delegating not just code generation but research, briefing, and cross-platform reporting to AI systems that work while the team is doing something else.