Running large-scale AI models just became dramatically more accessible thanks to a new release from a Rust-based tool. Shimmy now enables 42B parameter models to operate on consumer GPUs with as little as 8GB of VRAM.

By offloading inactive model layers to system RAM, the tool sidesteps the massive VRAM requirements typically needed. Does this open the door for Rust to become a key player in the local AI inference space?

In today’s Rust recap:

> Rust tool runs 42B AI models on 8GB VRAM

> A first look at the Macros 2.0 proposal

> How a blocking database call stalled an async runtime

> Slint UI framework targets desktop applications

AI's Consumer Moment

The Recap: A new release of the Rust-based tool Shimmy now allows massive 42B+ parameter AI models to run on consumer GPUs with as little as 8GB of VRAM, a feat previously requiring expensive enterprise hardware. The v1.7.0 release introduces a clever memory-saving technique for Mixture of Experts (MoE) models.

Unpacked:

The tool uses MoE CPU offloading to keep only the active "expert" layers of a model on the GPU, swapping inactive ones to standard system RAM as needed.
This technique drastically reduces memory requirements, allowing a 42B parameter model like Phi-3.5-MoE to run with just 4GB of VRAM instead of the typical 80GB+.
To simplify adoption, the developer has also released a collection of curated models on Hugging Face, pre-optimized for this new offloading feature.

Bottom line: This development makes state-of-the-art AI more accessible, allowing students and researchers to experiment without needing costly cloud instances or enterprise GPUs. It's a powerful demonstration of how Rust's performance enables efficient, high-impact software on everyday hardware.

A Glimpse of Macros 2.0

The Recap: An upcoming Macros 2.0 proposal is set to deliver one of the biggest developer experience upgrades in recent memory. It aims to make Rust's powerful macro system more intuitive and tool-friendly.

Unpacked:

The new system enables significantly better IDE support, allowing tools like rust-analyzer to provide hover, goto-definition, and autocomplete inside macro bodies.
Macros will finally get intuitive visibility rules, behaving like any other item with standard pub and use keywords and eliminating the strange scoping quirks of macro_rules!.
It also introduces proper path resolution at the definition site, which means you can use items in scope without resorting to absolute paths like $crate::... everywhere.

Bottom line: This overhaul promises to make writing and debugging macros feel like writing any other Rust function. By lowering the barrier to entry, it will empower more developers to leverage one of the language's most distinct features safely and productively.

The Async Performance Killer

The Recap: A developer's deep-dive investigation into random black screens in a WebRTC streaming server reveals a surprising culprit: blocking database calls from Diesel were stalling Tokio's async runtime, causing a cascade of failures across the application.

Unpacked:

The initial symptoms pointed to the WebRTC stack, leading the developer down a path of debugging network congestion and even submitting a community patch to fix a subtle issue in Tokio's timer logic, but the root cause of the cascade of failures remained hidden.
The true villain was not in the streaming code but in the database layer, where synchronous blocking database calls from Diesel were freezing entire Tokio worker threads, creating a domino effect that crippled the server's responsiveness under load.
The solution involved switching to diesel_async, a crate that provides a non-blocking interface for Diesel. This single change resolved the stalls and led to a remarkable outcome: it doubled overall server performance.

Bottom line: This investigation is a powerful reminder that in an async environment, a single blocking call can have system-wide consequences, even in seemingly unrelated parts of an application. Ensuring your entire stack is fully non-blocking is critical for building resilient and performant systems in Rust.

Slint's Desktop Push

The Recap: The Slint UI framework, popular for embedded systems, is making a major push to become production-ready for desktop apps. This initiative is driven by a partnership to build the next version of a major open-source application.

Unpacked:

The effort is anchored by a collaboration with the LibrePCB project, which is transitioning their Qt-based GUI to Slint for its upcoming 2.0 release.
Key features on the roadmap include rich text support, modal dialogs, and global drag-and-drop, with planned contributions to the underlying winit windowing library.
While this is a new focus, Slint already powers demanding commercial applications like WesAudio's DAW plugins, which praise its high performance and low CPU usage.

Bottom line: This move positions Slint as a compelling Rust-native alternative for developers looking beyond established UI toolkits. The focus on features driven by real-world projects signals a significant step towards maturity for cross-platform desktop development in Rust.

The Shortlist

Linux merged the initial framework for Rust-based USB driver bindings in kernel 6.18, a major step toward building memory-safe drivers for critical subsystems.

Avian released version 0.4 of its ECS-based physics engine for Bevy, delivering a 3x performance boost through multi-threaded constraint solving and improved data structures.

Niko argued that Rust should prioritize making explicit Rc/Arc handle cloning more ergonomic before considering automatic cloning, aligning with the language's goal of providing control.

CodeQL added a new security query for Rust in version 2.23.2 that detects the use of non-HTTPS URLs, helping developers prevent potential man-in-the-middle vulnerabilities.

Rust tool runs 42B AI models on 8GB VRAM

AI's Consumer Moment

A Glimpse of Macros 2.0

The Async Performance Killer

Slint's Desktop Push

The Shortlist

Reply

Keep Reading