Running Qwen 397B Locally on MacBook Pro with Apple’s ‚LLM in a Flash‘

TECH März 19, 2026

Research demonstrates the ability to run a custom version of Qwen3.5-397B-A17B locally on a MacBook Pro M3 Max, achieving over 5.5 tokens/second. This was accomplished using Apple’s „LLM in a Flash“ technique, overcoming the challenge of the model’s large size (209GB, 120GB quantized).