Research demonstrates the ability to run a custom version of Qwen3.5-397B-A17B locally on a MacBook Pro M3 Max, achieving over 5.5 tokens/second. This was accomplished using Apple’s „LLM in a Flash“ technique, overcoming the challenge of the model’s large size (209GB, 120GB quantized).
Source: Simon Willison Blog