Canadian hardware startup Taalas has announced its first product: a custom hardware implementation of the Llama 3.1 8B model capable of running at an impressive 17,000 tokens per second. This development marks a significant leap in local AI inference performance.
Source: Simon Willison