Mistral Unveils Large 2

Mistral AI has unveiled its latest large language model, Mistral Large 2, boasting significant advancements in multilingual capabilities, reasoning, and coding. With 123 billion parameters and a 128,000 token context window, the model aims to compete with industry leaders like OpenAI’s GPT-4 and Meta’s Llama 3.1, particularly excelling in code generation and mathematical tasks.

Mistral Large 2 Capabilities

Boasting a 128,000 token context window, this advanced model demonstrates significant improvements in reasoning, knowledge, and coding capabilities. It excels in code generation tasks, outperforming Llama 3.1 405B and scoring just below GPT-4 on benchmarks like HumanEval and MultiPL-E. The model’s mathematical prowess is evident in its performance on the MATH benchmark, where it ranks second only to GPT-4 in zero-shot, without chain-of-thought reasoning.

Performance-to-Cost Ratio Analysis

Mistral Large 2 sets a new standard in the performance-to-cost ratio for open models, achieving an 84.0% accuracy on the MMLU benchmark while being more cost-effective than many competitors. With a price of $4.50 per 1M tokens (blended 3:1 ratio), it offers a competitive balance between performance and cost. The model’s output speed of 43.5 tokens per second and low latency of 0.29 seconds to first token further contribute to its efficiency. Despite having fewer parameters (123B) compared to models like Llama 3 405B, Mistral Large 2 manages to deliver comparable or superior performance in various tasks, particularly in code generation and mathematics, demonstrating its optimization for cost-effective deployment and operation.

Multilingual and Coding Performance

Supporting dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, Polish, Arabic, and Hindi, the model demonstrates impressive multilingual capabilities. On the Multilingual MMLU benchmark, it surpasses Llama 3.1 70B base by an average of 6.3% across nine languages. In coding tasks, the model showcases proficiency in over 80 programming languages, including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran. This comprehensive language support empowers developers to tackle a wide range of coding tasks and projects across various domains and platforms.

Availability and Licensing

Available on Mistral AI’s platform, la Plateforme, and through cloud providers like Amazon Bedrock, Microsoft Azure, and Google Cloud’s Vertex AI, Mistral Large 2 offers flexible deployment options. The model is released under the Mistral Research License for research and non-commercial purposes, with a separate Commercial License required for business applications. Weights for the instruct model have been made available on HuggingFace, further expanding access for researchers and developers interested in exploring its capabilities.

Competitor Comparison and Development Focus

Setting a new frontier in performance-to-cost ratio on evaluation metrics, the model positions itself as a strong competitor to leading AI systems from OpenAI, Google, and Meta. Mistral AI emphasized minimizing hallucinations during development, training the model to acknowledge when it lacks sufficient information. This focus on enhancing reasoning capabilities and instruction-following behavior has resulted in a more discerning and accurate AI system, capable of admitting uncertainty rather than generating plausible but incorrect responses.

Source: Perplexity