It can handle up to 16k input tokens per command and the total popup can have up to 128k tokens. In terms of knowledge, the model is built on data up to October 2023. Although the GPT-4o mini is a lightweight version of the model, it outperforms most of the competition, including its own full-fledged GPT-3.5 Turbo. E.g. in reasoning in the MMLU benchmark it won 82,0 %, while Gemini Flash has 77.9% and Claude Haiku only 73.8%. The previous GPT-3.5 Turbo achieved only 69.8% in the same (however, the full-fledged GPT-4o has 88.7%).
V mathematics and writing source codes (MGSM benchmark) the novelty achieves excellent 87,0 % (GPT-4o even 90.5%), while Gemini Flash has only 75.5% and Claude Haiku 71.7%, the older GPT-3.5 Turbo was only at 56.3%. IN HumanEval it’s about 87,2 % for GPT-4o mini (90.2% for GPT-4o), Gemini Flash has 71.5%, Claude Haiku got to 75.9% and as usual, the original GPT-3.5 Turbo is already behind today with 68.0 %. GPT-4o mini also leads in multimodal thinking (MMMU) is scored 59,4 % (full version 69.1%). Competing Gemini Flash is slightly worse with 56.1%, Claude Haiku has 50.2%.
Also new to the model are new security features that better prevent hate speech, “adult” content, collection of personal data or spam, etc. OpenAI also worked on a new method to prevent attempts to jailbreak chatbots and various attempts to obtain system data of models.
Source: www.svethardware.cz