The advanced language model of the research project OpenGPT-Xcalled Teuken-7B, is now available for download on Hugging Face. This model was trained from scratch in all 24 official languages of the European Union and can be used commercially as open source for applications artificial intelligence. The project is financed by the German Federal Ministry for Economic Affairs and Climate Protection (BMWK) and promoted by the institutes Fraunhofer IAIS and IIS.
The aim of OpenGPT-X is to create models particularly suited to the European linguistic and cultural context and to provideEuropean alternative to AI models developed by large non-European companies, such as OpenAI or Googleguaranteeing control and technological transparency. Some examples of practical application of the new LLM are accurate translation between the 24 official EU languagesuseful for companies with international markets or government organizations; the creation of virtual assistants (chatbots) capable of understanding and responding in multiple languages; and the automatic editing of documents, emails or articles in different languages, with sensitivity to cultural context.
The peculiar type of training of Teuken-7B, in fact, makes it particularly suitable for training multilingual contexts who operate with languages even with particularly complex structures. Indeed, approximately 50% of Teuken-7B training data comes from non-English textsa much higher proportion than similar models.
Thanks to a tokenizer multilingual optimized, the template requires fewer computational resources to process texts in non-English languages. Furthermore, being open source, it allows you to customize and maintain AI applications sensitive data within companies. According to the professor Stefan Wrobeldirector of Fraunhofer IAIS, Teuken-7B represents a valid public alternative for the academic and industrial world. It offers control and transparency over closed business models – these features make it also ideal for critical sectors such as medicine, finance and robotics.
The was used to train the model supercomputer JUWELS del Forschungszentrum Jülichwith the support of partners such as TU Dresden, DFKI and others. Additionally, Teuken-7B is available across the infrastructure Gaia-Xwhich guarantees data security and compliance with European privacy regulations.
The LLM can be downloaded for free in two versions: one for research purposes and one under the Apache 2.0 license for commercial use. Further download details can be found at this address.
Source: www.hwupgrade.it