[{"summary":"Tech Report GitHub Hugging Face ModelScope DISCORD\nIntroduction We are excited to introduce Qwen3Guard, the first safety guardrail model in the Qwen family. Built upon the powerful Qwen3 foundation models and fine-tuned specifically for safety classificatoin, Qwen3Guard ensures responsible AI interactions by delivering precise safety detection for both prompts and responses, complete with risk levels and categorized classifications for accurate moderation.\nQwen3Guard achieves state-of-the-art performance on major safety benchmarks, demonstrating strong capabilities in both prompt and response classification tasks across English, Chinese, and multilingual environments.","title":"Qwen3Guard: Real-time Safety for Your Token Stream"},{"summary":"QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD\nWe are excited to introduce Qwen-Image-Edit, the image editing version of Qwen-Image. Built upon our 20B Qwen-Image model, Qwen-Image-Edit successfully extends Qwen-Image&rsquo;s unique text rendering capabilities to image editing tasks, enabling precise text editing. Furthermore, Qwen-Image-Edit simultaneously feeds the input image into Qwen2.5-VL (for visual semantic control) and the VAE Encoder (for visual appearance control), achieving capabilities in both semantic and appearance editing.","title":"Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nWe are thrilled to release Qwen-Image, a 20B MMDiT image foundation model that achieves significant advances in complex text rendering and precise image editing. To try the latest model, feel free to visit Qwen Chat and choose \u201cImage Generation\u201d.\nThe key features include:\nSuperior Text Rendering: Qwen-Image excels at complex text rendering, including multi-line layouts, paragraph-level semantics, and fine-grained details. It supports both alphabetic languages (e.","title":"Qwen-Image: Crafting with Native Text Rendering"},{"summary":"PAPER DISCORD\nIntroduction Reinforcement Learning (RL) has emerged as a pivotal paradigm for scaling language models and enhancing their deep reasoning and problem-solving capabilities. To scale RL, the foremost prerequisite is maintaining stable and robust training dynamics. However, we observe that existing RL algorithms (such as GRPO) exhibit severe instability issues during long training and lead to irreversible model collapse, hindering further performance improvements with increased compute.\nTo enable successful RL scaling, we propose the Group Sequence Policy Optimization (GSPO) algorithm.","title":"GSPO: Towards Scalable Reinforcement Learning for Language Models"},{"summary":"DEMO API DISCORD\nIntroduction Here we introduce the latest update of Qwen-MT (qwen-mt-turbo) via Qwen API. This update builds upon the powerful Qwen3, leveraging trillions multilingual and translation tokens to comprehensively enhance the model\u2019s multilingual understanding and translation capabilities. By integrating reinforcement learning techniques, the model achieves significant improvements in translation accuracy and linguistic fluency.\nKey Features:\nMultilingual Support for 92 Languages: Qwen-MT enables high-quality translation across 92 major official languages and prominent dialects, covering over 95% of the global population to meet diverse cross-lingual communication needs.","title":"Qwen-MT: Where Speed Meets Smart Translation"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DISCORD\nToday, we&rsquo;re announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder is available in multiple sizes, but we&rsquo;re excited to introduce its most powerful variant first: Qwen3-Coder-480B-A35B-Instruct \u2014 a 480B-parameter Mixture-of-Experts model with 35B active parameters which supports the context length of 256K tokens natively and 1M tokens with extrapolation methods, offering exceptional performance in both coding and agentic tasks. Qwen3-Coder-480B-A35B-Instruct sets new state-of-the-art results among open models on Agentic Coding, Agentic Browser-Use, and Agentic Tool-Use, comparable to Claude Sonnet 4.","title":"Qwen3-Coder: Agentic Coding in the World"},{"summary":"API DISCORD\nIntroduction Here we introduce the latest update of Qwen-TTS (qwen-tts-latest or qwen-tts-2025-05-22) through Qwen API . Trained on a large-scale dataset encompassing over millions of hours of speech, Qwen-TTS achieves human-level naturalness and expressiveness. Notably, Qwen-TTS automatically adjusts prosody, pacing, and emotional inflections in response to the input text. Notably, Qwen-TTS supports the generation of 3 Chinese dialects, including Pekingese, Shanghainese, and Sichuanese.\nAs of now, Qwen-TTS supports 7 Chinese-English bilingual voices, including Cherry, Ethan, Chelsie, Serena, Dylan (Pekingese), Jada (Shanghainese) and Sunny (Sichuanese).","title":"Time to Speak Some Dialects, Qwen-TTS!"},{"summary":"QWEN CHAT DISCORD\nIntroduction The evolution of multimodal large models is continually pushing the boundaries of what we believe technology can achieve. From the initial QwenVL to the latest Qwen2.5 VL, we have made progress in enhancing the model&rsquo;s ability to understand image content. Today, we are excited to introduce a new model, Qwen VLo, a unified multimodal understanding and generation model. This newly upgraded model not only &ldquo;understands&rdquo; the world but also generates high-quality recreations based on that understanding, truly bridging the gap between perception and creation.","title":"Qwen VLo: From \"Understanding\" the World to \"Depicting\" It"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DISCORD\nWe release Qwen3 Embedding series, a new proprietary model of the Qwen model family. These models are specifically designed for text embedding, retrieval, and reranking tasks, built on the Qwen3 foundation model. Leveraging Qwen3\u2019s robust multilingual text understanding capabilities, the series achieves state-of-the-art performance across multiple benchmarks for text embedding and reranking tasks. We have open-sourced this series of text embedding and reranking models under the Apache 2.","title":"Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models"},{"summary":"QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD\nIntroduction Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Additionally, the small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.","title":"Qwen3: Think Deeper, Act Faster"},{"summary":"QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD\nIntroduction Last December, we launched QVQ-72B-Preview as an exploratory model, but it had many issues. Today, we are officially releasing the first version of QVQ-Max, our visual reasoning model. This model can not only &ldquo;understand&rdquo; the content in images and videos but also analyze and reason with this information to provide solutions. From math problems to everyday questions, from programming code to artistic creation, QVQ-Max has demonstrated impressive capabilities.","title":"QVQ-Max: Think with Evidence"},{"summary":"QWEN CHAT HUGGING FACE MODELSCOPE DASHSCOPE GITHUB PAPER DEMO DISCORD\nWe release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive multimodal perception, it seamlessly processes diverse inputs including text, images, audio, and video, while delivering real-time streaming responses through both text generation and natural speech synthesis. To try the latest model, feel free to visit Qwen Chat and choose Qwen2.5-Omni-7B. The model is now openly available on Hugging Face, ModelScope, DashScope,and GitHub, with technical documentation available in our Paper.","title":"Qwen2.5 Omni: See, Hear, Talk, Write, Do It All!"},{"summary":"QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD\nIntroduction At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention and positive feedback from the community. Building on the Qwen2.5-VL series, we continued to optimize the model using reinforcement learning and open-sourced the new VL model with the beloved 32B parameter scale under the Apache 2.0 license \u2014 Qwen2.5-VL-32B-Instruct. Compared to the previously released Qwen2.","title":"Qwen2.5-VL-32B: Smarter and Lighter"},{"summary":"QWEN CHAT Hugging Face ModelScope DEMO DISCORD\nScaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning.\nOur research explores the scalability of Reinforcement Learning (RL) and its impact on enhancing the intelligence of large language models.","title":"QwQ-32B: Embracing the Power of Reinforcement Learning"},{"summary":"QWEN CHAT DISCORD\nThis is a blog created by QwQ-Max-Preview. We hope you enjoy it!\nIntroduction &lt;think&gt;\nOkay, the user wants me to create a title and introduction for their blog announcing the release of QwQ-Max-Preview. Let me start by understanding the key points they mentioned. First, the model is part of the Qwen series, built on Qwen2.5-Max. It&rsquo;s a preview version, so they probably want to highlight that it&rsquo;s a sneak peek before the full release.","title":"<think>...<\/think> QwQ-Max-Preview"},{"summary":"QWEN CHAT API DEMO DISCORD\nIt is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence. However, the research and industry community has limited experience in effectively scaling extremely large models, whether they are dense or Mixture-of-Expert (MoE) models. Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3. Concurrently, we are developing Qwen2.","title":"Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model"},{"summary":"Tech Report HuggingFace ModelScope Qwen Chat HuggingFace Demo ModelScope Demo DISCORD\nIntroduction Two months after upgrading Qwen2.5-Turbo to support context length up to one million tokens, we are back with the open-source Qwen2.5-1M models and the corresponding inference framework support. Here&rsquo;s what you can expect from this release:\nOpensource Models: We&rsquo;re releasing two new checkpoints, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, marking the first time we&rsquo;ve upgraded our opensource Qwen models to handle 1M-token contexts.","title":"Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens"},{"summary":"QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD\nWe release Qwen2.5-VL, the new flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL. To try the latest model, feel free to visit Qwen Chat and choose Qwen2.5-VL-72B-Instruct. Also, we open both base and instruct models in 3 sizes, including 3B, 7B, and 72B, in both Hugging Face and ModelScope.\nThe key features include:\nUnderstand things visually: Qwen2.","title":"Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL!"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DISCORD\nBackground The Mixture-of-Experts (MoEs) architecture has become a popular model-parameter-scale-up technique. Typically, one MoE layer consists of a router (often parameterized as one single Linear layer) and a group of experts (for transformer-based models, each expert is one feedforward layer). Given an input, only a subset of experts will be activated, and then their outputs will be aggregated based on the scores the router assigned.","title":"Global-batch load balance almost free lunch to improve your MoE LLM training"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DISCORD\nIntroduction In recent years, Large Language Models (LLMs) have made remarkable advances in mathematical reasoning, yet they can make mistakes, such as miscalculations or logical errors, leading to wrong conclusions. Moreover, even when achieving correct final answers, these powerful models can still regularly make up plausible reasoning steps, where the final answers build upon flawed calculations or derivations, which undermine the reliability and trustworthiness of LLMs&rsquo; reasoning processes.","title":"Towards Effective Process Supervision in Mathematical Reasoning"},{"summary":"GITHUB HUGGING FACE MODELSCOPE KAGGLE DEMO DISCORD\nLanguage and vision intertwine in the human mind, shaping how we perceive and understand the world around us. Our ability to reason is deeply rooted in both linguistic thought and visual memory - but what happens when we extend these capabilities to AI? Today&rsquo;s large language models have demonstrated remarkable reasoning abilities, but we wondered: could they harness the power of visual understanding to reach new heights of cognitive capability?","title":"QVQ: To See the World with Wisdom"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nNote: This is the pronunciation of QwQ: \/kwju:\/ , similar to the word &ldquo;quill&rdquo;.\nWhat does it mean to think, to question, to understand? These are the deep waters that QwQ (Qwen with Questions) wades into. Like an eternal student of wisdom, it approaches every problem - be it mathematics, code, or knowledge of our world - with genuine wonder and doubt. QwQ embodies that ancient philosophical spirit: it knows that it knows nothing, and that&rsquo;s precisely what drives its curiosity.","title":"QwQ: Reflect Deeply on the Boundaries of the Unknown"},{"summary":"API Documentation (Chinese) HuggingFace Demo ModelScope Demo\nIntroduction After the release of Qwen2.5, we heard the community&rsquo;s demand for processing longer contexts. In recent months, we have made many optimizations for the model capabilities and inference performance of extremely long context. Today, we are proud to introduce the new Qwen2.5-Turbo version, which features:\nLonger Context Support: We have extended the model&rsquo;s context length from 128k to 1M, which is approximately 1 million English words or 1.","title":"Extending the Context Length to 1M Tokens!"},{"summary":"GITHUB HUGGING FACE MODELSCOPE KAGGLE DEMO DISCORD\nIntroduction Today, we are excited to open source the &ldquo;Powerful&rdquo;, &ldquo;Diverse&rdquo;, and &ldquo;Practical&rdquo; Qwen2.5-Coder series, dedicated to continuously promoting the development of Open CodeLLMs.\nPowerful: Qwen2.5-Coder-32B-Instruct has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o. While demonstrating strong and comprehensive coding abilities, it also possesses good general and mathematical skills; Diverse: Building on the previously open-sourced two sizes of 1.","title":"Qwen2.5-Coder Series: Powerful, Diverse, Practical."},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nIntroduction In the past three months since Qwen2&rsquo;s release, numerous developers have built new models on the Qwen2 language models, providing us with valuable feedback. During this period, we have focused on creating smarter and more knowledgeable language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5. We are announcing what might be the largest opensource release in history!","title":"Qwen2.5: A Party of Foundation Models!"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nIntroduction In this blog, we delve into the details of our latest Qwen2.5 series language models. We have developed a range of decoder-only dense models, with seven of them open-sourced, spanning from 0.5B to 72B parameters. Our research indicates a significant interest among users in models within the 10-30B range for production use, as well as 3B models for mobile applications. To meet these demands, we are open-sourcing Qwen2.","title":"Qwen2.5-LLM: Extending the boundary of LLMs"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nIntroduction In early April, we introduced CodeQwen1.5, which garnered significant attention from the community. Since then, we have been working to enhance the coding model. Today, we are excited to announce the release of the next generation of open-source coding models, Qwen2.5-Coder, and officially rename CodeQwen to Qwen-Coder. We think &ldquo;Coder&rdquo; is more human-like and agile, reflecting our vision of it becoming a true coding partner in the future.","title":"Qwen2.5-Coder: Code More, Learn More!"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DISCORD\n\ud83d\udea8 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. Introduction A month ago, we released the first series of mathematical LLMs - Qwen2-Math - of our Qwen family. Today, we have upgraded it and open-sourced Qwen2.5-Math series, including base models Qwen2.5-Math-1.5B\/7B\/72B, instruction-tuned models Qwen2.5-Math-1.5B\/7B\/72B-Instruct, and mathematical reward model Qwen2.","title":"Qwen2.5-Math: The world's leading open-sourced mathematical LLMs"},{"summary":"DEMO GITHUB HUGGING FACE MODELSCOPE API DISCORD\nAfter a year&rsquo;s relentless efforts, today we are thrilled to release Qwen2-VL! Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model familities. Compared with Qwen-VL, Qwen2-VL has the capabilities of:\nSoTA understanding of images of various resolution &amp; ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.\nUnderstanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.","title":"Qwen2-VL: To See the World More Clearly"},{"summary":"DEMO PAPER GITHUB HUGGING FACE MODELSCOPE DISCORD\nTo achieve the objective of building an AGI system, the model should be capable of understanding information from different modalities. Thanks to the rapid development of large language models, LLMs are now capable of understanding language and reasoning. Previously we have taken a step forward to extend our LLM, i.e., Qwen, to more modalities, including vision and audio, and built Qwen-VL and Qwen-Audio. Today, we release Qwen2-Audio, the next version of Qwen-Audio, which is capable of accepting audio and text inputs and generating text outputs.","title":"Qwen2-Audio: Chat with Your Voice!"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DISCORD\n\ud83d\udea8 This model mainly supports English. We will release bilingual (English and Chinese) math models soon. Introduction Over the past year, we have dedicated significant effort to researching and enhancing the reasoning capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems. Today, we are delighted to introduce a series of math-specific large language models of our Qwen2 series, Qwen2-Math and Qwen2-Math-Instruct-1.","title":"Introducing Qwen2-Math"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nIntroduction After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you:\nPretrained and instruction-tuned models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B; Having been trained on data in 27 additional languages besides English and Chinese; State-of-the-art performance in a large number of benchmark evaluations; Significantly improved performance in coding and mathematics; Extended context length support up to 128K tokens with Qwen2-7B-Instruct and Qwen2-72B-Instruct.","title":"Hello Qwen2"},{"summary":"We&rsquo;ve created an agent using Qwen2 models with an 8k context size to understand documents with 1M tokens, surpassing RAG and native long-context models. This agent was also used to generate data for training new long-context Qwen models.","title":"Generalizing an LLM from 8k to 1M Context using Qwen-Agent"},{"summary":"API DEMO DISCORD\nPreviously, we opensourced a series of Qwen1.5 model ranging from 0.5 to 110 billion parameters. Now, we release a larger model, Qwen-Max-0428. Qwen-Max-0428 is an instruction-tuned model for chat service. Very recently, it is available via Chatbot Arena and it has now become the top-10 in the leaderboard. Furthermore, our evaluation of MT-Bench also demonstrates that the new model outperforms our previous largest model Qwen1.5-110B-Chat.\nModels MT-Bench Arena Qwen1.","title":"Notes on Qwen-Max-0428"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nIntroduction Recently we have witnessed a burst of large-scale models with over 100 billion parameters in the opensource community. These models have demonstrated remarkable performance in both benchmark evaluation and chatbot arena. Today, we release the first 100B+ model of the Qwen1.5 series, Qwen1.5-110B, which achieves comparable performance with Meta-Llama3-70B in the base model evaluation, and outstanding performance in the chat evaluation, including MT-Bench and AlpacaEval 2.","title":"Qwen1.5-110B: The First 100B+ Model of the Qwen1.5 Series"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nIntroduction The advent of advanced programming tools, which harnesses the power of large language models (LLMs), has significantly enhanced programmer productivity and accuracy. Notwithstanding these advancements, dominant coding assistants like Github Copilot, built upon proprietary LLMs, pose notable challenges in terms of cost, privacy, security, and potential copyright infringement. Recognizing the imperative for a more transparent and accessible alternative, the open-source community has embarked on a concerted endeavor to develop open codeLLMs.","title":"Code with CodeQwen1.5"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nIntroduction The open-source community has long sought a model that strikes an ideal balance between performance, efficiency, and memory footprint. Despite the emergence of cutting-edge models like Qwen1.5-72B and DBRX, the models have faced persistent challenges such as large memory consumption, slow inference speed, and substantial finetuning costs.\nA growing consensus within the field now points to a model with approximately 30 billion parameters as the optimal &ldquo;sweet spot&rdquo; for achieving both strong performance and manageable resource requirements.","title":"Qwen1.5-32B: Fitting the Capstone of the Qwen1.5 Language Model Series"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nIntroduction Since the surge in interest sparked by Mixtral, research on mixture-of-expert (MoE) models has gained significant momentum. Both researchers and practitioners are keenly interested in understanding how to effectively train such models and assessing their efficiency and effectiveness. Today, we introduce Qwen1.5-MoE-A2.7B, a small MoE model with only 2.7 billion activated parameters yet matching the performance of state-of-the-art 7B models like Mistral 7B and Qwen1.","title":"Qwen1.5-MoE: Matching 7B Model Performance with 1\/3 Activated Parameters"},{"summary":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD\nIntroduction In recent months, our focus has been on developing a &ldquo;good&rdquo; model while optimizing the developer experience. As we progress towards Qwen1.5, the next iteration in our Qwen series, this update arrives just before the Chinese New Year.\nWith Qwen1.5, we are open-sourcing base and chat models across six sizes: 0.5B, 1.8B, 4B, 7B, 14B, 32B, 72B, and 110B, and also an MoE model (see blog for more information).","title":"Introducing Qwen1.5"},{"summary":"Along with the rapid development of our large language model Qwen, we leveraged Qwen\u2019s capabilities and unified multimodal pretraining to address the limitations of multimodal models in generalization, and we opensourced multimodal model Qwen-VL in Sep. 2023. Recently, the Qwen-VL series has undergone a significant upgrade with the launch of two enhanced versions, Qwen-VL-Plus and Qwen-VL-Max. The key technical advancements in these versions include:\nSubstantially boost in image-related reasoning capabilities; Considerable enhancement in recognizing, extracting, and analyzing details within images and texts contained therein; Support for high-definition images with resolutions above one million pixels and images of various aspect ratios.","title":"Introducing Qwen-VL"},{"summary":"4 months after our first release of Qwen-7B, which is the starting point of our opensource journey of large language models (LLM), we now provide an introduction to the Qwen series to give you a whole picture of our work as well as our objectives. Below are important links to our opensource projects and community.\nPAPER GITHUB HUGGING FACE MODELSCOPE DISCORD\nAdditionally, we have WeChat groups for chatting and we invite you to join the groups through the provided link in our GitHub readme.","title":"Introducing Qwen"},{"summary":"2022 is a year of generalist models! With the bloom of multimodal pretraining, especially the unified model, we have witnessed the opportunity to building a generalist model that is capable of processing tasks of different modalities or multi-modalities! Thus, we propose OFA1, namely One-For-All, a unified multimodal pretrained model that unifies understanding and generation tasks concerning modalities into a single framework, and we pretrain OFA with the instruction-based multitask-pretraining that endows it with multiple capabilities.","title":"OFA: Towards Building a One-For-All Model"},{"summary":"Intro Generalist Models are hot! We all see an opportunity towards a real generalist model by multimodal multitask learning. We previously release an opensourced unified multimodal pretrained model OFA for this goal. However, we actually met a lot of difficulties in our implementation. For example, it is hard to set up multiple tasks concerning multiple modalities, and it is hard to organize multitask learning, e.g., how to batchify your data and how to make your training stable.","title":"OFASys: Enabling Multitask Learning with One Line of Code! "},{"summary":"CLIP1 is a phenomenal playmaker in vision and multimodal representation learning. It plays not only as a foundation model but also a bridge between vision and language. It has triggered a series of research in different fields, especially text-to-image generation. However, we find that there is a necessity for a language-specific CLIP for applications, especially cross-modal retrieval, and there is no opensourced Chinese CLIP with good performance. We therefore launched this project to promote the Chinese multimodal representation learning.","title":"Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese"}]