Skip to content
  1.  
  2. © 2023 – 2025 OpenRouter, Inc
    Favicon for qwen

    Qwen

    Browse models from Qwen

    51 models

    Tokens processed on OpenRouter

    • Qwen: Qwen3 VL 32B InstructQwen3 VL 32B Instruct

      Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.

      by qwen262K context
  3. Qwen: Qwen3 VL 8B ThinkingQwen3 VL 8B Thinking
    8.16M tokens

    Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and long-context processing (native 256K, expandable to 1M tokens) for tasks such as scientific visual analysis, causal inference, and mathematical reasoning over image or video inputs. Compared to the Instruct edition, the Thinking version introduces deeper visual-language fusion and deliberate reasoning pathways that improve performance on long-chain logic tasks, STEM problem-solving, and multi-step video understanding. It achieves stronger temporal grounding via Interleaved-MRoPE and timestamp-aware embeddings, while maintaining robust OCR, multilingual comprehension, and text generation on par with large text-only LLMs.

    by qwen256K context$0.18/M input tokens$2.10/M output tokens
  4. Qwen: Qwen3 VL 8B InstructQwen3 VL 8B Instruct
    4.5M tokens

    Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization. The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions.

    by qwen256K context$0.08/M input tokens$0.50/M output tokens
  5. Qwen: Qwen3 VL 30B A3B ThinkingQwen3 VL 30B A3B Thinking
    13.2M tokens

    Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.

    by qwen0 context$0.20/M input tokens$1/M output tokens
  6. Qwen: Qwen3 VL 30B A3B InstructQwen3 VL 30B A3B Instruct
    7.18M tokens

    Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.

    by qwen0 context$0.20/M input tokens$0.70/M output tokens
  7. Qwen: Qwen3 VL 235B A22B ThinkingQwen3 VL 235B A22B Thinking
    2.44M tokens

    Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

    by qwen131K context$0.30/M input tokens$1.20/M output tokens
  8. Qwen: Qwen3 VL 235B A22B InstructQwen3 VL 235B A22B Instruct
    511K tokens

    Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

    by qwen131K context$0.30/M input tokens$1.20/M output tokens
  9. Qwen: Qwen3 MaxQwen3 Max
    265M tokens

    Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode.

    by qwen256K context$1.20/M input tokens$6/M output tokens
  10. Qwen: Qwen3 Coder PlusQwen3 Coder Plus
    176M tokens

    Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.

    by qwen128K context$1/M input tokens$5/M output tokens
  11. Qwen: Qwen3 Coder FlashQwen3 Coder Flash
    48.5M tokens

    Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.

    by qwen128K context$0.30/M input tokens$1.50/M output tokens
  12. Qwen: Qwen3 Next 80B A3B ThinkingQwen3 Next 80B A3B Thinking
    35.5M tokens

    Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic planning, and reports strong results across knowledge, reasoning, coding, alignment, and multilingual evaluations. Compared with prior Qwen3 variants, it emphasizes stability under long chains of thought and efficient scaling during inference, and it is tuned to follow complex instructions while reducing repetitive or off-task behavior. The model is suitable for agent frameworks and tool use (function calling), retrieval-heavy workflows, and standardized benchmarking where step-by-step solutions are required. It supports long, detailed completions and leverages throughput-oriented techniques (e.g., multi-token prediction) for faster generation. Note that it operates in thinking-only mode.

    by qwen262K context$0.15/M input tokens$1.20/M output tokens
  13. Qwen: Qwen3 Next 80B A3B InstructQwen3 Next 80B A3B Instruct
    191M tokens

    Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual use, while remaining robust on alignment and formatting. Compared with prior Qwen3 instruct variants, it focuses on higher throughput and stability on ultra-long inputs and multi-turn dialogues, making it well-suited for RAG, tool use, and agentic workflows that require consistent final answers rather than visible chain-of-thought. The model employs scaling-efficient training and decoding to improve parameter efficiency and inference speed, and has been validated on a broad set of public benchmarks where it reaches or approaches larger Qwen3 systems in several categories while outperforming earlier mid-sized baselines. It is best used as a general assistant, code helper, and long-context task solver in production settings where deterministic, instruction-following outputs are preferred.

    by qwen262K context$0.10/M input tokens$0.80/M output tokens
  14. Qwen: Qwen Plus 0728Qwen Plus 0728Thinking variant
    2.35M tokens

    Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.

    by qwen1M context$0.40/M input tokens$4/M output tokens
  15. Qwen: Qwen3 30B A3B Thinking 2507Qwen3 30B A3B Thinking 2507
    25.2M tokens

    Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-step thinking. The model is designed specifically for “thinking mode,” where internal reasoning traces are separated from final answers. Compared to earlier Qwen3-30B releases, this version improves performance across logical reasoning, mathematics, science, coding, and multilingual benchmarks. It also demonstrates stronger instruction following, tool use, and alignment with human preferences. With higher reasoning efficiency and extended output budgets, it is best suited for advanced research, competitive problem solving, and agentic applications requiring structured long-context reasoning.

    by qwen131K context$0.08/M input tokens$0.29/M output tokens
  16. Qwen: Qwen3 Coder 30B A3B InstructQwen3 Coder 30B A3B Instruct
    43.4M tokens

    Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the Qwen3 architecture, it supports a native context length of 256K tokens (extendable to 1M with Yarn) and performs strongly in tasks involving function calls, browser use, and structured code completion. This model is optimized for instruction-following without “thinking mode”, and integrates well with OpenAI-compatible tool-use formats.

    by qwen0 context$0.06/M input tokens$0.25/M output tokens
  17. Qwen: Qwen3 30B A3B Instruct 2507Qwen3 30B A3B Instruct 2507
    12.3M tokens

    Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and agentic tool use. Post-trained on instruction data, it demonstrates competitive performance across reasoning (AIME, ZebraLogic), coding (MultiPL-E, LiveCodeBench), and alignment (IFEval, WritingBench) benchmarks. It outperforms its non-instruct variant on subjective and open-ended tasks while retaining strong factual and coding performance.

    by qwen131K context$0.08/M input tokens$0.33/M output tokens
  18. Qwen: Qwen3 235B A22B Thinking 2507Qwen3 235B A22B Thinking 2507
    2.32B tokens

    Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains. The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases.

    by qwen262K context$0.11/M input tokens$0.60/M output tokens
  19. Qwen: Qwen3 Coder 480B A35BQwen3 Coder 480B A35B
    5.67B tokens

    Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.

    by qwen1.05M context$0.22/M input tokens$0.95/M output tokens
  20. Qwen: Qwen3 235B A22B Instruct 2507Qwen3 235B A22B Instruct 2507
    1.18B tokens

    Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.

    by qwen262K context$0.08/M input tokens$0.55/M output tokens
  21. Qwen: Qwen3 0.6BQwen3 0.6B

    Qwen3-0.6B is a lightweight, 0.6 billion parameter language model in the Qwen3 series, offering support for both general-purpose dialogue and structured reasoning through a dual-mode (thinking/non-thinking) architecture. Despite its small size, it supports long contexts up to 32,768 tokens and provides multilingual, tool-use, and instruction-following capabilities.

    by qwen32K context
  22. Qwen: Qwen3 1.7BQwen3 1.7B

    Qwen3-1.7B is a compact, 1.7 billion parameter dense language model from the Qwen3 series, featuring dual-mode operation for both efficient dialogue (non-thinking) and advanced reasoning (thinking). Despite its small size, it supports 32,768-token contexts and delivers strong multilingual, instruction-following, and agentic capabilities, including tool use and structured output.

    by qwen32K context
  23. Qwen: Qwen3 4BQwen3 4BFree variant
    715K tokens

    Qwen3-4B is a 4 billion parameter dense language model from the Qwen3 series, designed to support both general-purpose and reasoning-intensive tasks. It introduces a dual-mode architecture—thinking and non-thinking—allowing dynamic switching between high-precision logical reasoning and efficient dialogue generation. This makes it well-suited for multi-turn chat, instruction following, and complex agent workflows.

    by qwen128K context$0/M input tokens$0/M output tokens
  24. Qwen: Qwen3 30B A3BQwen3 30B A3B
    1.21B tokens

    Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

    by qwen131K context$0.06/M input tokens$0.22/M output tokens
  25. Qwen: Qwen3 8BQwen3 8B
    348M tokens

    Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math, coding, and logical inference, and "non-thinking" mode for general conversation. The model is fine-tuned for instruction-following, agent integration, creative writing, and multilingual use across 100+ languages and dialects. It natively supports a 32K token context window and can extend to 131K tokens with YaRN scaling.

    by qwen131K context$0.035/M input tokens$0.138/M output tokens
  26. Qwen: Qwen3 14BQwen3 14B
    582M tokens

    Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, programming, and logical inference, and a "non-thinking" mode for general-purpose conversation. The model is fine-tuned for instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.

    by qwen132K context$0.05/M input tokens$0.22/M output tokens
  27. Qwen: Qwen3 32BQwen3 32B
    952M tokens

    Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.

    by qwen131K context$0.05/M input tokens$0.20/M output tokens
  28. Qwen: Qwen3 235B A22BQwen3 235B A22B
    998M tokens

    Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and code tasks, and a "non-thinking" mode for general conversational efficiency. The model demonstrates strong reasoning ability, multilingual support (100+ languages and dialects), advanced instruction-following, and agent tool-calling capabilities. It natively handles a 32K token context window and extends up to 131K tokens using YaRN-based scaling.

    by qwen131K context$0.18/M input tokens$0.54/M output tokens
  29. Qwen: Qwen2.5 Coder 7B InstructQwen2.5 Coder 7B Instruct
    1.81M tokens

    Qwen2.5-Coder-7B-Instruct is a 7B parameter instruction-tuned language model optimized for code-related tasks such as code generation, reasoning, and bug fixing. Based on the Qwen2.5 architecture, it incorporates enhancements like RoPE, SwiGLU, RMSNorm, and GQA attention with support for up to 128K tokens using YaRN-based extrapolation. It is trained on a large corpus of source code, synthetic data, and text-code grounding, providing robust performance across programming languages and agentic coding workflows. This model is part of the Qwen2.5-Coder family and offers strong compatibility with tools like vLLM for efficient deployment. Released under the Apache 2.0 license.

    by qwen131K context$0.03/M input tokens$0.09/M output tokens
  30. Qwen: Qwen2.5 VL 3B InstructQwen2.5 VL 3B Instruct

    Qwen2.5 VL 3B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen64K context
  31. Qwen: Qwen2.5 VL 32B InstructQwen2.5 VL 32B Instruct
    68.9M tokens

    Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.

    by qwen33K context$0.05/M input tokens$0.22/M output tokens
  32. Qwen: QwQ 32BQwQ 32B
    548M tokens

    QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

    by qwen131K context$0.15/M input tokens$0.40/M output tokens
  33. Qwen: Qwen2.5 32B InstructQwen2.5 32B Instruct

    Qwen2.5 32B Instruct is the instruction-tuned variant of the latest Qwen large language model series. It provides enhanced instruction-following capabilities, improved proficiency in coding and mathematical reasoning, and robust handling of structured data and outputs such as JSON. It supports long-context processing up to 128K tokens and multilingual tasks across 29+ languages. The model has 32.5 billion parameters, 64 layers, and utilizes an advanced transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. For more details, please refer to the Qwen2.5 Blog .

    by qwen131K context
  34. Qwen: Qwen VL PlusQwen VL Plus
    6.48M tokens

    Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input. It delivers significant performance across a broad range of visual tasks.

    by qwen8K context$0.21/M input tokens$0.63/M output tokens$0.269/K input imgs
  35. Qwen: Qwen VL MaxQwen VL Max
    22.4M tokens

    Qwen VL Max is a visual understanding model with 7500 tokens context length. It excels in delivering optimal performance for a broader spectrum of complex tasks.

    by qwen131K context$0.80/M input tokens$3.20/M output tokens$1.024/K input imgs
  36. Qwen: Qwen-TurboQwen-Turbo
    72M tokens

    Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.

    by qwen1M context$0.05/M input tokens$0.20/M output tokens
  37. Qwen: Qwen2.5 VL 72B InstructQwen2.5 VL 72B Instruct
    120M tokens

    Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

    by qwen131K context$0.08/M input tokens$0.33/M output tokens
  38. Qwen: Qwen-PlusQwen-Plus
    12.3M tokens

    Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.

    by qwen131K context$0.40/M input tokens$1.20/M output tokens
  39. Qwen: Qwen-Max Qwen-Max
    42.2M tokens

    Qwen-Max, based on Qwen2.5, provides the best inference performance among Qwen models, especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. The parameter count is unknown.

    by qwen33K context$1.60/M input tokens$6.40/M output tokens
  40. Qwen: QwQ 32B PreviewQwQ 32B Preview

    QwQ-32B-Preview is an experimental research model focused on AI reasoning capabilities developed by the Qwen Team. As a preview release, it demonstrates promising analytical abilities while having several important limitations: 1. Language Mixing and Code-Switching: The model may mix languages or switch between them unexpectedly, affecting response clarity. 2. Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer. 3. Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it. 4. Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

    by qwen33K context
  41. Qwen2.5 Coder 32B InstructQwen2.5 Coder 32B Instruct
    303M tokens

    Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning and code fixing. - A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. To read more about its evaluation results, check out Qwen 2.5 Coder's blog.

    by qwen128K context$0.04/M input tokens$0.16/M output tokens
  42. Qwen: Qwen2.5 7B InstructQwen2.5 7B Instruct
    471M tokens

    Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen131K context$0.04/M input tokens$0.10/M output tokens
  43. Qwen2.5 72B InstructQwen2.5 72B Instruct
    525M tokens

    Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen131K context$0.07/M input tokens$0.26/M output tokens
  44. Qwen: Qwen2.5-VL 7B InstructQwen2.5-VL 7B Instruct
    47.6M tokens

    Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen33K context$0.20/M input tokens$0.20/M output tokens$0.145/K input imgs
  45. Qwen 2 7B InstructQwen 2 7B Instruct

    Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization. For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen33K context
  46. Qwen 2 72B InstructQwen 2 72B Instruct

    Qwen2 72B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization. For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen33K context
  47. Qwen 1.5 72B ChatQwen 1.5 72B Chat

    Qwen1.5 72B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen33K context
  48. Qwen 1.5 110B ChatQwen 1.5 110B Chat

    Qwen1.5 110B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen33K context
  49. Qwen 1.5 32B ChatQwen 1.5 32B Chat

    Qwen1.5 32B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen33K context
  50. Qwen 1.5 14B ChatQwen 1.5 14B Chat

    Qwen1.5 14B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen33K context
  51. Qwen 1.5 7B ChatQwen 1.5 7B Chat

    Qwen1.5 7B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen33K context
  52. Qwen 1.5 4B ChatQwen 1.5 4B Chat

    Qwen1.5 4B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

    by qwen33K context