Step 3.7 Flash favicon

Step 3.7 Flash

Step 3.7 Flash: A High-Efficiency Multimodal AI Model for Advanced Agentic Coding and Enterprise Tasks

Introduction:

Step 3.7 Flash is a cutting-edge, high-efficiency multimodal model designed for real-world agents. Delivering up to 400 TPS, it excels in agentic coding, autonomous enterprise task execution, and deep visual search. With native multimodal understanding, it can see, think, and act across diverse environments, including web, mobile GUIs, and professional software tools. Step 3.7 Flash integrates seamlessly with mainstream agent frameworks and supports an innovative Advisor Mode for cost-effective, high-tier performance.

Added On:

2026-06-01

Monthly Visitors:

--K

Step 3.7 Flash - AI Tool Screenshot and Interface Preview

Step 3.7 Flash Product Information

Step 3.7 Flash: The New Frontier of Agentic Efficiency and Multimodal Intelligence

Released on May 29, 2026, Step 3.7 Flash represents a significant leap forward in the evolution of artificial intelligence. As a high-efficiency Flash model specifically designed for real-world agents, Step 3.7 Flash is built on the philosophy of See. Think. Act. This model is not just about processing text; it is engineered to achieve high-speed performance, reaching up to 400 TPS (Tokens Per Second), while maintaining the complex reasoning capabilities required for autonomous task execution.

What is Step 3.7 Flash?

Step 3.7 Flash is an agentic foundation model characterized by its multimodal understanding and high-speed efficiency. It features a robust architecture with 196B total parameters (plus a 1.8B ViT for vision tasks) and 11B active parameters, placing it at the forefront of the "Flash-tier" model category. Unlike traditional models that focus solely on answering queries, Step 3.7 Flash is designed to take action. Whether it is navigating a complex web interface, writing production-grade code, or orchestrating various software tools, Step 3.7 Flash provides a reliable substrate for digital agency.

By focusing on agent efficiency, Step 3.7 Flash bridges the gap between general intelligence and professional expertise. It is purpose-built to operate within an ecosystem of agents, supporting native multimodal acting and advanced tool use across enterprise environments.

Key Features of Step 3.7 Flash

Native Multimodal Understanding & Acting

One of the standout features of Step 3.7 Flash is its ability to understand images across a vast range—including product UIs, complex documents, dense charts, and natural scenes. Beyond simple recognition, the model can write code or call tools based on the visual information it perceives. This makes Step 3.7 Flash an ideal choice for tasks requiring visual reasoning and immediate action.

Enhanced Web and Visual Search

Step 3.7 Flash turns search into a native part of its reasoning process. It features:

  • Web Search Enhancement: Reaches deeper follow-up sources and a broader range of information.
  • Visual Search: Recognizes long-tail entities and freshly emerged concepts that other systems might miss.
  • Deep Retrieval: Scores an impressive 92.82% F1 score on DeepSearchQA, proving its research and synthesis capabilities.

Reliable Tool Use & Orchestration

Step 3.7 Flash is built for long-horizon tasks. It can drive terminals, browsers, Office tools, and search engines with high coherence. This reliability results in less drift, fewer broken tool calls, and fewer failed runs, even during extended workflows. On the Toolathlon benchmark for multi-tool coordination, Step 3.7 Flash achieved a score of 49.5%.

Agent Ecosystem Compatibility

The model is designed to work seamlessly with mainstream harnesses such as Claude Code, KiloCode, Hermes Agent, and OpenClaw. This compatibility ensures lower integration costs and less workflow rewiring for developers already utilizing these agentic frameworks.

The Innovative Advisor Mode

To push quality further without sacrificing Flash-tier efficiency, Step 3.7 Flash supports Advisor Mode. In this mode, Step 3.7 Flash drives the execution end-to-end, consulting a larger advisor model only at critical inflection points (such as planning or recovering from repeated failures). This strategy allows Step 3.7 Flash to reach 97% of Claude Opus 4.6's coding performance at approximately one-ninth the cost ($0.19 vs. $1.76 per task).

Step 3.7 Flash Benchmarks and Performance

Step 3.7 Flash consistently outperforms or matches industry leaders across various benchmarks:

  • Agentic Coding: On SWE-Bench Pro, Step 3.7 Flash scored 56.3, and on Terminal-Bench 2.1, it reached 59.6%.
  • Multimodal Tasks: The model scored 79.2% on SimpleVQA and an exceptional 95.29% on V* (using the Python tool).
  • General Agency: It achieved 67.1% on ClawEval-1.1 for daily autonomous tasks and 45.8% on GDPval across 44 different occupations.
  • Search Capabilities: On BrowseComp, it scored 75.8%, approaching the performance of much larger "Pro" level models.

Use Cases for Step 3.7 Flash

Agentic Coding and Development

Step 3.7 Flash is a powerhouse for developers. It excels in the plan-execute-observe-iterate loop. For example, it can take a sketch and turn it into a functional web page or take a draft and convert it into code. Its ability to autonomously turn to a GUI to test the code it just produced highlights its emergent compositional behavior.

Enterprise and Specialized Domains

Step 3.7 Flash is purpose-built for enterprise tasks that require domain-specific knowledge. Use cases include:

  • Finance and Accounting: Detailed data analysis and reporting.
  • Manufacturing: Complex production scheduling.
  • Engineering: Heat treatment analysis and technical trace tracking.
  • Legal: Conflict-of-interest analysis using domain-specific rules and case materials.

GUI and Phone Operation

With its Phone-use stack, Step 3.7 Flash can operate graphical user interfaces to complete long-horizon tasks across multiple mobile apps. It achieves a 61.87% score on the Android Daily benchmark, showcasing its stability and robustness in mobile environments.

How to Use and Deploy Step 3.7 Flash

Availability

Step 3.7 Flash is widely accessible through the following platforms:

  • StepFun Open Platform: platform.stepfun.ai (Global) and platform.stepfun.com (China).
  • Third-Party Providers: OpenRouter and NVIDIA NIM.
  • Chat Platforms: Available on Web (EN/中文) and mobile apps (iOS/Android).

Deployment Options

The model supports flexible deployment scenarios:

  • Cloud and Data Centers: Optimized for modern data center infrastructure and large-scale production.
  • Local/Workstation: Can run on high-memory devices such as NVIDIA DGX Station, AMD Ryzen AI Max+ 395 systems, and Mac Studio/Macbook Pro (minimum 128GB unified memory).

Developer Ecosystem

Developers can utilize various open-source infrastructures for inference and serving, including vLLM, SGLang, Hugging Face Transformers, and llama.cpp. For customization, Step 3.7 Flash is supported within the NVIDIA Nemo ecosystem.

FAQ

Q: What is the token speed of Step 3.7 Flash? A: Step 3.7 Flash is a high-efficiency model capable of reaching up to 400 TPS.

Q: Does Step 3.7 Flash support vision inputs? A: Yes, it is an agentic foundation model with native vision input support, capable of using visual tools like Visual Search and a Python-based cropping/zooming tool.

Q: How does Advisor Mode save money? A: By using Step 3.7 Flash as the primary executor and only calling a larger "Advisor" model at critical points, users can achieve near-frontier performance at roughly 1/9th the cost of using a large model alone.

Q: What are the hardware requirements for local deployment? A: For local or workstation scenarios, the model requires devices with at least 128GB of unified memory, such as Mac Studio or specialized NVIDIA/AMD systems.

Q: Is Step 3.7 Flash compatible with existing agent frameworks? A: Yes, it is designed for compatibility with mainstream harnesses like Claude Code, Hermes Agent, and KiloCode.

Loading related products...