Powered by StepFun

Step 3.7 Flash

  • Multimodal
  • Vision-Language
  • Code Generation
  • Agentic Workflows
  • Long Context
  • Fast Inference
  • +1 category

Step 3.7 Flash is StepFun’s latest high-efficiency multimodal Mixture-of-Experts vision-language model, optimized for enterprise-scale agentic, coding, and long-context reasoning workloads.

Start Using API

What is Step 3.7 Flash?

Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts vision-language model from StepFun that combines a large language backbone with a vision encoder for native image and video understanding. It is primarily used for high-throughput agentic workflows such as tool-calling, multi-step reasoning, and structured automation across text, image, and video inputs. It is also applied to coding, math, and long-context productivity tasks like parsing large documents or running concurrent coding agents with a 256K-token context window. The model extends and builds on the Step 3.5 Flash language architecture within the broader Step 3.x Flash family.

5 Core Capabilities

  • Multimodal Understanding

    Processes text, images, and video frames together, enabling native image and video understanding for complex perception and reasoning tasks.

  • Conversational Reasoning

    Supports fast, multi-step reasoning in dialogue, with selectable reasoning depth to balance speed, cost, and quality of answers.

  • Agentic Workflows

    Designed for agent-style applications, coordinating perception, search, and multi-step actions across tools, terminals, browsers, and services.

  • Code Generation

    Generates and edits code, supports frontend generation from mockups, screenshot-based debugging, and high-throughput concurrent coding agents.

  • Long-Context Processing

    Handles up to 256k tokens, enabling single-pass analysis of large documents, multi-source search traces, and extensive conversational histories.

6 Most Valuable Use Cases

  • Multimodal UI Agents
  • Screenshot Debugging
  • Frontend From Mockups
  • Document Understanding
  • Tool-Calling Orchestration
  • Code Generation Agents

Cost Comparison

LLM API offers Step 3.7 Flash access at the same base token prices as direct StepFun, while aggregators and cloud endpoints may add their own margins or be free-tier only.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global $0.20 per 1M tokens $1.15 per 1M tokens 256K
StepFun Global $0.20 per 1M tokens $1.15 per 1M tokens 256K
OpenRouter Global $0.20 per 1M tokens $1.15 per 1M tokens 256K
NVIDIA NIM Global

Technical Specifications

Metric Step 3.7 Flash DeepSeek V4 Flash Gemini 2.5 Flash
Model Type Multimodal MoE VLM Multimodal LLM Multimodal LLM
Total Parameters 198B
Active Parameters / Token ~11B
Context Window 256K 1M
Modalities Text, Image, Video Text, Image Text, Image, Audio, Video
Input Price ($/1M tokens) $0.071 $0.10
Output Price ($/1M tokens) $1.15 $0.40
Max Output Tokens 8192

30-day usage via LLM API

2.3B
Prompt tokens processed (last 30 days)
1.1B
Completion tokens generated (last 30 days)
7.8M
API requests served (last 30 days)
99.8%
Avg uptime across all regions
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Automatically route each request to the optimal model across providers based on latency, quality, or custom rules—no client changes required as your stack evolves.

    One endpoint, every model
  • Cost-Aware Orchestration

    Control spend by mixing premium and budget models behind one API, with routing policies that cap cost per request and optimize for price-performance.

    Lower cost, same output
  • Resilient Fallbacks

    Eliminate single-provider outages with automatic failover to backup models, preserving SLAs and uptime without adding error-handling complexity to your application code.

    Stay online, automatically
  • Full-Stack Observability

    Get unified logs, metrics, traces, and model-level analytics so you can debug latency spikes, track usage, and tune routing—all from a single dashboard.

    See every token
  • Task-Level Abstractions

    Call high-level tasks like chat, generation, or extraction instead of provider-specific APIs, so you can swap models without rewriting business logic.

    Code to tasks, not models
  • High-Throughput Batch

    Run large-scale batch jobs across models with automatic chunking, retry, and rate-limit handling, achieving maximum throughput without custom queue infrastructure.

    Thousands of calls, one job

When to Use — When NOT to Use

Use it if...

  • You need a fast, low-cost model for simple question answering or retrieval.
  • You need to serve high-volume API traffic where throughput and latency dominate accuracy.
  • Your use case involves lightweight classification, tagging, or routing over many short texts.
  • Your use case involves simple data extraction from semi-structured content like forms or receipts.
  • You need a compact model for rapid experimentation, A/B tests, or fallback logic.
  • Your use case involves template-based content generation where creativity and nuance are limited.

Avoid if...

  • You need state-of-the-art reasoning for complex multi-step problems or intricate planning tasks.
  • Your workload requires handling very long contexts with high faithfulness to source documents.
  • You need expert-level coding assistance, complex refactoring, or multi-file software design support.
  • You need highly creative writing, nuanced style control, or domain-specialist technical drafting.
  • Your workload requires robust multilingual performance across low-resource languages or tricky scripts.
  • You need strict reliability for safety-critical decisions, legal analysis, or medical advice.

Frequently Asked Questions

  • What is Step 3.7 Flash?

    Step 3.7 Flash is a StepFun large language model optimized for fast, low-cost text generation through the LLM.API unified gateway.

  • What is Step 3.7 Flash best suited for?

    Step 3.7 Flash is best for high-volume, latency-sensitive tasks like chatbots, routing, drafting, and lightweight reasoning where speed and cost matter most.

  • What is the context window of Step 3.7 Flash?

    Step 3.7 Flash supports context windows up to 16K tokens, suitable for long conversations or moderately sized documents.

  • How fast is Step 3.7 Flash in terms of latency?

    Step 3.7 Flash is designed for low-latency responses, typically returning first tokens quickly enough for real-time interactive applications.

  • What modalities does Step 3.7 Flash support?

    Step 3.7 Flash currently supports text-in, text-out interactions and does not natively process images, audio, or video.

  • How do I call Step 3.7 Flash via LLM.API?

    Use the LLM.API chat or completions endpoint and set the model parameter to "stepfun/step-3.7-flash" with your LLM.API key.

  • How is pricing for Step 3.7 Flash handled on LLM.API?

    Pricing for Step 3.7 Flash is metered per input and output token by LLM.API, with rates listed in your LLM.API dashboard and pricing page.

  • How does Step 3.7 Flash compare to more capable StepFun models?

    Compared to larger StepFun models, Step 3.7 Flash is cheaper and faster but offers weaker reasoning, coding, and complex instruction-following.

  • Can I use Step 3.7 Flash for code generation?

    Step 3.7 Flash can generate and edit code for straightforward tasks, but complex, critical coding workloads should use a more capable model.

  • What are the main limitations of Step 3.7 Flash?

    Step 3.7 Flash may hallucinate facts, struggle with intricate multi-step reasoning, and is not suitable for safety-critical or compliance-sensitive decisions.

Related Resources

Start in 2 lines of code

Get My API Key