Kimi K2.7 Code
Kimi-K2.7-Code: A High-Performance 1T Parameter MoE Coding Agent by Moonshot AI
Kimi-K2.7-Code is an advanced coding-focused agentic model built by Moonshot AI, featuring 1T parameters, 256K context length, and superior long-horizon task completion.
2026-06-15
27366.8K
Kimi K2.7 Code Product Information
Kimi-K2.7-Code: The Next Generation of Agentic Coding Models by Moonshot AI
Kimi-K2.7-Code represents a significant leap forward in the realm of coding-focused artificial intelligence. Developed by Moonshot AI, this model is an agentic powerhouse built upon the foundations of Kimi K2.6. Designed specifically for complex software engineering workflows, Kimi-K2.7-Code excels in real-world long-horizon coding tasks, providing end-to-end task completion while maintaining high token efficiency. By reducing thinking-token usage by approximately 30% compared to its predecessor, Kimi-K2.7-Code offers a more streamlined and cost-effective solution for developers and enterprises alike.
What's Kimi-K2.7-Code?
Kimi-K2.7-Code is a specialized Mixture-of-Experts (MoE) model engineered to act as a coding agent. Unlike standard large language models, it is optimized for the intricate requirements of software development, including debugging, refactoring, and complex architectural planning. The model boasts a massive 1 trillion total parameters, with 32 billion activated parameters per inference step, ensuring a balance between depth of knowledge and computational speed.
With a 256K context length, Kimi-K2.7-Code can ingest and analyze entire codebases, making it an ideal choice for long-context software projects. It utilizes the MLA (Multi-head Latent Attention) mechanism and SwiGLU activation function to deliver state-of-the-art performance in both text and vision-related coding tasks.
Key Features of Kimi-K2.7-Code
1. Advanced Architecture
Kimi-K2.7-Code is built using a sophisticated MoE framework:
- Total Parameters: 1T
- Activated Parameters: 32B
- Layers: 61 (including dense layers)
- Experts: 384 total experts, with 8 selected per token.
- Vision Encoder: MoonViT with 400M parameters, enabling the model to handle image and video inputs.
2. Enhanced Token Efficiency
Efficiency is at the core of Kimi-K2.7-Code. It achieves a 30% reduction in thinking-token usage compared to Kimi K2.6, allowing for faster response times and reduced operational costs without sacrificing reasoning quality.
3. Native INT4 Quantization
Following the same path as Kimi-K2-Thinking, Kimi-K2.7-Code adopts native INT4 quantization. This allows for high-performance deployment on various hardware configurations while maintaining model precision.
4. Agentic Performance Benchmarks
In rigorous evaluations, Kimi-K2.7-Code has shown remarkable improvements across various benchmarks:
- Kimi Code Bench v2: Scored 62.0 (up from 50.9 in K2.6).
- MCP Mark Verified: Scored 81.1.
- Kimi Claw 24/7 Bench: Scored 46.9.
Use Cases for Kimi-K2.7-Code
Kimi-K2.7-Code is versatile enough to support a wide range of professional software development scenarios:
- Complex Software Engineering: Managing long-horizon tasks that require multi-step reasoning and deep codebase understanding.
- Multi-Modal Coding Support: Utilizing its vision capabilities to describe images or analyze video content related to UI/UX design or technical demonstrations.
- Automated Debugging: Leveraging its thinking mode to trace errors across large files thanks to its 256K context window.
- Coding Agent Frameworks: It works optimally with the Kimi Code CLI to act as a fully autonomous coding assistant.
How to Use Kimi-K2.7-Code
There are several ways to deploy and interact with Kimi-K2.7-Code, ranging from high-level libraries to direct API calls.
Using Transformers Library
You can quickly implement Kimi-K2.7-Code using the Hugging Face transformers library. Ensure your version is >=4.57.1 and <5.0.0.
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="moonshotai/Kimi-K2.7-Code", trust_remote_code=True)
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
pipe(text=messages)
Deployment with vLLM
For high-throughput inference, vLLM is a recommended engine for Kimi-K2.7-Code.
# Install vLLM:
pip install vllm
# Start the server:
vllm serve "moonshotai/Kimi-K2.7-Code"
Official Moonshot AI API
Moonshot AI provides an OpenAI-compatible API for Kimi-K2.7-Code. Note that this model forces thinking and preserve_thinking modes to be active.
Chat Completion with Thinking Mode
import openai
def simple_chat(client: openai.OpenAI, model_name: str):
messages = [
{'role': 'system', 'content': 'You are Kimi, an AI assistant created by Moonshot AI.'},
{'role': 'user', 'content': 'which one is bigger, 9.11 or 9.9? think carefully.'},
]
response = client.chat.completions.create(
model=model_name, messages=messages, stream=False, max_tokens=4096
)
print(f'Reasoning: {response.choices[0].message.reasoning}')
print(f'Response: {response.choices[0].message.content}')
FAQ
Q: What is the context length of Kimi-K2.7-Code? A: The model supports a massive context length of 256K tokens.
Q: Does Kimi-K2.7-Code support multi-modal inputs? A: Yes, Kimi-K2.7-Code supports both image and video inputs. Note that video support is currently experimental and primarily available through the official Moonshot AI API.
Q: What license is Kimi-K2.7-Code released under? A: Both the code repository and the model weights are released under the Modified MIT License.
Q: What are the recommended settings for inference? A: When using third-party APIs like vLLM or SGLang, a temperature of 1.0 is recommended for Thinking mode, with a top_p of 0.95. Instant mode is not supported.
Q: Where can I get support for Kimi-K2.7-Code? A: You can reach out to the Moonshot AI team at [email protected] for any questions regarding the model or its implementation.
Kimi-K2.7-Code is a major milestone for developers looking for an agentic model that truly understands the complexities of modern software development workflows.








