NVIDIA GB10 / DGX Spark

真机实测 — NVIDIA GB10 / DGX Spark

下面这些数字是 AIMA 在 NVIDIA GB10 / DGX Spark 上跑出来的,每一条都能追溯到 AIMA 模型 catalog 里对应的 YAML。

数据更新于 2026-04 数据源:AIMA catalog

测量方式

AIMA 内置 agent 完成全程:模型部署后,采样 decode 吞吐(tok/s)、首 token 延迟(TTFT)、每 token 延迟(TPOT)、显存峰值、最大上下文;语音识别(ASR)另测实时率(RTF),语音合成(TTS)测合成耗时,图像生成测端到端延迟,视觉模型(VLM)测视觉 token 处理耗时。所有数字来自实测日志,写回 catalog。复现命令见 AIMA 文档。

gemma-4-26b-a4b-it

VLM benchmark + validated 2026-04-04
指标 数值 说明
Decode 吞吐 (tok/s) 24–28
TTFT (ms) 127–489
TPOT (ms) 45–59
显存峰值 (MiB) 92,800
最大上下文 (tokens) 155,648

Validated on NVIDIA GB10 / DGX Spark; aggregate tok/s and TTFT come from catalog summary, AIMA matrix columns come from the same validated note.

glm-4.7-flash

LLM benchmark 2026-03-01
指标 数值 说明
Decode 吞吐 (tok/s) 14.5–25.8
TTFT (ms) 71–10266
TPOT (ms) 39–69.3
显存峰值 (MiB) 60,400
最大上下文 (tokens) 65,536

128K context causes OOM on GB10; concurrency columns are from the file's explicitly labeled concurrency_1k block.

qwen3.5-35b-a3b

LLM/VLM benchmark 2026-02-28
指标 数值 说明
Decode 吞吐 (tok/s) 24.2–30
TTFT (ms) 96–34015
TPOT (ms) 31–34
显存峰值 (MiB) 67,100
最大上下文 (tokens) 131,072
1024 image tokens 1025
Max single-image resolution 3424x3424
Max images at 1024 res 255

ttft_ms_max uses the 131072-token context-scaling result (34.015s); vision columns come from the measured vision section.

qwen3-coder-next-fp8

Code LLM benchmark
指标 数值 说明
Decode 吞吐 (tok/s) 0.2–42.5
TTFT (ms) 239–141200
TPOT (ms) 22.3–46.6
显存峰值 (MiB) 113,691
最大上下文 (tokens) 262,144

Source file does not label the 2nd value in concurrency_1k; it is preserved in the raw columns without further inference.

qwen3-asr-1.7b

ASR benchmark + validated 2026-03-20
指标 数值 说明
显存峰值 (MiB) 3,870
RTF 0.076–0.17 approx 6–13x realtime
1.3s audio latency 0.22s
7s audio latency 0.59s
20s audio latency 1.56s
Total footprint ~10.5 GB model + KV cache + CUDA graphs
通过测试: ASR

Audio timings are from the validated GB10 note; total footprint about 10.5 GB comes from model + KV cache + CUDA graphs in notes.

qwen3-tts-0.6b

TTS benchmark + validated 2026-03-20
指标 数值 说明
显存峰值 (MiB) 2,048
GPU RTF 0.8–0.9
7s audio synthesis 5.4s
20s audio synthesis 16.2s
CPU ARM64 RTF (GB10 reference) ~5.0
通过测试: TTS

GPU RTF is from the GB10 CUDA note; the same source also mentions CPU ARM64 on GB10 at about RTF 5.0, which is kept in this note rather than mixed into GPU metric columns.

z-image

ImageGen benchmark + validated 2026-03-31
指标 数值 说明
显存峰值 (MiB) 22,000
512×512 / 28 steps 20s/image
通过测试: 图像生成

Image generation latency is the GB10 512x512 / 28 steps figure from the model catalog; end-to-end pass comes from the GB10 OpenClaw E2E report.

qwen3.5-9b

VLM/LLM validated 2026-03-20 / 2026-03-31
指标 数值 说明
Decode 吞吐 (tok/s) 13–17
TTFT (ms) 30–200
显存峰值 (MiB) 18,000
最大上下文 (tokens) 65,536
通过测试: LLM 对话 VLM

This row is included because GB10 validation is explicit, but the tok/s and TTFT numbers come from catalog estimates rather than a standalone measured benchmark log.

更多芯片的实测数据陆续整理中 —— 关注 GitHub 仓库或博客获取更新。 GitHub