NVIDIA GB10 / DGX Spark

真机实测 — NVIDIA GB10 / DGX Spark

下面这些数字是 AIMA 在 NVIDIA GB10 / DGX Spark 上跑出来的，每一条都能追溯到 AIMA 模型 catalog 里对应的 YAML。

数据更新于 2026-04 数据源：AIMA catalog

测量方式

AIMA 内置 agent 完成全程：模型部署后，采样 decode 吞吐（tok/s）、首 token 延迟（TTFT）、每 token 延迟（TPOT）、显存峰值、最大上下文；语音识别（ASR）另测实时率（RTF），语音合成（TTS）测合成耗时，图像生成测端到端延迟，视觉模型（VLM）测视觉 token 处理耗时。所有数字来自实测日志，写回 catalog。复现命令见 AIMA 文档。

gemma-4-26b-a4b-it

VLM benchmark + validated 2026-04-04

指标	数值	说明
Decode 吞吐 (tok/s)	24–28
TTFT (ms)	127–489
TPOT (ms)	45–59
显存峰值 (MiB)	92,800
最大上下文 (tokens)	155,648

来源： gemma-4-26b-a4b-it.yaml

Validated on NVIDIA GB10 / DGX Spark; aggregate tok/s and TTFT come from catalog summary, AIMA matrix columns come from the same validated note.

glm-4.7-flash

LLM benchmark 2026-03-01

指标	数值	说明
Decode 吞吐 (tok/s)	14.5–25.8
TTFT (ms)	71–10266
TPOT (ms)	39–69.3
显存峰值 (MiB)	60,400
最大上下文 (tokens)	65,536

来源： glm-4.7-flash.yaml

128K context causes OOM on GB10; concurrency columns are from the file's explicitly labeled concurrency_1k block.

qwen3.5-35b-a3b

LLM/VLM benchmark 2026-02-28

指标	数值	说明
Decode 吞吐 (tok/s)	24.2–30
TTFT (ms)	96–34015
TPOT (ms)	31–34
显存峰值 (MiB)	67,100
最大上下文 (tokens)	131,072
1024 image tokens	1025
Max single-image resolution	3424x3424
Max images at 1024 res	255

来源： qwen3.5-35b-a3b.yaml

ttft_ms_max uses the 131072-token context-scaling result (34.015s); vision columns come from the measured vision section.

qwen3-coder-next-fp8

Code LLM benchmark

指标	数值	说明
Decode 吞吐 (tok/s)	0.2–42.5
TTFT (ms)	239–141200
TPOT (ms)	22.3–46.6
显存峰值 (MiB)	113,691
最大上下文 (tokens)	262,144

来源： qwen3-coder-next-fp8.yaml

Source file does not label the 2nd value in concurrency_1k; it is preserved in the raw columns without further inference.

qwen3-asr-1.7b

ASR benchmark + validated 2026-03-20

指标	数值	说明
显存峰值 (MiB)	3,870
RTF	0.076–0.17	approx 6–13x realtime
1.3s audio latency	0.22s
7s audio latency	0.59s
20s audio latency	1.56s
Total footprint	~10.5 GB	model + KV cache + CUDA graphs

通过测试： ASR

来源： qwen3-asr-1.7b.yaml openclaw-multi.yaml

Audio timings are from the validated GB10 note; total footprint about 10.5 GB comes from model + KV cache + CUDA graphs in notes.

qwen3-tts-0.6b

TTS benchmark + validated 2026-03-20

指标	数值	说明
显存峰值 (MiB)	2,048
GPU RTF	0.8–0.9
7s audio synthesis	5.4s
20s audio synthesis	16.2s
CPU ARM64 RTF (GB10 reference)	~5.0

通过测试： TTS

来源： qwen3-tts-0.6b.yaml openclaw-multi.yaml

GPU RTF is from the GB10 CUDA note; the same source also mentions CPU ARM64 on GB10 at about RTF 5.0, which is kept in this note rather than mixed into GPU metric columns.

z-image

ImageGen benchmark + validated 2026-03-31

指标	数值	说明
显存峰值 (MiB)	22,000
512×512 / 28 steps	20s/image

通过测试：图像生成

来源： z-image.yaml BUG-gb10-openclaw-multi-e2e-20260331.md

Image generation latency is the GB10 512x512 / 28 steps figure from the model catalog; end-to-end pass comes from the GB10 OpenClaw E2E report.

qwen3.5-9b

VLM/LLM validated 2026-03-20 / 2026-03-31

指标	数值	说明
Decode 吞吐 (tok/s)	13–17
TTFT (ms)	30–200
显存峰值 (MiB)	18,000
最大上下文 (tokens)	65,536

通过测试： LLM 对话 VLM

来源： openclaw-multi.yaml BUG-gb10-openclaw-multi-e2e-20260331.md qwen3.5-9b.yaml

This row is included because GB10 validation is explicit, but the tok/s and TTFT numbers come from catalog estimates rather than a standalone measured benchmark log.

更多芯片的实测数据陆续整理中 —— 关注 GitHub 仓库或博客获取更新。 GitHub

← 首页 AIMA 灵机产品