4 posts tagged with "AI Chip"

AI chip industry dynamics and trends

View all tags

华为昇腾910C深度解析：规格、部署与性能全览

June 10, 2026 · 10 min read

AI Compute Cards Wiki Editorial

Industry Research Team

华为昇腾910C（Ascend 910C）作为华为第三代昇腾AI芯片，采用创新的双芯片（Chiplet）封装技术，于2025年5月开启大规模供货，成为国产AI算力的中坚力量。

本文将从技术规格、部署案例、性能对比、市场定位四个维度，全面解析这款国产旗舰AI芯片。

一、核心技术规格

1.1 芯片架构与制程

项目	参数
架构	Da Vinci（双小芯片封装）
制程工艺	SMIC N+2（7nm级）
封装方式	Chiplet（2× Ascend 910B 计算芯片）
晶体管数量	约530亿个
芯片尺寸	约800mm²（估算）

技术创新点：

采用双Die Chiplet封装，将两颗910B芯片整合，突破单芯片良率瓶颈
通过无中心I/O die设计，两颗计算芯片直接互联，降低通信延迟
使用SMIC N+2工艺，实现7nm级性能，保障供应链自主可控

1.2 算力性能

精度	算力	对比参考
BF16	800 TFLOPS	约NVIDIA H100的60%
FP16	~800 TFLOPS	同精度下接近H100性能
INT8	~1600 TOPS	推理场景优势明显
FP32	未公开	训练场景主要用BF16/FP16

性能特点：

在BF16精度下达到800 TFLOPS，成为国产AI芯片算力新标杆
相比910B，算力提升约1倍（双芯片叠加+架构优化）
不支持FP8精度（NVIDIA Blackwell的优势领域）

1.3 内存与互联

项目	参数
HBM类型	HBM2E（8个模块）
内存容量	~128 GB（双芯片合计）
内存带宽	784 GB/s
互联协议	Huawei AscendLink（自研）
互联带宽	单向400 GB/s（双向800 GB/s）

内存优势：

128GB大容量支持千亿参数模型全流程训练
784 GB/s带宽为HBM2E方案中的高端配置
自研AscendLink协议，支持384颗芯片全光互联

1.4 功耗与能效

项目	参数
TDP（双芯片）	~310 W
能效比（BF16）	~2.58 TFLOPS/W
对比H100	功耗约为H100的45%，能效比接近

能效优势：

在相同算力下，功耗显著低于NVIDIA H100（700W）
采用7nm级工艺，能效比较910B提升约30%
适合大规模集群部署，降低数据中心PUE压力

二、关键部署案例

2.1 CloudMatrix 384超节点

系统规格：

项目	配置
芯片数量	384颗Ascend 910C
机柜数量	16个（12个计算柜 + 4个网络柜）
HBM总容量	~49 TB（128GB × 384）
互联方式	全光网状网络
光模块数量	6,912个LPO光模块
系统级BF16算力	~300 PFLOPS

性能对比：

CloudMatrix 384的总BF16算力超过NVIDIA GB200 NVL72（72颗B200）
在大模型训练场景，384颗910C的线性扩展效率达85%以上
支持万卡级集群平滑扩展，满足超大规模训练需求

部署进展：

截至2026年6月，已部署超过500套CloudMatrix 384超节点
主要客户：中国电信、中国移动、中国联通、华为云、科大讯飞等
应用场景：大模型训练、智能客服、自动驾驶仿真、科研计算

2.2 DeepSeek-V4-Pro全参数后训练

突破意义：

2026年6月5日，深圳河套学院AI训练平台联合哈尔滨工业大学（深圳）、深圳市大数据研究院、华为、深智城AI算力平台，基于昇腾910C算力集群，完成1.6万亿参数DeepSeek-V4-Pro大模型的全参数后训练。

技术亮点：

全球首批在国产算力平台上跑通万亿参数大模型全参数后训练
验证昇腾910C在超大规模模型训练场景的成熟度
证明国产AI芯片已具备替代进口芯片的能力

性能数据（官方披露）：

训练吞吐量：约H100集群的60%（BF16精度）
内存利用率：92%（128GB HBM2E大容量优势）
互联效率：384颗芯片线性扩展效率85%+
稳定性：连续训练30天无故障

2.3 商业化部署案例

案例1：某省大数据中心（300 P FLOPS算力中心）

建设规模：300 P FLOPS AI算力（约1,000颗910C）
应用场景：政府大模型、城市大脑、智慧交通
部署时间：2025年9月
投资规模：约2亿元（120台服务器）

案例2：华为云AI训练平台

芯片数量：超过10,000颗Ascend 910C
服务客户：超过500家企业
模型支持：盘古大模型、第三方开源模型（LLaMA、ChatGLM等）
全球化部署：中国、东南亚、中东、拉美

案例3：科大讯飞智慧教育

部署规模：256颗Ascend 910C
应用场景：智慧教育大模型、语音识别、机器翻译
性能提升：相比910B，训练速度提升90%

三、性能对比分析

3.1 与NVIDIA H100对比

项目	Ascend 910C	NVIDIA H100	备注
BF16算力	800 TFLOPS	~1,300 TFLOPS	910C约为H100的60%
HBM容量	128 GB	80 GB	910C多60%
HBM带宽	784 GB/s	3.35 TB/s	H100带宽优势明显
TDP	310 W	700 W	910C功耗仅为H100的45%
制程	7nm（SMIC N+2）	4nm（TSMC）	H100制程更先进
软件生态	CANN（兼容CUDA）	CUDA	H100生态更成熟
供货情况	中国自主可控	受出口管制	910C无供应链风险

结论：

在纯算力上，910C约为H100的60%
在内存容量上，910C领先60%，适合大模型训练
在能效比上，910C显著优于H100
在供应链安全上，910C完胜

3.2 与Ascend 910B对比

项目	Ascend 910C	Ascend 910B	提升幅度
架构	双芯片Chiplet	单芯片	-
BF16算力	800 TFLOPS	~400 TFLOPS	+100%
HBM容量	128 GB	64 GB	+100%
TDP	310 W	310 W	持平（单芯片功耗）
制程	SMIC N+2	SMIC N+2	相同
良率	~40%	~30%	+33%

结论：

910C通过双芯片封装，实现算力、内存容量翻倍
良率从910B的30%提升至40%，降低制造成本
在相同功耗下，性能提升100%，能效比显著优化

3.3 推理性能（DeepSeek模型实测）

测试环境：

模型：DeepSeek-V3（671B参数）
硬件：Ascend 910C vs NVIDIA H100
精度：BF16
批次大小：64

测试结果：

指标	Ascend 910C	NVIDIA H100	比例
推理速度（tokens/s）	8,500	14,200	60%
首token延迟（ms）	120	85	141%
功耗（W）	310	700	44%
成本（万元/卡）	~10	~18	56%

结论：

910C推理速度为H100的60%，但功耗仅为44%
在成本敏感场景，910C的性价比优势明显
对于中国市场的国产化需求，910C是唯一选择

四、市场定位与竞争优势

4.1 目标市场

核心市场：

中国政府与国企：国产化替代、数据安全、自主可控
大模型创业公司：成本敏感、算力需求大
运营商与云服务商：大规模部署、能效要求高
科研与教育：超大规模计算、人才培养

边缘市场：

自动驾驶：端到端大模型训练
智慧医疗：医学影像分析、药物研发
金融科技：风险控制、智能投顾

4.2 竞争优势

优势	说明
自主可控	SMIC N+2工艺 + 华为自研架构，无供应链风险
大内存容量	128GB HBM2E，支持千亿参数模型全流程训练
高能效比	310W TDP实现800 TFLOPS，能效比接近H100
系统级扩展	CloudMatrix 384超节点，总算力超GB200 NVL72
软件生态	CANN兼容CUDA，降低迁移成本
成本优势	约10万元/卡，比H100低约44%

4.3 竞争劣势与改进方向

劣势	改进方向
单芯片算力	下一代910D将采用3nm工艺，目标翻倍
HBM带宽	950系列将采用自研HBM（HiBL 1.0），带宽提升至4 TB/s
软件生态	持续投入CANN + MindSpore，扩大开发者社区
制程工艺	与SMIC深度合作，推进N+3（5nm级）工艺量产

五、2026年出货计划与市场预测

5.1 出货计划

时间	出货量	累计出货	主要客户
2025 Q2-Q4	20万颗	20万颗	华为云、中国电信
2026 Q1-Q2	30万颗	50万颗	中国移动、中国联通、科大讯飞
2026 Q3-Q4	30万颗	80万颗	政府项目、大模型创业公司
2027年	100万颗	180万颗	全球市场（东南亚、中东、拉美）

产能瓶颈：

SMIC N+2工艺产能约10万片/月，其中Ascend 910C约占30%
2026年计划出货80万颗，需要约40万片晶圆，产能利用率需达80%+
华为通过与SMIC深度合作，优先保障910C产能

5.2 市场预测

中国AI芯片市场（2026年）：

总规模：约500亿元
国产芯片占比：约35%（175亿元）
Ascend 910C市场份额：约60%（105亿元，约80万颗）

全球AI芯片市场（2026年）：

总规模：约2,000亿美元
华为份额：约5%（100亿美元）
增长驱动：中国市场国产化 + 一带一路国家出口

六、总结与展望

6.1 核心结论

昇腾910C是国产AI芯片的里程碑产品，在算力、内存、能效、系统扩展等方面实现全面突破
CloudMatrix 384超节点证明国产芯片已具备替代进口芯片的能力
DeepSeek-V4-Pro训练成功验证910C在超大规模模型训练场景的成熟度
2026年出货80万颗，预计占据中国AI芯片市场60%份额

6.2 未来展望

短期（2026-2027）：

910C持续放量，出货量突破100万颗
CloudMatrix 384部署超过1,000套
软件生态（CANN + MindSpore）成熟度接近CUDA的70%

中期（2028-2029）：

下一代910D量产，采用3nm工艺，算力目标1.6 PFLOPS BF16
950系列（PR/DT）成为推理市场主力，市场份额超过30%
960/970发布，采用N+3工艺，支持万亿参数模型

长期（2030+）：

华为昇腾系列成为全球AI芯片市场TOP 3
国产AI芯片在全球市场份额超过20%
实现从"跟跑"到"并跑"再到"领跑"的跨越

参考资料

华为昇腾910C - 百度百科：https://baike.baidu.com/item/%E5%8D%8E%E4%B8%BA%E6%98%87%E8%85%BE910C/67777523
华为昇腾系列AI芯片详细参数对比（2025-2028）- 电子工程专辑：https://www.eet-china.com/mp/a486527.html
华为昇腾910C算力集群发力，国产芯片成功完成万亿级AI大模型训练 - QQ新闻：https://news.qq.com/rain/a/20260608A0526U00
华为昇腾910C完成DeepSeek V4 Pro训练 - 虎嗅网：https://www.huxiu.com/ainews/12966.html
华为昇腾910C实测效率超H100，AI Infra软硬件协同亮剑万卡集群 - CNBlogs：https://www.cnblogs.com/wujianming-110117/p/18939581

本文完

最后更新：2026年6月10日

Computex 2026 AI Compute Card Major Events: DGX Station for Windows, Intel Crescent Island, and More Major Launches

June 5, 2026 · 4 min read

AI Compute Cards Wiki Editorial

Industry Research Team

June 1-5, 2026, Taipei — Computex 2026 (Taipei International Information Technology Show) wrapped up successfully this week. With the theme "AI Together," industry giants including NVIDIA, Intel, AMD, and Qualcomm unveiled numerous AI compute products in rapid succession. Below, MirrorFrog brings you a roundup of the most noteworthy developments in the compute card space this week.

① NVIDIA DGX Station for Windows: A Desktop AI Supercomputer

NVIDIA officially launched the DGX Station for Windows during its Computex 2026 keynote, calling it "the world's most powerful desktop AI supercomputer."

Core Specifications

Item	Specification
Chip	GB300 Grace Blackwell Ultra Desktop Superchip
GPU Memory	252 GB HBM3e (7.1 TB/s)
CPU Memory	496 GB LPDDR5X (396 GB/s)
Unified Memory	748 GB (NVLink-C2C interconnect)
FP4 Compute	20 PFLOPS (sparse)
FP8 Compute	10 PFLOPS (sparse)
Network	ConnectX-8 SuperNIC, up to 800 Gb/s
Model Capacity	Can run 1 trillion parameter models
System Power	1,600 W
Operating System	Microsoft Windows
Shipping	Q4 2026

Significance: DGX Station compresses AI compute power (20 PFLOPS FP4) that previously required datacenter-class clusters into a single desktop workstation. 748GB of unified memory means developers can run models with hundreds of billions or even trillions of parameters locally, without cloud dependency.

② Intel Crescent Island: Inference-Specialized AI GPU

At Computex, Intel disclosed detailed specifications for its next-generation datacenter AI inference GPU, Crescent Island.

Item	Specification
Memory	Up to 480 GB LPDDR5x
Power	350 W (PCIe form factor)
Precision Support	FP4/MXFP4 → FP64 (full precision coverage)
Target	AI inference workloads (Agentic Inference)
Positioning	Better price-performance than HBM solutions
Shipping	H2 2026

Significance: Crescent Island represents Intel's key strategic move in the AI inference market. 480GB of massive LPDDR5x memory (non-HBM) means significantly lower cost compared to NVIDIA H200/B200 and other competing products, targeting enterprise inference deployment scenarios.

③ Intel Xeon 6+ (Clearwater Forest): First Intel 18A Datacenter CPU

Intel also unveiled the new Xeon 6+ processor, codenamed Clearwater Forest, its first datacenter CPU built on the 18A process:

288 Darkmont architecture cores
L2 288MB + L3 576MB cache
12-channel DDR5-8000 memory
Foveros Direct 3D advanced packaging
AI Agent Era: CPU returns to the center of infrastructure

④ NVIDIA RTX Spark Ecosystem Takes Shape

This week, the RTX Spark super chip developed in collaboration between NVIDIA and MediaTek continued to generate buzz. Multiple OEMs showcased RTX Spark-based laptop and compact desktop prototypes:

ASUS, Dell, HP, Lenovo, Microsoft Surface, MSI all confirmed as launch partners
Equipped with 20-core Grace CPU + Blackwell GPU (6144 CUDA cores)
AI compute 1 PFLOPS
Retail availability Fall 2026

⑤ Intel × Foxconn AI Infrastructure Partnership

Intel and Foxconn announced a joint AI infrastructure initiative, covering the complete chain from chip → server → rack-scale system, targeting the datacenter market opportunity driven by surging AI inference demand.

⑥ Domestic AI Chip Developments

According to the IDC 2025 annual report, total AI accelerator card shipments in China reached approximately 4 million units, with domestic vendors shipping approximately 1.65 million units, capturing a market share exceeding 41%. Huawei's Ascend 950 series has entered mass production and delivery, while Cambricon's MLU690 has begun shipping to internet customers.

This Week's Compute Roundup

Vendor	Product	Highlight	Timeline
NVIDIA	DGX Station for Windows	20 PFLOPS, 748GB unified memory	Q4 2026
NVIDIA	RTX Spark	1 PFLOPS AI PC chip	Fall 2026
Intel	Crescent Island GPU	480GB LPDDR5x, 350W	H2 2026
Intel	Xeon 6+ (Clearwater Forest)	288 cores, Intel 18A	H2 2026
Intel + Foxconn	AI infrastructure partnership	Chip→rack full chain	Strategic partnership
Huawei	Ascend 950PR/DT	1 PFLOPS FP8, self-developed HBM	In mass production
Cambricon	MLU690	2 PFLOPS FP8, 192GB HBM3E	Shipping

Sources: NVIDIA GTC Taipei 2026 / Computex 2026 official announcements, Intel press releases, ifeng Tech, IT Home.

Huawei Ascend 950 Mass Production and the Full Picture of China's AI Chip Ecosystem

June 4, 2026 · 4 min read

AI Compute Cards Wiki Editorial

Industry Research Team

June 2026 — Huawei's Ascend 950 series (950PR / 950DT) has entered formal mass production and delivery, a landmark event for China's AI chip industry in 2026. Meanwhile, Cambricon's MLU690 has begun shipping and Moore Threads has announced MTT S5000 specifications, formally establishing China's tri-polar AI chip landscape.

Ascend 950 Series: A Historic Breakthrough with Self-Developed HBM

Huawei HiSilicon's Ascend 950 series is the fourth-generation Ascend AI chip, first revealed at Huawei Connect 2025 in September and entering mass production in Q1 2026.

950PR (Prefill Inference Specialized)

Item	Specification
Architecture	Da Vinci v5 (SIMD + SIMT dual-model)
Process	N+2 (SMIC domestic)
HBM	HiBL 1.0 (Huawei self-developed) , 128 GB
FP8 Compute	1 PFLOPS (HiF8 format)
TDP	~400 W
Target	Inference Prefill (video recommendation, real-time interaction)

950DT (Decode + Training Specialized)

Item	Specification
Architecture	Da Vinci v5 (SIMD + SIMT dual-model)
Process	N+2 (SMIC domestic)
HBM	HiZQ 2.0 (Huawei self-developed) , 144 GB, 4 TB/s
FP8 Compute	1 PFLOPS (HiF8 format)
TDP	~500 W
Target	Inference Decode + Model Training

Historical Significance

Self-developed HBM (HiBL 1.0 / HiZQ 2.0) represents the most important technical breakthrough of Huawei Ascend 950 — this is the first time a Chinese enterprise has achieved self-developed mass production of HBM memory, completely eliminating dependence on SK Hynix / Samsung HBM supply. Combined with the domestic N+2 process, Ascend 950 has achieved full-chain domestic production from HBM → Compute Die → Packaging → System.

Cambricon MLU690: China's Only Native FP8 Support

Cambricon's seventh-generation AI chip MLU 690 (Siyuan 690) began volume production and shipping in H1 2026. This is the first domestic AI chip with native FP8 precision support.

Item	MLU 690
Process	5nm (TSMC / SMIC)
FP8 dense	2 PFLOPS
HBM	192GB HBM3E, 5 TB/s
TDP	~500 W
Unit Price (OAM)	~$8,000-12,000

MLU 690's FP8 compute power (2 PFLOPS dense) is on paper comparable to NVIDIA Blackwell (B200 FP8 4.5 PFLOPS sparse). Leveraging its financing advantage as a STAR Market listed company, Cambricon targets 2026 revenue of ¥15-20B (2025: ¥7.2B).

Moore Threads MTT S5000: From Graphics to Training-Inference Unified

Moore Threads publicly disclosed detailed specifications of the MTT S5000 in February 2026, featuring the fourth-generation MUSA "Pinghu" architecture, single-card AI compute of 1,000 TFLOPS, 80GB GDDR6X memory, 1.6 TB/s bandwidth.

Moore Threads pursues a full-function GPU path (graphics rendering + AI compute + general-purpose compute), closest to NVIDIA's strategy. The founding team comes from former NVIDIA China, and the MUSIFY toolchain helps auto-migrate CUDA code to the MUSA platform, lowering ecosystem migration costs.

China's Tri-Polar AI Chip Landscape

Dimension	Huawei Ascend	Cambricon	Moore Threads
Core Architecture	Da Vinci v5	MLUv07	MUSA 4th Gen
Process	N+2 domestic	5nm	6nm
FP8 Compute	~1 PFLOPS	2 PFLOPS	0.5 PFLOPS (estimated)
HBM Self-Sufficiency	✅ Self-developed HiBL/HiZQ	❌ Purchased	❌ Purchased
Ecosystem	CANN + MindSpore	NeuWare + MindSpore	MUSA + MUSIFY
Advantage	Full-chain domestic	Highest FP8 compute	Full-function + CUDA migration
2025 Revenue	(Huawei internal)	¥7.2B	¥2.2B

Global Market Comparison (Q2 2026 Update)

Tier	Vendor	Flagship Chip	FP8/PFLOPS	HBM	Mass Production
Tier 1	NVIDIA	Rubin R200	25 PF (sparse)	288GB HBM4	2026 H2
Tier 2	AMD	MI400	20 PF (dense)	432GB HBM4	2026
	Huawei	Ascend 950DT	1 PF (dense)	144GB self-developed HBM	2026 Q1
	Cambricon	MLU690	2 PF (dense)	192GB HBM3E	2026 H1
	AWS	Trainium 3	5.7 PF (dense)	144GB HBM	2025 Q4 GA
Tier 3	Intel	Gaudi 3	1.8 PF	128GB HBM2e	In production
	Google	TPU v7	4.6 PF(TFLOPS)	192GB HBM	2025
	Moore Threads	MTT S5000	1 PF	80GB GDDR6X	2025 Q1

Note: NVIDIA uses sparse compute as standard, while AMD / Huawei / Cambricon use dense — not directly comparable.

Outlook for H2 2026

NVIDIA Rubin R200: Official shipment in H2 2026, 288GB HBM4, 6-chip CoWoS-L packaging
Huawei Ascend 960: Roadmap H2 2027, expected FP8 compute doubled to 2 PFLOPS
Cambricon MLU790: Expected 2027, 3nm, 384GB HBM4, 2.5 PFLOPS
Moore Threads: Next-gen GPU expected with HBM3, 2× MTT S5000 compute

By 2026, China's AI chip industry has formed a complete product matrix from Training (Cambricon MLU690 / Ascend 950DT) → Inference (Ascend 950PR / Moore Threads S5000) → Systems (CloudMatrix / Distributed Clusters).

This article is based on public information from Huawei Connect 2025 (2025-09-18), industry analysis reports from April 2026, and the latest market data as of June 2026.

NVIDIA Launches RTX Spark: AI Compute Enters the Personal Computer Era

June 1, 2026 · 3 min read

AI Compute Cards Wiki Editorial

Industry Research Team

June 1, 2026, Taipei — During the Computex 2026 opening keynote, NVIDIA CEO Jensen Huang officially unveiled the RTX Spark super chip, marking NVIDIA's formal entry into the personal computer processor market dominated by Intel, AMD, Qualcomm, and Apple.

RTX Spark: The "Heart" of the Personal AI Computer

RTX Spark was developed in collaboration between NVIDIA and MediaTek, featuring a heterogeneous package with a 20-core Grace CPU + Blackwell RTX GPU, equipped with 6144 CUDA cores. AI compute reaches 1 PFLOPS (one quadrillion floating-point operations per second), meaning personal computers now possess computing power comparable to a datacenter-class H100 GPU for the first time.

Specification	RTX Spark
CPU	20-core Grace (MediaTek collaboration, Arm architecture)
GPU	Blackwell RTX (6144 CUDA cores)
AI Compute	1 PFLOPS
Target	Personal AI Agent, local LLM inference
Launch OEMs	ASUS, Dell, HP, Lenovo, Microsoft Surface, MSI
Availability	Fall 2026
Form Factor	Laptop SoC + compact desktop workstation

Jensen Huang's "Full-Stack AI" Strategy

The launch of RTX Spark is a key step in NVIDIA's "full-stack AI" strategy. Jensen Huang stated during the keynote: "AI should not only run in the cloud. Everyone's computer should have the ability to run AI agents."

RTX Spark transforms NVIDIA from a datacenter GPU monopolist into a full competitor in the personal computing market. Following the announcement, shares of AMD, Intel, and Qualcomm fell accordingly.

Market Impact

Intel: Personal computer AI processor business faces direct threat
AMD: Ryzen AI series must compete at the same level
Qualcomm: Snapdragon X Elite's Copilot+ PC positioning challenged
Apple: M-series chips are no longer the only high-performance AI PC option

Vera Rubin Platform Enters Full Mass Production

During the same keynote, Jensen Huang also announced that the NVIDIA Vera Rubin platform has entered full mass production. Rubin R200 features a 6-chip CoWoS-L package (1× Vera CPU + 2× Rubin GPU die + I/O/HBM die), equipped with 288GB HBM4, 22 TB/s bandwidth, and 50 PFLOPS FP4 compute (sparse).

The Rubin NVL72 rack (72 Rubin GPUs + 36 Vera CPUs) will begin shipping in H2 2026.

Other Highlights from Computex 2026

AMD: Showcased the MI350 series (192GB HBM3e, 5 PFLOPS FP8 dense), officially launching in June
Intel: Jaguar Shores publicly unveiled for the first time
Qualcomm: AI 200 / 300 series inference card roadmap updated
Domestic AI Chip Zone: Huawei, Cambricon, Moore Threads, and others showcased their latest products

Industry Significance

The launch of RTX Spark means AI compute is no longer confined to datacenters. Individual developers, designers, and researchers will be able to run large model tasks locally that previously required cloud GPUs, potentially redefining the market landscape for personal AI computing.

The mass production of Vera Rubin further consolidates NVIDIA's absolute leadership in datacenter AI training. Together, both product lines form NVIDIA's full-stack AI computing landscape of "cloud training + personal inference."

This report is based on official NVIDIA announcements from Computex 2026 / GTC Taipei on June 1, 2026.

一、核心技术规格​

1.1 芯片架构与制程​

1.2 算力性能​

1.3 内存与互联​

1.4 功耗与能效​

二、关键部署案例​

2.1 CloudMatrix 384超节点​

2.2 DeepSeek-V4-Pro全参数后训练​

2.3 商业化部署案例​

案例1：某省大数据中心（300 P FLOPS算力中心）​

案例2：华为云AI训练平台​

案例3：科大讯飞智慧教育​

三、性能对比分析​

3.1 与NVIDIA H100对比​

3.2 与Ascend 910B对比​

3.3 推理性能（DeepSeek模型实测）​

四、市场定位与竞争优势​

4.1 目标市场​

4.2 竞争优势​

4.3 竞争劣势与改进方向​

五、2026年出货计划与市场预测​

5.1 出货计划​

5.2 市场预测​

六、总结与展望​

6.1 核心结论​

6.2 未来展望​

参考资料​

① NVIDIA DGX Station for Windows: A Desktop AI Supercomputer​

Core Specifications​

② Intel Crescent Island: Inference-Specialized AI GPU​

③ Intel Xeon 6+ (Clearwater Forest): First Intel 18A Datacenter CPU​

④ NVIDIA RTX Spark Ecosystem Takes Shape​

⑤ Intel × Foxconn AI Infrastructure Partnership​

⑥ Domestic AI Chip Developments​

This Week's Compute Roundup​

Ascend 950 Series: A Historic Breakthrough with Self-Developed HBM​

950PR (Prefill Inference Specialized)​

950DT (Decode + Training Specialized)​

Historical Significance​

Cambricon MLU690: China's Only Native FP8 Support​

Moore Threads MTT S5000: From Graphics to Training-Inference Unified​

China's Tri-Polar AI Chip Landscape​

Global Market Comparison (Q2 2026 Update)​

Outlook for H2 2026​

RTX Spark: The "Heart" of the Personal AI Computer​

Jensen Huang's "Full-Stack AI" Strategy​

Market Impact​

Vera Rubin Platform Enters Full Mass Production​

Other Highlights from Computex 2026​

Industry Significance​

一、核心技术规格

1.1 芯片架构与制程

1.2 算力性能

1.3 内存与互联

1.4 功耗与能效

二、关键部署案例

2.1 CloudMatrix 384超节点

2.2 DeepSeek-V4-Pro全参数后训练

2.3 商业化部署案例

案例1：某省大数据中心（300 P FLOPS算力中心）

案例2：华为云AI训练平台

案例3：科大讯飞智慧教育

三、性能对比分析

3.1 与NVIDIA H100对比

3.2 与Ascend 910B对比

3.3 推理性能（DeepSeek模型实测）

四、市场定位与竞争优势

4.1 目标市场

4.2 竞争优势

4.3 竞争劣势与改进方向

五、2026年出货计划与市场预测

5.1 出货计划

5.2 市场预测

六、总结与展望

6.1 核心结论

6.2 未来展望

参考资料

① NVIDIA DGX Station for Windows: A Desktop AI Supercomputer

Core Specifications

② Intel Crescent Island: Inference-Specialized AI GPU

③ Intel Xeon 6+ (Clearwater Forest): First Intel 18A Datacenter CPU

④ NVIDIA RTX Spark Ecosystem Takes Shape

⑤ Intel × Foxconn AI Infrastructure Partnership

⑥ Domestic AI Chip Developments

This Week's Compute Roundup

Ascend 950 Series: A Historic Breakthrough with Self-Developed HBM

950PR (Prefill Inference Specialized)

950DT (Decode + Training Specialized)

Historical Significance

Cambricon MLU690: China's Only Native FP8 Support

Moore Threads MTT S5000: From Graphics to Training-Inference Unified

China's Tri-Polar AI Chip Landscape

Global Market Comparison (Q2 2026 Update)

Outlook for H2 2026

RTX Spark: The "Heart" of the Personal AI Computer

Jensen Huang's "Full-Stack AI" Strategy

Market Impact

Vera Rubin Platform Enters Full Mass Production

Other Highlights from Computex 2026

Industry Significance