Tencent’s tech team has optimized DeepSeek’s open-source DeepEP communication framework,mutual eroticism boosting its performance across different network environments, according to the Chinese AI startup. Testing showed a 100% improvement on RoCE networks and a 30% gain on InfiniBand (IB), offering more efficient solutions for AI model training. On GitHub, DeepSeek acknowledged the Chinese tech giant’s contribution had led to a “huge speedup.” DeepEP is a communication library tailored for a mixture of experts (MoE) and expert parallelism (EP), supporting high-throughput, low-latency GPU kernels and low-precision computing, including FP8. Tencent’s Starlink Networking team identified two main bottlenecks: underutilized dual-port NIC bandwidth and CPU control latency. After targeted optimizations, performance doubled on RoCE and improved by 30% on IB. The enhanced framework is now fully open-source and has been successfully deployed in training Tencent’s Hunyuan large model, demonstrating strong versatility within environments built on Tencent’s Starlink and H20 servers, Chinese tech media outlet iThome reported. [iThome, in Chinese]
Related Articles
2025-06-27 07:03
93 views
Meta deletes all AI character profiles on Facebook, Insta after backlash
Meta has shut down its AI character accounts after backlash, NBC News and others have reported.While
Read More
2025-06-27 06:41
1361 views
Tech companies blast Trump for transgender rights reversal
Silicon Valley and the administration of President Donald Trump have played a kind of "will they won
Read More
2025-06-27 05:42
1680 views
Cost Per Frame: Best Value Graphics Cards in Early 2025
See our latest update: The Best Graphics CardsIt's time to cut through the BS and talk about graphic
Read More