Increasing Momentum-Like Factors:A Method for Reducing Training Errors on Multiple GPUs

来源 :清华大学学报自然科学版(英文版) | 被引量 : 0次 | 上传用户:dd1246
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training errors.In this work,we investigate the occurrence of training errors in theory and train ResNet-50 on CIFAR-10 by using Stochastic Gradient Descent (SGD) and Adaptive moment estimation(Adam) while keeping the total batch size in the parameter server constant and lowering the batch size on each Graphics Processing Unit (GPU).A new method that considers momentum to eliminate training errors in distributed training is proposed.We define a Momentum-like Factor (MF) to represent the influence of former gradients on parameter updates in each iteration.Then,we modify the MF values and conduct experiments to explore how different MF values influence the training performance based on SGD,Adam,and Nesterov accelerated gradient.Experimental results reveal that increasing MFs is a reliable method for reducing training errors in distributed training.The analysis of convergent conditions in distributed training with consideration of a large batch size and multiple GPUs is presented in this paper.
其他文献
2021年7月18—21日,一场不期而遇的特大暴雨肆虐中原大地,河南省郑州市发生严重洪涝灾难.暴雨灾情给受灾地区群众造成重大人员伤亡和财产损失,灾情留给人们太多的沉思.
期刊
基于参与CMIP6高分辨率模式比较计划(HighResMIP)9个模式组的18个全球气候模式模拟数据,通过与CN05.1观测资料的对比,评估了不同分辨率气候模式对中国区域1961—2014年降水特征的模拟能力.结果表明:低、高分辨率模式均能模拟出中国区域多年平均降水的总体空间分布特征,以及降水冬弱夏强的季节变化特征,但对降水的模拟都存在系统性偏多的误差;与低分辨率模式结果相比,高分辨率模式对降水空间分布的模拟有明显改善,在青藏高原、华北、华南地区降水模拟的系统性偏差明显减小;与低分辨率模式结果相比,高分辨
Software Defect Prediction (SDP) technology is an effective tool for improving software system quality that has attracted much attention in recent years.However,the prediction of cross-project data remains a challenge for the traditional SDP method due to
With the increasing use of cloud computing,high energy consumption has become one of the major challenges in cloud data centers.Virtual Machine (VM) consolidation has been proven to be an efficient way to optimize energy consumption in data centers,and ma
When the input signal has been interfered and glitches occur,the power consumption of Double-Edge Triggered Flip-Flops (DETFFs) will significantly increase.To effectively reduce the power consumption,this paper presents an anti-interference low-power DETF
This research discussed a deep learning method based on an improved generative adversarial network to segment the hippocampus.Different convolutional configurations were proposed to capture information obtained by a segmentation network.In addition,a gene
Event temporal relation extraction is an important part of natural language processing.Many models are being used in this task with the development of deep learning.However,most of the existing methods cannot accurately obtain the degree of association be
Road pricing is an urban traffic management mechanism to reduce traffic congestion.Currently,most of the road pricing systems based on predefined charging tolls fail to consider the dynamics of urban traffic flows and travelers\' demands on the arrival
Integer overflow is a common vulnerability in Ethereum Smart Contracts (ESCs) and often causes huge economic losses.Smart contracts cannot be changed once it is deployed on the blockchain and thus demand further testing.Mutation testing is a fault-based t
Lesion detection in Computed Tomography (CT) images is a challenging task in the field of computer-aided diagnosis.An important issue is to locate the area of lesion accurately.As a branch of Convolutional Neural Networks (CNNs),3D Context-Enhanced (3DCE)