Efficient Handling of Lock Hand-off in DSM Multiprocessors with Buffering Coherence Controllers

来源 :Journal of Computer Science & Technology | 被引量 : 0次 | 上传用户：xingyongxiao

【摘要】

：

Synchronization in parallel programs is a major performance bottleneck in multiprocessor systems. Shared data is protected by locks and a lot of time is spent o

【作者】

：

Benjamín Sahelices Agustín de Dios Pablo Ibáez Víctor Vials-Yúfera José María Llabería

【机构】

：

Computer Science Department and HiPEAC European Network of Excellence, University of Valladolid, Val

【出处】

：

Journal of Computer Science & Technology

【发表日期】

：

2012年01期

【关键词】

：

distributed shared memory multiprocessors synchronization buffer coherence contr

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Synchronization in parallel programs is a major performance bottleneck in multiprocessor systems. Shared data is protected by locks and a lot of time is spent on the competition arising at the lock hand-off. In order to be serialized, requests to the same cache line can either be bounced (NACKed) or buffered in the coherence controller. In this paper, we focus mainly on systems whose coherence controllers buffer requests. In a lock hand-off, a burst of requests to the same line arrive at the coherence controller. During lock hand-off only the requests from the winning processor contribute to progress of the computation, since the winning processor is the only one that will advance the work. This key observation leads us to propose a hardware mechanism we call request bypassing, which allows requests from the winning processor to bypass the requests buffered in the coherence controller keeping the lock line. We present an inexpensive implementation of request bypassing that reduces the time spent on all the execution phases of a critical section (acquiring the lock, accessing shared data, and releasing the lock) and which, as a consequence, speeds up the whole parallel computation. This mechanism requires neither compiler or programmer support nor ISA or coherence protocol changes. By simulating a 32-processor system, we show that using request bypassing does not degrade but rather improves performance in three applications with low synchronization rates, while in those having a large amount of synchronization activity (the remaining four), we see reductions in execution time and in lock stall time ranging from 14% to 39% and from 52% to 71%, respectively. We compare request bypassing with a previously proposed technique called read combining and with a system that bounces requests, observing a significantly lower execution time with the bypassing scheme. Finally, we analyze the sensitivity of our results to some key hardware and software parameters. Synchronization in parallel programs is a major performance bottleneck in multiprocessor systems. Shared data is protected by locks and a lot of time is spent on the competition arising at the lock hand-off. In order to be serialized, requests to the same cache line can either be bounced (NACKed) or buffered in the coherence controller. In this paper, we focus mainly on systems whose coherence controllers buffer requests. In a lock hand-off, a burst of requests to the same line arrive at the coherence controller. During lock hand-off only the requests from the winning processor contribute to progress of the computation, since the winning processor is the only one that will advance the work. This key observation leads us to propose a hardware mechanism we call request bypassing, which allows requests from the winning processor to bypass the requests buffered in the coherence controller keeping the lock line. We present an inexpensive implementation of request bypassing that reduces the time spent on all the execution phases of acquiring a lock (see access to data, and releasing the lock) and which, as a consequence, speeds up the whole parallel computation. This mechanism requires neither compiler or programmer support nor ISA or By simulating a 32-processor system, we show that using request bypassing does not degrade but rather improves performance in three applications with low synchronization rates, while in having large amount of synchronization activity (the remaining four), we see reductions in execution time and in lock stall time ranging from 14% to 39% and from 52% to 71%, respectively. We compare request bypassing with previously proposed technique called read combining and with a system that bounces requests, observing a significantly lower, execution time with the bypassing scheme. Finally, we analyze the sensitivity of our results to some key hardware and software parameters.

其他文献

Negative effect of Ni on PtHY in n-pentane isomerization evidenced by IR and ESR studies

Ni/PtHY with different Ni loadings was prepared by impregnating HY with hexachloroplatinic acid solution and Ni2+/N,N-dimethylformamide solution.An increase in

期刊

NiNi/PtHYn-pentane isomerizationhydrogenprotonic acid sites

ON A SECOND ORDER DISSIPATIVE ODE IN HILBERT SPACE WITH AN INTEGRABLE SOURCE TERM

Asymptotic behaviour of solutions is studied for some second order equations including the model case (x)(t) +γx(t) +▽Φ(x(t)) =h(t) with γ ＞ 0 and h ∈ L1(0,

期刊

dissipative dynamical systemasymptotic behaviourgradient systemheavy ball wit

Oxidative coupling of methane over BaCl2-TiO2-SnO2 catalyst

The performance of BaCl2-TiO2-SnO2 composite catalysts in oxidative coupling of methane reaction has been investigated.A series of BaCl2-TiO2,BaCl2-SnO2,TiO2-Sn

期刊

oxidative coupling of methaneBaCl2-TiO2-SnO2 catalystsynergistic effectNa2WO4

Effects of temperature and solvent concentration on the solvent crystallization of palm-based dihydr

Palm-based dihydroxystearic acid of 69.55％ purity was produced in a 500-kg-per-batch operation pilot plant and purified through solvent crystallization in a cust

期刊

Dihydroxystearic acidPilot plant operationSolvent crystallizationSimultaneous

Three-component synthesis of amidoalkyl naphthols catalyzed by bismuth(Ⅲ) nitrate pentahydrate

Bismuth(Ⅲ) nitrate pentahydrate catalyzed the three-component condensation of β-naphthol,aldehydes and amines/urea under solvent-free conditions to afford the

期刊

Amidoalkyl naphtholsBismuth(Ⅲ) nitrateOne-pot synthesisSolvent-free condition

THE ONE-DIMENSIONAL HUGHES MODEL FOR PEDESTRIAN FLOW: RIEMANN-TYPE SOLUTIONS

为探究吕家坨井田地质构造格局,根据钻孔勘探资料,采用分形理论和趋势面分析方法,研究了井田7

期刊

pedestrian flownonlocal conservation laweikonal equation

Facile synthesis of N-(arylsulfonyl)-4-ethoxy-5-oxo-2,5-dihydro1H-pyrolle-2,3-dicarboxylates by one-

Three-component reaction of arylsulfonamides,dialkyl acetylenedicarboxylates,and ethyl chlorooxoacetate promoted by triphenylphosphine and triethylamine provide

期刊

ArylsulfonamidesDialkyl acetylenedicarboxylatesTriphenylphosphineMulti-compon

Analysis of Peanut Oil Adulterated with Other Edible Oils by Spectrophotometry

Since peanut oil(PO) is more expensive than other seed oils,some PO is adulterated with other cheap seed oils,such as soybean oil,palm olein,cottonseed oil,corn

期刊

Peanut oilAdulterationSpectrophotometryDirect analysis

Effect of rapidly depressurizing and rising temperature on methane hydrate dissociation

Two methods,rapidly depressurizing to 0.1 MPa at a constant temperature and rising temperature under equilibrium P,T conditions,were used to study the dissociat

期刊

dissociation of CH4 hydratemethod of rapid depressurizationmethod of rising te

“罗布人后裔”特定人群的生理特征调查

目的:探讨“罗布人后裔”的生理特征.方法:用方便取样方法,从新疆维吾尔族自治区尉犁县居住的罗布人后裔中抽取594人,通过现场问卷调查、体格检查和生化测定获取研究对象的人

期刊

罗布人后裔高血压血脂异常肥胖

Efficient Handling of Lock Hand-off in DSM Multiprocessors with Buffering Coherence Controllers

其他学术论文