论文部分内容阅读
In the field of supercomputing, one key issue for scalable sharedmemory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good directory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity. However, researchers often face the problem that gaining one attribute may result in losing another. The paper proposes an elastic pointer directory(EPD) structure based on the analysis of sharedmemory applications, taking the fact that the number of sharers for each directory entry is typically small. Analysis results show that for 4 096 nodes, the ratio of memory overhead to the fullmap directory is 2.7%. Theoretical analysis and cycleaccurate executiondriven simulations on a 16 and 64node cache coherence non uniform memory access(CCNUMA) multiprocessor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be better than that of a limited pointers directory and almost identical to the fullmap directory, except for the slight implementation complexity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promising approach to be used in the stateoftheart high performance computing domain.
In the field of supercomputing, one key issue for scalable shared memory multiprocessors is the design of the directory which represents the sharing state for a cache block. A good directory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity . However, researchers often face the problem that gaining one attribute may result in losing another. The paper proposes an elastic pointer directory (EPD) structure based on the analysis of shared memory applications, taking the fact that the number of sharers for each directory entry is typically small. Analysis results show that for 4 096 nodes, the ratio of memory overhead to the fullmap directory is 2.7%. Theoretical analysis and cycle accurate execution sim simulations on a 16 and 64 node cache coherence non uniform memory access (CCNUMA) multiprocessor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be b etter than that of a limited pointers directory and almost identical to the fullmap directory, except for the slight implementation complexity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promising approach to be used in the stateoftheart high performance computing domain.