①(Department of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China) ②(College of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China) ③(Department of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China)
Due to the deficiencies in prefetch distance controlling of most threaded data prefetching methods for pointer application, a prefetch distance control strategy based on the cache behavior characteristics is proposed. In this paper, the prefetch distance control model is constructed using the runtime data cache features of pointer applications to reduce cache pollution and system resources contention. By skipping loop-carried independencies data accesses, the task between main thread and helper thread is balanced and the timeliness of threaded prefetching is improved. The experimental results show that the proposed approach can optimize the performance of threaded prefetching mechanism.
Chen T F and Baer J L. A performance study of software and hardware data prefetching schemes[C]. Proceedings of 21st International Symposium on Computer Architecture, Chicago, USA, 1994: 223-232.
[2]
Saavedra R H and Daeyeon P. Improving the effectiveness of software prefetching with adaptive execution[C]. Proceedings of Conference on Parallel Architectures and Compilation Techniques, Boston, USA, 1996: 68-78.
[3]
Hur I and Lin C. Feedback mechanisms for improving probabilistic memory prefetching[C]. Proceedings of 15th International Symposium on High Performance Computer Architecture, North Carolina, USA, 2009: 443-454.
[4]
Dongkeun K, Liao S S W, Wang P H, et al.. Physical experimentation with prefetching helper threads on Intel,s hyper-threaded processors[C]. Proceedings of International Symposium on Code Generation and Optimization, California, USA, 2004: 27-38.
[5]
Lu J. Design and implementation of a lightweight runtime optimization system on modern computer architectures[D]. [Ph.D. dissertation], University of Minnesota, 2006.
[6]
Ro W W and Gaudiot J L. Speculative pre-execution assisted by compiler (SPEAR)[J]. Journal of Parallel and Distributed Computing, 2006, 66(8): 1076-1089.
[7]
Somogyi S, Wenisch T F, Ailamaki A, et al.. Spatial-temporal memory streaming[C]. Proceedings of the 36th International Symposium on Computer Architecture, Austin, USA, 2009: 69-80.
[8]
Lee J, Jung C, Lim D, et al.. Prefetching with helper threads for loosely coupled multiprocessor systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2009, 20(9): 1309-1324.
Shan Shu-chang, Hu Yu, and Li Xiao-wei. Date prefetching based last-level cache optimization for chip multiprocessors [J]. Journal of Computer-Aided Design & Computer Graphics, 2012, 24(9): 1241-1248.
Zhang Jian-xun, Gu Zhi-min, Hu Xiao-han, et al.. Multi-core helper thread prefetching forirregular data intensive applications[J]. Journal on Communications, 2014, 35(8): 137-146.
[11]
Marin G, McCurdy C, and Vetter J S. Diagnosis and optimization of application prefetching performance[C]. Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, Oregon, USA, 2013: 303-312.
[12]
Garside J and Audsley N C. Prefetching across a shared memory tree within a network-on-chip architecture[C]. Proceedings of 15th International Symposium on System-on- Chip, Melbourne, Australia, 2013: 1-4.
[13]
Jain A and Lin C. Linearizing irregular memory accesses for improved correlated prefetching[C]. Proceedings of the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO), Davis, USA, 2013: 247-259.
[14]
Zhao Y, Yoshigoe K J, and Xie M J. Pre-execution data prefetching with I/O scheduling[J]. The Journal of Supercomputing, 2014, 68(2): 733-752.
Wu Xu-min, Yin Bao-qun, Huang Jing, et al.. A prefetching- based caching policy in streaming service systems[J]. Journal of Electronics & Information Technology, 2010, 32(10): 2440-2445.
Liu Bin, Zhao Yin-liang, Han Bo, et al.. A loop selection approach based on performance prediction for speculative multithreading[J]. Journal of Electronics & Information Technology, 2014, 36(11): 2768-2774.
[17]
Emma P G, Hartstein A, Puzak T R, et al.. Exploring the limits of prefetching[J]. IBM Journal of Research and Development, 2005, 49(1): 127-144.
[18]
Srinath S, Mutlu O, Hyesoon K, et al.. Feedback directed prefetching: improving the performance and bandwidth- efficiency of hardware prefetchers[C]. Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture, Arizona, USA, 2007: 63-74.
[19]
Doweck J. White paper: inside intel core microarchitecture and smart memory access[R]. Intel Corporation, 2006.
[20]
Hui K and Jennifer L W. To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach[C]. Proceedings the 8th International Conference on Architectural Support For Programming Languages And Operating Systems, Houston, USA, 2013: 357-368.