IJE TRANSACTIONS A: Basics Vol. 31, No. 10 (October 2018) 516-525   

downloaded Downloaded: 0   viewed Viewed: 807

Jitendra V. Tembhurne and S. Sathe
( Received: January 15, 2017 – Accepted: March 08, 2018 )

Abstract    This paper aims to present and analyze a GPU based fast and effective parallel algorithm to compute Multi-Precision Integer (MPI) multiplication using Fast Fourier Transform (FFT). Multiplying long integer is a fundamental operation in public-key cryptography such as RSA, ECC, Diffe-Hellman key exchange and other advanced cryptosystem etc. The speed of these algorithms directly depends on the computation of multiplying long integers. In this work, a newly proposed algorithm adapts the model of parallel processing and design of algorithm aims to accelerate the computation of MPI multiplication for public-key algorithm on the GPU. We explore the modern parallel hardware for computing MPI multiplication using FFT, a best asymptotic time complexity of O(nlogn) algorithm. We claim that, the best use of GPU for the efficient computation to accelerate the MPI multiplication using FFT for public-key cryptosystem is demonstrated in this work.


Keywords    FFT, RSA, GPU, CUDA, multi-precision multiplication


References    References[1] Rivest, R., Shamir, A., Adleman, L.,  “A Method for Obtaining Digital Signatures and Public-Key cryptosystems”, Communications of the ACM, Vol. 21(2), (1978), 120-126.[2] Koblitz, N., Elliptic curve cryptosystems, Journal of Mathematics of Computation, Vol. 48(177), (1987), 203- 209.[3] Diffie, W., Hellman, M., New directions in cryptography, IEEE Transactions on Information Theory, Vol.  22(6), (1976), 644-654.[4] Karatsuba, A., Ofman, Yu., Multiplication of  Many-Digital Numbers by Automatic Computers,  Proceedings of the USSR Academy of Sciences, Vol. 145, 293-294, (1962). [5] Toom, L., The complexity of a scheme of functional elements realizing the multiplication of integers, DAN USSR, Vol. 150(3), (1963), 496-498 (in Russian). Eng. Transl. in Soviet Math. Dokl., Vol. 3, (1963), 714-716.[6] Cook, S.A., On minimum computation time of function, Ph.D. dissertation, Department of Mathematics, Harvard University, 1966.[7] Schönhage, A., Strassen, V., Schnelle Multiplikation grosser Zahlen, Computing, Vol. 7, (1971), 281-292. [8] Emeliyanenko, P., Efficient Multiplication of Polynomials on Graphics Hardware, APPT 2009, LNCS 5737, 134-149, (2009).[9] Liu, H.J., Tong, C., GMP Implementation on CUDA-A Backward Compatibale Design with Performance Tuning, Technical report, University of Toronto, 2011. [10]  Chen, C., Covanov,  S., Mansouri, F., Maza, M.M.,  Xie, N., Xie, Y., The Basic Polynomial Algebra Subprograms, ICMS 2014, 669-676, (2014). [11]  NVIDIA Corporation, CUDA C Programming guide, October 2012. [12]  Cooley, J. W., Tukey, J. W., An algorithm for the machine calculation of complex Fourier series, Mathematics of Computation, Vol. 19(90), (1965), 297-301. [13]  Loan, C. V., Computational Frameworks for the Fast Fourier Transform, SIAM Publications, 3600 University City Science Center, Philadelphia, PA 19104-2688, (1992). [14]  Chu, E., George, A., Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms, CRC Press, (1999). [15]  Khezri, R., Hosseini, R., Mazinani, M.,  A Fuzzy Rule-based Expert System for the Prognosis of the Risk of Development of the Breast Cancer, International Journal of Engineering-Transactions A: Basics, Vol. 27(10), (2014),1557-1564. [16]  Hassanpour, H., Rostami Ghadi, A., Image Enhancement via Reducing Impairment Effects on Image Components, International Journal of Engineering-Transactions B: Applications, Vol. 26(11), 2013, 1267-1274. [17]  Bazoobandi, H. A., Eftekhari, M., A Differential Evolution and Spatial Distribution based Local Search for Training Fuzzy Wavelet Neural Network, International Journal of Engineering-Transactions B:Applications, Vol. 27(8), 2014, 1185-1194. [18]  Takahashi, D., An algorithm for multiple-precision floating-point multiplication, Applied Mathematics and Computation, vol. 166, (2005), 291-298. [19]  Emerencia, A., Multiplying Huge Integers Using Fourier Transforms, http://www.cs.rug.nl/~ando/pdfs/Ando_Emerencia_multiplying_ huge_integers_using_fourier_transforms_paper.pdf, 2007.  [20] Liu, C.-B., Huang, C.H., Lei, C.-L., Design and Implementation of Long-Digit Karatsuba’s Multiplication Algorithm Using Tensor Product Formulation, The Ninth Workshop on Compiler Techniques for HighPerformance Computing, (2003). [21]  Furer, M.,  Faster Integer Multiplication, SIAM Journal of Computing, Vol. 39(3), (2009), 979-1005. [22]  Brent, R. P., Gaudry, P., Thom´e, E., Zimmermann, P., Faster Multiplication in GF(2)[x], ANTS-VIII 2008, LNCS 5011, 153-166, (2008). [23]  Fuguo, D., Lingling, S., Yuxin, T., YanTai, S., Multiplication Algorithm of Large Integers using Finite Discrete Convolution, Journal of Convergence Information Technology, Vol. 6(7), 2011. [24]  Jamieson,  L. H., Mueller, Jr. P. T., Siegel, H. J., FFT Algorithms for SIMD Parallel Processing Systems, Journal of Parallel and Distributed Computing, Vol. 3, 1986, 48-71. [25]  Giorgi, P., Imbert, L., Izard, T., Parallel modular multiplication on multi-core processors, 21st IEEE Symposium on Computer Arithmetic (ARITH), (2013), 135- 142, (2013). [26]  Baktir S., Sava, E., Highly-Parallel Montgomery Multiplication for Multi-core General-Purpose Microprocessors, Computer and Information Sciences III (2013), 467-476. [27]  Maza, M. M., Xie, Y., Balanced Dense Polynomial Multiplication on Multi-cores, International Conference on Parallel and Distributed Computing, Applications and Technologies, (2009), 1-9, 2009. [28]  Takahashi, D., Parallel Implementation of Multiple-Precision Arithmetic and 1,649,267,440,000 Decimal Digits of π Calculation, Parallel Computing, Vol. 36(8), 2010, 439-448. [29]  Maza, M. M., Pan, W., Fast polynomial multiplication on a GPU, Journal of Physics: Conference Series, Vol. 256(1) 2010, 012009, High Performance Computing Symposium (HPCS2010). [30]  Chmielowiec, A., Fast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients, Proceedings of the World Congress on Engineering,  2012 Vol II, WCE-2012, (2012). [31]  Giorgi, P.,  Izard, T., Tisserand, A., Comparison of Modular Arithmetic Algorithms of GPUs, Proceedings of ParCo'09: International Conference on Parallel Computing, 315-322, (2009). [32]  Zhao, K., GPUMP: A Multiple-Precision Integer Library for GPUs, IEEE 10th International Conference on Computer and Information Technology (CIT), 2010, 1164-1168, (2010). [33] Emmart, N., Weems, C. C., High Precision Integer Multiplication with a GPU using Strassen’s Algorithm with Multiple FFT sizes, Parallel Processing Letters, Vol. 21(3),  2011, 359-375. [34]  Sze, T. W., Schönhage-Strassen Algorithm with MapReduce for Multiplying Terabit Integers, Proceedings of the 2011 International Workshop on Symbolic-Numeric Computation,  54-62, (2011). [35]  Marino, F., Swartzlander, E. E., Parallel Implementation of Multidimensional Transforms without Interprocessor Communication, IEEE Transactions on Computers, Vol. 48(9), 1999, 951-961. [36] Na’mneh, R. A., Pan, D. W., Two-Step 1-D Fast Fourier Transform without Inter-Processor Communications, Proceedings of the 38th South eastern Symposium on System Theory Tennessee Technological University, 2006, 529-533, (2006).

International Journal of Engineering
E-mail: office@ije.ir
Web Site: http://www.ije.ir