浏览全部资源
扫码关注微信
1. 郑州大学计算机与人工智能学院,河南 郑州 450001
2. 信息工程大学数学工程与先进计算国家重点实验室,河南 郑州 450001
[ "李斌(1986-),男,河南郑州人,博士,郑州大学讲师,主要研究方向为信息安全、可重构计算" ]
[ "陈晓杰(1993-),男,河南武陟人,信息工程大学博士生,主要研究方向为信息安全、可重构计算" ]
[ "冯峰(1990-),男,河南新乡人,郑州大学博士生,主要研究方向为信息安全" ]
[ "周清雷(1962-),男,河南新乡人,博士,郑州大学教授,主要研究方向为信息安全、自动机理论和计算复杂性理论" ]
网络出版日期:2022-02,
纸质出版日期:2022-02-25
移动端阅览
李斌, 陈晓杰, 冯峰, 等. 后量子密码CRYSTALS-Kyber的FPGA多路并行优化实现[J]. 通信学报, 2022,43(2):196-207.
Bin LI, Xiaojie CHEN, Feng FENG, et al. FPGA multi-unit parallel optimization and implementation of post-quantum cryptography CRYSTALS-Kyber[J]. Journal on communications, 2022, 43(2): 196-207.
李斌, 陈晓杰, 冯峰, 等. 后量子密码CRYSTALS-Kyber的FPGA多路并行优化实现[J]. 通信学报, 2022,43(2):196-207. DOI: 10.11959/j.issn.1000-436x.2022026.
Bin LI, Xiaojie CHEN, Feng FENG, et al. FPGA multi-unit parallel optimization and implementation of post-quantum cryptography CRYSTALS-Kyber[J]. Journal on communications, 2022, 43(2): 196-207. DOI: 10.11959/j.issn.1000-436x.2022026.
在基于格的后量子密码中,多项式乘法运算复杂且耗时,为提高格密码在实际应用中的运算效率,提出了一种后量子密码CRYSTALS-Kyber的FPGA多路并行优化实现。首先,描述了Kyber算法的流程,分析了NTT、INTT及CWM的执行情况。其次,给出了FPGA的整体结构,采用流水线技术设计了蝶形运算单元,并以Barrett模约简和CWM调度优化,提高了计算效率。同时,放置32个蝶形运算单元并行执行,缩短了整体计算周期。最后,对多RAM通道进行了存储优化,以数据的交替存取控制和RAM资源复用,提高了访存效率。此外,采用松耦合架构,以DMA通信实现了整体运算的调度。实验结果和分析表明,所提方案可在44、49、163个时钟周期内完成NTT、INTT及CWM运算,优于其他方案,具有较高的能效比。
In lattice-based post-quantum cryptography
polynomial multiplication is complicated and time-consuming.In order to improve the computational efficiency of lattice cryptography in practical applications
an FPGA multi-unit parallel optimization and implementation of post-quantum cryptography CRYSTALS-Kyber was proposed.Firstly
the flow of Kyber algorithm was described and the execution of NTT
INTT and CWM were analyzed.Secondly
the overall structure of FPGA was given
the butterfly arithmetic unit was designed by pipeline technology
and the Barrett modulus reduction and CWM scheduling optimization were used to improve the calculation efficiency.At the same time
32 butterfly arithmetic units were executed in parallel
which shortens the overall calculation cycle.Finally
the multi-RAM channel was optimized to improve the memory access efficiency with alternate data access control and RAM resource reuse.In addition
with the loosely coupled architecture
the overall operation scheduling was realized by DMA communication.The experimental results and analysis show that the proposed scheme implemented can complete NTT
INTT and CWM operations within 44
49
and 163 clock cycles
which is superior to other schemes and has high energy efficiency ratio.
DANG V , FARAHMAND F , ANDRZEJCZAK M , et al . Implementation and benchmarking of round 2 candidates in the NIST post-quantum cryptography standardization process using hardware and software/hardware co-design approaches [J ] . IACR Cryptol EPrint Arch,2020 , 2020 :795.
AVANZI R , BOS J , DUCAS L , et al . CRYSTALS-Kyber [R ] . 2017 .
LYUBASHEVSKY V , SEILER G . NTTRU:truly fast NTRU using NTT [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2019 , 2019 ( 3 ): 180 - 201 .
ZHANG N , QIN Q , YUAN H , et al . NTTU:an area-efficient low-power NTT-uncoupled architecture for NTT-based multiplication [J ] . IEEE Transactions on Computers , 2020 , 69 ( 4 ): 520 - 533 .
YAMAN F , MERT A C , ÖZTÜRK E , et al . A hardware accelerator for polynomial multiplication operation of CRYSTALS-Kyber PQC scheme [C ] // Proceedings of 2021 Design,Automation & Test in Europe Conference & Exhibition (DATE) . Piscataway:IEEE Press , 2021 : 1020 - 1025 .
HUANG Y M , HUANG M Q , LEI Z K , et al . A pure hardware implementation of CRYSTALS-KYBER PQC algorithm through resource reuse [J ] . IEICE Electronics Express , 2020 , 17 ( 17 ): 1 - 6 .
MERT A C , KARABULUT E , OZTURK E , et al . An extensive study of flexible design methods for the number theoretic transform [J ] . IEEE Transactions on Computers , 2020 :doi.org/10.1109/TC.2020.3017930.
MERT A C , ÖZTÜRK E , SAVAŞ E . Design and implementation of a fast and scalable NTT-based polynomial multiplier architecture [C ] // Proceedings of 2019 22nd Euromicro Conference on Digital System Design (DSD) . Piscataway:IEEE Press , 2019 : 253 - 260 .
XING Y F , LI S G . A compact hardware implementation of CCA-secure key exchange mechanism CRYSTALS-Kyber on FPGA [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2021 , 2021 ( 2 ): 328 - 356 .
RICCI S , JEDLICKA P , CIBIK P , et al . Towards CRYSTALS-Kyber VHDL implementation [C ] // Proceedings of the 18th International Conference on Security and Cryptography.[S.l . ]:Science and Technology Publications , 2021 : 760 - 765 .
RICCI S , MALINA L , JEDLICKA P , et al . Implementing CRYSTALS-dilithium signature scheme on FPGAs [C ] // Proceedings of 16th International Conference on Availability,Reliability and Security . New York:ACM Press , 2021 : 1 - 11 .
CHEN Z H , MA Y , CHEN T Y , et al . Towards efficient kyber on FPGAs:a processor for vector of polynomials [C ] // Proceedings of 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) . Piscataway:IEEE Press , 2020 : 247 - 252 .
SEILER G . Faster AVX2 optimized NTT multiplication for ring-LWE lattice cryptography [J ] . IACR Cryptology EPrint Archive,2018 , 2018 :39.
ZIJLSTRA T , BIGOU K , TISSERAND A . Lattice-based cryptosystems on FPGA:parallelization and comparison using HLS [J ] . IEEE Transactions on Computers , 2021 :doi.org/10.1109/TC.2021.3112052.
BASU K , SONI D , NABEEL M , et al . NIST post-quantum cryptography-a hardware evaluation study [R ] . 2019 .
AGRAWAL R , BU L K , EHRET A , et al . Open-source FPGA implementation of post-quantum cryptographic hardware primitives [C ] // Proceedings of 2019 29th International Conference on Field Programmable Logic and Applications (FPL) . Piscataway:IEEE Press , 2019 : 211 - 217 .
BISHEH-NIASAR M , AZARDERAKHSH R , MOZAFFARI-KERMANI M . High-speed NTT-based polynomial multiplication accelerator for post-quantum cryptography [C ] // Proceedings of 2021 IEEE 28th Symposium on Computer Arithmetic (ARITH) . Piscataway:IEEE Press , 2021 : 94 - 101 .
FRITZMANN T , SIGL G , SEPÚLVEDA J . RISQ-V:tightly coupled RISC-V accelerators for post-quantum cryptography [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2020 : 239 - 280 .
陈朝晖 , 马原 , 荆继武 . 格密码关键运算模块的硬件实现优化与评估 [J ] . 北京大学学报(自然科学版) , 2021 , 57 ( 4 ): 595 - 604 .
CHEN Z H , MA Y , JING J W . Hardware optimization and evaluation for crucial modules of lattice-based cryptography [J ] . Acta Scientiarum Naturalium Universitatis Pekinensis , 2021 , 57 ( 4 ): 595 - 604 .
刘冬生 , 赵文定 , 刘子龙 , 等 . 应用于格密码的可重构多通道数论变换硬件设计 [J ] . 电子与信息学报 , 2021 :doi.org/10.11999/ JEIT210114.
LIU D S , ZHAO W D , LIU Z L , et al . Reconfigurable hardware design of multi-lanes number theoretic transform for lattice-based cryptography [J ] . Journal of Electronics & Information Technology , 2021 :doi.org/10.11999/ JEIT210114.
华斯亮 , 张惠国 , 王书昶 . 用于全同态加密的数论变换乘法蝶形运算优化及实现 [J ] . 电子与信息学报 , 2021 , 43 ( 5 ): 1381 - 1388 .
HUA S L , ZHANG H G , WANG S C . Optimization and implementation of number theoretical transform multiplier butterfly operation for fully homomorphic encryption [J ] . Journal of Electronics & Information Technology , 2021 , 43 ( 5 ): 1381 - 1388 .
沈诗羽 , 何峰 , 赵运磊 . Aigis密钥封装算法多平台高效实现与优化 [J ] . 计算机研究与发展 , 2021 , 58 ( 10 ): 2238 - 2252 .
SHEN S Y , HE F , ZHAO Y L . Multi-platform efficient implementation and optimization of aigis-enc algorithm [J ] . Journal of Computer Research and Development , 2021 , 58 ( 10 ): 2238 - 2252 .
ZHANG N , YANG B H , CHEN C , et al . Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2020 , 2020 ( 2 ): 49 - 72 .
0
浏览量
849
下载量
4
CSCD
关联资源
相关文章
相关作者
相关机构