High Performance Encryption FPGA Requirements

I studied the fifteen candidates from the Advanced Encryption Standard competition held by the National Institute of Standards and Technology. The competition winner - Rijndael is the new NIST encryption standard and will eventually replace the Data Encryption Standard. This competitition provided a unique opportunity for a case study.

AES Competition Timeline

January, 1997 NIST issued a public call for encryption algorithms that would be faster AND more secure than DES
August, 1998 From many submissions, fifteen algorithms were accepted to compete in an eight-month review period
August, 1999 Of those, six were further reviewed during the 9 month second round
October, 2000 Rijndael is selected as the winner


One of the gifts given to the winners
of the AES competition (Yikes!)

Since the new encryption standard would secure countless monetary, governmental and other sensitive data transfers, it is easy to see the desire for a fast hardware implementation of the algorithm. In addition, because all of the possible candidates were known over two years before the winner was announced, it would have been advantagous to begin the design process as early as possible.

An application-specific FPGA would be the ideal platform for this type of operation. In order to have a marketable ASIC as soon as the winner was announced, the design process would likely have to start very early - before many of the final details of the algorithms were known, much less even narrowing the field to finalists. This would add considerably to the already high cost of producing custom ICs. On the other hand, while conventional commercially-available FPGAs have proven themselves as flexible, easy to program prototyping platforms, they leave a lot to be desired in terms of their clock rate and power consumption.

Our thought was that we could produce near-ASIC performance while maintaining the flexibility for minor algorithm revisions, future algorithmic improvements, or even completely different types of encryption by utilizing two techniques. First, we would replace the normal LUT-based FPGA logic with coarse-grain functional units. Modern encryption algorithms commonly use a fairly small set of complex operations, butbuilding these functions from the tiny 4-bit Look Up Tables provided by commodity reconfigurable parts is incredibly wasteful. Instead, we consolidated these operations into a set of less-flexible but far more effiecent coarse-grain functional units. Next, we would also replace the normal island-style FPGA interconnect structure with something that more suited the purely linear flow of encryption algorithms. The interconnect structure of conventional FPGAs has been born out of the necessity of supporting a wide range of circuits. While this style of high connectivity is useful for random logic, encryption algorithms have a very simple and primarily uni-directional datapath: plaintext enters from one side, gets computed upon in word-sized chunks over and over, then ciphertext exits the other side. Our solution was to replace this overly-flexible interconnect structure with a much simplier architecture similar to that found in RaPiD

Towards both these goals, we examined all fifteen encryption algorithms and produced a set of baseline hardware requirements. Based upon these we attempted to build an application-specific FPGA from that would allow us to implement all of the fifteen inital AES candidates. Unfortunately this research has ended in a dead-end. Although we exposed some interesting aspects of application-specialized architectures, we basically failed to show that our system was more efficient than commodity FPGAs. Why? First, Xilinx and Altera both have legions of circuit designers working to perfect their devices. This project had one lonely engineer - me.

Beyond that, we found that while our coarse-grained functional units worked very well for the majority of operations, we really lost a lot of performance trying to emulate the small amount of non-native functions that these ciphers used. This would argue to maintain at least a portion of the device for LUT-based logic. Put another way, commodity FPGAs began including larger monolithic blocks such as embedded multipliers and block memory during their last few device generations. It makes sense for this trend to continue and FPGA designers should look towards incorporating some other types of specialized functional units. Another failure point was the interconnect structure that we used. Although encryption algorithms have a very uni-directional data flow, our specialized functional units were spread throughout the device requiring quite a bit of back and forth communication. Again, this leads us to the same conclusion in which we should embed specialized functional units within a more flexible interconnect structure. Essentially this leads us to believe that future FPGA devices will evolve to have relatively more complex logic and less flexible interconnect structure.


Contact me at eguro@eguro.com