IP Cores for DES Encryption

EE290A Homework #2: IP Shopping
Michael Shilman


Introduction

For this I chose to research IP cores for data encryption. Originally I looked for RSA cores, but I could only find one implementation. I was able to find a small assortment of DES cores and also a variety of software implements (Java / C / Assembly ). Here I give an outline of the algorithm, describe its computational characteristics, and then give estimates of cost metrics for each implementation. I have not included links to any of the software implementations, because it seems to be illegal to distribute DES freely.

The DES Algorithm

(Courtesy http://ece.wpi.edu/infoeng/misc/student_projects/mack/index.html)

There are several variations of DES for several different applications:

These different variants of DES only vary in how they use the data encryption algorithm. They all use the same algorithm.

A computer password system is an example of DES. Often password files are created using DES. The password is the key and the user name is the plaintext. This is encrypted and compared to the password file. If it matches, the user is permitted access.

Main Algorithm

There are two permutations of the data, one at the beginning of the algorithm, and another, the inverse of the first permutation at the end of the algorithm. The body of the algorithm divides the data into two 32-bit halves. Each round there is a new left half and a new right half produced using the previous halves and a 48-bit subkey, which is calculated by a key scheduling algorithm.

The Encryption Scheme

In the Data Encryption Scheme there are 16 such rounds using 16 different subkeys. In each round a function, F, described below, is created using the right half and subkey. The left half is then XOR'ed with the output of the function F. Between each round the left half and right halves are switched.

The F Function

The F function expands the 32-bit right half to 48 bits by duplicating 16 bits. The result is XOR'ed with the 48-bit subkey. This is then passed through eight S boxes, each one of which maps six bits into four using a particular look-up table. The output of the S boxes is concatenated and permuted once to give the final output of the F function.

Computational Characteristics of DES

FIXME

IP Cores

This summary is based on datasheets downloaded from several IP vendors. In many cases, this information was not explicitly given in the datasheets, so I made estimates where possible.

Mentor Graphics Inventra DES Encryption Core
Description Synthesizable VHDL-RTL. This is the core DES encryption unit. It is used in several other cores which provide more complex and feature rich interfaces (for multi-key encryption).
Interface

Performance (Mbps) FMax = 100MHz
64-bit input blocks
16 Cycles / block + 1 Cycle load/unload
64 * 100M / 16 = 400Mbps
Power (mW/frame) ???
Cost ($/unit,area,etc.) 4000 gates
Design effort (man-months) One week to incorporate into design?
References http://www.mentorg.com/inventra/cores/catalog/prod_desc/des_core_pd.pdf

Memec/Xilinx Alliance XF-DES Core
Description VHDL RTL or Verilog.
Interface

Performance (Mbps) 100Mbps / 172Mbps (depends on FPGA)
Power (mW/frame) ???
Cost ($/unit,area,etc.) 316 CLBs / 200 IOBs (assuming all core signals are routed off chip).

Try to get this into gates by looking at Xilinx XC4013XL data sheet and noting that the Xilinx estimate for this chip (which contains 576 CLBs and 192 IOBs) is 10K-30K logic gates.

Assume no IOB's since we are doing SOC, then a very rough estimate is:
316 * (17 to 51) = 5372 to 16116 gates
Design effort (man-months) One week to incorporate into design?
References http://www.xilinx.com/products/logicore/alliance/memec/xf_des.pdf

SICAN DesignObject DES Core.
Description VHDL RTL or Verilog.
Interface

Performance (Mbps) 4Mbps / MHz clock
Power (mW/frame) ???
Cost ($/unit,area,etc.) 3000 gates + 2Kbits ROM

Approximate that ROM = DRAM and use figures from 1997 Nat'l. Tech Roadmap for Semiconductors, Table 14. Then this is ~5K gates.
Design effort (man-months) One week to incorporate into design?
References http://www.sican.de/do/do_list/crypto/des.pdf

Java DES Implemenation
Interface
public class Des implements DesEncrypt {
   public Des(DesKey key) { ... }
   public void cfb_encrypt(byte [] input, 
                           int input_start, int length,
                           byte [] output, int output_start,
                           int numbits, byte []ivec,
                           boolean encrypt) {  ... }
   ...
}
Performance (Mbps) ???
Power (mW/frame) ???
Cost ($/unit,area,etc.) 35.1Kbytes compiled with Sun JDK1.2 Javac -0
Design effort (man-months) Negligible design effort.
References http://www.cryptography.org

Portable C DES Implemenation
Interface
void
des(ks,block)
unsigned long ks[16][2];	/* Key schedule */
unsigned char block[8];		/* Data block */
{
   ...
}
Performance (Mbps) ???
Power (mW/frame) ???
Cost ($/unit,area,etc.) 35.1Kbytes compiled with Sun JDK1.2 Javac -0
Design effort (man-months) Negligible design effort.
References http://www.cryptography.org

Comments

I had trouble getting the x86 assembler working to try out the assembly code implementation. The algorithm is simple enough that the smaller/faster the implementation the better (as opposed to a more complex and/or flexible algorithm which might be better expressed in a more modular fashion to decrease design time).

As far as the IP cores are concerned, I could detect no discernable difference between the different implementations. All run at the same speed (16 clock cycles / block ). The only difference is the implementation medium; I expect that the Xilinx implementation is optimized for FPGAs, while the other two are not.

As a further experiment I would like to test the throughput of the x86 implementation; I think the Java/C implementations are not worth looking at, since as noted earlier modularity is not much of an issue here.