Still to be updated.

Performance testing - How it works

Goal

Measure time (ms) of selected operation running on card (e.g., AESKey.setKey()). Problem: Selected operation cannot be measured directly (e.g., elapsed = timeEnd - timeStart) because of missing on-card timer and protected environment of smart card (no possibility to time start and end of operation inside surrounding code). As a result, one can only measure overall time between send of input data that will trigger operation (PC/SC SCardTransmit()) on host system.

General principles

Operations are performed over RAM-allocated arrays (unless specifically targeted to EEPROM)
Time measured on PC side - therefore communication costs also included. Can be mitigated either by enough repetitions of measured operation (so communication cost is negligible) or two runs with separate number of repetitions are used and subtracted for time computations
Allocation of new instance of measured algorithm is NOT included in measurement (as good practice is to pre-allocate all objects during applet's install time)

What is measured

Time necessary to execute separate method on given length of data (e.g., AESKey.setKey()) [SINGLE_METHOD]
Time necessary to perform whole logical cryptographic operation on-card (e.g., transmit X bytes of data to card, encrypt it by AES and transmit back result) [FULL_OP].
- Test scenario 1: (short packets - 5 to 32B?, key unchanged)
- Test scenario 2: (longer packets - 128 to 256B?, key unchanged)
- Test scenario 3: Test scenario 1 + key changed every call
- Test scenario 4: Test scenario 2 + key changed every call
- Test scenario 5: Custom software implementation of AES / SHA-2 (test of bytecode algorithmic performance)
Some operations can be measured without additional preparation of target object (e.g., AESKey.setKey()). Target method call is then iterated on-card according to TEST_SETTINGS.numRepeatWholeOperation to suppress of significance of overhead (apdu transmission etc.) on measurement
Some operations requires some pre-processing (e.g., Cipher.update() requires init() with proper key etc.). This preparation is done in separate command and is not included into measurement.

List of tested properties

Card information - Version of JavaCard, Answer To Reset (ATR), Card Production Life Cycle (CPLC)
Size of memory - Persistent (MEMORY_TYPE_PERSISTENT), Transient (MEMORY_TYPE_TRANSIENT_RESET, MEMORY_TYPE_TRAN-SIENT_DESELECT)
Classes - AESKey, Checksum, Cipher, DESKey, DSAKey, DSAPrivateKey, DSAPublicKey, ECKey, ECPrivateKey, ECPublicKey, HMACKey, KeyPair, KoreanSEEDKey, MessageDigest, RandomData, RSAPrivateCrtKey, RSAPrivateKey, RSAPublicKey, Signature, Util
Software reimplementation - AES, XOR

Performance testing methodology

This measurement then includes time to:

send input to card reader (PC/SC stack)
transmit input data (T=1/T=0)
dispatch command and select on-card method (process())
execute code predeceasing target operation
execute target operation
execute trailing code after target operation
transmit response (dataOut, status word, (T=1/T=0))
receive response (PC/SC stack)

Additionally, target operation usually takes only small fraction of measurement with majority taken by operations we like to exclude from measurement (~one ms vs. ~tens of ms). Situation is additionally worsen by possible non-determistic time fluctuations on a host side.

Note: Measurement fluctuations on host side can be mitigated if simpler host architecture is available (e.g., microcontroller-based card reader).

Note: Very precise measurement of elapsed time can be obtained from a power trace, if start and end of selected operation can be identified. Such a measurement requires access to setup with osciloscope and significant time for identification of target operation. We verified selected operation using this method.

Note: Some operations cannot be meaningfully measured without additional operation executed together (e.g., setKey before target operation Cipher.init). We measure both operations together and then substract time for additional operation.

Note: We intentionally did not exclude outlayer measurements as it may contain interesting information regarding non-deterministic behaviour of a card

Measurement procedure:

start/stop time measured on host
repetitions on host with same data (5x, NUM_REPEAT_WHOLE_MEASUREMENT, 10x NUM_REPEAT_WHOLE_MEASUREMENT_KEYPAIRGEN)
default length of data for on-card operation (256B)
length of variable data for on-card operation (16-512B)
repetitions of target operation on card (fixed length => 50x NUM_REPEAT_WHOLE_OPERATION, variable length => 5x (to keep overall running time reasonable))
perftest_measure method (outer):
- check if not already measured before
- try perftest_measure - catch exception, try again, then ask for user intervention (physical remove of card, reupload of applet)
perftest_measure method (innner):
- for every repetitions of measurement prepare fresh set of objects on card and reset it (prepare_class_XXX, APDU, not measured)
- Measure processing time without actually calling measured operation (achieved by setting 0 to number of repetitions; testSet.numRepeatWholeOperation set to 0) - repeat 5x NUM_BASELINE_CALIBRATION_RUNS
  - ResetApplet (APDU, not measured)
  - PerfTestCommand
  - => baseline avg time
- Measure target operation (testSet.numRepeatWholeOperation = 50x fro fixed data or 5x for variable data):
  - ResetApplet (APDU, not measured)
  - PerfTestCommand => time
  - measurement time = time - baseline avg time
  - => min, max, avg measurement time
prepare_class_XXX (on-card)
- allocate new objects required for testing target operation
- erase or set RAM/EEPROM arrays into default values
perftest_class_XXX (on-card)
- receive apdu data
- parse incoming settings (num_repeats, data length...)
- switch (algorithm type)
- initializations done only once before for(num_repeats)
- for(num_repeats)
  - target operation
  - (optional) alternate between two different objects to prevent (too quick) use of existing one when card actually decides not to execute operation as the object is still same (e.g., Cipher.init())
- end, send apdu out

Listed segments show all parts of performance test execution. Part 5 executes target operation we want to measure (green). Since we are not able to perform this execution directly, by subtracting the time of execution of all operations in the left column from all operations in the right column, we gain quite an accurate operation run time. The subtracted time consists of host PC processing (yellow), data transmission between host PC and a card (yellow-blue) and smart card processing (blue).