Data File Specifications

File specification for exchange of sifting / key information between two crypto clients

Summary

For the purpose of data storage and communication, several filetypes are named and refer to specific data in steps of the QKD signal flow.

Type-1 holds the detector clicks (results + basis) from the high count side. The data in this file is not compressed. Generated by chopper2.
Type-2 holds the basis of the detector clicks from the low count side. This file is sent to the high count side. Generated by chopper.
Type-3 holds the detector clicks (results + basis) on either side. They are
1. Generated by chopper. For storage before sifting by splicer on the low count side.
2. Generated by costream. For rawkeys after coincidence sifting on the high count side.
3. Generated by splicer. For rawkeys after coincidence sifting on the low count side.
Type-4 holds the basis match result for the coincidence sifting. This file is sent to the low count side. Generated by costream
Type-7 holds the final error corrected keys. Generated by errcd/ecd2

Rationale

The key generation procedure with either BB84 or entanglement based protocols is streamed data which is prepared by physical hardware. Assuming that usage time of the physical system is the most expensive element in a key generation procedure, any communication protocol should be as efficient as possible to ensure that no data is being lost or available time is wasted. Furthermore the classical communication for synchronization and sifting probably requires the largest bandwidth in communication between two partners. The following is an attempt to define file and data stream formats used to communicate detected photon events on several sides. It focuses on a hardware which uses a timestamping card to detect events, as a first method of keeping the accumulated information per click in a scenario with up to 10^6 detection events per second and a timing requirement on the order of 1 nsec small on a hardware side.

The raw file format

The structure of events as emitted by the timestamp cards is a series of 32 or 64 bit wide structures containing timing and detector event information. The selection is done by choosing to pass the option -f to readevents7.c [source]. If the option is passed, ‘fastmode’ is enabled and the timestamp output is a 32 bit unsigned integer. If the option is not passed, the timestamp outputs 64 bit unsigned integers instead.

uint64_t rawevent;

/* If fastmode */
rawevent = (newevent & 0x3f) | 0x300 |
           ((newevent & ~0x3f)| (((uint64_t)rolloveroffset)<<32))<<4;

/* No fastmode */
rawevent = rbbuffer[i] | (((uint64_t)rbbuffer[i+1])<<32);

For the 32 bit output, the timing data is contained in the most significant 26 bits in multiples of 1/256 nsec. The detector status is contained in the least significant four bits.

Timing Info	Selftest flag	Dummy flag	Detector Info
26 bits	1 bit	1 bit	4 bits

For the 64 bit output, the timing data is contained in the most significant 54 bits in multiples of 1/256 nsec. The detector status is contained in the least significant four bits.

Timing Info	Fixed pattern (0b1101)	Selftest flag	Dummy flag	Detector Info
54 bits	4 bits	1 bit	1 bit	4 bits

At the time of writing, this format only works with four possible detectors. For future use, a combination of up to eight detectors might be considered, contained in the lower 8 bits of the unsigned integer. (?)

The origin of the time may be varying according to the timestamp save procedure: one option is to define the start of the acquisition as the origin of time, another option adds the time of the day (microseconds since start of the day - please check with reading program??) such that two computers can compare timestamps with a raw accuracy supplied by their local clocks.

All subsequent data will be partitioned, transferred and processed according to epoch. An epoch is defined by all events with the same most significant 17 bits of raw timing information. That will correspond to a time span of 2^(49-17)*1/8 nsec = 2^29 nsec approx 537 msec. For the numbering of an epoch there are two possibilities: the first one takes the epoch just as the most significant 17 bits of the timing event. An epoch is a unsigned 32bit wide integer type, and the least significant bit of that integer contains the least significant bit of the epoch. In a second version, the epoch data from the raw timestamp data format is corrected by the Unix definition of the origin of time, such that the extended timestamp information (exceeding the original 49 bits of timing info) is referring to the Unix time origin definition of midnight January 1, 1970. The correction is carried out and the epoch contains the least significant 32 bits of that time in multiples of 2^29 nsec. This will ensure no rollover during 73 years or before the year 2043. In the following, the second definition of epoch will be referred to as extended epoch.

Epochs are supposed to be used as filenames or reference handles for any set of data in subsequent processing. For the different communication and storage jobs within a key establishing sequence, several tagged data file or stream packet formats are specified. Each packet of file contains an identifier describing the type of data file, the corresponding epoch and possibly some length and encoding information. To ensure that subsequent packets can be separated from a continuous stream even without special external separation indicators (like EOF marks or closures of a socket), each packet should have a content-based termination criterium. That can either be a number of entries information, or a specific codeword to terminate the current packet.

The following file types seem to be useful:

packaged raw data file
The intent of this format is for local storage purposes with minimal processing overhead and the ability to partition raw data into epochized packets. format (as saved in binary version on a x86 architecture; ints and unsigned ints refer to 32 bit wide integers):
header format: struct header_1 { int tag; unsigned int epoc; unsigned int length; int bitsperentry; int basebits; }
<tag> is either 1 for simple raw data or 0x101 for raw data with an extended epoch. <length> is optional and counts the number of 64 bit entries to follow. If length is zero, the packet termination has to be determined by an event with all 64 bits set to zero (such an event is unlikely to occur). The <bitsperentry> is fixed to a value of 49. The <basebits> entry contains the number of bits encoding detector clicks and coincidences and depends on the timestamp card used. The native format of the event reader program is used, and contains currently 4 for four detectors.

The header is followed by 64 bit entries corresponding to the rawevent type definition. A rawevent entry with the value {0,0} indicates the end of a packet. If there is a conflict of termination precedence between the <length> entry and the termination code, the file is considered inconsistent. A file always has to have a termination pair {0,0} at the end; this is not counted towards the entries.
Timing info and base choice file for initial sifting
This packet definition is supposed to be used for communication of coincidences and matching base recognition in the first sifting procedure. It should contain the timing information in a compressed manner; compression is based on the idea that timing events are incremental, and only the difference of time information is transmitted. The algorithm should be implemented with little computational overhead and is hopefully reasonably efficient (less than 20% excess size compared to the informational optimum for this stream).
header format: struct header_2 { int tag; unsigned int epoc; unsigned int length; int timeorder; int basebits; int protocol; }
The <tag> entry is either 0x2 for local epoch or 0x102 for an extended epoch definition. The <length> entry is optional and counts the number of events encoded in the whole stream. The <timeorder> entry contains the number of bits used for basic time-difference encoding in the data section. The <basebits> entry gives the number of basis bits transmitted in this stream and is 1 for BB84 type protocols. protocol contains info about the protocol used in the compressor. Currently supported are:
1. service protocol. both type-2 stream and type-3 stream contain the raw detector information.
2. BB84 standard protocol. The type-2 stream contains one bit of basis information, the type-3 stream one bit of value information. The detector sequence is hard coded in the header. Header format: (LSB) V,-,H,+ (MSB).
3. rich BB84. As before, but two bits are transmitted. if the msb is 0, the lsb has BB84 meaning, if msb is 1, a multi- or no-coincidence event was recorded (lsb=1), or a pair coincidence was detected (lsb=0).
4. extended devindep, 3 bits (1 out of 5)
5. extended devindep, no base info is sent.
The data section contains a stream of bit-packed entries with a length of <timeorder>+<basebits> bits each, where the <timeorder> bits are the least significant bits and the base bits the most significant ones. bit packing is performed in a 32bit word wise way with the first entry aligned to the most significant bit of the 32 bit word. data is saved in 32 bit wide chunks, and the data section has a multiple length of an 32bit integer. A timing entry of 0 is interpreted as an extension word, indicating a following of a 32 bit value for time differences in case the announced <timeorder> is too small to encode a particular time difference. The combination of the 0 codeword and the 32bit hires timing info is counted as a single entry for the entry count. A codeword of 1 (in the timing information) is indicating the end of the packet. The end codeword also emits a basebit information containing 0.

In the unlikely case that the real time difference is either 0 or 1, the absolute time of this particular event is shifted by 1/4 nsec such that no time differences of 0 to 1 can appear in this transmission. This error should not lead to any significant data corruption or spoil the coincidence tracking.

restrictions: the sum of bits used for difference encoding and the sum of bits used for data encoding cannot exceed 32 bits. checks have to be made both in the compression and decompression. (TODO!!!)
local sift storage
While preparing the packets with the time difference information for a coincidence and sift check on Alice side, the detailed timing information is not strictly necessary anymore on that side. For the storage of the base/result information of a particular click until a response from bob is received, a compact file structure on Alice side is needed. This will be the information with the maximum storage requirement on Alice side; therefore, it makes sense to keep this information stored efficiently. Since the identification of the individual events in an epoch are identified by the time sequence in the transmitted file type 2 already, only the packed bit information has to be stored.
header format: struct header_3 { int tag; unsigned int epoc; unsigned int length; int bitsperentry; }
The <tag> entry is either 0x3 for local epoch or 0x103 for an extended epoch definition. The <length> entry is optional and counts the number of events encoded in the whole stream. However, if the length argument is not specified, there is a possible security hole in the packet structure in the sense that an eavesdropper could insert wrong responses and therefore force Alice to use possibly predictable entries at the end of a file, since the data itself contains no termination character. It is therefore recommended to either use the length in a mandatory way, or check the consistency of the timing response in another way. The <bitsperentry> could be either one or two bits; one is minimal if the base were not to be saved on Alice side in a simple bb84 protocol. Since this is the largest storage requirement (round-trip response time times the detection rate), it might be considered worth not storing the base.

The data section contains the bits in a packed order, with the first entry being aligned to the most significant 32 bit word; packing takes place in a 32 bit wide variable. The data section consists of an integer multiple of 32bit wide words, the possibly unused last bits in the data field are set to zero.
Coincidence/sift check response
To respond from Bob to Alice with a coincidence/base-match file for the events in one epoch, only the index of the entry in the query file (type 2) of matching events have to be returned. For a given epoch, this index increases monotonously, so again a differential encoding may be the most efficient way. For a single/pair efficiency of 20% on the generation side, and a loss of 0 dB to 30 dB, a typical index spacing will be between 5 and 5000 entries, leading to a optimal word size between 3 and 13 bit. Therefore, index submission is always more efficient than yes/no encoding for all queried events. The encoding is very similar to file format type 2.
header format: struct header_4 { int tag; unsigned int epoc; unsigned int length; int timeorder; int basebits; }
The <tag> entry is either 0x4 for local epoc or 0x104 for an extended epoc definition. The <length> entry is optional and counts the number of events encoded in the whole stream. The <timeorder> entry contains the number of bits used for basic time-difference encoding in the data section. The <basebits> entry gives the number of basis bits transmitted in this stream and is 0 for BB84 type protocols.

Data is again encoded in packed bit versions, and packing is done similarly as in file type 2. The two reserved control words 0 and 1 have the same meaning as in file type 2, therefore the first useful timing index is 2. To accommodate for the first two possible indices, the index is just increased by 2 before encoding, and has to be reduced by 2 upon decoding. The datacontent bit pattern is emitted together with every event or entry in the data section, including the termination word (even there it has no information content).
Final key storage file
This file type is used essentially by the error correction/privacy amplification scheme; since EC/PA is typically carried out on more than one epoch, the header contains information on how many epochs were concatenated to obtain this particular key sequence. Futhermore, the file contains only one bit per entry, so this information needs not to be encoded. This file format is therefore a simplification of the type 3 format.
header format: struct header_7 { int tag; unsigned int epoc; unsigned int numberofepochs; int numberofbits; }
The <tag> entry is either 0x7 for local epoc or 0x107 for an extended epoc definition. The <epoch> entry represents the first data epoch used for generating this key file; <numberofepochs> contains the number of processed epochs for this file, and represents what was passed as a parameter to the error correction deamon. Finally, <numberofbits> contains the number of bits following the header. Bits get filled in 32 bit wide words, starting from the most significant bit in each word. Thus, the length of the data section of this file is given by floor((<numberofbits>+31)/32).