Abstract

Big data is a term used for very large data sets. Digital equipment produces vast amounts of images every day; the need for image encryption is increasingly pronounced, for example, to safeguard the privacy of the patients’ medical imaging data in cloud disk. There is an obvious contradiction between the security and privacy and the widespread use of big data. Nowadays, the most important engine to provide confidentiality is encryption. However, block ciphering is not suitable for the huge data in a real-time environment because of the strong correlation among pixels and high redundancy; stream ciphering is considered a lightweight solution for ciphering high-definition images (i.e., high data volume). For a stream cipher, since the encryption algorithm is deterministic, the only thing you can do is to make the key “look random.” This article proves that the probability that the digit 1 appears in the midsection of a Zeckendorf representation is constant, which can be utilized to generate the pseudorandom numbers. Then, a novel stream cipher key generator (ZPKG) is proposed to encrypt high-definition images that need transferring. The experimental results show that the proposed stream ciphering method, with the keystream of which satisfies Golomb’s randomness postulates, is faster than RC4 and LSFR with indistinguishable performance on hardware depletion, and the method is highly key sensitive and shows good resistance against noise attacks and statistical attacks.

1. Introduction

The development of digital sensor technology and storage device leads to the rapid expansion of the digital image library, and all kinds of digital equipment produce vast amounts of images every day. Though image compression reduces the bandwidth, transferring compressed images alone is still not secure. Thus, how to effectively and securely transfer these images has become a hot research direction in recent years. A variety of encryption algorithms have been investigated to image cryptosystems. Most of them are based on permutation and diffusion architecture [1]. The permutation process alters the location of image pixels, and the diffusion process changes the pixel values so that a small change in one pixel can spread to almost all pixels in the entire image [2]. These two procedures are independent. Modern cryptography includes symmetric encryption, asymmetric encryption, and hash function, among which symmetric encryption is divided into two types: block ciphers and stream ciphers. Block ciphers such as DES, AES, and IDEA, are not suitable for practical image encryption because of intrinsic features of some images such as mass data capacity, strong correlation among pixels, and high redundancy [3].

A stream cipher is a symmetric key encryption where the crypto keys used to encrypt the binary image is randomly changed so that the cipher image produced is mathematically impossible to break. Also, each bit of data is encrypted with each bit of key. The random keys are changed so that it will not allow any pattern to be repeated, giving a clue to the cracker to break the cipher image. The advantage of using stream cipher is that the execution speed is higher when compared to block ciphers and has lower hardware complexity. Unlike block ciphers, stream cipher will not produce the same ciphertext even for repetitive blocks of plaintext, since the keys are changed constantly for every bit of plaintext. Basically, in stream ciphers, for simplicity, the manner you encrypt is by bitwise XOR, and if you intend to decrypt a ciphertext, you simply do XOR once more. The exclusive or (XOR or ) operation, which is simple to implement on hardware, gives a ray of hope for fast image encryption.

However, if multiple data are encrypted with the same key, the attacker can decrypt the data without guessing the key. For example, suppose that two strings of plaintext data, and , are encrypted using the same key, . The ciphertexts are as follows: and . Because , if XOR on both sides, then . At this point, it is clear that the attacker recovered the plaintext without obtaining the key.

Therefore, in stream ciphering, the difficulty of cracking depends on the randomness and unpredictability of the keystream. Alternatively, a keystream generated by a specified generator should at least “look random.” The motivation of this paper is to generate such pseudorandom keystreams to resist chosen plaintext attacks and statistical attacks.

This paper is organized as follows: Section 2 introduces preliminary knowledge. Section 3 reviews the related work. Section 4 elaborates on generating a pseudorandom keystream that satisfies Golomb’s randomness postulates. Section 5 proves the randomness of the keystream theoretically. Section 6 does some experiments. Finally, Section 7 draws the conclusion.

2. Preliminaries

2.1. Golomb’s Randomness Postulates

Golomb’s randomness postulates [4] defines the requisite properties to be sufficiently random looking. Those properties are given as follows: the runs of 0’s are called “gaps”; runs of 1’s are called “blocks”. (1)In a cycle, the number of 1’s differs from that of 0’s by at most 1(2)At least half the runs have length 1, at least one-fourth have length 2, at least one-eighth has length 3, and so forth. Moreover, for each of these lengths, there are (almost) equally many gaps and blocks. In other words, the number of any possible -runs is approximately equal to , where denotes the length of the keystream(3)The autocorrelation function :

2.2. Zeckendorf Representation

It is known from Zeckendorf’s theorem [5] that each nonnegative integer can be addressed as a sum of distinct Fibonacci numbers. For instance, is the sum of the 7th, 4th, and st Fibonacci numbers, viz. . Every nonnegative integer admits a representation:

with , and as usual, , , and for all . We call this a Zeckendorf representation or -addend representation of . It is convenient to write this representation as a word/sequence of length with each oscillating over the alphabet , where 1 indicates the respective Fibonacci addend appears in the sum, and 0 otherwise. Let

be the aforementioned Zeckendorf representation, for instance, ; then,

If imposing the additional requirement that consecutive 1 are not allowed (viz. ) and cannot be ‘1’ (, provided only admissible), then we obtain the canonical version of the definition [69]. Such a “canonical Zeckendorf representation” always exists and is unique [5].

3.1. Stream Ciphers and PRNG

Stream cipher is a symmetric key cryptography in which the key is randomly altered in a way that the cipher image created is mathematically impossible to break. The benefit of using stream cipher is that when it is compared to block ciphers, the execution speed is higher and has less hardware complexity. Unlike block ciphers, stream ciphers, even for repetitive blocks of plain text, will not generate the same ciphertext, since the keys are changed constantly for every element of plaintext [10].

Image encryption using some of the existing standard stream cipher methods such as RC4 and Vernam cipher methods have drawbacks. The RC4 algorithm is vulnerable to analytic attacks of the state table. In every 256 keys, there can be a weak key [11]. These keys are identified by cryptanalysis that is able to find circumstances under which one of more generated bytes are strongly correlated with a few bytes of the key [12]. Also, the same sequence of keys is repeated which would enable the hacker to break the ciphertext. Also, the first three words of the secret key can be found, and by iteration, each word of the key used in RC4 can be obtained. The Vernam cipher considered a perfect cipher is a type of one-time pad cipher. The drawback in this method is the need for the unlimited number of keys and the distribution of large number of random keys becomes a problem [13]. Recently, the use of a chaotic system in cryptography to encrypt images has emerged for its random characteristics [14].

As the core component of stream ciphers, the generation of random numbers is essential. There are two basic types of generators used to produce random sequences: random number generators (RNGs) and pseudorandom number generators (PRNGs). For cryptographic applications, both of these generator types produce a stream of zeros and ones that may be divided into substreams or blocks of random numbers. A random bit sequence could be interpreted as the result of the flips of an unbiased “fair” coin with sides that are labeled “0” and “1.” Obviously, the use of unbiased coins for cryptographic purposes is impractical. An RNG considers a nondeterministic source (i.e., the entropy source). The source typically consists of some physical quantity, such as the noise in an electrical circuit, the timing of user processes (e.g., keystrokes or mouse movements), or the quantum effects in a semiconductor. The outputs of an RNG may be used directly as a random number or may be fed into a PRNG. However, producing high-quality random numbers may be so time-consuming. Inputs to PRNGs are called seeds. The outputs of a PRNG are typically deterministic functions of the seed, which is the origin of the term “pseudorandom.” Ironically, pseudorandom numbers often appear to be more random than RNGs, because a series of transformations can eliminate statistical autocorrelations between input and output [10].

3.2. Applications of Zeckendorf Representation

Zeckendorf representation works well in certain situations. For example, Leroy et al. [15] count the number of distinct (scattered) subwords occurring in a given word. More precisely, it considers the generalization of the Pascal triangle to the binomial coefficients of words and the Zeckendorf representation counting the number of positive entries on each row. Epifanio et al. [16] proved that Zeckendorf representation has deep connections with the Sturmian graph, and Bernat [17] connected Zeckendorf representation with continued fractions.

In addition, the research of stream ciphers that resorts to Zeckendorf representation has long been done. For example, feedback with carry shift registers (FCSRs) plays a vital role in the hardware design of stream ciphers besides LFSRs. Galois representation is often considered the first choice for FCSRs, howbeit, recently, a new representation that generalizes both Galois and Zeckendorf representations for FCSR automata was presented [18]. It is immune to previous attacks and can dramatically improve internal diffusion. Later, Lin [19] further improved the aforementioned FCSR circuit.

Similarly, the U-Quark hash function with FCSRs of Zeckendorf representation [20]. Fish (Fibonacci shrinking), a fast software stream cipher, was proposed to achieve solid performance simulated on an Intel 486 processor [21]. Nevertheless, these researches do not focus on the generation of keystream of the stream cipher at the software level but the hardware level. Our research is suggested adding a small stone to the wall of the application of Zeckendorf representation. Recently, several studies apply Zeckendorf representation in blockchain and big data encryption [22, 23].

4. The Proposed Method

4.1. Probability Structure of Zeckendorf Representation

Suppose there exist ones in a canonical Zeckendorf representation, denoted as , it is known from [15] that the quantity of is given by

where is the ceiling function.

Let be the number of representations that own 1 in the th position. Filipponi and Wolfowicz [24] proves the fact that

where , or resting on the assumption [25]:

It can be easily inferred from (7) that the addend disappears for . And, from (6), the following are found.

Theorem 1. , and it is constant for any .

Proof. By using (7), we compute and using (8), we get then

Therefore, the probability that the th position of a Zeckendorf representation containing 1 s locates the digit 1 is

In a similar fashion of (10) and (12), we have

Let , , then, the value of is shown as Figure 1, where the straight line in the midsection means that the probability is constant.

4.2. Pseudorandom Keystream Generation

The following procedures could generate a reasonably satisfying keystream :

Suppose that both the sender and the receiver know a pair of keys , each of which consists of three integers:

where and are primes of the same order of magnitude. There exist pseudorandom integral sequences

with starting values satisfying

and each item of and is obtained by the algorithm described in [26] that

where is decided by the message length. The integers , () are converted into canonical Zeckendorf representations:

Then, we carry out the bitwise logical addition (OR) on their midsection (where the probability is constant) in this way acquiring of length :

where is given by (see next section for the value of )

By juxtaposing , we finally obtain the keystream sequence of length :

The sequence is exactly what we need. Just for the sake of convenience narration, we refer to this kind of pseudorandom keystream generation algorithm as “ZPKG” hereinafter, and the pseudocode is shown as Algorithm 1.

5. Randomness Analysis

Let be the greatest Fibonacci number no greater than , then the length of the shortest Zeckendorf representation will be . It can be proved that

where denotes the golden ratio. of 1’s in is most likely to be [15]:

Require: A pair of key, ; message length, ;
Ensure: Keystream, ;
1: //initialize;
2: ;
3: ;
4: for to do
5: 
6:   Generate integral sequences
7: 
8: ;
9: ;
10: 
11:  Convert into Zeckendorf representations
12: 
13: ;
14: ;
15: 
16:  Intercept midsection
17: 
18: fordo
19:  //Count the number of 1 in sequence;
20:  Count;
21:  //Determine the middle-section;
22:  ;
23:  ;
24:  //Intercept;
25:  ;
26: end for
27: 
28:   Bitwise OR
29: 
30:  OR
31: 
32:   Take random piece (of length )
33: 
34: ;
35: ;
36: 
37: end for
38: Return;

The probability that a digit ‘1’ lies in the th position in the midsection of is

Similarly, it holds for .

It draws from [15] that as approaches infinity, even in this case, would be enough, is expected to approach the limit of (see Figure 2)

Then, the probability that a ‘0’ lies in the th position in both and is readily given as below: where

In this way, is given by

From (26) and (28), it follows that, for , Golomb’s first postulate is enough fulfilled.

Golomb’s second postulate does not seem, by all accounts, to be so all around fulfilled. In this paper, we are going to assess the probabilities , , , and of any conceivable pair in .

First, we think about the probabilities , , and and see that ‘1’ fundamentally preexists or is followed by ‘0’—the fact that there is no pair (11) at all that is blamed. Therefore, we have

We can apply some bitwise logical additions (OR or ) to get each pair of : (1) or or (2) or or (3) or

Next, from (29), (30), and 1-4, we have

6. Experiment

6.1. Randomness Test
6.1.1. Golomb’s Postulate Testing

Given the initial values , , , , . Put them into the formula, we have , , and . Therefore, . The probability of -runs is obtained by enumerating the occasions they appear in and dividing . Table 1 shows the results. The results of and prove that (26), (28), and (32) hold for relatively large (); in other words, the experimental estimations are near theoretical calculation when .

We extract a segment of length randomly from and then compute the estimation of for , and unquestionably:

where .

A few more cases were acquired by alternating the parameters , , , and rendered insignificant differences from the preceding cases, which prove that the ZPKG algorithm satisfies the Golomb randomness postulates.

6.1.2. NIST-800-22 Statistical Testing

NIST-800-22 [27] is a statistical test suite for random and pseudorandom number generators for cryptographic applications. This test standard was enacted by the Information Technology Laboratory (ITL) at the National Institute of Standards and Technology (NIST). The test suite describes 16 statistical tests, including the longest run test, cumulative sums, and the linear complexity test, which are useful in detecting deviations of a binary sequence from randomness. The value summarizes the strength of the evidence against the null hypothesis in each statistical test. If value ≥ (level of significance), then the null hypothesis is accepted; i.e., the sequence appears to be random. If value < , the null hypothesis is rejected; i.e., the sequence appears nonrandom. Typically, is chosen in the range . Common values of in cryptography are about 0.01 based on the NIST-800-22 test standard.

We configured four sets of initial parameters of , , , and .The experimental results of the NIST-800-22 test are shown in Table 2. Table 2 shows that the generated sequences of four sets of parameters passed all the tests. These sequences show good randomness and meet the requirements of the stream cipher.

6.2. Performance Evaluation

-sequence based on linear feedback shift register (LFSR) is a widely used keystream generator for its long period, good statistical characteristics, easy to be analyzed by algebraic methods, and adapted for hardware implementation. Another type is word-based stream ciphers, for example, RC4 [28, 29]. RC4 has a variable key length and is based on the word-driven operation using random permutations. Unlike LFSR, RC4 works better with software implementation. RC4 consists of two parts: PRGA algorithm, which is for a pseudorandom number generator, and KSA algorithm, which is for key generation. RC4 is extensively used in the secure sockets protocol/transport layer security (SSL/TLS) and WEP protocols, part of the IEEE802.11 wireless LAN standard.

In this section, the ZPKG algorithm, RC4 algorithm, and LFSR algorithm are successfully applied to the encryption of more than 50 images of CVG-UGR test image set, including gray image, biometric image, medical image, and magnetic resonance image (MRI). The experimental simulation platform is FPGA, and the simulation software is ModelSim SE-64 10.4.

6.2.1. Hardware Depletion

As shown in Table 3, the ZPKG circuit employs significantly more logic gates and registers than RC4 and LFSR; it occupies the same RAM as LFSR but is slightly higher than RC4. However, the I/O pins of the ZPKG circuit are only 1/3 of RC4, slightly more than that of LFSR. The number of PLLs is similar to that of RC4 but higher than LFSR. This shows that the ZPKG and RC4 circuits have different priorities regarding the disposal of hardware resources; ZPKG and RC4 occupied more hardware resources than LFSR. From a power distribution perspective, they are all primarily based on static power. ZPKG has the highest I/O power other than dynamic power (see Figure 3 and Table 4). However, as shown in Table 3, the number of ZPKG I/O pins is considerably lower than RC4, implying that ZPKG requires frequent I/O operations. In addition, the clock power consumption of RC4 is much higher than that of ZPKG and LFSR, which indicates that RC4 requires more clock cycles, which suggests that the generation speed of a pseudorandom keystream of RC4 is the slowest.

6.2.2. Key Generation Speed

Under crystal vibration frequency of 50 MHz, the ZPKG circuit spends 4670 ns to generates a 64-bit pseudorandom keystream, while that of the RC4 is 9790 ns, and that of the LFSR is 25500 ns (see Table 5). In other words, ZPKG is approximately one time faster than RC4; both ZPKG and RC4 are much slower than LFSR.

Twenty simulations were completed, each generating a pseudorandom 64-bit key. Figure 4 presents the statistics, suggesting that the results are stable with the simulations.

6.3. Security Analysis

Security is important not only for the encrypted objects but also for the encryption algorithms themselves.

In what follows, we discuss some security issues of the ZPKG algorithm, such as scrambling effect, statistical histogram analysis, key sensitivity testing, robustness, and noise attacks.

6.3.1. Scrambling Effect

Figure 5 shows the results of ZPKG algorithm encrypting and decrypting different types of grayscale images. The encrypted images are visually close to noise images. The structural similarity (SSIM) index is a quantitative assessment method for measuring the similarity between two images [29]; it reflects whether the original images are completely reconstructed or not. A value 1 of the SSIM index indicates that two measured images are identical. SSIM is computed as where

where represents the gray value of the th pixel of the first image, and represents the gray value of the th pixel of the second image.

Figures 5(i)5(l) show the grayscale images decrypted by ZPKG, and its SSIM index equals to 1, which proves the correctness of the ZPKG algorithm.

The 3D images such as color images contain several 2D data matrices called 2D components. Color images, for example, contain three color channels called R, G, and B. Each color channel is a 2D component. In this manner, the 3D images can be considered the combination of several 2D images. The 3D image encryption can be accomplished by encrypting all its 2D components one by one. Figure 6 shows three examples of the ZPKG algorithm encrypting color images. The encrypted images (Figures 6(d)6(f)) look like noise images visually.

6.3.2. Histogram Analysis

An image histogram is a graphic representation of the pixel intensity distribution of an image. To overcome statistic attacks, the encrypted image should have a histogram with random behavior and uniform distribution.

The encrypted image histograms using the ZPKG algorithm, LFSR algorithm, and RC4 algorithm are completely different from the original image (see Figure 7). Nevertheless, the encrypted image of RC4 still retains some visual information of the original image, and the intensity distribution of the corresponding histogram is uneven. By contrast, the encrypted images generated by the LFSR and ZPKG algorithms follow a nearly uniform distribution, indicating that the ZPKG algorithm has better performance against statistical attacks than RC4.

6.3.3. Key Sensitivity

We flip one bit of keystream, then decrypt images with flipped key. We notice that the decoded picture is in a state of chaos (see Figure 8) and deviates from the original image. The ZPKG algorithm shows good key sensitivity.

6.3.4. Robustness

The communication and networking channels are generally in the presence of different types of noise. To test the robustness of the ZPKG algorithm against noise attacks, the salt and pepper noise with density 0.05 is added to the encrypted images. We then try to reconstruct the original image from these noised encrypted images. The SSIM index is used to quantitatively evaluate the similarity between the reconstructed images and the original images. The results are shown in Figure 9. The SSIM index of ZPKG (0.9021) is higher than that of RC4 (0.7037 and 0.4548) and LFSR (0.5068), so we can say ZPKG is more robust when facing noise attacks.

7. Conclusion

Stream ciphers cannot really work without the keystream randomness. We proved that the probability of occurrence of the number 1 in the middle part of Zeckendorf coding is constant, which can generate pseudorandom numbers. The pseudorandom numbers generated by the proposed algorithm satisfy the Golomb randomness hypothesis. Experimental results show that our method is three times faster than the RC4 and LSFR algorithms, has no significant difference in hardware occupation, and has high key sensitivity and good resistance to noise attacks and statistical attacks. There is no doubt that our research has its limitations, for example, the lack of theoretical analysis of the characteristics of cryptography, and the lack of comparison with recent research such as chaotic stream ciphering algorithms. We will spend more time devoting on theoretical research on Zeckendorf representation in the future. Furthermore, more experimental investigations are needed to evaluate the performance of ZPKG by comparing it with other stream ciphers such as RC4A, VMPC, and Spritz.

Data Availability

The CVG-UGR test images data used to support the findings of this study have been deposited in the repository (https://ccia.ugr.es/cvg/dbimagenes/).

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work is sponsored by the National Natural Science Foundation of China under grant number 61832014.