How? And it is very simple — I will not intersperse anything in a stegokonteyner at all. Really, if we intersperse nothing, then the empty container is indistinguishable from a stegokonteyner, truly?
"Wait, but if we intersperse nothing at all, then we transfer nothing at all!!!" — the reader will reasonably argue with me.
Absolutely truly! To intersperse we and we will not be! There is a method, without distorting the container, nevertheless to transfer information. How?
Schematically Hash steganography ɔ⃝
it is possible to provide so:
The text explanation to the picture under a cat.
The idea is very simple. We take a big a lot of pictures. In the modern world of a surplus of gadgets, selfyashnitsa and cheap memory for storage of the received photos, generation of a large number of pictures, I think, will not make special problems …
If to generate laziness, there is a set of the websites, providing a huge number of photos (and other pictures).
We write idle time skriptik on Python and we unload data.
Now we take certain good MD5, fairly lost trust of the public.
We banish all pictures through a hash function, we enter data in the table:
the picture-> a hash
- kotik1.jpg-> 0xd131dd02c5e6eec4693d9a0698aff95c
- kotik2.jpg-> 0x55ad340609f4b30283e488832571415a
We fix a certain integral number of n.
For example n=2.
We make the new plate, selecting the first (or the last who as love) n of nibbl (half-bytes) from a hesh.
For our example:
- kotik1.jpg-> 0xd1
- kotik2.jpg-> 0x55
As not to do without determinations any more, we will enter concept a hash code.
The hash code is first n of nibbl of a hesh of the image.
Calculations we will designate Funtion for H (...).
For an example: H(котик1.jpg) = 0xd1; H(котик2.jpg) = 0x55
We make an index on a hash code in our table. It is necessary that we quickly enough (namely for O (log(M) where M — the number of pictures in base) could find the necessary line in the database on a hash code.
Let's assume that we want to transfer the message (in a hexadecimal type): 55d134237598. Razobjy it on n of nibbl. As an example we selected n=2, means it is necessary 55d134237598 to break the message on 2 nibbl. If we "break" spaces, then it will look so: 55 d1 34 23 75 98. The received "pieces", each of which contains n of nibbl we will call words. That is all at us 6 words turned out.
Further we do the following.
- We take the floor. We look for the picture at which the hash code matches our word. If it is several those pictures, we take any of them.
- We transfer the picture to the canal
- we take the second floor, we look for the hash code matching the second word
- We transfer the picture to the canal
- we pass to the third word … and so on...
Once again for understanding we will look at Gif'ku in which the principle of transfer of 3 words is shown: 55 d1 34
I ask to notice that if we take arbitrarily the picture in the database, then probability of that
its hash code will match with randomly the set word is equal (1/2^(4*n)) (
For n=1 this value is equal 1/16, for n=2 this value is equal already 1/256.
The less this probability, the more photos it is necessary to place in the database for satisfaction of our steganografichesky requirements.
On the other hand, the more n, the is more code speed.
Unfortunately with increase in n, speed increases linearly, and here need for the number of photos exponential.
For this reason a hash steganography it is very slow. Nevertheless, for transfer of short messages, it is quite acceptable.
Now let's give the formal description of a stegosistema.
Let's define stegosisty as a set of objects of I.
Let's define a set of words S
Let's define the H(i) function which for each i from I puts in compliance the word s from S.
Let's call this function a hash code.
For transfer of the message of s1, s2, s3..., sm should find such i1, i2, i3..., im,
for which it is fair: H(i1) =s1, H(i2) =s2..., H(im)=sm.
Here actually and all mathematical model. As you can see — anything difficult.
In order that for each word s not to touch the picture until H(i) becomes equal s
it is reasonable to select rather large number of images, previously to calculate their hash codes, to write in the database and to construct an index on a hash codes.
How many has to be images that with probability of 99.999% it was possible to transfer any message of s1, s2, s3..., m length sm? Let's designate this number for M.
For me this question open … Any different ways, except as the Monte-Carlo method, for calculation of M from m and n I do not see.
Let's notice that ours a hash steganography, is a typical pure-steganography therefore would be not bad data to cipher. For example, it is possible to use the stream cipher; in this case, before giving a required stegosoobshcheniye on a stegosistema input, it should be summed up with a certain gamma. For example the required message 55 d1 34 23 75 98 can be ciphered Vernam's cipher and to give already encoded sequence on a stegosistema input.
Another, in my opinion the important upgrade is a removal of the picture from the database after its use; that later it not to reuse use of the picture.
It is also possible to try to play with probabilities and with reliability of system. For example that will be if we want to ask to speak 0x55, but at us in the database images about a hash code 0x55 already ended?
So far option two: or to transfer with an error, or by any method to nagenerirovat many new photos in hope that among them there will be at least one about a hash code equal 0x55 …
However it is possible to think up also other option: to use noiseproof coding. Then we have an opportunity to make several mistakes and ECC them will correct.
Well and, of course, it is possible to break not into nibbla, and into bytes, into bits, into any numbers of any numeration system.
It is also necessary to give the explanation concerning the channel. I meant abstraction in the most general sense of this word by the channel. The most important — the sequence of one word after another has to be set. In principle it is possible just to place files in a crowd in the folder, having numbered them. In this case the folder with files will be the channel.
Offhand several examples of specific implementation hash steganography.
- The publication of images on a social network in a certain sequence.
- Data transmission under the BitTorrent protocol in the necessary sequence (that were necessary a hash codes for transfer of proper words)
- If in the protocol X there is a possibility of coding of the same data by different methods, then the choice of such method of coding to transfer proper words. For example the ZIP protocol will approach
- Absolutely any intelligent replacement in human language until we receive necessary a hash code. For example the sentence "I went to the wood and saw a gray wolf" it is possible to replace with the similar intelligent sentence "I went to the wood and saw a gray wolf". The sense is identical, but a hash code, perhaps, it will be another. It is necessary to make replacement until we receive a hash code necessary to us.
Let's sum up the results
- The hash steganography does not distort data of the container. Therefore it is absolutely confidential (on Kashena)
- Hash steganography it is rather slow. With growth of n speed linearly, and complexity exponential increases.
- Hash steganography it is not sensitive to data of the container. There is no LSB for a MP3, it is impossible to apply a steganography in prosodies to JPEG … For a hash steganography important only the fact that data are representable in a digital form.
- Because of item 1, item 2, the item 3 the hash steganography can be a quite good solution for transfer of short messages
- As the hash steganography is pure a steganography (a steganography without key), data protection requires enciphering
- The hash steganography can be "subalgorithm" of other algorithm of a steganography. The truth in this case to us, most likely it is necessary to offer a sovershennost. Naprmer in LSB it is possible to break the image into U parts. We take part and we change in it 1 data bit, we calculate a hash code for this part. If it matched the word, then it is passed to the following part and we take the following floor. If is not present, then we try to change other data bit. Thus we changed only U data bit, but gave 4*U*n data bit. Increasing n and U it is possible to achieve very good results.
The users only registered can participate in poll. Enter, please.
This article is a translation of the original post at habrahabr.ru/post/272935/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: firstname.lastname@example.org.
We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.