Within the next laboratory work we with colleagues faced a problem of analysis of a hexadecimal dump of the PNG file. According to the RFC 2083 standard the PNG format stores the pixel data squeezed by algorithm of Deflate. Therefore at analysis of a dump we needed to unpack compressed data algorithm of Inflate.
Analysis is carried out on the following image of the size of 4х4 pixels:
1951 data streams squeezed by means of RFC standard Deflate are stored in PNG in the zlib format of the RFC 1950 standard, we also will use this standard at analysis. The heading zlib defines the Deflate settings. It has the following structure:
1 byte 
1 byte 
4 bytes 

CMF 
FLG 
DICTID 

4 bits 
4 bits 
2 bits 
1 bits 
5 bits 

CINFO 
CM 
FLEVEL 
FDICT 
FCHECK 
In more detail about fields:
 CMF (Compression method and flags). This byte is separated into 2 cells on 4 bits in everyone: CM (Compression method), CINFO (Compression info).
 CM (Compression method). This field defines a method of the compression used in the file. = 8 designates CM that Deflate with a size of window up to 32 kilobyte is used.
 CINFO (Compression info). When CM = 8, CINFO is the logarithm with the basis 2 setting value of the size of a window minus 8 (CINFO = 7, sets the window size, to equal 32 kilobytes).
 FLG (Flags). This byte is separated into parts as follows: FCHECK, FDICT, FLEVEL.
 FCHECK (check bits for CMF and FLG). Let's consider value of expression sum = (CMF * 256 + FLG) as a sixteenbit integral number without sign. This field supplements FLG so that sum value was multiple 31.
 FDICT (Preset dictionary). If this flag is set, then the descriptor of the DICT dictionary follows FLG byte at once. Raspakovshchik can use value of the given field for determination of the dictionary which was used at compression.
 FLEVEL (Compression level). This field is used by special shrinking algorithms. Deflate (CM = 8): 0 — the fastest compression, 1fast compression, 2 — compression by default, 3 — the maximum compression (most slowly). Value of this field is not considered when unpacking. It is used to specify whether it makes sense, additional compression.
After DICT (if FDICT is set) there is a flow of compressed data which length for PNG depends on the CHUNK_LENGTH field of structure of IDAT. At the end of the portion of zlib there is a checksum of ADLER32 calculated according to uncompressed data on algorithm of Adler32.
In our case the heading zlib has the following appearance:
78 DA 
0111 
1000 
11 
0 
11010 
Window size = 32K 
Compression method = Deflate 
Compression level = fastest 
Dictionary is used = false 
Check bits 
From heading we find out that the dictionary is not used (FDICT = 0).
Compressed data for our picture:
63 F8 3F 93 E1 3F 03 C3 CC FF 20 1A C8 00 22 24 0E 58 12 85 33 D3 F8 3F 03 32 07 44 03 00 AA 05 23 77
Bits are read out further continuously on bytes. In byte we go from the last bit to the first (for example, in a data stream consistently there are 2 bytes: 63 F8 (0b01100011 0b11111000), then have to receive the following bit sequence: 1100011000011111).
BFINAL flag specifying whether this portion of data of the last is goes in the first bit in the readout sequence. The following 2 bits of BTYPE specify compression type: 00 — without compression, 01 — compression with the fixed Huffman codes, 10 — compression with dynamic Huffman codes, 11 — value is reserved.
The table for the fixed Huffman codes:
The unpacked value 
Number of bits 
Codes 
base 
0 — 143 
8 
From 00110000 to 10111111 
00110000 
144 — 255 
9 
From 110010000 to 111111111 
110010000 
256 — 279 
7 
From 0000000 to 0010111 
0000000 
280 — 287 
8 
From 11000000 to 11000111 
11000000 
Table of shifts:
Codes 
Additional bits 
Distance 
Codes 
Additional bits 
Distance 
Codes 
Additional bits 
Distance 
0 
0 
1 
10 
4 
33 — 48 
20 
9 
1025 — 1536 
1 
0 
2 
11 
4 
49 — 64 
21 
9 
1537 2048 
2 
0 
3 
12 
5 
65 — 96 
22 
10 
2049 — 3072 
3 
0 
4 
13 
5 
97 — 128 
23 
10 
3073 — 4096 
4 
1 
5, 6 
14 
6 
129 — 192 
24 
11 
4097 — 6144 
5 
1 
7, 8 
15 
6 
193 — 256 
25 
11 
6145 — 8192 
6 
2 
9 — 12 
16 
7 
257 — 384 
26 
12 
8193 — 12288 
7 
2 
13 — 16 
17 
7 
385 — 512 
27 
12 
12289 — 16384 
8 
3 
17 — 24 
18 
8 
513 — 768 
28 
13 
16385 — 24576 
9 
3 
25 — 32 
19 
8 
769 — 1024 
29 
13 
24577 — 32768 
Table of lengths:
Codes 
Additional. Bits 
Length 
Codes 
Additional. Bits 
Length 
Codes 
Additional. Bits 
Length 
257 
0 
3 
267 
1 
15, 16 
277 
4 
67 — 82 
258 
0 
4 
268 
1 
17, 18 
278 
4 
83 — 98 
259 
0 
5 
269 
2 
19 — 22 
279 
4 
99 — 114 
260 
0 
6 
270 
2 
23 — 26 
280 
4 
115 — 130 
261 
0 
7 
271 
2 
27 — 30 
281 
5 
131 — 162 
262 
0 
8 
272 
2 
31 — 34 
282 
5 
163 — 194 
263 
0 
9 
273 
3 
35 — 42 
283 
5 
195 — 226 
264 
0 
10 
274 
3 
43 — 50 
284 
5 
227 — 257 
265 
1 
11, 12 
275 
3 
51 — 58 
285 
0 
258 
266 
1 
13, 14 
276 
3 
59 — 66 



When unpacking data can be submitted by two types: fixed value and length of reverse bias.
Data which we read out have to be transferred to fixed values. The transfer formula is given below:
lit value = data — base + left bound, where
lit value — fixed value,
base — a column from table 1,
left bound — the left number from a column of the unpacked values from table 1.
At data reading, we can unambiguously define to what interval of fixed values from table 1 it corresponds thanks to the fact that prefixes of each interval are different.
For example, we will select 2 bytes of F8 and 3F from our sequence. Sequence of bits which turned out 0001111111111100. Let current position of reading 4, we read out further 7 bits (minimum the number of bits), we receive a prefix 1111111 which corresponds to an interval 2. From this it turns out that length of the readout code is equal to 9.
0b111111111 = 0d511
Using the formula which is stated above, we will receive that number which was coded equally 255 = 511 — 400 (0b110010000) + 144.
Let's consider a case when at us instead of fixed value the data stream contains information on length and reverse bias, i.e. the readout value gets to the 3rd interval. In our option it will be a subsequence:
20 1A C8 (00000100 01011000 00010011).
We read out the first 7 bits (0b0000010) which get to an interval 3. On a formula we transfer to number. It will be number 257 = 1 — 0 + 256. Further we use table 3. The code 257 means that the number of additional bits which needs to be considered equally in 0. And length on the third column is equal to 3. If additional bits are, then they are read out upsidedown. These bits define number which we add to length.
Further we read out 5 bits (0b00101 = 0d5) which define reverse bias. In our case this number 5. From table 2 clear that we should consider 1 additional bit. We read out it upsidedown. It turned out 1 (0d1). We add this number to the minimum distance from a column 3. And it follows from this that our reverse bias is equal 7 + 1 = 8.
What does it mean? For example, we considered 9 values: 0 255 153 0 255 0 0 153 255. The return distance shows, on how many values we should be displaced in this flow back, i.e. we will be on the second value — 255. Length which at us is equal 3 shows that we should take 3 values in from a data stream, since value on which we stand, i.e. 255 153 0. If length is more, than shift, then we return to an initial position and we read out again in an initial order until we consider the amount of values equal to length. Let's say at us length 7 turned out, and reverse bias is equal to 2. We are displaced on 2 values, i.e. our position on penultimate number — 153. The sequence which turned out is equal to 153 255 153 255 153 255 153. Eventually, when the raspakovshchik reads out the next fixed value equal to zero, it completes the work.
Complete analysis of a dump:
01100011 FINAL CHUNK, FIXED HUFFMAN 11111000 48 — 48 + 0 = 0 — FILTER: NONE 00111111 511 — 400 + 144 = 255 10010011 409 — 400 + 144 = 153 11100001 48 — 48 + 0 = 0 00111111 511 — 400 + 144 = 255 00000011 48 — 48 + 0 = 0 11000011 48 — 48 + 0 = 0 11001100 409 — 400 + 144 = 153 11111111 511 — 400 + 144 = 255 00100000 2 — 0 + 256 = 258 LENGTH=4 00011010 5 DISTANCE=7+1=8 11001000 1 — 0 + 256 = 257 LENGTH=3 00000000 6 DISTANCE=9+0=9 00100001 1 — 0 + 256 = 257 LENGTH=3, 2 DISTANCE=3 00100100 9 — 0 + 256 = 265 LENGTH=11+0=11 00001110 7 DISTANCE=13+0=13 01011000 3 — 0 + 256 = 259 LENGTH=5 00010010 9 DISTANCE=25+4=29 10000101 10 — 0 + 256 = 266 LENGTH=13+0=13 00110011 7 DISTANCE=13+0=13 11010011 409 — 400 + 144 = 153 11111000 99 — 48 + 0 = 51 00111111 511 — 400 + 144 = 255 00000011 48 — 48 + 0 = 0 00110010 9 — 0 + 256 = 265 LENGTH=11+1=12 00000111 7 DISTANCE=13+0=13 01000100 2 — 0 + 256 = 258 LENGTH=4 00000011 5 DISTANCE=7+1=8 00000000 0 — 0 + 256 = 256 END 10101010 ADLER 00000101 ADLER 00100011 ADLER 01110111 ADLER 
0 255 153 0 255 0 0 153 255 255 153 0 255 255 0 0 255 0 0 0 153 255 255 153 0 255 255 0 0 255 255 153 0 255 0 255 153 0 255 255 0 0 255 255 153 0 255 0 153 51 255 0 255 0 0 255 255 153 0 255 0 153 51 255 255 153 0 255 
Finally on laboratory work a test and approval of the teacher was passed. And now according to its sentence this article is also written. Searches of similar materials led eventually to standards. We hope that this article already in mother and clear Russian will help persons in need with their own undertakings.
Source image
This article is a translation of the original post at habrahabr.ru/post/274825/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.