Developers Club geek daily blog

1 year, 6 months ago
Once during an ice cold winter time … exactly a year ago, we had a nontrivial task. There is a screen on electronic ink, there is a MHz processor 16 (yes, in the built-in electronics, especially ultralow energy consumption, also such meet) and at all there is no memory. Well, i.e. kilobytes of 8 RAM and 256 Flash. Kilobytes, Carle. And it is necessary to push several images in these sad kilobytes 800х600 in four shades of gray. Having quickly multiplied in mind 800 on 600 and on 2 bits per pixel we receive 120 thousand bytes. Does not get a little. It is necessary to squeeze.

So we saw a task: "how to squeeze a flat cat"? Why cat? Yes because on cats tested on what to check black-and-white pictures. Not on dollar banknotes.

The first answer sounded so: let's squeeze a cat of RLE. Cat flat … That is ploskotsvetny — only four shades. Of blank spaces on the screen it is complete, i.e. the repeating pixels will be to the devil. Has to contract.

Has to contract — contracted. With the only difficulty: to squeeze to us all the same as, we squeeze on PC, and even on the server. And here we should unclench consistently, a stream image: the byte was pulled out — the byte was displayed. We have no video buffer on 8 kilobytes of RAM, the unclenched cat could not be stored.

Coped. The cat contracts.

Compression of RLE
/****************************************************************************/
/* Common Utilities Library        *            (C) Componentality Oy, 2015 */
/****************************************************************************/
/* RLE compression implementation                                           */
/****************************************************************************/
#include "rle.h"
#include <memory.h>

using namespace Componentality::Common;

// Do 8-bit RLE encoding
void Componentality::Common::RLE_Encode(unsigned char* source, size_t source_size, unsigned char* target, size_t&target_size)
{
	target_size = 0;
	unsigned char previous_character = source[0];
	unsigned char series_counter = 1;
	bool same = false;
	size_t i;
	for (i = 1; i < source_size; i++)
	{
		// If current byte is equal to previous
		if (source[i] == previous_character)
		{
			// If we process sequence of the same characters
			if (same)
			{
				if (series_counter < 127)
					series_counter++;
				else
				{
					target[target_size++] = 0x80 | series_counter;
					target[target_size++] = previous_character;
					series_counter = 1;
				}
			}
			else
			{
				if (series_counter > 1)
				{
					target[target_size++] = series_counter - 1;
					memcpy(target + target_size, source + i - series_counter, series_counter - 1);
					target_size += series_counter - 1;
				}
				series_counter = 2;
				same = true;
			}
		}
		else
		{
			if (same)
			{
				if (series_counter > 1)
				{
					target[target_size++] = 0x80 | series_counter;
					target[target_size++] = previous_character;
					series_counter = 1;
				}
				else
					series_counter += 1;
				same = false;
			}
			else
			{
				if (series_counter > 127)
				{
					target[target_size++] = series_counter - 1;
					memcpy(target + target_size, source + i - (series_counter - 1), series_counter - 1);
					target_size += series_counter - 1;
					series_counter = 1;
				}
				else
					series_counter++;
			}
		}
		previous_character = source[i];
	}
	if (same)
	{
		target[target_size++] = 0x80 | series_counter;
		target[target_size++] = previous_character;
	}
	else
	{
		target[target_size++] = series_counter;
		memcpy(target + target_size, source + i - (series_counter), series_counter);
		target_size += series_counter;
	}
}

// Do buffered RLE decoding
void Componentality::Common::RLE_Decode(unsigned char* source, size_t source_size, unsigned char* target, size_t&target_size)
{
	target_size = 0;
	for (size_t i = 0; i < source_size;)
	{
		unsigned char size = source[i] &~0x80;
		if (source[i] &0x80)
		{
			memset(target + target_size, source[i + 1], size);
			i += 2;
		}
		else
		{
			memcpy(target + target_size, source + i + 1, size);
			i += size + 1;
		}
		target_size += size;
	}
}

// Check where two buffers are different
size_t Componentality::Common::isDiff(unsigned char* left, unsigned char* right, size_t size)
{
	for (size_t i = 0; i < size; i++)
	{
		if (left[i] != right[i])
			return i;
	}
	return (size_t)-1;
}

// Incremental decoding initialization
void Componentality::Common::RLE_InitDecoder(RLE_DECODE* handler, unsigned char* source)
{
	handler->mDecodedPortion = 0;
	handler->mPtr = 0;
	handler->mOffset = 0;
	handler->mSource = source;
}

// Decode next byte incrementally
unsigned char Componentality::Common::RLE_Fetch(RLE_DECODE* handler)
{
	if (handler->mDecodedPortion > handler->mPtr)
	{
		handler->mPtr += 1;
		if (handler->mMode == 0x00)
			handler->mOffset += 1;
		return handler->mSource[handler->mOffset - 1];
	}
	handler->mDecodedPortion = handler->mSource[handler->mOffset] &~0x80;
	handler->mMode = handler->mSource[handler->mOffset] &0x80;
	handler->mOffset += 2;
	handler->mPtr = 1;
	return handler->mSource[handler->mOffset - 1];
}



Well the cat contracts. On average on hospital of time in 4. But we greedy, to us are more dense. A little we have memories, very little, and it is necessary not for only one cats, we still need to build a peer-to-peer network, to store routes and other any nonsense.

Thought once again. So thought, syak thought, decided that LZ77 will approach too, it is only necessary to think up how to unclench data in the flow without intermediate storage. Thought up. It turned out here so:

Compression by embedded-modification of LZ77 (algorithm of Scanback)
/****************************************************************************/
/* Common Utilities Library        *       (C) Componentality Oy, 2014-2015 */
/****************************************************************************/
/* Scanback compression implementation                                      */
/****************************************************************************/
/* Scanback - LZ77 for embedded systems                                     */
/* Designed and developed by Konstantin Khait                               */
/****************************************************************************/

#include "scanback.h"
extern "C"
{
#include "bitmem.h"
}

using namespace Componentality::Common;

// Scan buffer (buf) back from position <index> - 1 for byte <wtf> from <minfind> to <maxfind> index
static unsigned char _sb__findback(unsigned char* buf, unsigned long index, unsigned char wtf, unsigned char minfind, unsigned char maxfind)
{
	unsigned char i;
	for (i = minfind; i < maxfind; i++)
	{
		if (buf[index - i] == wtf)
			return i;
	}
	return 0;
}

// Compare <buf1> and <buf2> for maximum length of <size> and return length of identical fragment
static unsigned char _sb__match(unsigned char* buf1, unsigned char* buf2, unsigned char size)
{
	unsigned char i;
	for (i = 0; i < size; i++)
	{
		if (buf1[i] != buf2[i])
			break;
	}
	return i;
}

// Find maximum matching sequence in buffer <buf> to sequence starting from <index>
// <winsize> - size of window to be scanned in bytes, <matchlen> - maximum length of matching area in bytes, <bufsize> - size of <buf>
SB_PTR _sb_scanback(unsigned char* buf, unsigned long index, unsigned char winsize, unsigned char matchlen, unsigned long bufsize)
{
	SB_PTR result = { 0, 0 };
	unsigned char i;
	if (winsize > index)
		winsize = (unsigned char)index;
	if (matchlen > winsize)
		matchlen = winsize;
	for (i = 1; i < winsize; i++)
	{
		register unsigned char offset = _sb__findback(buf, index, buf[index], i, winsize);
		if (offset)
		{
			register unsigned char matchsize = (unsigned char)(matchlen < (bufsize - index) ? matchlen : bufsize - index);
			if (matchsize > offset)
				matchsize = offset;
			register unsigned char len = _sb__match(buf + index, buf + index - offset, matchsize);
			if (len > result.length)
			{
				result.offset = offset;
				result.length = len;
			}
			i = offset;
		}
	}
	return result;
}

// Do compression of buffer <buf> of size <size> to bitwise memory <mem>. Returns number of produced bits
unsigned long Componentality::Common::SB_Compress(unsigned char* mem, unsigned char* buf, unsigned long size)
{
	unsigned long bitcount = 0, i;
	SB_PTR cptr;

	for (i = 0; i < (1 << LENGTH_BITS); i++)
		mem[i] = buf[i];
	bitcount += (1 << LENGTH_BITS) * 8;

	for (i = 1 << LENGTH_BITS; i < size;)
	{
		cptr = _sb_scanback(buf, i, 1 << WINDOW_BITS, 1 << LENGTH_BITS, size);
		if (!cptr.offset)
		{
			bitmem_put1(mem, bitcount++, 0);
			bitmem_put(mem, bitcount, buf[i], 8);
			bitcount += 8;
			i += 1;
		}
		else
		{
			bitmem_put1(mem, bitcount++, 1);
			bitmem_put(mem, bitcount, cptr.offset - 1, WINDOW_BITS);
			bitcount += WINDOW_BITS;
			bitmem_put(mem, bitcount, cptr.length - 1, LENGTH_BITS);
			bitcount += LENGTH_BITS;
			i += cptr.length;
		}
	}
	return bitcount;
}

// Initialize decoder context
void Componentality::Common::SB_InitDecoder(SB_DECODER* decoder, unsigned char* mem)
{
	decoder->bitindex = 0;
	decoder->mem = mem;
	decoder->total_decoded = 0;
	decoder->index = 0;
	decoder->brb = 0;
}

// Initialize decoder with ringbuffer
void SB_InitDecoderRB(SB_DECODER* decoder, void* ringbuffer)
{
	decoder->bitindex = 0;
	decoder->mem = 0;
	decoder->total_decoded = 0;
	decoder->index = 0;
	decoder->brb = ringbuffer;
}

// Unpack next byte from the packed stream
unsigned char Componentality::Common::SB_Fetch(SB_DECODER* decoder)
{
	register unsigned char result;
	if (decoder->index < (1 << LENGTH_BITS))
	{
		if (!decoder->brb)
			result = decoder->decoded_buf[decoder->index % (1 << WINDOW_BITS)] = decoder->mem[decoder->index];
		else
			result = decoder->decoded_buf[decoder->index % (1 << WINDOW_BITS)] = (unsigned char)bitmem_read_ringbuf((BIT_RINGBUF*)decoder->brb, 8);
		decoder->index += 1;
		decoder->bitindex += 8;
		decoder->total_decoded += 1;
		return result;
	}
	if (decoder->index >= decoder->total_decoded)
	{
		bit isref = bitmem_get1(decoder->mem, decoder->bitindex++);
		if (!isref)
		{
			if (!decoder->brb)
				decoder->decoded_buf[decoder->total_decoded % (1 << WINDOW_BITS)] = (unsigned char)bitmem_get(decoder->mem, decoder->bitindex, 8);
			else
				decoder->decoded_buf[decoder->total_decoded % (1 << WINDOW_BITS)] = (unsigned char)bitmem_read_ringbuf((BIT_RINGBUF*)decoder->brb, 8);;
			decoder->bitindex += 8;
			decoder->total_decoded += 1;
		}
		else
		{
			register SB_PTR ptr;
			register unsigned char i;
			if (!decoder->brb)
				ptr.offset = (unsigned char)bitmem_get(decoder->mem, decoder->bitindex, WINDOW_BITS) + 1;
			else
				ptr.offset = (unsigned char) bitmem_read_ringbuf((BIT_RINGBUF*)decoder->brb, WINDOW_BITS) + 1;
			decoder->bitindex += WINDOW_BITS;
			if (!decoder->brb)
				ptr.length = (unsigned char)bitmem_get(decoder->mem, decoder->bitindex, LENGTH_BITS) + 1;
			else
				ptr.length = (unsigned char) bitmem_read_ringbuf((BIT_RINGBUF*)decoder->brb, LENGTH_BITS) + 1;
			decoder->bitindex += LENGTH_BITS;
			for (i = 0; i < ptr.length; i++)
			{
				register unsigned long srcptr = decoder->total_decoded - ptr.offset;
				decoder->decoded_buf[decoder->total_decoded % (1 << WINDOW_BITS)] = decoder->decoded_buf[srcptr % (1 << WINDOW_BITS)];
				decoder->total_decoded += 1;
			}
		}
	}
	result = decoder->decoded_buf[decoder->index % (1 << WINDOW_BITS)];
	decoder->index += 1;
	return result;
}


Especially we were pleased by footprint for a decompression, to approximately equal 150 bytes (at algorithm "window" in 127 bytes). Originally in algorithms Lempelya-Ziva strongly confused us need of memory allocation under the dictionary. RLE that the dictionary at all without need … But you will not frighten us of 150 bytes.

Another frightened us — in spite of the fact that from the theory it is known that LZ77 is a generalization of RLE, replacement of the second by the first gave improvement of result on the verge of a statistical error: sometimes better, sometimes worse, but in general the same compression ratio, what parameters do not set.

Began to think about entropy methods: Huffman, arithmetic coding, wrote couple of prototypes … did not think up. All decompressors need tables quite decent, and to our measures — really indecent size.

And then … And then we started compression of scanback'om over RLE. And, about a miracle, the compression ratio with 3-4 jumped up till 7-10 depending on "a wooliness of a cat", that is from degree of a ploskotsvetnost of the picture and the number of gradient areas. It is possible to live. And, the most important, RLE + SB perfectly is unclenched by a stream decompressor in one pass.

Here so:

Stream decompressor of RLE + Scanback
/****************************************************************************/
/* Common Utilities Library        *            (C) Componentality Oy, 2015 */
/****************************************************************************/
/* Combined RLE + Scanback implementation (compression is to be done        */
/* sequentially, decompression is optimized                                 */
/****************************************************************************/
#include "rlesbc.h"

using namespace Componentality::Common;

// Decode next byte incrementally for stream compressed by both RLE and Scanback
unsigned char Componentality::Common::RLESB_Fetch(RLE_DECODE* handler, SB_DECODER* sb_handler, unsigned char* repeating_value)
{
	if (handler->mDecodedPortion > handler->mPtr)
	{
		handler->mPtr += 1;
		if (handler->mMode == 0x00)
			*repeating_value = SB_Fetch(sb_handler);
		return *repeating_value;
	}
	*repeating_value = SB_Fetch(sb_handler);
	handler->mDecodedPortion = *repeating_value &~0x80;
	handler->mMode = *repeating_value &0x80;
	*repeating_value = SB_Fetch(sb_handler);
	handler->mPtr = 1;
	return *repeating_value;
}


Already a year as our cats perfectly live almost "in complete unconsciousness", to spite of any ZLIB'AM. Which, certainly, press far more densely, but also resources consume incomparably more. And we found out meanwhile that our RLE + perfectly is suitable SB for many other tasks, for example, it bitmap fonts, even with transparency and an antialiasing, and also any network text traffic perfectly contract. Naturally, degree of a compression makes ridiculous 2.5-6, but and resources are almost not consumed, especially on a release which is, as a rule, executed more often, and to speed memory where is more critical.

So we decided not to cheap out and upload a code publicly (on condition of observance of the license MIT). Suddenly and to you it is necessary to unclench something in the conditions of catastrophic memory contention and a processor resource.

This article is a translation of the original post at habrahabr.ru/post/273047/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus