Developers Club geek daily blog

3 years, 7 months ago
Тестирование в CRather recently there was article "Semi-automatic Registration Unit Tests on Pure With" in which the author has shown task solution with use of counters from Boost. Following the same principle, (successful) attempt to repeat this experiment already without use of Boost for availability illogicality reason in the project on the C dependence on Boost moreover and in such small volume has been made. Thus in tests there were auxiliary directives of preprocessor at large number. And all and remained, but practically on closing stage the alternative way of registration which allows completely has been found will get rid of additional actions. It is C89 solution for registration of tests and a little more exacting to system to assembly solution for registration of test sets.

Motivation of all this simple and clear, but for completeness it is worth designating it briefly. In lack of car registration it is necessary to deal or with set/insert of the repeating code, or with generators, external in relation to the compiler. The first to do reluctantly plus this occupation is subject to errors, the second adds excess dependences and complicates build process. The idea to use C ++ in tests only for the sake of this opportunity when all the rest is written on C, causes feeling of firing from gun on sparrows. To all this, in principle, it is interesting to solve problem at the same level at which it has arisen.

Let's define ultimate goal as something similar is one code lower with additional condition that names of tests do not repeat except place of their definition anywhere, i.e. they are taken one both only once and further are not copied by any self-oscillator.

TEST(test1) { /* Do the test. */ }
TEST(test2) { /* Do the test. */ }

After small retreat for entering of determinancy into terminology, it will be possible to start search of solution.

Terminology and expected structure of tests


Different test frameworks it is uncoordinated use words for designation of separate tests or their groups. Therefore we will define some words obviously, and at the same time we will show their value on the example of quite widespread structure of tests.

The collection of tests ("suite") will be understood as group of test sets ("fixture"). It is the greatest structural unit of hierarchy. Sets in turn group tests in collection. Tests already in itself. Quantity of elements of each type the any.

Same graphically:

Car registration of tests on With means of language

Each bigger level integrates elements smaller and optionally adds procedures of preparation ("setup") and end ("teardown") of tests.

Registration of tests in sets

Never let your sense of morals prevent you from doing what is right.
— ISAAC ASIMOV, Foundation

Separate tests are added more often than the whole sets therefore also car registration is more actual for them. Also all of them are located within one unit of broadcasting that simplifies task solution.

So, it is necessary to organize storage of the list of tests means of language, without using thus preprocessor as the general manager element. Failure from preprocessor means that we remain without explicit counters. But availability of the counter almost obligatory if it is necessary is unique to identify tests and, in general, somehow to address to them, and not just to declare. Thus near at hand always there is built-in macro __LINE__, on it is necessary to think up as it it is possible to apply in this situation. There is also one more restriction: some explicit assignment to elements of global array on similarity

test_type tests[];

static void test(void) { /* Do the test. */ }
tests[__LINE__] = &test;

do not approach as out of functions such operations simply are not supported at the level of language. The initial situation looks is not really iridescent:

  1. There is no opportunity to store neither intermediate nor final statuses.
  2. There is no opportunity to define incoherent elements, and then to gather them.
  3. As the result, is absent opportunity to define coherent structure (generally array, but the list too will approach, there would be way), because of impossibility to refer to the previous entity.

But not everything is so hopeless as can seem. Let's provide ideal option as though we have what is not enough. In this case the code after deployment of auxiliary macroes could look approximately as follows:

MagicDataStructure MDS;

static void test1(void) { /* Do the test. */ }
MDS[__LINE__] = &test1;

static void test2(void) { /* Do the test. */ }
MDS[__LINE__] = &test2;

static void fixture(void)
{
    int i;
    for (i = 0; i < MDS.length; ++i) {
        MDS.func[i]();
    }
}

Things are easy: to implement "magic" structure which, by the way, is suspiciously similar to array of the predetermined size. It makes sense to think, as if we worked be it array actually:

  1. Would define array having initialized all elements NULL.
  2. Would appropriate values to separate elements.
  3. Would bypass all array and have caused everyone not -NULL element.

This set of operations — it everything that is necessary for us and does not look too unreal, perhaps, arrays really here will be useful. By definition, the array — is set of the same elements. Normally it is some one entity with indexing operation support, but it makes sense to consider the same array as group of separate elements. Let's tell, whether it

int arr[4];

whether

int arr0, arr1, arr2, arr3;

At the moment and in the light of availability of mentioning of macro __LINE__ above, it has to be already clear where the author drives. It was necessary to understand how it is possible to implement pseudo-array with support of assignment at compilation stage. It is represented entertaining exercise therefore it is worth waiting a little a little more with demonstration of ready solution and to ask the following questions:

  1. What entity in C can appear more once and not cause thus compilation error?
  2. What can be treated by the compiler on miscellaneous depending on context?

Think of heading files. After all that in them, usually is present still somewhere at code. For example:

/* file.h */
int a;

/* file.c */
#include "file.h"
int a = 4;
/* ... */

Thus everything perfectly works. Here example, closer to task:

static void run(void);

int main(int argc, char *argv[])
{
    run();
    return 0;
}

static void run(void) { /* ... */ }

Quite to itself ordinary code which can be expanded a little for obtaining desirable functionality:

#include <stdio.h>

static void (*run_func)(void);

int main(int argc, char *argv[])
{
    if (run_func) run_func();
    return 0;
}

static void run(void) { puts("Run!"); }
static void (*run_func)(void) = &run;

It is offered to the reader independently it will be convinced that change of order or commenting of the last mentioning run_func it will be coordinated with expectations, i.e. if run_func have not reassigned, the only element of "single-element array" (run_func) it is equal NULL, otherwise he points to function run(). Lack of dependence from order important property which allows to hide all "magic" in the heading file.

Higher easily to make macro for autoregistration which declares function of example and saves the pointer on it in the variable numbered by means of value of macro __LINE__. Except the macro it is necessary to list all possible names of variables pointers and to cause them on one. Here almost full solution apart from availability of "excess" code which has to be hidden in the heading file, but it already parts:

/* test.h */
#define CAT(X, Y) CAT_(X, Y)
#define CAT_(X, Y) X##Y

typedef void test_func_type(void);

#define TEST(name) \
    static test_func_type CAT(name, __LINE__); \
    static test_func_type *CAT(test_at_, __LINE__) = &CAT;(name, __LINE__); \
    static void CAT(name, __LINE__)(void)

/* test.c */
#include "test.h"
#include <stdio.h>

TEST(A) { puts("Test1"); }
TEST(B) { puts("Test2"); }
TEST(C) { puts("Test3"); }

typedef test_func_type *test_func_pointer;
static test_func_pointer test_at_1, test_at_2, test_at_3, test_at_4, test_at_5, test_at_6;
int main(int argc, char *argv[])
{
    /* Это упрошённая версия для наглядности, на самом деле указатели стоит
     * поместить в массив. */
    if (test_at_1) test_at_1();
    if (test_at_2) test_at_2();
    if (test_at_3) test_at_3();
    if (test_at_4) test_at_4();
    if (test_at_5) test_at_5();
    if (test_at_6) test_at_6();
    return 0;
}

For clarity it can be useful to look at result macro - substitution from which the fact of impossibility of placement more than one dough in line follows that, however, more than it is acceptable.

static test_func_type A4; static test_func_type *test_at_4 = &A4; static void A4(void) { puts("Test1"); }
static test_func_type B5; static test_func_type *test_at_5 = &B5; static void B5(void) { puts("Test2"); }
static test_func_type C6; static test_func_type *test_at_6 = &C6; static void C6(void) { puts("Test3"); }

The link to full implementation will be given below.

Why it works


Now has come it is time to understand that here occurs, in more detail and to answer question why it works.

If to remember example with headings, it is possible to select a little possible option of how members of data can be provided in code:

int data = 0;    /* (1) */
extern int data; /* (2) */
int data;        /* (3) */

(1) unambiguously is definition (so and declaration too) because of presence of the initialiser.

(2) is only declaration.

(3) (our case) is declaration and, возможно, definition. Lack of key word extern and does not leave the initialiser to the compiler of other choice except how to postpone decision-making for the account of what is this operator ("statement"). This "fluctuation" of the compiler is also operated for emulation of car registration.

Just in case some examples with comments finally to clear situation:

int data1; /* Определение, так как больше нигде не встречается. */

int data2 = 1; /* Определение, из-за инициализатора. */
int data2;     /* Объявление, так как определение уже было. */

int data3;     /* Изначально, неизвестно, но после обработки следующей строки
                * становится понятно, что объявление. */
int data3 = 1; /* Определение, из-за инициализатора. */

/* Ключевое слово static ничего в этом плане не меняет. */
static int data4;     /* Изначально, неизвестно, но после обработки следующей
                       * строки становится понятно, что объявление. */
static int data4 = 1; /* Определение, из-за инициализатора. */
static int data4;     /* Объявление, так как определение уже было. */

int data5; /* Неизвестно, но в отсутствии определений считается определением. */
int data5; /* Аналогично, эти два "неизвестно" считаются за одно. */

int data6 = 0; /* Определение, из-за инициализатора. */
int data6 = 0; /* Ошибка, повторное определение. */

For us two cases are important:

  • There are only declarations. In this case the variable is initialized by zero by which it is possible to determine lack of dough in the corresponding line.
  • There is at least one declaration and exactly one definition. In the corresponding variable the function address with dough is brought.

Here, actually, and everything that is necessary for implementation of required operations and receiving automatic registration by the worker. This duality of some operators in the text allows to unroll array element-wise and "to appropriate" values of part of array.

Features and shortcomings


It is clear that if we do not want to insert macro at the end of each file with tests which would serve as marker of the last line, it is necessary to be put initially on some maximum quantity of lines. Not the best option, but not the worst. Let's tell, one test file will hardly contain in itself more than one thousand lines and it is possible to stop the choice on this upper bound. There is one not really pleasant the moment: if in that case tests are defined at line with number big 1000, they will lie dead load and will never be caused. Fortunately, there is simple solutions option: it is enough to compile tests with flag -Werror (less rigid option: with -Werror=unused-function) and similar files will not be compiled. (UPD2: in comments have prompted as simpler to resolve this issue and with automatic interruption of compilation using STATIC_ASSERT. There is enough in each macro TEST to insert check on admissible value __LINE__.)

Sufficiency of approach with the fixed array generally not the only reason for which it is better to record the maximum quantity of lines in advance. If not to make it, the corresponding declarations (in place of call of tests) need to be generated in compilation time that can slow down noticeably it (it not guess, but result of attempts). Here it is simpler not to complicate, benefit from receiving opportunity to compile files of any size, apparently, does not cost that.

In example with macro TEST() above use of the pointer on function is visible, it only one record about the father-in-law, but most likely will want to add more. It to make the wrong way: to add parallel pseudo-arrays. It will only increase compilation time. Correct way: to use structure, in this case adding of new fields manages almost free of charge.

For real processing (not copying of code) of elements of pseudo-array it is necessary to create the real array. Not the best solution will place in this array of value of the same pointers on function (or to copy structures with information on tests) as it will make the initialiser not constant. And here the placement of pointers on pointers will allow to make array static that will exempt the compiler from need to generate code for assignment of values on stack during execution, and also will reduce compilation time.

This solution is initial was born for implementation of transparent registration setup()/teardown() functions and only then it has been applied to tests. In principle it suits for any functionality which can be redefined. It is enough to insert declaration of the pointer and to provide macro for its redefinition if the macro was not used, the pointer will be equal to zero, otherwise — to the value determined by the user.

Messages of the compiler on errors of the top level in tests can surprise with the volume, but it will occur in quite exceptional cases of lack of the completing semicolon and similar syntax errors.

At last it is possible to evaluate result of efforts:
Test set to:
static void
teardown(void)
{
    /* ... */
}

static void
test_adds_fine_first_time(void)
{
    /* ... */
}

static void
test_errors_on_second_addition(void)
{
    /* ... */
}

void
addition_tests(void)
{
    test_fixture_start();

    fixture_teardown(teardown);

    run_test(test_adds_fine_first_time);
    run_test(test_errors_on_second_addition);

    test_fixture_end();
}
Test set after:
TEARDOWN()
{
    /* ... */
}

TEST(adds_fine_first_time)
{
    /* ... */
}

TEST(errors_on_second_addition)
{
    /* ... */
}

Registration of test sets in collections

A trick is a clever idea that can be used once, while a technique is a trick that can be used at least twice.
— D. KNUTH, The Art Of Computer Programming 4A

Close in something to previous the task, but is couple of essential differences:

  1. Interesting characters (functions/data) are defined in different compilation units.
  2. And, as a result, there is no counter similar __LINE__.

Owing to the first point the trick from the previous section in pure form here will not work, but the main idea remains former while means of its implementation will a little exchange.

As it was mentioned in the beginning, in this part some move forward the additional requirement by Wednesday, namely to assembly system which has to be able to appropriate to files identifiers in the range [0, N), where N represents the maximum quantity of test sets. Besides, border on top, but, say, hundred sets in each collection of tests have to be enough for much.

If last time all for us performed "dirty work" the compiler, the turn has come this time to work to the binder (it is "linker"). In each unit of broadcasting it is necessary to define point of entry, using that file id, and in the main file of collection of tests to check characters for availability and to cause them.

One of possible options is use of "weak characters". In this case functions almost everywhere are defined as usual, but in the main file they are marked with attribute weak (somehow so: __attribute__((weak))). Obvious shortcoming is the requirement of availability of support of weak characters from the compiler and the binder.

If to think a little of structure of weak characters, their similarity with pointers on function becomes noticeable: indefinite weak characters are equal to zero. It turns out that it is possible to do without them at all: it is enough to define pointers on function as well as earlier, but without key word static. Use of pointers in explicit form brings also additional benefit in the form of lack of automatically generated name in the list of frames of stack.

On it the first difference from test sets can be considered reduced to already known solution. There is definition of the relation of order between broadcasting units. In the file there is not enough information for execution of this task therefore information from the outside is necessary. Here for each assembly system there will be parts of implementation, the example for GNU/Make will be given below.

Definition of the order is rather trivial, let it will be file name position in the sorted list of all files making collection of tests. You should not worry about supporting files without tests, they will not prevent as at most, will create admissions in numbering that is insignificant. This information will be transmitted through macro definition by means of compiler flag (-D in this case).

Actually, function of definition of the identifier:

pos = $(strip $(eval T := ) \
              $(eval i := 0) \
              $(foreach elem, $1, \
                        $(if $(filter $2,$(elem)), \
                             $(eval i := $(words $T)), \
                             $(eval T := $T $(elem)))) \
              $i)

The first argument the list of all names of files, and the second name of the current file is expected. Returns index. Function not the most trivial by sight, but it performs the work regularly.

Adding of the identifier TESTID (here $(OBJ) stores the list object files):

%.o: %.c
	$(CC) -DTESTID=$(call pos, $(OBJ), $@) -c -o $@ $<

On it practically all difficulties are overcome and it is necessary only to use the identifier in code, for example, so:

#define FIXTURE() \
    static void fixture_body(void); \
    void (*CAT(fixture_number_, TESTID))(void) = &fixture;_body; \
    static void fixture_body(void)

In the main file of collection of tests there have to be corresponding declarations and their bypass.

The remained difficulties


At increase in number of files is higher than the set limit, some of them can "drop out" of our viewing field as it could do happens to tests. This time the solution will require additional check of compilation time. Whether at in advance known number of files in collection they easily will not be to check superfluous. In fact, it is enough to provide to each unit of broadcasting access to this information by means of one more macro:

    ... -DMAXTESTID=$(words $(OBJ)) ...

It is necessary only to add verification of presence of enough declarations by means of something it seems:

#define STATIC_ASSERT(msg, cond) \
    typedef int msg[(cond) ? 1 : -1]; \
    /* Fake use to suppress "Unused local variable" warning. */ \
    enum { CAT(msg, _use) = (size_t)(msg *)0 }

There is a little less obvious problem of the conflict (double definition) of functions at adding/removal of files of test sets. Similar changes cause shift of indexes and demand recompilation of all files which have been affected by it. Here it is worth remembering check of dates of modification of files assembly systems and updating of date of the directory at change of its structure, i.e. actually it is necessary to add dependence on directory in which it is located to each compiled file.

As a result, the rule of compilation of the file with tests assumes similar air:

%.o: %.c $(dir %.c)/.
	$(CC) -DTESTID=$(call pos, $(OBJ), $@) -DMAXTESTID=$(words $(OBJ)) -c -o $@ $<

Having gathered everything, it is possible to observe the following transformation of definition of collection of tests:
Collection of tests to:
void addition_tests(void);
void deletion_tests(void);
void expansion_tests(void);

static void
setup(void)
{
    /* ... */
}

static void
all_tests(void)
{
    addition_tests();
    deletion_tests();
    expansion_tests();
}

int
main(int argc, char *argv[])
{
    suite_setup(setup);
    return run_tests(all_tests) == 0;
}
Collection of tests after:
DEFINE_SUITE();

SETUP()
{
    /* ... */
}

Additional optimization


Need for periodic recompilation and some deceleration of processing of each file set thinking on ways of compensation of these costs. Let's remind some of the available opportunities.

The precompiled heading. Time complex code is long processed by the compiler, it will be logical to prepare result of processing once and to pereispolzovat it.

Use of ccache for acceleration of repeated compilation. The good idea in itself, for example, allows to switch between branches of repositories unlimited number of times and not to wait for full recompilation: total time will be will be defined first of all by the speed of pulling of data from cache.

- pipe compiler flag (if it is supported). Will reduce number of file operations due to use of add-in random access memory.

Shutdown of optimization and exception of the debug information. In normal situation it should not affect work of tests, except some acceleration of compile process in any way.

To what it everything here? Possible deterioration of productivity of compilation was mentioned above several times and there is a wish to provide means of fight against it, and also to smooth effect in couple of notes a little:

  • Falling of productivity first of all is noticeable at full reassembly of tests and in regular situation is not so critical.
  • Before application of the approach to tests described above, time of full reassembly of tests (with the subsequent start) in case of the author made 6,5 sec. After — has increased to 13 sec., but optimization as code of declaration of tests and process of their assembly have corrected situation, having improved indicator to 5,5 sec. Acceleration of build process of the former version of tests has improved time to 5,7 sec. that (surprisingly) even there is a little more than a compilation time of the current option.

Links


Initially for writing of tests seatest in which arranged practically everything was used, but there was no car registration. By results of the above described activity on the basis of seatest stic (a few C99 is used there, but it is not obligatory generally) adding lacking from the point of view of the author has been made. Exactly there it is possible to look at the implementation parts lowered here, namely in the heading stic.h file. The chosen intermediate sketches are available in separate repository. The example of integration can be found here in this Makefile (its understanding requires knowledge of syntax).

Results


Judging by the list in Wikipedia, stic can be the first successful attempt of implementation of car registration means of C (naturally, with caution on the described restrictions). All checked alternatives turn on external generators of the list of tests (UPD: in comments prompt about way of registration of tests close to implementation of call of static designers in C ++ that, however, demands availability of the corresponding support from the compiler and the binder, but approach definitely deserves attention). The advantage of this way not only in lack of additional dependences, but also universality (the compiler will not make mistake because of #ifdef, in difference from third-party script) and relative simplicity of collecting optional data about tests. For example, it was quite simple to add predicate of start of dough in look:

TEST(os_independent)
{
    /* ... */
}

TEST(unix_only, IF(not_windows))
{
    /* ... */
}

Let everyone solve for himself, but the author unambiguously had liked way, process and result which has replaced with itself seatest now, has simplified process of adding of tests and has reduced the volume of tests already by 3911 lines that makes about 16% of their former size.

This article is a translation of the original post at habrahabr.ru/post/252439/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus