Developers Club geek daily blog

1 year, 10 months ago
image

It is a little protection against code injection, but
This method not the panacea, but complicates life of injections of a code a little.

Lyrics


Since with each jump of high-level programming less people understand the assembler, that is sense to reflect:
And what if the program which you perform is not it?
Or, and that if the virus replaces pieces of the program which you use?
Smart people in far 80kh thought up one recipe to confirm integrity of executive files and their separate pieces — hashes. Usually all releases of libraries are delivered with a hash or the digital signature whether to check this author delivers to us a liba, or the application whether it was changed by nobody except it.
There are languages (With, C ++) which does not support this feature in a rantayma (such as in Oberon in whom there are a few sensible ideas as modules for example), but With is good the fact that with his straight arms it is possible to finish a little a file. At big desire it is also possible to finish About the compiler, but it is other history.
Why you should not trust anybody?
There are many versions of the answer to this question. Part of them in a comic type.

Theoretical part


Any executable code or data are information (bytes).
Hash — convolution function, i.e. we give on an input of n of bytes we receive m where m — a constant is long a hash, n — the variable is long input data.
In this case we will be needs cryptoresistant a hash function because than less opportunity to find the "correct" collision the better for us, and the code injection at the correct hash is less probable.
Code injection — a type of attack when something goes not so, and the unauthorized user adds the performed data to the program.

Practical part (Receipt of a hash)


How to learn the size and the address began functions? Still Myshchkh wrote about it, but actually very strongly depends on the compiler and optimization, in case of not strong optimization the sequence of functions is defined by the programmer i.e. if we write:

void some(int *trains) {
	printf("cho, chooo, motherfucker\n");
	++*trains;
}

void endSome() {}


That f-tion endSome will be located below, than some (address of labels)
Thus it will be possible to learn the size of op/baytkod of the some function.

It is a little check from LLDB

We use
disassemble - n "imya_funktion" for receipt of opkod of a function body

disassemble -n some
RayLanguage`some:
    0x10231e7b0 <+0>:  pushq  %rbp
    0x10231e7b1 <+1>:  movq   %rsp, %rbp
    0x10231e7b4 <+4>:  subq   $0x10, %rsp
    0x10231e7b8 <+8>:  leaq   0x2be9(%rip), %rax        ; "cho, chooo, motherfucker\n"
    0x10231e7bf <+15>: movq   %rdi, -0x8(%rbp)
    0x10231e7c3 <+19>: movq   %rax, %rdi
    0x10231e7c6 <+22>: movb   $0x0, %al
    0x10231e7c8 <+24>: callq  0x102321050               ; symbol stub for: printf
    0x10231e7cd <+29>: movq   -0x8(%rbp), %rdi
    0x10231e7d1 <+33>: movl   (%rdi), %ecx
    0x10231e7d3 <+35>: addl   $0x1, %ecx
    0x10231e7d9 <+41>: movl   %ecx, (%rdi)
    0x10231e7db <+43>: movl   %eax, -0xc(%rbp)
    0x10231e7de <+46>: addq   $0x10, %rsp
    0x10231e7e2 <+50>: popq   %rbp
    0x10231e7e3 <+51>: retq   

(lldb) disassemble -n endSome
RayLanguage`endSome:
    0x10231e7f0 <+0>: pushq  %rbp
    0x10231e7f1 <+1>: movq   %rsp, %rbp
    0x10231e7f4 <+4>: popq   %rbp
    0x10231e7f5 <+5>: retq   



And there is a little arithmetics:

0x10231e7f0 − 0x10231e7e3 = 0хd


13 bytes something there what it can be?

Most likely it is spacind nop for alignment from the assembler, lldb hurries to the aid.
We use disassemble - s "the address in hex" to look so at it or not.


(lldb) disassemble -s 0x10231e7e3
RayLanguage`some:
    0x10231e7e3 <+51>: retq   
    0x10231e7e4 <+52>: nopw   %cs:(%rax,%rax)

RayLanguage`endSome:
    0x10231e7f0 <+0>:  pushq  %rbp
    0x10231e7f1 <+1>:  movq   %rsp, %rbp
    0x10231e7f4 <+4>:  popq   %rbp
    0x10231e7f5 <+5>:  retq   
    0x10231e7f6 <+6>:  nopw   %cs:(%rax,%rax)

RayLanguage`main:
    0x10231e800 <+0>:  pushq  %rbp

And yes, RTFM says that it "the assembler (not the compiler) pads code up to the next alignment boundary with the longest NOP instruction it can find that fits. This is what you're seeing."

Anyway, if you do not use a heap of assembly hak, for self-modification of a code, then spacing from nop'ov owes them and remain, i.e. it is part of the some function too. Thus the some size is the size from the beginning of the some function prior to the endSome function.

size_t sizeOfSome = (size_t)&endSome; - (size_t)&some;


Three conditions for a start hashing are satisfied (the continuity of opkod of function, knowledge began function and its size).
Thus it is possible to take any cryptoresistant hash and to hash a function body:


size_t sizeOfSome = (size_t)&endSome; - (size_t)&some;

unsigned char *body = malloc(sizeOfSome);
memcpy(body, some, sizeOfSome);

unsigned char *hash = someHash(body, sizeOfSome);


where hash — will be a required hash

Verification of hashes


Here it is necessary to consider two points:
  1. verification has to happen not only at start of the program, but also after a while will repeat (if the period of repetitions accidental in general it is good)
  2. to check at start it is necessary to dopilivat the loader and the compiler + the assembler
  3. it is advisable to store reference hashes somewhere


Let's stop on the 3rd point which has several options too:
  1. To store hashes in the separate file
  2. Sewn up in a program memory (the ciphered or open type)
  3. Sewn up in the program a hypervisor


It is not difficult to understand that the third option the most adequate since theoretically the user program has no access to the program to a hypervisor, i.e. is not present methods to change hashes.
The second option more or less since it is possible to cipher hashes and to rasshifrovvyvat them when it is necessary to verify, and it does pain to the analyst who will sort your code. In open form it is possible to protect pages of memory where there are hashes on readonly.
The first option is least good since there are many methods to replace data in the file, but it is possible to use POSIX ACL to deliver to readonly.

Anyway it is possible to propatchit this check of hashes. And as told one of the famous persons on the demostsena — if it is impossible to make keygen always it is possible to make a patch.

For the third type it is possible to make the list of operating conditions of the program still:
  1. Does not give to the program to work without provided data (the list of pointers + lengths)
  2. Without the provided access to a body of functions (by default the hypervisor has it)
  3. Verifications of a hash of a randomizirovanna on time.


Check of implementation with lldb


To see opkoda, we use the option disassemble - b

 disassemble -b -n some
RayLanguage`some:
    0x1079ac500 <+0>:  55                    pushq  %rbp
    0x1079ac501 <+1>:  48 89 e5              movq   %rsp, %rbp
    0x1079ac504 <+4>:  48 83 ec 10           subq   $0x10, %rsp
    0x1079ac508 <+8>:  48 8d 05 99 2e 00 00  leaq   0x2e99(%rip), %rax        ; "cho, chooo, motherfucker\n"
    0x1079ac50f <+15>: 48 89 7d f8           movq   %rdi, -0x8(%rbp)
    0x1079ac513 <+19>: 48 89 c7              movq   %rax, %rdi
    0x1079ac516 <+22>: b0 00                 movb   $0x0, %al
    0x1079ac518 <+24>: e8 35 2b 00 00        callq  0x1079af052               ; symbol stub for: printf
    0x1079ac51d <+29>: 48 8b 7d f8           movq   -0x8(%rbp), %rdi
    0x1079ac521 <+33>: 8b 0f                 movl   (%rdi), %ecx
    0x1079ac523 <+35>: 81 c1 01 00 00 00     addl   $0x1, %ecx
    0x1079ac529 <+41>: 89 0f                 movl   %ecx, (%rdi)
    0x1079ac52b <+43>: 89 45 f4              movl   %eax, -0xc(%rbp)
    0x1079ac52e <+46>: 48 83 c4 10           addq   $0x10, %rsp
    0x1079ac532 <+50>: 5d                    popq   %rbp
    0x1079ac533 <+51>: c3                    retq   


Data from the program:


RData object - 0x7f83ebc054a0 [64] {
55 48 89 E5 48 83 EC 10 48 8D 05 99 2E 00 00 48 89 7D F8 48 89 C7 B0 00 E8 35 2B 00 00 48 8B 7D 
F8 8B 0F 81 C1 01 00 00 00 89 0F 89 45 F4 48 83 C4 10 5D C3 66 66 66 2E 0F 1F 84 00 00 00 00 00 
} - 0x7f83ebc054a0 

Base64 evasion hash:
wRagc6tuimNnTlTSbyOe+BT6QAkWVDXVEjWktrG4a+Zm/2U2mgeeTr286yLE2lB3rVqihtQ2Fsb7eEvTocnEqg==


Since okoda the copied and shown lldb match, it is possible to tell that function is copied from memory correctly.

Pluses and minuses


Pluses:
  1. Heshevy version control performed a component (we break a program into modules and submodules) and further as on reference hashes we look if the hash of a certain liba changed and the programmer forgot to consider it, then it is bad. Such approach abruptly looks at dynamic loadable modules, as in more high-level languages.
  2. Control of integrity performed a component.
  3. Identification and authentication of separate modules / functions
  4. Sticks in wheels to analysts of a code during injection it (is invaluable).


Minuses:
  1. Lowers performance (as it is strong, depends on time of taking of a hash, the number of the hashed functions)
  2. For option with a hypervisor hemorrhoids of writing of a correct hypervisor.


A few obvious experiments or as it is possible not to shoot at a knee



void some(int *trains) {
    byte some[6] = "hello";
    printf("cho, chooo, motherfucker\n");
    ++*trains;
}

RData object - 0x7ffe6ae00050 [80] {
55 48 89 E5 48 83 EC 20 48 8D 05 AF 2E 00 00 48 89 7D F8 8B 0D 9F 2E 00 00 89 4D F2 66 8B 15 99 
2E 00 00 66 89 55 F6 48 89 C7 B0 00 E8 31 2B 00 00 48 8B 7D F8 8B 0F 81 C1 01 00 00 00 89 0F 89 
45 EC 48 83 C4 20 5D C3 0F 1F 84 00 00 00 00 00 
} - 0x7ffe6ae00050 

aDO1L9KThmWe3NPBQuxgqkqcd72TkCxa2bJmzSiLdNq8KXbjls7cd38FPSQJQ82RTitb1qwZZcdlf1l5MP521A==

void some(int *trains) {
    byte some[6] = "lol";
    printf("cho, chooo, motherfucker\n");
    ++*trains;
}

RData object - 0x7fa222c057e0 [80] {
55 48 89 E5 48 83 EC 20 48 8D 05 AF 2E 00 00 48 89 7D F8 8B 0D 9F 2E 00 00 89 4D F2 66 8B 15 99 
2E 00 00 66 89 55 F6 48 89 C7 B0 00 E8 31 2B 00 00 48 8B 7D F8 8B 0F 81 C1 01 00 00 00 89 0F 89 
45 EC 48 83 C4 20 5D C3 0F 1F 84 00 00 00 00 00 
} - 0x7fa222c057e0 

aDO1L9KThmWe3NPBQuxgqkqcd72TkCxa2bJmzSiLdNq8KXbjls7cd38FPSQJQ82RTitb1qwZZcdlf1l5MP521A== EQUAL



It is possible to draw two conclusions from above what was seen at once
  1. Replacement of constants does not lead to replacement of a baytkod since they are stored in other place.
  2. Adding of variable (constants) leads to change of a baytkod



void some(int *trains) {
    byte some[4] = "lol";
    printf("cho, chooo, motherfucker\n");
    ++*trains;
}

RData object - 0x7ffb3a600040 [64] {
55 48 89 E5 48 83 EC 10 48 8D 05 9D 2E 00 00 48 89 7D F8 8B 0D 8F 2E 00 00 89 4D F4 48 89 C7 B0
00 E8 2C 2B 00 00 48 8B 7D F8 8B 0F 81 C1 01 00 00 00 89 0F 89 45 F0 48 83 C4 10 5D C3 0F 1F 00
} - 0x7ffb3a600040

Wiq45/M7ES5TOgkUNVUdn04OxsTF/ej76Wj9B7ItE/eYrU1f18nX5IT696fymYtlYj8drf9AtgPCStQPR0CEpg==


3. Size variation of a stack constant leads to change of a baytkod too

P.S


the used debagger of LLDB + lldb - help
the used library for taking of a hash and other operations
a few source codes can be found in a kommita (d3c3d5d).

This article is a translation of the original post at habrahabr.ru/post/272561/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus