Developers Club geek daily blog

1 year, 6 months ago
All hi. I wanted to touch this subject long ago and to write something similar, but hands did not reach in any way. Today I decided, and we will sort structure of the ELF file (the performed file on * nix-like systems), and we will write the simple program under x86 Linux in machine codes which will display the message. But here not everything is so unambiguous, believe me.

I wanted to begin with the end. Namely from what will be done by our program. Our program — no other than a heap of machine code which, afterwards, will be performed by system. As the deputy of numeration system of Hex I will use "Wct" because it is much more convenient because there is online the compiler and an opportunity to insert lines on the run and to use decimal numbers. At us it will display one line of the text.

image


In general, ELF — the format of binary files used in many modern UNIX-like operating systems, such as FreeBSD, Linux, Solaris, etc. But, as they say, a picture is worth a thousand words, than many times to hear. I ask, below you can contemplate analysis of the performed file for Linux OS.

At once I will make a reservation, you can find the table of op-codes for architecture of x86 here.
Before we plunge into this "nyasha" (in literal and figurative sense), I would like to warn you that this article can take out your brain, or partially injure it. You are warned. If you venture I ask to read further.

As I already spoke earlier, ELF is type of the performed files under * nix-like systems. But how the system will define whether the performed file is suitable under it, and also how the system defines that it is ELF structure, but not PE, we will allow?

Everything is extremely simple, is right at the beginning put sinkhrobayt "Ho" then the sequence of ASCII characters "ELF" follows that helps an axis to understand what the animal such, our file is.

		// Заголовок исполняемого файла
Ho "ELF"	// Подпись .ELF, где "Ho" - специальный символ, а
		// "ELF" - ASCII символы


Important — byte of "Ho" — zero byte. It should be known by heart because it is possible to lose count and to be lost.

So, I congratulate, we met such part of the program as "heading". It contains important information for execution of the file, such as — digit capacity of system, the version of ELF structure of the file, OS ABI … but we will talk about these strange words slightly later.

So, well and time we use Wct, let me tell as to what.
In total in Wct 16 characters — A B C D E F G H I J K L M N P O where O goes after P are used, it confuses beginners, but over time get used. By the way, the abbreviation of "Wct" is decrypted as "Weird Coding Tool" — a strange thing for programming, a koding. Yes, and Wct I use only because it is easier remembered and more conveniently used — inserts of decimal numbers it conveniently, isn't that so?

So, we set foot on land of the evil. We continue to investigate our file.
Further there is a digit capacity of system which is one byte which can be either "B", or "C" where "B" indicates that system 32nd a razryadn, and C — 64 a razryadn. It is very important because further we will need to use the table of heading or for the 32 bit system, or for the 64th bit, and they cardinally differ from each other.

Ab		// или 01, где 1 (B) - 32-х битная архитектура,
		// а 2 (C) - 64-х битная, думаю, что это уж точно понятно


Well, well. Yes, digit capacity of system — important argument, but we still need to know what sequence of bytes we will use — Little Endian, or Big Endian. What their difference and in general, what is it? consists in

Little Endian is a byte order, in this case — from younger to the senior, tobit so — Ab aa aa aa, and it will be number of "Bw", that is, unit. In case of Big Endian all with an accuracy and on the contrary is an order from the senior to younger or (English big-endian, literally: "blunt-pointed"): An — Ao, record begins with the senior and comes to an end with younger. By the way, about Little-Endian and Big-Endian already wrote on "Habrakhabr" here.

Ab		// B = Little Endian, C = Big Endian
		// Это порядок байтов. В нашем случае -
		// Little Endian, или же, порядок от младшего
		// к старшему, тобишь - Ao - An...


So, we dealt with it. But it is, by no means, not everyone, take me the word, we also did not consider a half.
Now the most interesting will begin.

Ab		// Версия ELF-структуры файла


We use the original version of ELF structure of the file so we will just leave "B".
And now we reached to "OS ABI".

The binary (binary) applications interface is a set of the agreements between programs, libraries and an operating system providing interaction of these components at a low level on this platform. It is necessary for providing to digit capacity data types, a transmission format of arguments and a returned value by a challenge f-tsii, structure and a format of system calls and files. We will use "System V" ABI for writing of the program on Linux, ABI, from the point of view of the program — that other as an operating system, having completely implemented ABI of this or that operating system in the system, you will be able to execute "nonnative" programs as they are executed on a "native" platform.

Ad		// Это у нас "OS ABI" - двоичный интерфейс приложений,
		// набор соглашений между программами, библиотеками
		// и операционной системой, обеспечивающих взаимодействие
		// этих компонентов на низком уровне на данной платформе.
		// В данном случае - ABI для 32-х битного линукса.


There are further reserved bytes which are used for "pedding", or are not used in general.

Aa aa aa aa	// Не используется...
aa aa aa aa	// В любом случае, оно для чего-то нужно


Here precisely not to do us without it in any way — these two bytes designate what is the file.
Our file is performed, but also there are also other types of files, see below.

The feasible (performed) module, the performed file — the file containing the program in a type in which it can be performed by the computer. Before execution the program is loaded into memory, and some preparatory operations are executed.

Ac aa		// Тип исполняемого файла, где
	        	// B = изменяемый, C = исполняемый, D = общий, E = ядро


Very interesting part — a set of instructions. A set of instructions is what the processor at execution of the program will use. In this case, this set of instructions for architecture of x86 will be used.
We will see commands for the processor right at the end where the program code will be located.

Instruction.
To use a set of instructions for x86 — it is necessary to specify "Ad" by the first byte that x86_64 — "Dp", ARM — "Ci" and so on.

Ad aa		// Набор инструкций. Сейчас мы работаем с набором
		// инструкций процессора типа "x86", но если захотим
		// писать программу для другого проца, то и сет инструкций
		// там будет другой.


We reached point of entry the program. The point of entry in the program — the pointer which shows to system on that place where headings come to an end and begins the program. Our program begins with the room of number of "Ae" (4) in EAX, but about it is slightly later.

Ab aa aa aa	// Повтор версии ELF структуры...
He IA AE	// Точка входа в программу. Одна из важнейших
		// частей в программе.Ab aa aa aa	// Повтор версии ELF структуры...
He IA AE	// Точка входа в программу. Одна из важнейших
		// частей в программе.


Ai DE AA	// Расположение таблицы заголовков секций	>———————┐
		//							│
Aa aa aa aa aa	//							│
Aa aa aa aa aa	//						      	│
		//						      	│
De aa		// Размер заголовка					│
Ca aa		// Размер таблицы заголовков программы			│
Ac aa		// Кол-во записей в таблице заголовка программы		│
Ci aa		// Размер записи в таблице заголовков			│
Aa aa		// Кол-во записей в таблице раздела заголовков		│
Aa aa		// Список в разделе "таблицы заголовков" с именами	│
		//							│
/*		//							│
		//							│
Часть 2 -								│
Заголовок программы							│
									│
*/		//							│
		//							│
Ab aa aa aa 	// Тип сегмента, у нас - B, значит		<———————┘
		// байты p_memsz по адресу p_vaddr будут
		// очищены, после чего будет произведено
		// копирование байтов p_filesz со смещением		>———————————————┐
		// p_offset в p_vaddr...						│
		//									│
He aa aa aa	// Смещение в файле, по которому могжет быть	>———————┐		│
		// найдена информация для данного сегмента (p_offset)	│		│
		//							│		│
He ia ae ai		// Место, где этот сегмент должен	>———————┼———————┐	│
			// размещаться в виртуальной памяти (p_vaddr)	│	│	│
		//							│	│	│
He ia ae ai	// UNDEFINED для системы V ABI				│	│	│
Bo aa aa aa	// Размер сегмента в файле (p_filesz)		<———————┼———————┼———————┘
Bo aa aa aa	// Размер сегмента в памяти (p_memsz)			│	│
		//							│	│
Af aa aa aa aa	// Флаги - EXECUTABLE WRITEABLE READABLE		│	│
Ba aa aa	// Необходимое выравнивание для данного раздела		│	│
			//						│	│
			// Необходимая системе информация		│	│
ABAAAAAAJDAAAAAA	// Просто без этого не работает..		│	│
JDJAAEAIJDJAAEAI	// На самом деле, тут содержатся		│	│
ANAAAAAAANAAAAAA	// p_* директивы.				│	│
AGAAAAAAAABAAAAA	//					<———————┴———————┘


So, we approached the center of events. This place is complete of secrets and riddles … is fine, we know that for us there are no riddles any more. Let's start. The name to it the place — code section.

To execute any functions by means of machine code, it is necessary to know what is registers and interruptions. I will explain clearly. Registers store in themselves arbitrary values and results of execution of any functions, and interruptions is the whole history. Modern processors expedite a program code very much, and at such low level as machine code — all also is constructed. You look, commands are executed consistently, that is, one after another. And in the AXIS, in our case — in Linux-e, any code is executed "putting" in the general flow of commands, interrupting their execution.

You Tobit, interruption stops execution by the processor of the current program and causes the subprogramme which is executed, comes to the end and system operation is resumed. If to be more precisely, then it is the signal reporting to the processor about approach of any event. At the same time execution of the current sequence of commands stops, and the control is transferred to the processor of interruption who reacts to an event and services it then returns management in the interrupted code.

To execute any function of the table of interruptions of Linux, we should place number of function which we want to execute, in the register EAX (eXtended AX — the expanded register AX, 32-bit), and in other registers (EBX, ECX and EDX) — another information, necessary for function execution.

Thus, we receive:

Li AE aa aa aa		// Помещаем число 4 (AE) в регистр EAX
Ll AB aa aa aa		// В регистр EBX помещаем число 1 (AB)

Lj JD ja ae ai		// В регистр ECX кладем адрес нашего сообщения
Lk AN aa aa aa		// В регистр EDX - размер сообщения 13 (N)

Mn IA			// Выполняем прерывание IA.. Зачем?			
			// Чтобы выполнить определенную функцию.		
			// У нас в EAX - 4, значит мы будем выполнять		
			// действие "вывод строки на экран", где -		
			// EAX - номер функции. После выполнения прерывания	
			// "IA" будет выведена строка "Wct One Love"		
			//							
Li AB aa aa aa		// И опять в EAX кладем единичку			
			//							
Db NL			// Обнуляем регистр EAX					
Mn IA			// Знакомое нам прерывание IA..				
			// Кстати, так звали ослика из мульта "Винни Пух", только тогда "IA-IA"	
			// Я думаю, что вы знаете, про что я говорю		


Well and the final stage for writing of our program data declaration, at us — the text is.
For writing of the text it is possible to use quotes, or the table ASCII characters.
"Ca" = space, if that.

"Wct" Ca "One" Ca "Love"	// "Wct One Love"
Ak				// Конец...


And below you can contemplate result of work of our program:

image

There now and all. I am afraid that in the text I could make any mistakes, errors in explanations, etc … I ask to forgive me, I write it late at night, tired. If you find errors, be so kind as, tell me, I will surely correct! Thanks that you paid attention to my article! Use online the compiler for assembly of the source code. All the best to you, buddy.

To download the source code.
Online compiler.
Resources.

On all questions write on e-mail — mihip@yandex.ru.
To continue a cycle of articles about low-level programming?

137 people voted. 21 persons refrained.

The users only registered can participate in poll. Enter, please.


This article is a translation of the original post at habrahabr.ru/post/271519/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus