Developers Club geek daily blog

2 years ago
Well such difficult can be in creation of a screenshot? It would seem — call the function which is kindly provided by an OS and receive the ready picture. For certain many of you did it not once, and, nevertheless, it is impossible to take and zaskrinshotit just like that full-screen directx or opengl the application. To be exact — it is possible, but as a result you receive not a screenshot of this application, but the rectangle which is filled in black.

So happens because of the fact that for full-screen games the frame is rendered by the video card and can even not come to a normal RAM. As a result nobody including the OS does not know frame contents.

Perhaps, the only reliable way to receive a frame — to be implemented in game process and, using directx or opengl api, to force process to retrieve a frame from video memory and to transfer him to the application which does a screenshot. This equipment is used in the majority of programs for record of video from the screen and a striming. The same approach can be used and if necessary to draw something over game.

For code injection in others process traditionally use a method under the name dll injection. It is necessary to write dll which will contain executable code. dll approximately so looks:

#include <windows.h>

DWORD WINAPI MainLoop(LPVOID) {
    // Тут запускаем наш event loop
}

extern "C"
{

__declspec (dllexport) BOOL __stdcall DllMain(HMODULE, DWORD ul_reason_for_call, LPVOID) {
    if (ul_reason_for_call == DLL_PROCESS_ATTACH) {
        DWORD thrID;
        CreateThread(0, 0, MainLoop, 0, 0, &thrID;);
    }
    return TRUE;
}

}


For implementation of dll it is necessary to select memory in others process, to write there the address of the implemented dll and to start process which will load this dll:

bool InjectDll(int pid, const std::string&dll) {
    HANDLE hProcess = OpenProcess(PROCESS_CREATE_THREAD | PROCESS_QUERY_INFORMATION | PROCESS_VM_OPERATION | PROCESS_VM_WRITE | PROCESS_VM_READ, FALSE, pid);
    HMODULE hKernel32 = ::GetModuleHandle(L"kernel32.dll");
    void* remoteMemoryBlock = ::VirtualAllocEx(hProcess, NULL, dll.size() + 1, MEM_COMMIT, PAGE_READWRITE );
    if (!remoteMemoryBlock) {
        return false;
    }
    ::WriteProcessMemory(hProcess, remoteMemoryBlock, (void*)dll.c_str(), dll.size() + 1, NULL);
    HANDLE hThread = ::CreateRemoteThread(hProcess, NULL, 0,
                    (LPTHREAD_START_ROUTINE)::GetProcAddress(hKernel32, "LoadLibraryA"),
                    remoteMemoryBlock, 0, NULL);
    if (hThread == NULL ) {
        ::VirtualFreeEx(hProcess, remoteMemoryBlock, dll.size(), MEM_RELEASE);
        return false;
    }
    return true;
}


Now it is necessary will decide on the scheme of interaction between the implemented code and the main application. On windows there are many different methods of inter-process communication — files, sockets, shared memory, named pipes and other. For development I use Qt — in it there is a class QLocalSocket and QLocalServer which in windows work over named pipes - it is just what is necessary. For a start — we will start qt-shny event loop in a dll-ka:

DWORD WINAPI MainLoop(LPVOID) {
    if (QCoreApplication::instance()) { // Это на случай если мы внедрились в qt приложение
        QEventLoop loop;
        TInjectedApp myApp;
        return loop.exec();
    } else {
        int argc = 0;
        char** argv = nullptr;
        QCoreApplication loop(argc, argv);
        TInjectedApp myApp;
        return loop.exec();
    }
}


Now we can implement the class TInjectedApp in which it is possible to use all opportunities of qt. On the party of our main application we will create QLocalServer and we will begin to wait for connections, and on the party of dll — we will create QLocalSocket and we will be connected through it to the main application. I will not stop in detail on use of QLocalSocket — there is a large number of examples of its use, also you can look at the complete source code according to the link at the end of article.

And so — we dealt with implementation of our code in process and interaction with it. Now it is necessary to receive actually a screenshot, being in process. Let's consider it on the example of directx9. Using directx api we can receive video card backbuffer. But for this purpose we need to find the pointer on IDirect3DDevice9. The task is complicated by the following factors — first, directx have no api of the methods allowing to receive the pointer on the existing IDirect3DDevice9 — only on creation new. Secondly — we have no access to source codes of those applications into which we are implemented, and we do not know where exactly this device in what variable it remains and where in general to look for it is created.

How after all to find this device? The first option is to walk on all memory of the application and to find there the object similar on contents to what we look for. Most likely all objects of this class will have many identical members, and also identical or similar table of virtual functions — it is enough for search. But this method has a number of shortcomings. First — it not reliable (suddenly in some application some members of a class on whom we look for will differ), and secondly — it slow (complete pass on all selected to the application of memory can take a lot of time).

There is other method. We do not know the address of object IDirect3DDevice9, but we can easily define addresses of functions which work with this object. For example, directx of the application have to call all the IDirect3DDevice9 function:: Present for rendering of a frame. And the first argument (this) to it transfers the pointer to IDirect3DDevice9. Knowing the address of this function, we can perform interception (hook) of a challenge of this function and execute instead of it the function which will receive the first argument the pointer IDirect3DDevice9 and will make through it a screenshot.

In windows interception of function call can be made approximately so (for 32-bit applications):

#include <windows.h>
#include <stdint.h>
#include <iostream>

void Foo() {
    std::cerr << "Foo()\n";
}

void Bar() {
    std::cerr << "Bar()\n";
}

void main() {
    uint8_t* f = (uint8_t*)Foo;
    uint8_t* b = (uint8_t*)Bar;

    DWORD t;
    VirtualProtect(f, 5, PAGE_EXECUTE_READWRITE, &t;);
    uint32_t distance = b - f - 5;
    *f = 0xE9;
    *(uint32_t*)(f + 1) = distance;

    Foo();
}


In the beginning — we permit record of 5 bytes to the Foo function address. Then we consider the number of bytes on which it is necessary to perform a jump (distance). Then — we write a jmp command op-code (1 byte) and jump distance (4 bytes) to the function address. Now at start of this code instead of the Foo function the Bar function will be executed. For practical application this method will need to be finished slightly — first — to save somewhere old contents of memory and to recover it after interception. Secondly — to add support of 64-bit applications.

But how to us to learn the Present function address? Present is not function which is exported by dll, so and its address to us too is not available (at least on a straight line). But we can use the fact that Present is implemented in dll, and when loading dll it will be always located on identical shift from dll. Therefore, knowing the address dll and shift of the Present function we will receive the address of the Present function having put the first with the second.

And nevertheless — everything is not so simple again as it would be desirable. Depending on the dll version in system — shifts can be different therefore we will not be able to zakhardkodit them in our program — it is necessary to determine shifts again every time at start of the program. In c ++ there is no ready method to learn the address of virtual function. Normal — please, virtual — no. Therefore it is necessary to arrive as follows — to create object of IDirect3DDevice9 in the application, to watch the Present function address in the table of virtual functions of this object and then to consider shift between the address dll and the address of the Present function. Knowing this shift and the address of already loaded dll in someone else's application we will find the address of the Present function and we will be able to zakhukat it.

uint64_t GetVtableOffset(uint64_t module, void* cls, uint32_t offset) {
    uintptr_t* virtualTable = *(uintptr_t**)cls;
    return (uint64_t)(virtualTable[offset] - module);
}


Here module — the address of the loaded dll-ka (what returns LoadLibrary), cls — the pointer on previously created IDirect3DDevice9 and offset — number of function in the table of virtual functions of the class IDirect3DDevice9 (Present — the 17th). It is the best of all to determine shift in the process, and then to transfer him to the implemented dll. In the implemented dll it is possible to intercept the Present function now and to do in it a screenshot by extraction of contents backbuffer-and.

void* PresentFun = nullptr;

void GetDX9Screenshot(IDirect3DDevice9* device) {
    IDirect3DSurface9* backbuffer;
    device->GetRenderTarget(0, &backbuffer;);
    D3DSURFACE_DESC desc;
    backbuffer->GetDesc(&desc;);
    IDirect3DSurface9* buffer;
    device->CreateOffscreenPlainSurface(desc.Width, desc.Height, desc.Format, D3DPOOL_SYSTEMMEM, &buffer;, nullptr);
    device->GetRenderTargetData(backbuffer, buffer);
    D3DLOCKED_RECT rect;
    buffer->LockRect(&rect;, NULL, D3DLOCK_READONLY);
    QImage img = ConvertToQImage(desc.Format, (char*)rect.pBits, desc.Height, desc.Width);
    // ...
}

static HRESULT STDMETHODCALLTYPE HookPresent(IDirect3DDevice9* device,
                CONST RECT* srcRect, CONST RECT* dstRect,
                HWND overrideWindow, CONST RGNDATA* dirtyRegion)
{
    UnHook(PresentFun);
    GetDX9Screenshot(device);
    return device->Present(srcRect, dstRect, overrideWindow, dirtyRegion);
}

void MakeDX9Screen(uint64_t presentOffset) {
    HMODULE dx9module = GetModuleHandleA("d3d9.dll");
    PresentFun = (void*)((uintptr_t)dx9module + (uintptr_t)presentOffset);
    Hook(PresentFun, HookPresent);
}


The retrieved backbuffer is converted into a format necessary to us (for example, QImage) is and there will be a screenshot which we so long tried to receive. Similarly process is based also for other directx and opengl versions. For opengl the general scheme is even simpler as there it is not necessary to look for shifts at virtual functions — glBegin is exported dll-which and its address is known.

You can look at the complete source code in library which I made for one of the projects, LibQtScreen. In it the method of receipt of screenshots described in article is implemented. It supports mingw and msvc, the 32 and 64 bit applications, opengl and directx from the 8th on the 11th.

The main source of information when writing article and library — source codes of the program for a striming — obs-studio.

This article is a translation of the original post at habrahabr.ru/post/272989/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus