A modern OS's program loader basically uses mmap
, not read
. https://en.wikipedia.org/wiki/Memory-mapped_file#Common_uses says:
Perhaps the most common use for a memory-mapped file is the process loader in most modern operating systems (including Microsoft Windows and Unix-like systems.)
This creates a file-backed private mapping. (https://en.wikipedia.org/wiki/Virtual_memory).
- ... In other words how does the OS recognize that it must stop executing the instruction to avoid jumping to a irrelevant address and how does it calculate which page the related address situated in?
In that case code-fetch causes a page fault, just like if your code loaded from part of a big static array that wasn't loaded from disk yet. After possible loading the page from disk (if it wasn't already present in the page cache) and updating the page tables, execution resumes at the address that faulted, to retry the instruction.
The CPUs virtual memory hardware ("MMU", although that's not actually a separate thing in a modern CPU) handles detection of loads/stores/code-fetch from unmapped addresses. (Unmapped according to the actual page tables the HW can see. When a process "logically" has some memory mapped, but the OS is being lazy about it, we say the memory isn't "wired" into the page tables, so a page fault will bring it into memory if it's not already, and will wire it up in the page tables so the HW can access it (after a TLB miss to trigger a hardware page-walk.)
If there are any runtime symbol relocations, aka fixups, to account for the program being loaded at a base address other than the one it was linked for if it needs any absolute addresses in memory, they may require writing pages of code or otherwise-read-only data, dirtying the virtual memory page so it's backed by the pagefile instead of the executable on disk. e.g. if your C source includes int *foo = &bar;
at global scope, or int &foo = bar;
- How many pages of code does the OS copy into RAM before jumping to the entry point of the program?
The program loader probably has some heuristics to make sure the entry point and maybe some other pages are mapped before trying the first time. Other than that IDK if there are any special heuristics in the virtual-memory code for executables / libraries vs. non-executable mappings.