Memory Management Due to Murphy's Law, system must still manage memory since some programs will be bigger than available physical memory. No swapping or paging monoprogramming IBM PC, Fig 3-1(c): device drivers in ROM, one user program and OS in RAM absolute load multiprogramming allows multiprocess user application programs supports timesharing of interactive users improves utilization of CPU and IO devices, Fig 3-2 multiprogramming with fixed partitions, Fig 3-4 absolute load: adjust all addresses in program to partition loaded does not prevent program from referencing outside its partition small job in large partition wastes space multiprogramming with variable partitions, Fig 3-5 relocation (base and limit registers) and compaction process can grow only if moved or adjacent to hole, so do Fig 3-6(b) with some room to grow between heap and stack allocation algorithms: first-, best-, worst-, next-fit better use of memory than fixed partitions, but still problems not good for interactive use since number of users limited to number that can squeeze into memory not good if a high priority job enters the system and there is not enough free memory even after compaction Swapping so far, we keep programs for which there is no partition in main memory big enough on disk until such time as one becomes available, then load it but what if it is an interactive user or a high priority job? can make a big enough partition available by swapping out some executing jobs in main memory multiprogramming with variable partitions implementing unallocated and allocated memory maps bit maps, Fig 3-7(b), slow search to find contiguous memory hole of a given size linked lists, Fig 3-7(c) allocation algorithms (applies both to main memory and swap space) first-, best-, worst-, next-fit best-fit actually performs the worst since lots of small fragments coalesce (merge) holes when process exits, Fig 3-8 problem is (external) fragmentation, also called checkerboarding fifty percent rule (number holes = half number processes) since holes are merged on process exit unused memory rule (fraction memory in holes = k/(k+2) where k=avg hole size / avg size allocated processes) if holes and processes are about same size, 1/3 of memory is wasted if program is too big for main memory then do manual overlays Virtual memory paging with whole program loaded into main memory extension of base/limit register idea used in variable partition multiprogramming separate logical address space from physical address space program need not be contiguous in main memory no external fragmentation since any free page frame can be used BUT internal fragmentation since part of a page may be wasted pages of VAS, page frames of physical memory, ~4KB CPU generates effective address translated into physical address by MMU, Fig 3-10 programmer address space called virtual address space, mapped into main memory by address map stored in MMU called page table: for each page, it has protection bits, page frame # PDP-11 example, early model, hardware page table CPU has MMU with 8 registers, pages and page frames are 8K, so VAS limited to 64K, main memory is 256K 16-bit CPU address translated by hardware, left 3 bits choose MMU register, right 13 bits are offset, MMU register contains page frame number, concatenate with offset to get main memory address Intel 8088 segment registers CS, SS, DS, ES, are extension of base/limit relocation 16-bit CPU address translated by hardware, shift segment register left 4 bits and add to CPU address to get 20-bit main memory address which segment register used determined by memory operation: fetch instruction uses CS, stack push/pop uses SS, data operand uses DS now drop requirement that whole program loaded into main memory so that programs bigger than main memory can be run so that parts of program ("dead" code and data) need not be loaded into main memory or to increase the degree of multiprogramming by giving each process a smaller area of main memory in which to run OS automates the old Fortran program overlay idea: some VAS pages in main memory page frames, others in swap area programmer address space called virtual address space, mapped partially into main memory by address map stored in MMU, Fig 3-11 add present/absent bit (also called valid bit) to page table entry trap generated by MMU if page not mapped (valid bit off) page fault: OS picks free or little-used page frame, copies out if dirty, copies in missing page, updates MMU memory map this is software overhead so page fault rate better not be too big add referenced bit, modified bit to page table entry Fig 3-12 shows hardware mapping in MMU and why page size is power 2 page table can be in dedicated MMU memory or in main memory if in main memory, address translation requires extra memory reference which slows program to half speed CPU will have a page table base address register that is changed on every context switch a translation lookaside buffer can cache recently used page table entries to speed up address translation associative memory, Fig 3-21 shows improved address translation speed if hit ratio is high if in MMU special memory, must be loaded on context switch or room for several process page tables page tables can be large 32 bit VAS / 4KB pages = million PTE's per process, 4B PTE's -> yow! 64 bit VAS / 8KB pages = zillion PTE's per process, .... multi-level page tables, Fig 3-13, allows only part of page table to be in memory at cost of extra memory reference in address translation and possibly another page fault page table for hole between stack and heap is not in memory VAX-11 keeps user page tables in OS virtual memory, so user page tables are themselves paged in and out, Fig 3-17 inverted page table, Fig. 3-23, can be kept in memory with process page tables on disk one entry of inverted page table per page frame: page table entry, Fig 3-14, protection field rwx, modified, referenced bits, valid bit, page frame number may be a disable caching bit used for such things as semaphores or IO DMA buffers page replacement algorithms characteristics fixed sized memory allocation with local page replacement (process pages against itself) variable memory allocation can result from global page replacement (one process can take a page frame away from another process) or from OS increasing/decreasing process's fixed allocation based on observing process's behavior e.g. page fault rate OPT, MIN: fixed memory allocation, replace page not needed longest in the future, not a realizable policy not-recently-used: replace in order !R!M, !RM, R!M, RM; R bits cleared periodically on clock interrupts FIFO: linked list of pages ordered FIFO; replace head of list and move to tail 2nd chance, clock: modify above, replace page if !R else clear R and move it to end of list, repeat until find a !R, Fig 3-24, 25 4.3BSD UNIX has two-handed clock since physical memories have gotten so big LRU: fixed allocation, replace page referenced longest ago; expensive since LRU stack must be updated on each memory reference can be implemented more cheaply with a field in each PTE that has current time copied into it by hardware each time the page is referenced; PTE with smallest such field is LRU not-frequently-used: software approximation to LRU, Fig 3-27, at each clock tick, shift PTE counter to right and add R bit to left of counter, replace page with smallest counter Belady's anomaly: FIFO page faults can increase if memory is increased, Fig 3-28 stack replacement algorithms: those with property that M(m,r) is a subset of M(m+1,r) where M(m,r) is the set of pages in memory of m page frames after r references have been processed, Fig 3-29 LRU, OPT are stack algorithms, FIFO is not, no stack algorithm has Belady's anomaly reference strings: page numbers referenced distance strings: in a stack algorithm, depth in stack of pages referenced, Fig 3-29 one pass over distance string can compute number of page faults for any main memory size in a stack algorithm, Fig 3-31 LRU is a stack algorithm because for a memory allocation of m page frames, we keep in memory the top m pages of the LRU stack MIN/OPT is a stack algorithm because we can keep a stack ordered from most recent next reference to furthest in future next reference and keep in memory the top m pages working set of a process is set of pages referenced during last n clock ticks, give a process enough main memory to contain its working set else it will generate lots of page faults, called thrashing working set << total VAS of a process due to locality of reference, the tendency of a process to reference in near future those pages referenced in the near past local v. global allocation, Fig 3-32 LRU, FIFO are local, fixed allocation replacement policies since process pages against itself working set is global, variable allocation since working set sizes change over time and page frame from process with shrinking working set can be given to process with growing working set if not enough main memory to hold all working sets then reduce degree of multiprogramming by swapping some processes out (this is how CPU scheduling intermediate level and memory management are related) when swapping a process in, reload all of working set swapped out (prepaging) rather than demand paging it in page-fault-frequency allocation, give a process enough page frames to reduce its page fault rate to an acceptable level (take some away if page fault rate is very low), Fig 3-33 how big should the page size be? internal fragmentation page table size swapping time (IO per page) the larger the page size the more unused information gets loaded on each page fault implementation issues instruction backup: after page fault handled, restore intermediate state of instruction execution to original state before instruction started locking pages into memory: OS kernel pages, page frames into which DMA IO operation is being done sharing pages among processes: read-only pages i.e. code paging daemons: used with clock, when free page frame pool runs low, it scans circular linked list of all page frames and clears set R bits and moves !R pages to end of free list where they can be reclaimed if referenced again before they rise to top of free list and get overwritten (minfree and maxfree) if main memory is very large, can use two hands: first hand turns off R bits and second hand moves !R page frames to free list Segmentation remove constraint that pages are fixed in size paging eliminated external fragmentation but segmentation reintroduces it but eliminates internal fragmentation segment tables contain segment sizes segmentation with paging: segments themselves can be pages, each segment has a page table, Fig 3-41