File Systems What are files and why do we have them: permanent information, potentially voluminous, that stays around when a process terminates and that can be shared among processes. User view and user interface to files and directories vs. implementation details, the system view. User view file naming schemes letters, digits, special characters, case sensitivity, root and extension in Fig 4-1 structure of a file Fig 4-2: sequence of bytes, sequence of fixed-sized records, (sorted) tree of variable-sized records with key field in each types of files regular, directory, character-special (IO device that produces or accepts stream of characters), block special (IO device that stores blocks of bytes) ASCII vs. binary (Fig 4-3) ways of accessing files: sequential vs. random access attributes of files, Fig 4-4 permissions, creator, owner, ASCII/binary, random/sequential, locked, key, record length, creation-modification-access times, current size, maximum size operations on files create, delete, open, close, read, write, append, seek, get attributes, set attributes, rename accessing files with system calls, Fig 4-5 accessing files by mapping them into memory mmap(file_name, virtual_address, length) causes the file to serve as the swap space for this part of process virtual memory reading and writing the file are now memory load and store instructions do a "man mmap" in SunOS Directories hierarchical organization of files a directory is a file containing (Fig 4-7) file names and attributes/ disk addresses or file names and pointers to disk structures containing attributes/disk addresses can be single system directory (CP-M), one directory per user, or general directory/subdirectory tree (UNIX, MS-DOS), Fig 4-8 path names of files absolute from root e.g. /usr/local/src/Java relative from current working directory e.g. MCS720/Java/bakery.java . is current directory and .. is parent directory operations on directories create, delete, opendir, closedir, readdir, rename, link, unlink Implementation schemes -- system view ways of implementing files contiguous allocation simple (each file is a beginning disk address), fast IO, but must preallocate space and will be disk fragmentation linked list allocation Fig 4-10, no disk fragmentation, directory entry is beginning disk address, but random access is slow and storage block != 2**k linked list through index in memory (MS-DOS FAT) pull links out of file into memory to alleviate above two, but large disks have large tables (multi MB), table is indexed by disk block number, Fig 4-11 index stored on disk (i-nodes) Fig 4-12, each file has a disk structure of attributes and disk block addresses called i-node (UNIX), single indirect block address, double indirect, triple indirect ways of implementing directories, map ASCII name into disk blocks containing data CP/M way: Fig 4-13, one systemwide directory containing file name, user, and disk block numbers, additional directory entries for more disk blocks MS-DOS way: Fig 4-14, directory entry contains file name, attributes, first block of file (entry into FAT for rest of blocks) UNIX way: Fig 4-15, directory entry contains file name and i-node number (i-nodes are stored in known areas of the disk), i-nodes contain attributes (type, size, times, owner) and disks block numbers, direct and indirect; Fig 4-16 shows path lookup example sharing files -- same file has multiple directory entries can be done with hard links in UNIX (same i-node number in different directories, i-node contains link count) or with soft or symbolic links (special type of file containing a path name) the two schemes have different semantics when a remove file is done block size Fig 4-19, larger means fewer seeks thus faster IO but more internal fragmentation, smaller means larger directory entries or i-nodes keeping track of free disk blocks, Fig 4-20 linked list of free blocks, each block containing free block numbers bit map of free blocks stored is faster if it can be kept in memory and will generally be smaller unless disk is nearly full quotas for users quota file contains entry for each user, total number of disk blocks allocated to user is updated when a user's file size increases, total checked against quota limit reliability issues (bad blocks, backups, consistency checks) bad blocks can be remapped by the disk controller ("reformatting" the disk) or avoided by the OS in file allocation (put all bad blocks in one file) monthly full backups of large disks, with weekly and daily incremental dumps since not all file system operations are atomic, file system can be in an inconsistent state after a crash e.g. buffer cache not flushed after a crash a file system consistency checker can scan all i-nodes and build counts of how many times each disk block appears in a file and appears in the free list a block should have a count of exactly one in exactly one category, zero count in the other category, Fig 4-23 easy to fix all errors but same block in two or more files then scan directories and counts how many times a file (i-node) appears in some directory, then compares these counts to link count field in each i-node if link count is higher or lower than actual count, change link count improving file system performance block buffer cache: linked list of recently accessed disk blocks, ordered from LRU to MRU some blocks can be forced immediate write-through if modified (i-nodes, directories, indirect blocks) for better consistency if crash other blocks (double indirect, full data blocks) can be forced LRU by linking into front of list rather than rear when access finished buffer cache should be flushed to disk periodically every 30 seconds or so (sync() system call and /etc/update daemon) MS-DOS buffer cache is always write-through cylinder groups keep i-nodes near data blocks on disk to reduce seek time, Fig 4-24 Security and protection security: overall issue and policies of who gets to access what protection mechanisms: how the OS implements policy decisions for a collection of security annecdotes, see "A Taxonomy of Computer Security Flaws" by C. E. Landwehr, et al., ACM Computing Surveys, v 26, n 3, Sept 1994 UNIX security hole anecdotes when you walk up to a terminal to log in, how do you know it is really /bin/login that printed "login:" in the screen rather than a user-written password stealing program? mkdir was a root setuid program and not atomic so a user could switch in the password file and have it chown-ed to them the original /bin/mail program would allow you to append a message to any file, so you could append passwordless entry for root uid to /etc/passwd doing a "stty 0" on somebody else dialed in will log them off if their /dev/tty is world writable (fixed in 4.3 BSD) kmem group so you can't trapse through kernel data structures in /dev/kmem except through setgid kmem programs like w and ps path variable in shell scripts (particularly setuid root ones) must be set explicitly by the shell script or it might execute your cat or ls rather than the system one there are still security holes in passing arguments to shell scripts: Bourne shell sh and IFS variable if you put a line like "vi:set tabstop=3" in the first or last five lines of your file, then vi used to execute the command; now imagine what would happen if you edited somebody elses file which contained "vi:rm -r $HOME" "#! /bin/csh -f" setuid shell scripts are still a security problem (using symbolic links) in 4.3 BSD UNIX sh/csh have path variable; you can Trojan Horse someone ls-ing in your directories if . is first in their path by placing "rm -r $HOME" into a file named ls intelligent terminals can be told with an escape sequence to read something on the screen into command interpreter thereby executing it; malicious user could find root logged in and use write command to place "cp /bin/sh ~user; chmod u+s ~user/sh" on root's screen, then write appropriate escape sequence to screen talk and write commands now filter out control characters for the same reason, but mail does not so you can still send "letter bombs" (e-mail messages containing escape and control sequences that draw pretty pictures on a vt100 or lock the keyboard) the remote finger server, fingerd, can be configured to run under a specified uid, so root was chosen, of course, for lack of a better choice, so somebody finally discovered that you could (symbolically) link your .plan file to any unreadable file, and then do "finger yourself@yourhost" and see the contents of the unreadable file displayed; now fingerd runs as uid nobody Internet worm used bug in sendmail (debug option) to execute arbitrary program on a remote machine; also bug in finger daemon that used gets() rather than fgets() so no buffer overflow checking, so hand-crafted stack frame could be downloaded into finger daemon causing return to be into downloaded stack frame where /bin/sh was executed the sendmail program has many bugs which results in many "security alert" messages on the Internet; a "feature" of sendmail is that one can telnet to port 25 of a remote machine and converse in SMTP with the sendmail daemon on that machine; effectively you can send mail that is "From:" anybody you want to fake; the dead give-away is the "Apparently-To:" From Fake Tue Dec 28 11:41:52 1993 Date: Tue, 28 Dec 93 11:41:04 EST From: Fake Apparently-To: This is a test! authentication of users passwords must be chosen from a big space but easy to remember UNIX stores encrypted passwords with a 12 bit salt for each one to increase time of crackers that encrypt the dictionary not very user-friendly if each terminal has a funnel connected to a spectrograph with instructions "deposit skin, hair, blood, urine sample here" Protection mechanisms object/domain model objects: memory, devices, files each process executes in a domain which determines what objects it can access and with what permissions or rights, Fig 4-27 protection matrix rows are domains, columns are objects, entries are rights, Fig 4-29 domains can also be objects in the protection matrix to allow a process executing in one domain to enter another one, Fig 4-30 since protection matrix is sparse, it is stored by rows or by columns, non-empty entries access control lists store by columns, i.e. each object has a list of domain, rights pairs UNIX just stores (owner, rwx), (owner's group, rwx), (all others, rwx) capabilities store by rows, i.e. each process has a list of (object, rights) pairs, called its capability list, Fig 4-31 the capability list is protected from tampering by storing the actual capabilities in kernel memory and giving the user a list of indexes into kernel memory or the capabilities can be encrypted by OS and stored in user memory covert channels even if the protection matrix does not let process1 and process2 shares files, process1 can send a bit stream to process2 by generating 1000 page faults to be a 1 and sleeping for one second to be 0 (process2 can monitor system load to read the bit)