FATX - Technical Design - File Allocation Table

File Allocation Table

A volume's data area is divided up into identically sized clusters, small blocks of contiguous space. Cluster sizes vary depending on the type of FAT file system being used and the size of the partition; typically cluster sizes lie somewhere between 2 KB and 32 KB.

Each file may occupy one or more of these clusters depending on its size; thus, a file is represented by a chain of these clusters (referred to as a singly linked list). However these clusters are not necessarily stored adjacent to one another on the disk's surface but are often instead fragmented throughout the Data Region.

The File Allocation Table (FAT) is contiguous number of sectors immediately following the area of reserved sectors. It represents a list of entries that map to each cluster on the volume. Each entry records one of five things:

  • the cluster number of the next cluster in a chain
  • a special end of cluster-chain (EOC) entry that indicates the end of a chain
  • a special entry to mark a bad cluster
  • a zero to note that the cluster is unused

Each version of the FAT file system uses a different size for FAT entries. Smaller numbers result in a smaller FAT, but waste space in large partitions by needing to allocate in large clusters. The FAT12 file system uses 12 bits per FAT entry, thus two entries span 3 bytes. It is consistently little-endian: if those three bytes are considered as one little-endian 24-bit number, the 12 least significant bits represent the first entry (cluster 0) and the 12 most significant bits the second (cluster 1).

For very early versions of DOS to recognize the file system, the FAT must start with the volume's second sector (logical sector 1 with physical CHS address 0/0/2 or LBA address 1), that is, immediately following the boot sector. Operating systems assume this hard-wired location of the FAT in order to find the media descriptor in the FAT's cluster 0 entry on DOS 1.0-1.1 FAT diskettes, where no valid BPB is found.

The first two entries in a FAT store special values:

The first entry (cluster 0 in the FAT) holds the media descriptor (allowed values 0xF0-0xFF with 0xF1-0xF7 reserved) in bits 7-0, which is also copied into the BPB of the boot sector, offset 0x015 since DOS 2.0. The remaining 4 bits (if FAT12), 8 bits (if FAT16) or 20 bits (if FAT32) of this entry are always 1. For media descriptors other than 0xFF (and 0x00) it is possible to determine the correct nibble and byte order (to be) used by the file system driver, however, the FAT file system officially uses a little-endian representation only and there are no known implementations of variants using big-endian values instead.

The second entry (cluster 1 in the FAT) nominally stores the end-of-cluster-chain marker as used by the formater, but typically always holds 0xFFF / 0xFFFF / 0x0FFFFFFF, that is, with the exception of bits 31-28 on FAT32 volumes these bits are normally always set. Some Microsoft operating systems, however, set these bits if the volume is not the volume holding the running operating system (that is, use 0xFFFFFFFF instead of 0x0FFFFFFF here). (In conjunction with alternative end-of-chain markers the lowest bits 2-0 can become zero for the lowest allowed end-of-chain marker 0xFF8 / 0xFFF8 / 0x?FFFFFF8. Some operating systems may not be able to mount some volumes if any of these bits are not set, therefore the default end-of-chain marker should not be changed.)

Since DOS 7.1 the two most-significant bits of this cluster entry may hold two optional bitflags representing the current volume status on FAT16 and FAT32, but not on FAT12 volumes. These bitflags are not supported by all operating systems, but operating systems supporting this feature would set these bits on shutdown and clear the most significant bit on startup:
If bit 15 (FAT16) respectively bit 27 (FAT32) is not set when mounting the volume, the volume was not properly unmounted before shutdown or ejection and thus is in an unknown and possibly "dirty" state. On FAT32 volumes, the FS Information Sector may hold outdated data and thus should not be used. The operating system would then typically run SCANDISK or CHKDSK on the next startup (but not on insertion of removable media) to ensure and possibly reestablish the volume's integrity.
If bit 14 (FAT16) respectively bit 26 (FAT32) is cleared, the operating system has encountered disk I/O errors on startup, a possible indication for bad sectors. Operating systems aware of this extension will interpret this as a recommendation to carry out a surface scan (SCANDISK) on the next boot. (A similar set of bitflags exists in the FAT12/FAT16 EBPB at offset 0x1A or the FAT32 EBPB at offset 0x36. While the cluster 1 entry can be accessed by file system drivers once they have mounted the volume, the EBPB entry is available even when the volume is not mounted and thus easier to use by disk block device drivers or partitioning tools.)

If the number of FATs in the BPB is not set to 2, the second cluster entry in the first FAT (cluster 1) may also reflect the status of a TFAT volume for TFAT-aware operating systems. If the cluster 1 entry in that FAT holds the value 0, this may indicate that the second FAT represents the last known valid transaction state and should be copied over the first FAT, whereas the first FAT should be copied over the second FAT if all bits are set.

Because these first two FAT entries store special values, there are no data clusters 0 or 1. The first data cluster (after the root directory if FAT12/FAT16) is cluster 2, and cluster 2 is by definition the first cluster in the data area.

FAT entry values:

FAT12 FAT16 FAT32 Description
0x000 0x0000 0x0000000 Free Cluster; also used by DOS to refer to the parent directory starting cluster in ".." entries of subdirectories of the root directory on FAT12/FAT16 volumes.

Otherwise, if this value occurs in cluster chains (f.e. in directory entries of zero length or deleted files), file system implementations should treat this like an end-of-chain marker.

0x001 0x0001 0x0000001 Reserved for internal purposes; MS-DOS/PC DOS use this cluster value as a temporary non-free cluster indicator while constructing cluster chains during file allocation (only seen on disk if there is a crash or power failure in the middle of this process).

If this value occurs in on-disk cluster chains, file system implementations should treat this like an end-of-chain marker.

0x002 - 0xFEF 0x0002 - 0xFFEF 0x0000002 - 0xFFFFFEF Used as data clusters; value points to next cluster. MS-DOS/PC DOS accept values up to 0xFEF / 0xFFEF / 0x0FFFFFEF (sometimes more; see below), whereas for Atari GEMDOS only values up to 0x7FFF are allowed on FAT16 volumes.
0xFF0 MS-DOS/PC DOS 3.3 and higher treats a value of 0xFF0 on FAT12 (but not on FAT16 or FAT32) volumes as additional end-of-chain marker similar to 0xFF8-0xFFF. For compatibility with MS-DOS/PC DOS, file systems should avoid to use data cluster 0xFF0 in cluster chains on FAT12 volumes (that is, treat it as a reserved cluster similar to 0xFF7). (NB. The correspondence of the low byte of the cluster number with media descriptor values is the reason, why these cluster values are reserved.)
0xFF1 - 0xFF5 0xFFF0 - 0xFFF5 0xFFFFFF0 - 0xFFFFFF5 Reserved in some contexts, or also used as data clusters. Volume sizes, which would utilize these values as data clusters, should be avoided, but if these values occur in existing volumes, the file system must treat them as normal data clusters in cluster-chains (ideally applying additional sanity checks), similar to what MS-DOS, PC DOS and DR-DOS do, and should avoid to allocate them for files otherwise.
0xFF6 0xFFF6 0xFFFFFF6 Reserved; do not use. (NB. Corresponds with the default format filler value 0xF6 on IBM compatible machines.) Volumes should not be created which would utilize this value as data cluster, but if this value occurs in existing volumes, the file system must treat it as normal data cluster in cluster-chains (ideally applying additional sanity checks), and should avoid to allocate it for files otherwise.
0xFF7 0xFFF7 0xFFFFFF7 Bad sector in cluster or reserved cluster.

The cutover values for the maximum number of clusters for FAT12 and FAT16 file systems are defined as such that the highest possible data cluster values (0xFF5 and 0xFFF5, respectively) will always be smaller than this value. Therefore, this value cannot normally occur in cluster-chains, but if it does, it may be treated as a normal data cluster, since 0xFF7 could have been a non-standard data cluster on FAT12 volumes before the introduction of FAT16 with DOS 3.0, and 0xFFF7 could have been a non-standard data cluster on FAT16 volumes before the introduction of FAT32 with DOS 7.10. Theoretically, 0x0FFFFFF7 can be part of a valid cluster chain on FAT32 volumes, but disk utilities should avoid creating FAT32 volumes, where this condition could occur. The file system should avoid to allocate this cluster for files.

Disk utilities must not attempt to restore "lost clusters" holding this value in the FAT, but count them as bad clusters.

0xFF8 - 0xFFF 0xFFF8 - 0xFFFF 0xFFFFFF8 - 0xFFFFFFF Last cluster in file (EOC). File system implementations must treat all these values as end-of-chain marker at the same time. Most file system implementations use 0xFFF / 0xFFFF / 0x0FFFFFFF as end-of-file marker when allocating files, but versions of Linux before 2.5.40 used 0xFF8 / 0xFFF8 / 0x0FFFFFF8, whereas some disk repair and defragment tools utilize other values in the set (f.e. SCANDISK may use 0xFF8 / 0xFFF8 / 0x0FFFFFF8 instead). Some faulty file system implementations only accept 0xFFF / 0xFFFF / 0x?FFFFFFF as valid end-of-chain marker.

File system implementations should check cluster values in cluster-chains against the maximum allowed cluster value calculated by the actual size of the volume and treat higher values as if they were end-of-chain markers as well.

Despite its name FAT32 uses only 28 bits of the 32 possible bits. The upper 4 bits are usually zero, but are reserved and should be left untouched. A standard conformant FAT32 file system driver or maintenance tool must not rely on the upper 4 bits to be zero and it must strip them off before evaluating the cluster number in order to cope with possible future expansions where these bits may be used for other purposes. They must not be cleared by the file system driver when allocating new clusters, but should be cleared during a reformat.

Read more about this topic:  FATX, Technical Design

Famous quotes containing the words file and/or table:

    I have been a soreheaded occupant of a file drawer labeled “Science Fiction” ... and I would like out, particularly since so many serious critics regularly mistake the drawer for a urinal.
    Kurt Vonnegut, Jr. (b. 1922)

    Comes the time when it’s later
    and onto your table the headwaiter
    puts the bill,
    Robert Creeley (b. 1926)