Lustre (file System) - Locking

Locking

Lustre has a distributed lock manager in the OpenVMS style to protect the integrity of each file's data and metadata. Access and modification of a Lustre file is completely cache coherent among all of the clients. Metadata locks are managed by the MDT that stores the inode for the file, using the 128-bit Lustre File Identifier (FID, composed of the Sequence number and Object ID) as the resource name. The metadata locks are split into multiple bits that protect the lookup of the file (file owner and group, permission and mode, and access control list (ACL)), the state of the inode (directory size, directory contents, link count, timestamps), and layout (file striping). A client can fetch multiple metadata lock bits for a single inode with a single RPC request, but currently they are only ever granted a read lock for the inode. The MDS manages all modifications to the inode in order to avoid lock resource contention and is currently the only node that gets write locks on inodes.

File data locks are managed by the OST on which each object of the file is striped, using byte-range extent locks. Clients can be granted both overlapping read extent locks for part or all of the file, allowing multiple concurrent readers of the same file, and/or non-overlapping write extent locks for regions of the file. This allows many Lustre clients to access a single file concurrently for both read and write, avoiding bottlenecks during file I/O. In practice, because Linux clients manage their data cache in units of pages, the clients will request locks that are always an integer multiple of the page size (4096 bytes on most clients). When a client is requesting an extent lock the OST may grant a lock for a larger extent than requested, in order to reduce the number of lock requests that the client makes. The actual size of the granted lock depends on several factors, including the number of currently-granted locks, whether there are conflicting write locks, and the number of outstanding lock requests. The granted lock is never smaller than the originally-requested extent. OST extent locks use the Lustre FID as the resource name for the lock. Since the number of extent lock servers scales with the number of OSTs in the filesystem, this also scales the aggregate locking performance of the filesystem, and of a single file if it is striped over multiple OSTs.

Read more about this topic:  Lustre (file System)