Lustre (file System) - Architecture

Architecture

A Lustre file system has three major functional units:

  • A single metadata server (MDS) that has a single metadata target (MDT) per Lustre filesystem that stores namespace metadata, such as filenames, directories, access permissions, and file layout. The MDT data is stored in a single local disk filesystem, which may be a bottleneck under some metadata intensive workloads. However, unlike block-based distributed filesystems, such as GPFS and PanFS, where the metadata server controls all of the block allocation, the Lustre metadata server is only involved in pathname and permission checks, and is not involved in any file I/O operations, avoiding I/O scalability bottlenecks on the metadata server.
  • One or more object storage servers (OSSes) that store file data on one or more object storage targets (OSTs). Depending on the server’s hardware, an OSS typically serves between two and eight OSTs, with each OST managing a single local disk filesystem. The capacity of a Lustre file system is the sum of the capacities provided by the OSTs.
  • Client(s) that access and use the data. Lustre presents all clients with a unified namespace for all of the files and data in the filesystem, using standard POSIX semantics, and allows concurrent and coherent read and write access to the files in the filesystem.

The MDT, OST, and client can be on the same node, but in typical installations these functions are on separate nodes communicating over a network. The Lustre Network (LNET) layer supports several network interconnects, including native Infiniband verbs, TCP/IP on Ethernet and other networks, Myrinet, Quadrics, and other proprietary network technologies. Lustre will take advantage of remote direct memory access (RDMA) transfers, when available, to improve throughput and reduce CPU usage.

The storage used for the MDT and OST backing filesystems is partitioned, optionally organized with logical volume management (LVM) and/or RAID, and normally formatted as ext4 file systems. The Lustre OSS and MDS servers read, write, and modify data in the format imposed by these file systems.

An OST is a dedicated filesystem that exports an interface to byte ranges of objects for read/write operations. An MDT is a dedicated filesystem that controls file access and tells clients which object(s) make up a file. MDTs and OSTs currently use an enhanced version of ext4 called ldiskfs to store data. Work started in 2008 at Sun to port Lustre to Sun's ZFS/DMU for back-end data storage and continues as an open source project.

When a client accesses a file, it completes a filename lookup on the MDS. As a result, a file is created on behalf of the client or the layout of an existing file is returned to the client. For read or write operations, the client then interprets the layout in the logical object volume (LOV) layer, which maps the offset and size to one or more objects, each residing on a separate OST. The client then locks the file range being operated on and executes one or more parallel read or write operations directly to the OSTs. With this approach, bottlenecks for client-to-OST communications are eliminated, so the total bandwidth available for the clients to read and write data scales almost linearly with the number of OSTs in the filesystem.

Clients do not directly modify the objects on the OST filesystems, but, instead, delegate this task to OSSes. This approach ensures scalability for large-scale clusters and supercomputers, as well as improved security and reliability. In contrast, shared block-based filesystems such as Global File System and OCFS must allow direct access to the underlying storage by all of the clients in the filesystem and increase the risk of filesystem corruption from misbehaving/defective clients.

Read more about this topic:  Lustre (file System)

Famous quotes containing the word architecture:

    I don’t think of form as a kind of architecture. The architecture is the result of the forming. It is the kinesthetic and visual sense of position and wholeness that puts the thing into the realm of art.
    Roy Lichtenstein (b. 1923)

    No architecture is so haughty as that which is simple.
    John Ruskin (1819–1900)

    In short, the building becomes a theatrical demonstration of its functional ideal. In this romanticism, High-Tech architecture is, of course, no different in spirit—if totally different in form—from all the romantic architecture of the past.
    Dan Cruickshank (b. 1949)