Pick Operating System - Overview

Overview

The Pick database is a 'hash-file' data management system. A hash-file system is a collection of dynamic associative arrays which are organized altogether and linked and controlled using associative files as a database management system. Being hash-file oriented, Pick provides efficiency in data access time. Originally, all data structures in Pick were hash-files (at the lowest level) meaning records are stored as associated couplets of a primary key to a set of values. Today a Pick system can also natively access host files in Windows or Unix in any format.

A Pick database is divided into one or more accounts, master dictionaries, dictionaries, files and sub-files, each of which is a hash-table oriented file. These files contain records made up of fields, sub-fields and sub-sub-fields. In Pick, records are called items, fields are called attributes, and sub-fields are called values or sub-values (hence the present-day label "multivalued database"). All elements are variable-length, with field and values marked off by special delimiters, so that any file, record, or field may contain any number of entries of the lower level of entity. As a result, a Pick item (record) can be one complete entity (one entire invoice, purchase order, sales order, etc.), or is like a file on most conventional systems. Entities that are stored as 'files' in other common-place systems (i.e. source programs and text documents) must be stored as records within files on Pick.

The file hierarchy is roughly equivalent to the common Unix-like hierarchy of directories, sub-directories, and files. The master dictionary is similar to a directory in that it stores pointers to other dictionaries, files and executable programs. The master dictionary also contains the command-line language.

All files (accounts, dictionaries, files, sub-files) are organized identically, as are all records. This uniformity is exploited throughout the system, both by system functions, and by the system administration commands. For example, the 'find' command will find and report the occurrence of a word or phrase in a file, and can operate on any account, dictionary, file or sub-file.

Each record must have a unique, primary key which determines where in a file that record is stored. To retrieve a record, its key is hashed and the resultant value specifies which of a set of discrete "buckets" (called "groups") to look in for the record. (Within a bucket, records are scanned sequentially.) Therefore, most records (i.e. a complete document) can be read using one single disk-read operation. This same method is used to write the record back to its correct "bucket".

In its initial implementation, Pick records were limited to 32K bytes in total (when a 10MB hard disk cost US$5000), although this limit was removed in the 1980s. Files can contain an unlimited number of records, but retrieval efficiency is determined by the number of records relative to the number of buckets allocated to the file. Each file may be initially allocated as many buckets as required, although changing this extent later may (for some file types) require the file to be quiescent. All modern multi-value databases have a special file-type which changes extent dynamically as the file is used. These use a technique called linear hashing, whose cost is proportional to the change in file size, not (as in typical hashed files) the file size itself. All files start as a contiguous group of disk pages, and grow by linking additional "overflow" pages from unused disk space.

Initial Pick implementations had no index structures as they were not deemed necessary. Around 1990, a B-tree indexing feature was added. This feature makes secondary key look-ups operate much like keyed inquiries of any other database system: requiring at least two disk reads (a key read then a data-record read).

Pick data files are usually two levels. The first level is known as the "dictionary" level and is mandatory. It contains:

  • Dictionary items – the optional items that serve as definitions for the names and structure of the items in the data fork, used in reporting
  • The data-level identifier – a pointer to the second or "data" level of the file

Files created with only one level are, by default, dictionary files. Some versions of the Pick system allow multiple data levels to be linked to one dictionary level file, in which case there would be multiple data-level identifiers in the dictionary file.

A Pick database has no data typing since all data is stored as characters, including numbers (which are stored as character decimal digits). Data integrity, rather than being controlled by the system, is controlled by the applications and the discipline of the programmers. Because a logical document in Pick is not fragmented (as it would be in SQL), intra-record integrity is automatic.

In contrast to many SQL database systems, Pick allows for multiple, pre-computed field aliases. For example, a date field may have an alias definition for the format "12 Oct 1999", and another alias formatting that same date field as "10/12/99". File cross-connects or joins are handled as a synonym definition of the foreign key. A customer's data, such as name and address, are "joined" from the customer file into the invoice file via a synonym definition of "customer number" in the "invoice" dictionary.

Pick record structure favors a non-first-normal-form composition, where all of the data for an entity is stored in a single record, obviating the need to perform joins. Managing large, sparse data sets in this way can result in efficient use of storage space. This is why these databases are sometimes called NF2 or NF-squared databases.

Read more about this topic:  Pick Operating System