Transaction Processing Over XML

Transaction Processing over XML (TPoX) is a computing benchmark for XML database systems. As a benchmark, TPoX is used for the performance testing of database management systems that are capable of storing, searching, modifying and retrieving XML data. The goal of TPoX is to allow database designers, developers and users to evaluate the performance of XML database features, such as the XML query languages XQuery and SQL/XML, XML storage, XML indexing, XML Schema support, XML updates, transaction processing and logging, and concurrency control. TPoX includes XML update tests based on the XQuery Update Facility.

The TPoX benchmark exercises the processing of data-centric XML, in contrast to content- or document-centric XML.

TPoX was originally developed and tested by IBM and Intel, but became an open source project on SourceForge in January 2007. TPoX 1.1 was released in June 2007. TPoX 2.0 was released in July 2009.

The TPoX benchmark package contains the following:

  • XML Schemas that define the XML data used in the benchmark.
  • An XML data generation tool to generate an arbitrary number of XML documents with well-defined value distributions and referential integrity across documents. The XML data is generated conforming to industry schema such as FIXML to model real-world applications.
  • Workloads which are executed on the generated data. A workload is a set of transactions. A transaction can be a query in XQuery or SQL/XML notation or an insert, update or delete operation.
  • A Java application which acts as a workload driver. It is configurable and can spawn 1 to n parallel threads to simulate concurrent database users. Each user connects to the database and executes a random sequence of transactions defined in the workload. Parameter markers in the transactions are replaced by real values that are drawn from random value distributions. The workload driver collects and reports performance metrics, such as the transaction throughput as well as minimum, maximum and average response times.
  • Documentation.

The TPoX workload consists of seven XML queries, two inserts, two deletes, and six XML update operations. The primary performance metric of the benchmark is TTPS (TPoX Transactions Per Second) which is the throughput of the multi-user read/write workload at a given scale factor. The smallest TPoX scale factor uses 10GB of raw XML documents, the largest uses 1PB of raw XML documents.