In the multi-valued concurrency control (MVCC) model within the in-memory OLTP engine, a row going through updates may have multiple versions in memory simultaneously. Rows inserted and deleted in pre- and post-commit states are also present. Transactions running concurrently and serially may access different row versions at once, depending on rules, toward correct―or sometimes incorrect―outcomes.
In this article I’ll try to give you a solid understanding of what row versions a transaction may see under various conditions. We’ll start with groundwork concepts and an access rule for in-memory rows. Then I’ll expose a timing issue inherent in the model. With this understanding, we can explore further concepts and constructs that fit together to make row visibility consistent in the MVCC.
Row Visibility: BasicsIn-memory rows are structured differently than those in disk-based tables. In lieu of the page structure and its need for logical locking and physical latching, and other differences, each row has a metadata header and payload:
The payload comprises the actual table columns and their values for the row.
I’ve left out most of the header fields to focus on the two directly involved in row visibility. Both Begin Timestamp ( Begin-T s) and End Timestamp ( End-Ts ) are monotonically increasing database-wide numbers that serialize the time at which transactions that create and delete rows commit their work. I’ll refer to row versions only by their transaction serial numbers: < Begin-T s, End-Ts >; the payload does not play a role in visibility.
We’ll soon see where these numbers come from, but for now let’s try an example. I’ll also use the terms row and row version interchangeably throughout.
<10, ∞ >
<31, 240>
The upper row was inserted by a transaction that committed at timestamp 10, and the infinity symbol for the end timestamp means that no transaction has deleted this row; this row version is currently in the table. The lower row was inserted by a transaction that committed at a later point in time―at transaction serial time 31―and was deleted by a transaction having commit timestamp 240.
The payload in an in-memory row can never change. To update column values, the End-Ts for the current row version (remember, End-Ts is ∞ ) is marked with the transaction’s commit timestamp, and a new row with the updated payload columns is inserted with Begin-T s having the same commit timestamp and infinity for the End-Ts . These two operations are done in one atomic step.
<31467, ∞ >
<20000, 31467>
<10, 20000>
In this sample, if all three rows have the same primary key―which, in in-memory tables, are immutable ―then the lowest row would be the row’s insertion, and those above it resulted from two updates in separate transactions committing at timestamps 20000 and 31467 respectively. If the keys are different, then the bottom row was inserted by an early transaction, then deleted by a later transaction, which also inserted the middle row, and the top row was inserted by a transaction with the latest commit timestamp, and this transaction also deleted the middle row.
The samples imply that although the in-memory OLTP engine works with INSERT , UPDATE , and DELETE modification statements, they reduce to operations insert <C-Ts, ∞ > and (logical) delete <original C-Ts, C-Ts> , where C-Ts is the commit timestamp of the transaction doing the work.
Timestamps Begin-T s and End-Ts are gotten from a counter of type bigint known as the Global Transaction Timestamp (GTTs). Its value at any moment was the one given to the latest transaction that issued a commit. (For implicit and explicit transactions, the COMMIT TRAN statement is part of the code, and for single-statement autocommit and atomic block statements used with natively compiled procedures, it is written in automatically.)
Here are its rules:
The counter value is persisted through instance restarts to maintain transaction serial order and correct row version visibility. At execution start, the transaction is assigned the GTTs current value―the logical read time. This determines its single point-in-time row version visibility. Any number of concurrent transactions can have the same read timestamp. At commit, the transaction is assigned the post-incremented GTTs value―the commit timestamp―making the number always one or more higher than any currently active transaction’s logical read time. This value is written into Begin-Ts and End-Ts fields in row headers as was discussed. Commit timestamps are unique across all committing transactions.We now have enough information to understand row visibility.span style=”mso-spacerun:yes”> The < Begin-T s, End-Ts > pair is the validity interval. A transaction may access a row version if its position in the transaction serial order is at least as great as that of the transaction that created the row, and less than the transaction that deleted it, if any; i.e. its order falls within the validity interval of the row:
Begin-T s <= logical read time < End-Ts .
In the three-row sample above, validity intervals for all row versions are disjoint, so a transaction with logical read time of at least 10 could see exactly one of the rows. By contrast, given intervals <50, 80> , <60, ∞ > , a transaction with logical read time from 60 to 79 could see both rows (and the versions can’t share the same primary key).
A Timing IssueThese concepts are sufficient for row visibility―but only in a static database. In-memory OLTP is meant to increase performance in (highly) transactional systems, though, so let’s look at behavior of transactions running together.
For the examples, rows x and y have these begin and end timestamps in their current versions before the transactions start:
X: <10, ∞ >
Y: <676, ∞ >
I’ll use a modified schedule representation to show interleaving of operations:
When transaction i txn-i for short―commits, say it gets timestamp 12345; these are the new row versions:
X: <10, 12345>
―――――
Y: <12345, ∞ >
<676, 12345>
If no other transactions commit between the time txn-i commits and txn-k starts, the latter will get the logical read time 12345; else it will be greater. In either case, by the formula txn-k reads the newly inserted y row version having interval <12345, ∞ > and x is not visible.
Now let’s interleave the transactions’ operations:
If txn-i gets the same commit timestamp, then this time the logical read time for txn-k must be less than 12345. In this case, for the y row it reads the deleted version, <676, 12345>. It had already read the x version whose header values were <10, ∞ >. If it were to re-read the row later, it would access the same version, now with interval <10, 12345>. Which is also correct:
All rows [re-]read by a transaction remain consistent as of a single point in time―at transaction start. Providing consistent read sets (snapshots) is the rationale for row versioning. Now imagine that in either schedule txn-k reads both x and y before txn-i commits; for