背景
Add Generic WAL interface
This interface is designed to give an access to WAL for extensions which
could implement new access method, for example. Previously it was
impossible because restoring from custom WAL would need to access system
catalog to find a redo custom function. This patch suggests generic way
to describe changes on page with standart layout.
Bump XLOG_PAGE_MAGIC because of new record type.
Author: Alexander Korotkov with a help of Petr Jelinek, Markus Nullmeier and
minor editorization by my
Reviewers: Petr Jelinek, Alvaro Herrera, Teodor Sigaev, Jim Nasby,
Michael Paquier
Although all built-in WAL-logged modules have their own types of WAL records, there is also a generic WAL record type, which describes changes to pages in a generic way. This is useful for extensions that provide custom access methods, because they cannot register their own WAL redo routines.
The API for constructing generic WAL records is defined in access/generic_xlog.h and implemented in access/transam/generic_xlog.c.
To perform a WAL-logged data update using the generic WAL record facility, follow these steps:
- state = GenericXLogStart(relation) — start construction of a generic WAL record for the given relation.
- page = GenericXLogRegisterBuffer(state, buffer, flags) — register a buffer to be modified within the current generic WAL record. This function returns a pointer to a temporary copy of the buffer's page, where modifications should be made. (Do not modify the buffer's contents directly.) The third argument is a bitmask of flags applicable to the operation. Currently the only such flag is GENERIC_XLOG_FULL_IMAGE, which indicates that a full-page image rather than a delta update should be included in the WAL record. Typically this flag would be set if the page is new or has been rewritten completely. GenericXLogRegisterBuffer can be repeated if the WAL-logged action needs to modify multiple pages.
- Apply modifications to the page images obtained in the previous step.
- GenericXLogFinish(state) — apply the changes to the buffers and emit the generic WAL record.
WAL record construction can be canceled between any of the above steps by calling GenericXLogAbort(state). This will discard all changes to the page image copies.
Please note the following points when using the generic WAL record facility:
- No direct modifications of buffers are allowed! All modifications must be done in copies acquired from GenericXLogRegisterBuffer(). In other words, code that makes generic WAL records should never call BufferGetPage() for itself. However, it remains the caller's responsibility to pin/unpin and lock/unlock the buffers at appropriate times. Exclusive lock must be held on each target buffer from before GenericXLogRegisterBuffer() until after GenericXLogFinish().
- Registrations of buffers (step 2) and modifications of page images (step 3) can be mixed freely, i.e., both steps may be repeated in any sequence. Keep in mind that buffers should be registered in the same order in which locks are to be obtained on them during replay.
- The maximum number of buffers that can be registered for a generic WAL record is MAX_GENERIC_XLOG_PAGES. An error will be thrown if this limit is exceeded.
- Generic WAL assumes that the pages to be modified have standard layout, and in particular that there is no useful data between pd_lower and pd_upper.
- Since you are modifying copies of buffer pages, GenericXLogStart() does not start a critical section. Thus, you can safely do memory allocation, error throwing, etc. between GenericXLogStart() and GenericXLogFinish(). The only actual critical section is present inside GenericXLogFinish(). There is no need to worry about calling GenericXLogAbort() during an error exit, either.
- GenericXLogFinish() takes care of marking buffers dirty and setting their LSNs. You do not need to do this explicitly.
- For unlogged relations, everything works the same except that no actual WAL record is emitted. Thus, you typically do not need to do any explicit checks for unlogged relations.
- The generic WAL redo function will acquire exclusive locks to buffers in the same order as they were registered. After redoing all changes, the locks will be released in the same order.
- If GENERIC_XLOG_FULL_IMAGE is not specified for a registered buffer, the generic WAL record contains a delta between the old and the new page images. This delta is based on byte-by-byte comparison. This is not very compact for the case of moving data within a page, and might be improved in the future.