D7net
Home
Console
Upload
information
Create File
Create Folder
About
Tools
:
/
opt
/
alt
/
postgresql11
/
usr
/
share
/
doc
/
alt-postgresql11-9.2.24
/
html
/
Filename :
storage-toast.html
back
Copy
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML ><HEAD ><TITLE >TOAST</TITLE ><META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK REV="MADE" HREF="mailto:pgsql-docs@postgresql.org"><LINK REL="HOME" TITLE="PostgreSQL 9.2.24 Documentation" HREF="index.html"><LINK REL="UP" TITLE="Database Physical Storage" HREF="storage.html"><LINK REL="PREVIOUS" TITLE="Database File Layout" HREF="storage-file-layout.html"><LINK REL="NEXT" TITLE="Free Space Map" HREF="storage-fsm.html"><LINK REL="STYLESHEET" TYPE="text/css" HREF="stylesheet.css"><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"><META NAME="creation" CONTENT="2017-11-06T22:43:11"></HEAD ><BODY CLASS="SECT1" ><DIV CLASS="NAVHEADER" ><TABLE SUMMARY="Header navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TH COLSPAN="5" ALIGN="center" VALIGN="bottom" ><A HREF="index.html" >PostgreSQL 9.2.24 Documentation</A ></TH ></TR ><TR ><TD WIDTH="10%" ALIGN="left" VALIGN="top" ><A TITLE="Database File Layout" HREF="storage-file-layout.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="10%" ALIGN="left" VALIGN="top" ><A HREF="storage.html" ACCESSKEY="U" >Up</A ></TD ><TD WIDTH="60%" ALIGN="center" VALIGN="bottom" >Chapter 56. Database Physical Storage</TD ><TD WIDTH="20%" ALIGN="right" VALIGN="top" ><A TITLE="Free Space Map" HREF="storage-fsm.html" ACCESSKEY="N" >Next</A ></TD ></TR ></TABLE ><HR ALIGN="LEFT" WIDTH="100%"></DIV ><DIV CLASS="SECT1" ><H1 CLASS="SECT1" ><A NAME="STORAGE-TOAST" >56.2. TOAST</A ></H1 ><P >This section provides an overview of <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > (The Oversized-Attribute Storage Technique).</P ><P ><SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > (or <SPAN CLASS="QUOTE" >"the best thing since sliced bread"</SPAN >).</P ><P >Only certain data types support <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > — there is no need to impose the overhead on data types that cannot produce large field values. To support <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >, a data type must have a variable-length (<I CLASS="FIRSTTERM" >varlena</I >) representation, in which the first 32-bit word of any stored value contains the total length of the value in bytes (including itself). <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > does not constrain the rest of the representation. All the C-level functions supporting a <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >-able data type must be careful to handle <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >ed input values. (This is normally done by invoking <CODE CLASS="FUNCTION" >PG_DETOAST_DATUM</CODE > before doing anything with an input value, but in some cases more efficient approaches are possible.)</P ><P ><ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > usurps two bits of the varlena length word (the high-order bits on big-endian machines, the low-order bits on little-endian machines), thereby limiting the logical size of any value of a <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >-able data type to 1 GB (2<SUP >30</SUP > - 1 bytes). When both bits are zero, the value is an ordinary un-<ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >ed value of the data type, and the remaining bits of the length word give the total datum size (including length word) in bytes. When the highest-order or lowest-order bit is set, the value has only a single-byte header instead of the normal four-byte header, and the remaining bits give the total datum size (including length byte) in bytes. As a special case, if the remaining bits are all zero (which would be impossible for a self-inclusive length), the value is a pointer to out-of-line data stored in a separate TOAST table. (The size of a TOAST pointer is given in the second byte of the datum.) Values with single-byte headers aren't aligned on any particular boundary, either. Lastly, when the highest-order or lowest-order bit is clear but the adjacent bit is set, the content of the datum has been compressed and must be decompressed before use. In this case the remaining bits of the length word give the total size of the compressed datum, not the original data. Note that compression is also possible for out-of-line data but the varlena header does not tell whether it has occurred — the content of the TOAST pointer tells that, instead.</P ><P >If any of the columns of a table are <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >-able, the table will have an associated <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > table, whose OID is stored in the table's <TT CLASS="STRUCTNAME" >pg_class</TT >.<TT CLASS="STRUCTFIELD" >reltoastrelid</TT > entry. Out-of-line <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >ed values are kept in the <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > table, as described in more detail below.</P ><P >The compression technique used is a fairly simple and very fast member of the LZ family of compression techniques. See <TT CLASS="FILENAME" >src/backend/utils/adt/pg_lzcompress.c</TT > for the details.</P ><P >Out-of-line values are divided (after compression if used) into chunks of at most <TT CLASS="SYMBOL" >TOAST_MAX_CHUNK_SIZE</TT > bytes (by default this value is chosen so that four chunk rows will fit on a page, making it about 2000 bytes). Each chunk is stored as a separate row in the <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > table for the owning table. Every <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > table has the columns <TT CLASS="STRUCTFIELD" >chunk_id</TT > (an OID identifying the particular <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >ed value), <TT CLASS="STRUCTFIELD" >chunk_seq</TT > (a sequence number for the chunk within its value), and <TT CLASS="STRUCTFIELD" >chunk_data</TT > (the actual data of the chunk). A unique index on <TT CLASS="STRUCTFIELD" >chunk_id</TT > and <TT CLASS="STRUCTFIELD" >chunk_seq</TT > provides fast retrieval of the values. A pointer datum representing an out-of-line <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >ed value therefore needs to store the OID of the <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > table in which to look and the OID of the specific value (its <TT CLASS="STRUCTFIELD" >chunk_id</TT >). For convenience, pointer datums also store the logical datum size (original uncompressed data length) and actual stored size (different if compression was applied). Allowing for the varlena header bytes, the total size of a <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > pointer datum is therefore 18 bytes regardless of the actual size of the represented value.</P ><P >The <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > code is triggered only when a row value to be stored in a table is wider than <TT CLASS="SYMBOL" >TOAST_TUPLE_THRESHOLD</TT > bytes (normally 2 kB). The <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > code will compress and/or move field values out-of-line until the row value is shorter than <TT CLASS="SYMBOL" >TOAST_TUPLE_TARGET</TT > bytes (also normally 2 kB) or no more gains can be had. During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > costs if none of the out-of-line values change.</P ><P >The <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > code recognizes four different strategies for storing <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >-able columns: <P ></P ></P><UL ><LI ><P > <TT CLASS="LITERAL" >PLAIN</TT > prevents either compression or out-of-line storage; furthermore it disables use of single-byte headers for varlena types. This is the only possible strategy for columns of non-<ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >-able data types. </P ></LI ><LI ><P > <TT CLASS="LITERAL" >EXTENDED</TT > allows both compression and out-of-line storage. This is the default for most <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >-able data types. Compression will be attempted first, then out-of-line storage if the row is still too big. </P ></LI ><LI ><P > <TT CLASS="LITERAL" >EXTERNAL</TT > allows out-of-line storage but not compression. Use of <TT CLASS="LITERAL" >EXTERNAL</TT > will make substring operations on wide <TT CLASS="TYPE" >text</TT > and <TT CLASS="TYPE" >bytea</TT > columns faster (at the penalty of increased storage space) because these operations are optimized to fetch only the required parts of the out-of-line value when it is not compressed. </P ></LI ><LI ><P > <TT CLASS="LITERAL" >MAIN</TT > allows compression but not out-of-line storage. (Actually, out-of-line storage will still be performed for such columns, but only as a last resort when there is no other way to make the row small enough to fit on a page.) </P ></LI ></UL ><P> Each <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >-able data type specifies a default strategy for columns of that data type, but the strategy for a given table column can be altered with <TT CLASS="COMMAND" >ALTER TABLE SET STORAGE</TT >.</P ><P >This scheme has a number of advantages compared to a more straightforward approach such as allowing row values to span pages. Assuming that queries are usually qualified by comparisons against relatively small key values, most of the work of the executor will be done using the main row entry. The big values of <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >ed attributes will only be pulled out (if selected at all) at the time the result set is sent to the client. Thus, the main table is much smaller and more of its rows fit in the shared buffer cache than would be the case without any out-of-line storage. Sort sets shrink also, and sorts will more often be done entirely in memory. A little test showed that a table containing typical HTML pages and their URLs was stored in about half of the raw data size including the <ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM > table, and that the main table contained only about 10% of the entire data (the URLs and some small HTML pages). There was no run time difference compared to an un-<ACRONYM CLASS="ACRONYM" >TOAST</ACRONYM >ed comparison table, in which all the HTML pages were cut down to 7 kB to fit.</P ></DIV ><DIV CLASS="NAVFOOTER" ><HR ALIGN="LEFT" WIDTH="100%"><TABLE SUMMARY="Footer navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" ><A HREF="storage-file-layout.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="index.html" ACCESSKEY="H" >Home</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" ><A HREF="storage-fsm.html" ACCESSKEY="N" >Next</A ></TD ></TR ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" >Database File Layout</TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="storage.html" ACCESSKEY="U" >Up</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" >Free Space Map</TD ></TR ></TABLE ></DIV ></BODY ></HTML >