GreenPlum存储模式
写道
GreenPlum 数据库创建表时提供了一组关于存储的参数,这一点非常重要。什么时候使用堆(Heap)存储与AO(Append-optimized)存储,什么时候使用基于行式存储与列式存储。正确选择堆与AO和行与列对于大表来说非常重要。
CREATE TABLE
写道
Defines a new table.
Note: Referential integrity syntax (foreign key constraints) is accepted but not enforced.
Synopsis
Note: Referential integrity syntax (foreign key constraints) is accepted but not enforced.
Synopsis
CREATE [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE table_name( [ { column_name data_type[ DEFAULT default_expr] [column_constraint[ ... ] [ ENCODING ( storage_directive[,...] ) ] ] | table_constraint | LIKE other_table[{INCLUDING | EXCLUDING} {DEFAULTS | CONSTRAINTS}] ...} [, ... ] ] ) [ INHERITS ( parent_table[, ... ] ) ] [ WITH ( storage_parameter=value[, ... ] ) [ ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP} ] [ TABLESPACE tablespace] [ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ] [ PARTITION BY partition_type(column) [ SUBPARTITION BY partition_type(column) ] [ SUBPARTITION TEMPLATE ( template_spec ) ] [...] ( partition_spec) | [ SUBPARTITION BY partition_type(column) ] [...] ( partition_spec [ ( subpartition_spec [(...)] ) ] )
where column_constraintis(列限制):
[CONSTRAINT constraint_name] NOT NULL | NULL | UNIQUE [USING INDEX TABLESPACE tablespace] [WITH ( FILLFACTOR = value)] | PRIMARY KEY [USING INDEX TABLESPACE tablespace] [WITH ( FILLFACTOR = value)] | CHECK ( expression) | REFERENCES table_name [ ( column_name [, ... ] ) ] [ key_match_type ] [ key_action ]
where storage_directivefor a column is(列存储):
COMPRESSTYPE={ZLIB | QUICKLZ | RLE_TYPE | NONE} [COMPRESSLEVEL={0-9} ] [BLOCKSIZE={8192-2097152} ]
where storage_parameterfor the table is(表存储):
APPENDONLY={TRUE|FALSE} BLOCKSIZE={8192-2097152} ORIENTATION={COLUMN|ROW} CHECKSUM={TRUE|FALSE} COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE} COMPRESSLEVEL={0-9} FILLFACTOR={10-100} OIDS[=TRUE|FALSE]
table_constraintis(表限制):
[CONSTRAINT constraint_name] UNIQUE ( column_name[, ... ] ) [USING INDEX TABLESPACE tablespace] [WITH ( FILLFACTOR=value)] | PRIMARY KEY ( column_name[, ... ] ) [USING INDEX TABLESPACE tablespace] [WITH ( FILLFACTOR=value)] | CHECK ( expression) | FOREIGN KEY ( column_name[, ... ] ) REFERENCES table_name [ ( column_name [, ... ] ) ] [ key_match_type] [ key_action] [ key_checking_mode]
--where key_match_typeis: MATCH FULL | SIMPLE --where key_actionis: ON DELETE | ON UPDATE | NO ACTION | RESTRICT | CASCADE | SET NULL | SET DEFAULT --where key_checking_modeis(Key检查模式): DEFERRABLE | NOT DEFERRABLE | INITIALLY DEFERRED | INITIALLY IMMEDIATE --where partition_typeis(分区类型): LIST | RANGE --where partition_specificationis(分区说明): partition_element[, ...] and partition_elementis: DEFAULT PARTITION name | [PARTITION name] VALUES (list_value[,...] ) | [PARTITION name] START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE] [ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ] [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ] | [PARTITION name] END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ] [ WITH ( partition_storage_parameter=value[, ... ] ) ] [ TABLESPACE tablespace] --where subpartition_specor template_specis(子分区模板指定): subpartition_element[, ...] --and subpartition_elementis(分区元素): DEFAULT SUBPARTITION name | [SUBPARTITION name] VALUES (list_value[,...] ) | [SUBPARTITION name] START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE] [ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ] [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ] | [SUBPARTITION name] END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ] [ WITH ( partition_storage_parameter=value[, ... ] ) ] [ TABLESPACE tablespace] --where storage_parameterfor a partition is(分区存储参数): APPENDONLY={TRUE|FALSE} BLOCKSIZE={8192-2097152} ORIENTATION={COLUMN|ROW} CHECKSUM={TRUE|FALSE} COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE} COMPRESSLEVEL={1-9} FILLFACTOR={10-100} OIDS[=TRUE|FALSE]
Description(描述)
写道
CREATE TABLEcreates an initially empty table in the current database. The user who issues the command owns the table.
If you specify a schema name, Greenplum creates the table in the specified schema. Otherwise
Greenplum creates the table in the current schema. Temporary tables exist in a special schema, so you cannot specify a schema name when creating a temporary table. Table names must be distinct from the name of any other table, external table, sequence, index, or view in the same schema.
If you specify a schema name, Greenplum creates the table in the specified schema. Otherwise
Greenplum creates the table in the current schema. Temporary tables exist in a special schema, so you cannot specify a schema name when creating a temporary table. Table names must be distinct from the name of any other table, external table, sequence, index, or view in the same schema.
The optional constraint clauses specify conditions that new or updated rows must satisfy for an insert or update operation to succeed. A constraint is an SQL object that helps define the set of valid values in the table in various ways. Constraints apply to tables, not to partitions. You cannot add a constraint to a partition or subpartition.
Referential integrity constraints (foreign keys) are accepted but not enforced. The information is kept in the system catalogs but is otherwise ignored.
There are two ways to define constraints: table constraints and column constraints. A column constraint is defined as part of a column definition. A table constraint definition is not tied to a particular column, and it can encompass more than one column. Every column constraint can also be written as a table constraint; a column constraint is only a notational convenience for use when the constraint only affects one column.
When creating a table, there is an additional clause to declare the Greenplum Database distribution policy. If a DISTRIBUTED BYor DISTRIBUTED RANDOMLYclause is not supplied, then Greenplum assigns a hash distribution policy to the table using either the PRIMARY KEY(if the table has one) or the first column of the table as the distribution key. Columns of geometric or user-defined data types are not eligible as Greenplum distribution key columns. If a table does not have a column of an eligible data type, the rows are distributed based on a round-robin or random distribution. To ensure an even distribution of data in your Greenplum Database system, you want to choose a distribution key that is unique for each record, or if that is not possible, then choose DISTRIBUTED RANDOMLY.
The PARTITION BY clause allows you to divide the table into multiple sub-tables (or parts) that, taken
together, make up the parent table and share its schema. Though the sub-tables exist as independent tables, the Greenplum Database restricts their use in important ways. Internally, partitioning is implemented as a special form of inheritance. Each child table partition is created with a distinct CHECK constraint which limits the data the table can contain, based on some defining criteria. The CHECK constraints are also used by the query optimizer to determine which table partitions to scan in order to satisfy a given query predicate. These partition constraints are managed automatically by the Greenplum Database.
together, make up the parent table and share its schema. Though the sub-tables exist as independent tables, the Greenplum Database restricts their use in important ways. Internally, partitioning is implemented as a special form of inheritance. Each child table partition is created with a distinct CHECK constraint which limits the data the table can contain, based on some defining criteria. The CHECK constraints are also used by the query optimizer to determine which table partitions to scan in order to satisfy a given query predicate. These partition constraints are managed automatically by the Greenplum Database.
Parameters
GLOBAL | LOCAL
These keywords are present for SQL standard compatibility, but have no effect in Greenplum
Database.
TEMPORARY | TEMP
If specified, the table is created as a temporary table. Temporary tables are automatically
dropped at the end of a session, or optionally at the end of the current transaction (see ON
COMMIT). Existing permanent tables with the same name are not visible to the current session
while the temporary table exists, unless they are referenced with schema-qualified names.
Any indexes created on a temporary table are automatically temporary as well.
table_name
The name (optionally schema-qualified) of the table to be created.
column_name
The name of a column to be created in the new table.
data_type
The data type of the column. This may include array specifiers.
For table columns that contain textual data, Pivotal recommends specifying the data
type VARCHARor TEXT. Specifying the data type CHARis not recommended. In Greenplum
Database, the data types VARCHARor TEXThandles padding added to the data (space
characters added after the last non-space character) as significant characters, the data type
CHARdoes not. See Notes.
DEFAULT default_expr
The DEFAULTclause assigns a default data value for the column whose column definition it
appears within. The value is any variable-free expression (subqueries and cross-references
to other columns in the current table are not allowed). The data type of the default expression
must match the data type of the column. The default expression will be used in any insert
operation that does not specify a value for the column. If there is no default for a column, then
the default is null.
ENCODING ( storage_directive[, ...] )
For a column, the optional ENCODINGclause specifies the type of compression and block
size for the column data. See storage_optionsfor COMPRESSTYPE, COMPRESSLEVEL, and
BLOCKSIZEvalues.
The clause is valid only for append-optimized, column-oriented tables.
Column compression settings are inherited from the table level to the partition level to the
subpartition level. The lowest-level settings have priority.
INHERITS
The optional INHERITSclause specifies a list of tables from which the new table automatically
inherits all columns. Use of INHERITScreates a persistent relationship between the new
child table and its parent table(s). Schema modifications to the parent(s) normally propagate
to children as well, and by default the data of the child table is included in scans of the
parent(s).
GLOBAL | LOCAL
These keywords are present for SQL standard compatibility, but have no effect in Greenplum
Database.
TEMPORARY | TEMP
If specified, the table is created as a temporary table. Temporary tables are automatically
dropped at the end of a session, or optionally at the end of the current transaction (see ON
COMMIT). Existing permanent tables with the same name are not visible to the current session
while the temporary table exists, unless they are referenced with schema-qualified names.
Any indexes created on a temporary table are automatically temporary as well.
table_name
The name (optionally schema-qualified) of the table to be created.
column_name
The name of a column to be created in the new table.
data_type
The data type of the column. This may include array specifiers.
For table columns that contain textual data, Pivotal recommends specifying the data
type VARCHARor TEXT. Specifying the data type CHARis not recommended. In Greenplum
Database, the data types VARCHARor TEXThandles padding added to the data (space
characters added after the last non-space character) as significant characters, the data type
CHARdoes not. See Notes.
DEFAULT default_expr
The DEFAULTclause assigns a default data value for the column whose column definition it
appears within. The value is any variable-free expression (subqueries and cross-references
to other columns in the current table are not allowed). The data type of the default expression
must match the data type of the column. The default expression will be used in any insert
operation that does not specify a value for the column. If there is no default for a column, then
the default is null.
ENCODING ( storage_directive[, ...] )
For a column, the optional ENCODINGclause specifies the type of compression and block
size for the column data. See storage_optionsfor COMPRESSTYPE, COMPRESSLEVEL, and
BLOCKSIZEvalues.
The clause is valid only for append-optimized, column-oriented tables.
Column compression settings are inherited from the table level to the partition level to the
subpartition level. The lowest-level settings have priority.
INHERITS
The optional INHERITSclause specifies a list of tables from which the new table automatically
inherits all columns. Use of INHERITScreates a persistent relationship between the new
child table and its parent table(s). Schema modifications to the parent(s) normally propagate
to children as well, and by default the data of the child table is included in scans of the
parent(s).
In Greenplum Database, the INHERITSclause is not used when creating partitioned tables.
Although the concept of inheritance is used in partition hierarchies, the inheritance structure
of a partitioned table is created using the PARTITION BYclause.
If the same column name exists in more than one parent table, an error is reported unless
the data types of the columns match in each of the parent tables. If there is no conflict, then
the duplicate columns are merged to form a single column in the new table. If the column
name list of the new table contains a column name that is also inherited, the data type must
likewise match the inherited column(s), and the column definitions are merged into one.
However, inherited and new column declarations of the same name need not specify identical
constraints: all constraints provided from any declaration are merged together and all are
applied to the new table. If the new table explicitly specifies a default value for the column,
this default overrides any defaults from inherited declarations of the column. Otherwise, any
parents that specify default values for the column must all specify the same default, or an
error will be reported.
LIKE other_table[{INCLUDING | EXCLUDING} {DEFAULTS | CONSTRAINTS}]
The LIKEclause specifies a table from which the new table automatically copies all column
names, data types, not-null constraints, and distribution policy. Storage properties like
append-optimized or partition structure are not copied. Unlike INHERITS, the new table and
original table are completely decoupled after creation is complete.
Default expressions for the copied column definitions will only be copied if INCLUDING
DEFAULTSis specified. The default behavior is to exclude default expressions, resulting in the
copied columns in the new table having null defaults.
Not-null constraints are always copied to the new table. CHECKconstraints will only be copied
if INCLUDING CONSTRAINTSis specified; other types of constraints will neverbe copied. Also,
no distinction is made between column constraints and table constraints — when constraints
are requested, all check constraints are copied.
Note also that unlike INHERITS, copied columns and constraints are not merged with similarly
named columns and constraints. If the same name is specified explicitly or in another LIKE
clause an error is signalled.
CONSTRAINT constraint_name
An optional name for a column or table constraint. If the constraint is violated, the constraint
name is present in error messages, so constraint names like column must be positivecan
be used to communicate helpful constraint information to client applications. (Double-quotes
are needed to specify constraint names that contain spaces.) If a constraint name is not
specified, the system generates a name.
Note: The specified constraint_nameis used for the constraint, but a system-generated unique name is used for the index name. In some prior releases, the
provided name was used for both the constraint name and the index name.
NULL | NOT NULL
Specifies if the column is or is not allowed to contain null values. NULLis the default.
UNIQUE ( column constraint)
UNIQUE ( column_name[, ... ] ) ( table constraint)
The UNIQUEconstraint specifies that a group of one or more columns of a table may contain
only unique values. The behavior of the unique table constraint is the same as that for
column constraints, with the additional capability to span multiple columns. For the purpose of
a unique constraint, null values are not considered equal. The column(s) that are unique must
contain all the columns of the Greenplum distribution key. In addition, the <key>must contain
all the columns in the partition key if the table is partitioned. Note that a <key>constraint in a
partitioned table is not the same as a simple UNIQUE INDEX.
PRIMARY KEY ( column constraint)
PRIMARY KEY ( column_name[, ... ] ) ( table constraint)
Although the concept of inheritance is used in partition hierarchies, the inheritance structure
of a partitioned table is created using the PARTITION BYclause.
If the same column name exists in more than one parent table, an error is reported unless
the data types of the columns match in each of the parent tables. If there is no conflict, then
the duplicate columns are merged to form a single column in the new table. If the column
name list of the new table contains a column name that is also inherited, the data type must
likewise match the inherited column(s), and the column definitions are merged into one.
However, inherited and new column declarations of the same name need not specify identical
constraints: all constraints provided from any declaration are merged together and all are
applied to the new table. If the new table explicitly specifies a default value for the column,
this default overrides any defaults from inherited declarations of the column. Otherwise, any
parents that specify default values for the column must all specify the same default, or an
error will be reported.
LIKE other_table[{INCLUDING | EXCLUDING} {DEFAULTS | CONSTRAINTS}]
The LIKEclause specifies a table from which the new table automatically copies all column
names, data types, not-null constraints, and distribution policy. Storage properties like
append-optimized or partition structure are not copied. Unlike INHERITS, the new table and
original table are completely decoupled after creation is complete.
Default expressions for the copied column definitions will only be copied if INCLUDING
DEFAULTSis specified. The default behavior is to exclude default expressions, resulting in the
copied columns in the new table having null defaults.
Not-null constraints are always copied to the new table. CHECKconstraints will only be copied
if INCLUDING CONSTRAINTSis specified; other types of constraints will neverbe copied. Also,
no distinction is made between column constraints and table constraints — when constraints
are requested, all check constraints are copied.
Note also that unlike INHERITS, copied columns and constraints are not merged with similarly
named columns and constraints. If the same name is specified explicitly or in another LIKE
clause an error is signalled.
CONSTRAINT constraint_name
An optional name for a column or table constraint. If the constraint is violated, the constraint
name is present in error messages, so constraint names like column must be positivecan
be used to communicate helpful constraint information to client applications. (Double-quotes
are needed to specify constraint names that contain spaces.) If a constraint name is not
specified, the system generates a name.
Note: The specified constraint_nameis used for the constraint, but a system-generated unique name is used for the index name. In some prior releases, the
provided name was used for both the constraint name and the index name.
NULL | NOT NULL
Specifies if the column is or is not allowed to contain null values. NULLis the default.
UNIQUE ( column constraint)
UNIQUE ( column_name[, ... ] ) ( table constraint)
The UNIQUEconstraint specifies that a group of one or more columns of a table may contain
only unique values. The behavior of the unique table constraint is the same as that for
column constraints, with the additional capability to span multiple columns. For the purpose of
a unique constraint, null values are not considered equal. The column(s) that are unique must
contain all the columns of the Greenplum distribution key. In addition, the <key>must contain
all the columns in the partition key if the table is partitioned. Note that a <key>constraint in a
partitioned table is not the same as a simple UNIQUE INDEX.
PRIMARY KEY ( column constraint)
PRIMARY KEY ( column_name[, ... ] ) ( table constraint)
The primary key constraint specifies that a column or columns of a table may contain only
unique (non-duplicate), non-null values. Technically, PRIMARY KEYis merely a combination
of UNIQUEand NOT NULL, but identifying a set of columns as primary key also provides
metadata about the design of the schema, as a primary key implies that other tables may
rely on this set of columns as a unique identifier for rows. For a table to have a primary key, it
must be hash distributed (not randomly distributed), and the primary key The column(s) that
are unique must contain all the columns of the Greenplum distribution key. In addition, the
<key>must contain all the columns in the partition key if the table is partitioned. Note that a
<key>constraint in a partitioned table is not the same as a simple UNIQUE INDEX.
CHECK ( expression)
The CHECKclause specifies an expression producing a Boolean result which new or updated
rows must satisfy for an insert or update operation to succeed. Expressions evaluating to
TRUEor UNKNOWNsucceed. Should any row of an insert or update operation produce a FALSE
result an error exception is raised and the insert or update does not alter the database. A
check constraint specified as a column constraint should reference that column's value only,
while an expression appearing in a table constraint may reference multiple columns. CHECK
expressions cannot contain subqueries nor refer to variables other than columns of the
current row.
REFERENCES table_name[ ( column_name[, ... ] ) ]
[key_match_type ] [ key_action]
FOREIGN KEY ( column_name[, ... ] )
REFERENCES table_name[ ( column_name[, ... ] )
[key_match_type ] [ key_action[ key_checking_mode]
The REFERENCESand FOREIGN KEYclauses specify referential integrity constraints (foreign
key constraints). Greenplum accepts referential integrity constraints as specified in
PostgreSQL syntax but does not enforce them. See the PostgreSQL documentation for
information about referential integrity constraints.
WITH ( storage_option=value)
The WITHclause can be used to set storage options for the table or its indexes. Note that you
can also set storage parameters on a particular partition or subpartition by declaring the WITH
clause in the partition specification. The lowest-level settings have priority.
The defaults for some of the table storage options can be specified with the server
configuration parameter gp_default_storage_options. For information about setting
default storage options, see Notes.
The following storage options are available:
APPENDONLY— Set to TRUEto create the table as an append-optimized table. If FALSEor
not declared, the table will be created as a regular heap-storage table.
BLOCKSIZE— Set to the size, in bytes for each block in a table. The BLOCKSIZEmust be
between 8192 and 2097152 bytes, and be a multiple of 8192. The default is 32768.
ORIENTATION— Set to column for column-oriented storage, or row(the default) for row-oriented storage. This option is only valid if APPENDONLY=TRUE. Heap-storage tables can only
be row-oriented.
CHECKSUM— This option is valid only for append-optimized tables (APPENDONLY=TRUE).
The value TRUEis the default and enables CRC checksum validation for append-optimized
tables. The checksum is calculated during block creation and is stored on disk. Checksum
validation is performed during block reads. If the checksum calculated during the read does
not match the stored checksum, the transaction is aborted. If you set the value to FALSE
to disable checksum validation, checking the table data for on-disk corruption will not be
performed.
COMPRESSTYPE— Set to ZLIB(the default), RLE-TYPE, or QUICKLZto specify the type
of compression used. The value NONEdisables compression. QuickLZ uses less CPU
unique (non-duplicate), non-null values. Technically, PRIMARY KEYis merely a combination
of UNIQUEand NOT NULL, but identifying a set of columns as primary key also provides
metadata about the design of the schema, as a primary key implies that other tables may
rely on this set of columns as a unique identifier for rows. For a table to have a primary key, it
must be hash distributed (not randomly distributed), and the primary key The column(s) that
are unique must contain all the columns of the Greenplum distribution key. In addition, the
<key>must contain all the columns in the partition key if the table is partitioned. Note that a
<key>constraint in a partitioned table is not the same as a simple UNIQUE INDEX.
CHECK ( expression)
The CHECKclause specifies an expression producing a Boolean result which new or updated
rows must satisfy for an insert or update operation to succeed. Expressions evaluating to
TRUEor UNKNOWNsucceed. Should any row of an insert or update operation produce a FALSE
result an error exception is raised and the insert or update does not alter the database. A
check constraint specified as a column constraint should reference that column's value only,
while an expression appearing in a table constraint may reference multiple columns. CHECK
expressions cannot contain subqueries nor refer to variables other than columns of the
current row.
REFERENCES table_name[ ( column_name[, ... ] ) ]
[key_match_type ] [ key_action]
FOREIGN KEY ( column_name[, ... ] )
REFERENCES table_name[ ( column_name[, ... ] )
[key_match_type ] [ key_action[ key_checking_mode]
The REFERENCESand FOREIGN KEYclauses specify referential integrity constraints (foreign
key constraints). Greenplum accepts referential integrity constraints as specified in
PostgreSQL syntax but does not enforce them. See the PostgreSQL documentation for
information about referential integrity constraints.
WITH ( storage_option=value)
The WITHclause can be used to set storage options for the table or its indexes. Note that you
can also set storage parameters on a particular partition or subpartition by declaring the WITH
clause in the partition specification. The lowest-level settings have priority.
The defaults for some of the table storage options can be specified with the server
configuration parameter gp_default_storage_options. For information about setting
default storage options, see Notes.
The following storage options are available:
APPENDONLY— Set to TRUEto create the table as an append-optimized table. If FALSEor
not declared, the table will be created as a regular heap-storage table.
BLOCKSIZE— Set to the size, in bytes for each block in a table. The BLOCKSIZEmust be
between 8192 and 2097152 bytes, and be a multiple of 8192. The default is 32768.
ORIENTATION— Set to column for column-oriented storage, or row(the default) for row-oriented storage. This option is only valid if APPENDONLY=TRUE. Heap-storage tables can only
be row-oriented.
CHECKSUM— This option is valid only for append-optimized tables (APPENDONLY=TRUE).
The value TRUEis the default and enables CRC checksum validation for append-optimized
tables. The checksum is calculated during block creation and is stored on disk. Checksum
validation is performed during block reads. If the checksum calculated during the read does
not match the stored checksum, the transaction is aborted. If you set the value to FALSE
to disable checksum validation, checking the table data for on-disk corruption will not be
performed.
COMPRESSTYPE— Set to ZLIB(the default), RLE-TYPE, or QUICKLZto specify the type
of compression used. The value NONEdisables compression. QuickLZ uses less CPU
power and compresses data faster at a lower compression ratio than zlib. Conversely, zlib
provides more compact compression ratios at lower speeds. This option is only valid if
APPENDONLY=TRUE.
The value RLE_TYPEis supported only if ORIENTATION=columnis specified, Greenplum
Database uses the run-length encoding (RLE) compression algorithm. RLE compresses data
better than the zlib or QuickLZ compression algorithm when the same data value occurs in
many consecutive rows.
For columns of type BIGINT, INTEGER, DATE, TIME, or TIMESTAMP, delta compression is also
applied if the COMPRESSTYPEoption is set to RLE-TYPEcompression. The delta compression
algorithm is based on the delta between column values in consecutive rows and is designed
to improve compression when data is loaded in sorted order or the compression is applied to
column data that is in sorted order.
For information about using table compression, see "Choosing the Table Storage Model" in
the Greenplum Database Administrator Guide.
COMPRESSLEVEL— For zlib compression of append-optimized tables, set to an
integer value between 1 (fastest compression) to 9 (highest compression ratio). QuickLZ
compression level can only be set to 1. If not declared, the default is 1. For RLE_TYPE, the
compression level can be set an integer value between 1 (fastest compression) to 4 (highest
compression ratio).
This option is valid only if APPENDONLY=TRUE.
FILLFACTOR— See CREATE INDEXfor more information about this index storage
parameter.
OIDS— Set to OIDS=FALSE(the default) so that rows do not have object identifiers assigned
to them. Greenplum strongly recommends that you do not enable OIDS when creating a
table. On large tables, such as those in a typical Greenplum Database system, using OIDs
for table rows can cause wrap-around of the 32-bit OID counter. Once the counter wraps
around, OIDs can no longer be assumed to be unique, which not only makes them useless to
user applications, but can also cause problems in the Greenplum Database system catalog
tables. In addition, excluding OIDs from a table reduces the space required to store the
table on disk by 4 bytes per row, slightly improving performance. OIDS are not allowed on
partitioned tables or append-optimized column-oriented tables.
ON COMMIT
The behavior of temporary tables at the end of a transaction block can be controlled using ON
COMMIT. The three options are:
PRESERVE ROWS- No special action is taken at the ends of transactions for temporary
tables. This is the default behavior.
DELETE ROWS- All rows in the temporary table will be deleted at the end of each
transaction block. Essentially, an automatic TRUNCATEis done at each commit.
DROP- The temporary table will be dropped at the end of the current transaction block.
TABLESPACE tablespace
The name of the tablespace in which the new table is to be created. If not specified, the
database's default tablespace is used.
USING INDEX TABLESPACE tablespace
This clause allows selection of the tablespace in which the index associated with a UNIQUEor
PRIMARY KEYconstraint will be created. If not specified, the database's default tablespace is
used.
DISTRIBUTED BY (column, [ ... ] )
DISTRIBUTED RANDOMLY
provides more compact compression ratios at lower speeds. This option is only valid if
APPENDONLY=TRUE.
The value RLE_TYPEis supported only if ORIENTATION=columnis specified, Greenplum
Database uses the run-length encoding (RLE) compression algorithm. RLE compresses data
better than the zlib or QuickLZ compression algorithm when the same data value occurs in
many consecutive rows.
For columns of type BIGINT, INTEGER, DATE, TIME, or TIMESTAMP, delta compression is also
applied if the COMPRESSTYPEoption is set to RLE-TYPEcompression. The delta compression
algorithm is based on the delta between column values in consecutive rows and is designed
to improve compression when data is loaded in sorted order or the compression is applied to
column data that is in sorted order.
For information about using table compression, see "Choosing the Table Storage Model" in
the Greenplum Database Administrator Guide.
COMPRESSLEVEL— For zlib compression of append-optimized tables, set to an
integer value between 1 (fastest compression) to 9 (highest compression ratio). QuickLZ
compression level can only be set to 1. If not declared, the default is 1. For RLE_TYPE, the
compression level can be set an integer value between 1 (fastest compression) to 4 (highest
compression ratio).
This option is valid only if APPENDONLY=TRUE.
FILLFACTOR— See CREATE INDEXfor more information about this index storage
parameter.
OIDS— Set to OIDS=FALSE(the default) so that rows do not have object identifiers assigned
to them. Greenplum strongly recommends that you do not enable OIDS when creating a
table. On large tables, such as those in a typical Greenplum Database system, using OIDs
for table rows can cause wrap-around of the 32-bit OID counter. Once the counter wraps
around, OIDs can no longer be assumed to be unique, which not only makes them useless to
user applications, but can also cause problems in the Greenplum Database system catalog
tables. In addition, excluding OIDs from a table reduces the space required to store the
table on disk by 4 bytes per row, slightly improving performance. OIDS are not allowed on
partitioned tables or append-optimized column-oriented tables.
ON COMMIT
The behavior of temporary tables at the end of a transaction block can be controlled using ON
COMMIT. The three options are:
PRESERVE ROWS- No special action is taken at the ends of transactions for temporary
tables. This is the default behavior.
DELETE ROWS- All rows in the temporary table will be deleted at the end of each
transaction block. Essentially, an automatic TRUNCATEis done at each commit.
DROP- The temporary table will be dropped at the end of the current transaction block.
TABLESPACE tablespace
The name of the tablespace in which the new table is to be created. If not specified, the
database's default tablespace is used.
USING INDEX TABLESPACE tablespace
This clause allows selection of the tablespace in which the index associated with a UNIQUEor
PRIMARY KEYconstraint will be created. If not specified, the database's default tablespace is
used.
DISTRIBUTED BY (column, [ ... ] )
DISTRIBUTED RANDOMLY
Used to declare the Greenplum Database distribution policy for the table. DISTIBUTED BY
uses hash distribution with one or more columns declared as the distribution key. For the
most even data distribution, the distribution key should be the primary key of the table or a
unique column (or set of columns). If that is not possible, then you may choose DISTRIBUTED
RANDOMLY, which will send the data round-robin to the segment instances.
The Greenplum Database server configuration parameter
gp_create_table_random_default_distributioncontrols the default table distribution
policy if the DISTRIBUTED BYclause is not specified when you create a table. Greenplum
Database follows these rules to create a table if a distribution policy is not specified.
If the value of the parameter is off(the default), Greenplum Database chooses the table
distribution key based on the command. If the LIKEor INHERITSclause is specified in table
creation command, the created table uses the same distribution key as the source or parent
table.
If the value of the parameter is set to on, Greenplum Database follows these rules:
• If PRIMARY KEYor UNIQUEcolumns are not specified, the distribution of the table is
random (DISTRIBUTED RANDOMLY). Table distribution is random even if the table creation
command contains the LIKEor INHERITSclause.
• If PRIMARY KEYor UNIQUEcolumns are specified, a DISTRIBUTED BYclause must also
be specified. If a DISTRIBUTED BYclause is not specified as part of the table creation
command, the command fails.
For information about the parameter, see "Server Configuration Parameters."
PARTITION BY
Declares one or more columns by which to partition the table.
When creating a partitioned table, Greenplum Database creates the root partitioned table (the
root partition) with the specified table name. Greenplum Database also creates a hierarchy
of tables, child tables, that are the subpartitions based on the partitioning options that you
specify. The Greenplum Database pg_partition* system views contain information about the
subpartition tables.
For each partition level (each hierarchy level of tables), a partitioned table can have a
maximum of 32,767 partitions.
Note: Greenplum Database stores partitioned table data in the leaf child tables,
the lowest-level tables in the hierarchy of child tables for use by the partitioned
table.
partition_type
Declares partition type: LIST(list of values) or RANGE(a numeric or date range).
partition_specification
Declares the individual partitions to create. Each partition can be defined individually
or, for range partitions, you can use the EVERYclause (with a STARTand optional END
clause) to define an increment pattern to use to create the individual partitions.
DEFAULT PARTITION name— Declares a default partition. When data does not
match to an existing partition, it is inserted into the default partition. Partition designs
that do not have a default partition will reject incoming rows that do not match to an
existing partition.
PARTITION name— Declares a name to use for the partition. Partitions are created
using the following naming convention: parentname_level#_prt_givenname.
VALUES— For list partitions, defines the value(s) that the partition will contain.
START— For range partitions, defines the starting range value for the partition. By
default, start values are INCLUSIVE. For example, if you declared a start date of
uses hash distribution with one or more columns declared as the distribution key. For the
most even data distribution, the distribution key should be the primary key of the table or a
unique column (or set of columns). If that is not possible, then you may choose DISTRIBUTED
RANDOMLY, which will send the data round-robin to the segment instances.
The Greenplum Database server configuration parameter
gp_create_table_random_default_distributioncontrols the default table distribution
policy if the DISTRIBUTED BYclause is not specified when you create a table. Greenplum
Database follows these rules to create a table if a distribution policy is not specified.
If the value of the parameter is off(the default), Greenplum Database chooses the table
distribution key based on the command. If the LIKEor INHERITSclause is specified in table
creation command, the created table uses the same distribution key as the source or parent
table.
If the value of the parameter is set to on, Greenplum Database follows these rules:
• If PRIMARY KEYor UNIQUEcolumns are not specified, the distribution of the table is
random (DISTRIBUTED RANDOMLY). Table distribution is random even if the table creation
command contains the LIKEor INHERITSclause.
• If PRIMARY KEYor UNIQUEcolumns are specified, a DISTRIBUTED BYclause must also
be specified. If a DISTRIBUTED BYclause is not specified as part of the table creation
command, the command fails.
For information about the parameter, see "Server Configuration Parameters."
PARTITION BY
Declares one or more columns by which to partition the table.
When creating a partitioned table, Greenplum Database creates the root partitioned table (the
root partition) with the specified table name. Greenplum Database also creates a hierarchy
of tables, child tables, that are the subpartitions based on the partitioning options that you
specify. The Greenplum Database pg_partition* system views contain information about the
subpartition tables.
For each partition level (each hierarchy level of tables), a partitioned table can have a
maximum of 32,767 partitions.
Note: Greenplum Database stores partitioned table data in the leaf child tables,
the lowest-level tables in the hierarchy of child tables for use by the partitioned
table.
partition_type
Declares partition type: LIST(list of values) or RANGE(a numeric or date range).
partition_specification
Declares the individual partitions to create. Each partition can be defined individually
or, for range partitions, you can use the EVERYclause (with a STARTand optional END
clause) to define an increment pattern to use to create the individual partitions.
DEFAULT PARTITION name— Declares a default partition. When data does not
match to an existing partition, it is inserted into the default partition. Partition designs
that do not have a default partition will reject incoming rows that do not match to an
existing partition.
PARTITION name— Declares a name to use for the partition. Partitions are created
using the following naming convention: parentname_level#_prt_givenname.
VALUES— For list partitions, defines the value(s) that the partition will contain.
START— For range partitions, defines the starting range value for the partition. By
default, start values are INCLUSIVE. For example, if you declared a start date of
'2008-01-01', then the partition would contain all dates greater than or equal to
'2008-01-01'. Typically the data type of the STARTexpression is the same type as
the partition key column. If that is not the case, then you must explicitly cast to the
intended data type.
END— For range partitions, defines the ending range value for the partition. By
default, end values are EXCLUSIVE. For example, if you declared an end date of
'2008-02-01', then the partition would contain all dates less than but not equal to
'2008-02-01'. Typically the data type of the ENDexpression is the same type as
the partition key column. If that is not the case, then you must explicitly cast to the
intended data type.
EVERY— For range partitions, defines how to increment the values from STARTto END
to create individual partitions. Typically the data type of the EVERYexpression is the
same type as the partition key column. If that is not the case, then you must explicitly
cast to the intended data type.
WITH — Sets the table storage options for a partition. For example, you may want
older partitions to be append-optimized tables and newer partitions to be regular
heap tables.
TABLESPACE— The name of the tablespace in which the partition is to be created.
SUBPARTITION BY
Declares one or more columns by which to subpartition the first-level partitions of the table.
The format of the subpartition specification is similar to that of a partition specification
described above.
SUBPARTITION TEMPLATE
Instead of declaring each subpartition definition individually for each partition, you can
optionally declare a subpartition template to be used to create the subpartitions (lower level
child tables). This subpartition specification would then apply to all parent partitions.
'2008-01-01'. Typically the data type of the STARTexpression is the same type as
the partition key column. If that is not the case, then you must explicitly cast to the
intended data type.
END— For range partitions, defines the ending range value for the partition. By
default, end values are EXCLUSIVE. For example, if you declared an end date of
'2008-02-01', then the partition would contain all dates less than but not equal to
'2008-02-01'. Typically the data type of the ENDexpression is the same type as
the partition key column. If that is not the case, then you must explicitly cast to the
intended data type.
EVERY— For range partitions, defines how to increment the values from STARTto END
to create individual partitions. Typically the data type of the EVERYexpression is the
same type as the partition key column. If that is not the case, then you must explicitly
cast to the intended data type.
WITH — Sets the table storage options for a partition. For example, you may want
older partitions to be append-optimized tables and newer partitions to be regular
heap tables.
TABLESPACE— The name of the tablespace in which the partition is to be created.
SUBPARTITION BY
Declares one or more columns by which to subpartition the first-level partitions of the table.
The format of the subpartition specification is similar to that of a partition specification
described above.
SUBPARTITION TEMPLATE
Instead of declaring each subpartition definition individually for each partition, you can
optionally declare a subpartition template to be used to create the subpartitions (lower level
child tables). This subpartition specification would then apply to all parent partitions.
相关推荐
有时候在django中使用postgresql和greenplum数据库时,使用django model的数据库同步命令时会有问题,生成不了对应的数据库表,无奈只有手工先创建数据库表,然后再根据表字段手动创建model,为了提高效率我写了一个...
权限同样可以授予或撤销对整个数据库的访问,如允许用户连接到特定数据库,创建表等。这通常在数据库初始化或调整访问策略时进行。 总之,Greenplum的权限管理系统提供了精细的控制,允许管理员根据需要定制每个...
在Greenplum中,创建表可以通过CREATE TABLE语句实现,并且可以指定数据类型、默认值,以及通过DISTRIBUTED BY子句来指定数据的分布键,这样有助于数据均匀分布在集群中,提高查询效率。例如,创建表时可以使用自增...
外部表和外表都可以访问存储在Greenplum数据库之外的数据源中 的数据,就好像数据存储在常规数据库表中一样。您可以从外部表 和外表读取和写入数据。 装载和卸载数据 这一节中的主题描述了Greenplum数据库中将...
例如,可以创建一个销售数据表,然后使用PLPython定义一个函数来返回查询结果中的特定数据。这样的操作减少了数据处理的复杂度,并且允许数据科学家在数据库内直接使用Python强大的数据处理能力。 Greenplum机器...
pgAdmin3提供了一个直观的界面,用于创建、修改和管理数据库对象,如表、视图、索引等。此外,它还支持SQL脚本执行,方便用户进行复杂的数据操作和查询。 - **命令行工具**:除了图形界面,Greenplum还提供了一系列...
数据导入前,需先启动`gpfdist`,创建与数据文本及表结构相匹配的外部表,再通过外部表将数据插入目标表中。例如,外部表的创建命令如下: ```sql CREATE EXTERNAL TABLE ext_expenses ( name text, date date, ...
通过使用Hadoop的HDFS作为外部表源,Greenplum可以查询存储在Hadoop中的大数据集,无需将数据移动到Greenplum内部。这实现了数据湖和分析平台的无缝连接,提供了实时分析能力。 在安装完成后,需要进行系统调优和...
Greenplum是一个高度并行的、大规模数据分析的MPP(大规模并行处理)数据库系统。本文将深入探讨如何获取和分析Greenplum中的元数据信息。 首先,我们要了解如何获取集群中数据库的信息。在Greenplum中,可以通过...
例如,创建新表后,应立即使用ANALYZE命令更新统计信息。通过EXPLAIN命令查看执行计划,可以直观地看到查询的执行路径,从而判断是否需要更新统计信息。 在实践中,我们应当定期执行ANALYZE,特别是在数据量大幅...
Greenplum支持分区和分片策略,将大表划分为更小的、易于管理的部分。分区可以按时间、地理位置或其他业务关键字段进行,而分片则将数据均匀分布到各个节点,提高查询效率。 **3. 并行查询优化器** Greenplum内置的...
-- 如果目标表不存在,可以创建并填充数据 CREATE TABLE target_schema.table_name AS SELECT * FROM source_schema.table_name; -- 如果目标表已存在,可直接插入数据 INSERT INTO target_schema.table_name ...
“第二部分.exe”可能是一个交互式教程或演示,让用户实际操作Greenplum的某些功能,例如创建表、执行复杂的SQL查询、管理分区,或者体验与Hadoop的集成。实践操作是理解Greenplum工作原理和性能优势的重要方式。 ...
- **多层分区**:可以通过创建分区表进一步优化数据分布。例如,按时间进行分区,可以显著提高查询效率。 #### 四、GreenPlum的存储方式 - **组合存储**:结合列存储和行存储的优点,根据不同的应用场景选择合适的...
3. 统计分析:收集并展示数据库的统计信息,如表大小、索引状态、分区信息等,便于优化数据库设计。 4. 用户管理:支持创建和管理多个用户,分配不同的访问权限,保障数据安全。 5. 日志查看:提供日志查看功能,...
- 选择所需的数据表或视图,Tableau将自动提取数据并在内存中进行缓存,以加快后续的分析过程。 - 使用Tableau的拖放功能轻松创建各种类型的图表和仪表板。 3. **高级分析功能**: - 利用Tableau的计算功能,...
- **管理对象权限**:提供了对数据库表、视图等对象进行授权的具体步骤。 - **模拟行级和列级访问控制**:介绍了如何实现更精细的数据访问控制策略。 - **数据加密**:介绍了加密敏感数据的重要性,并指导如何实施。...
- **创建加载数据的外部表**:如何创建外部表来使用gpfdist加载数据。 - **创建可写外部表**:如何创建可以插入数据的外部表。 - **杀掉gpfdist进程**:在必要时如何停止gpfdist进程。 以上知识点涵盖了...