AMAZON Redshift(1)Introduction

sillycat

浏览: 2563645 次
性别:
来自: 成都

最近访客更多访客>>

huageng520

learnmore

u012363178

ymgjava

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Summary

AMAZON Redshift(1)Introduction

Python is well used here with SQL.
Normal SQL
select regex_replace(url, ‘(https?)://([^@]*@)?([^:/]*)([/:].*)$)’, ‘\3’) FROM table;

===>
Python and SQL
create function f_hostname(url VARCHAR) returns archer Immutable as
$$ import url parse.urlparse(url).hostname $$
LANGUAGE plpython;

select f_hostname(url) FROM table;

NumPy SciPy: math tool
Pandas: SQL operation on top of SciPy and NumPy
Dateutil and Pytz: Date and Timezone

http://www.numpy.org/

http://scipy.org/about.html

http://pandas.pydata.org/

https://dateutil.readthedocs.org/en/latest/

https://pypi.python.org/pypi/pytz/

Data Warehouse System Architecture
http://docs.aws.amazon.com/zh_cn/redshift/latest/dg/c_high_level_system_architecture.html

Industry-standard PostgreSQL JDBC and ODBC driver.

Leader node —> compile codes and distribute the compiled code to the compute nodes, assigns a portion of the data to each compute node

Compute nodes —> 160 GB node

Load data from S3 into Redshift
http://docs.aws.amazon.com/zh_cn/redshift/latest/dg/t_Loading-data-from-S3.html

Copy Command to Load the Data
copy <table_name> from ‘s3://<bucket_name>/<object_prefix>'
credentials ‘<aws-auth-args>’;
http://docs.aws.amazon.com/zh_cn/redshift/latest/dg/t_loading-tables-from-s3.html

http://docs.aws.amazon.com/zh_cn/datapipeline/latest/DeveloperGuide/dp-copydata-redshift.html

Work on the DB
http://docs.aws.amazon.com/zh_cn/redshift/latest/dg/t_deleting_redshift_user_cmd.html

How to Design the Table
http://docs.aws.amazon.com/zh_cn/redshift/latest/dg/c_designing-tables-best-practices.html

http://docs.aws.amazon.com/zh_cn/redshift/latest/dg/t_Creating_tables.html

How to Load Data
http://docs.aws.amazon.com/zh_cn/redshift/latest/dg/c_loading-data-best-practices.html

How to Query Data
http://docs.aws.amazon.com/zh_cn/redshift/latest/dg/c_designing-queries-best-practices.html

DataBase Admin’s Command
http://docs.aws.amazon.com/zh_cn/redshift/latest/dg/t_querying_redshift_system_tables.html

Table Design
If recent data is queried most frequently, specify the timestamp column as the leading column for the sort key. - timestamp

If you do frequent range filtering or equality filtering on one column, specify that column as the sort key. - range or equality

If you frequently join a table, specify the join column as both the sort key and the distribution key.

References:
http://docs.aws.amazon.com/zh_cn/redshift/latest/dg/c_redshift_system_overview.html

https://aws.amazon.com/cn/documentation/redshift/

分享到：

Scala XML Reader and Loop Elements | AMAZON Kinesis(1)Introduction

2016-04-22 06:16
浏览 589
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论