`

Amazon Simple Store Service (S3)

阅读更多
Introduction
Amazon Simple Store Service (S3) is a service from Amazon that allows you to store files into reliable remote storage for a very competitive price; it is becoming very popular. S3 is used by companies to store photos and videos of their customers, back up their own data, and more. S3 provides both SOAP and REST APIs; this article focuses on using the S3 REST API with the Java programming language.

S3 Basics
S3 handles objects and buckets. An object matches to a stored file. Each object has an identifier, an owner, and permissions. Objects are stored in a bucket. A bucket has a unique name that must be compliant with internet domain naming rules. Once you have an AWS (Amazon Web Services) account, you can create up to 100 buckets associated with that account. An object is addressed by a URL, such as http://s3.amazonaws.com/bucketname/objectid. The object identifier is a filename or filename with relative path (e.g., myalbum/august/photo21.jpg). With this naming scheme, S3 storage can appear as a regular file system with folders and subfolders. Notice that the bucket name can also be the hostname in the URL, so your object could also be addressed by http://bucketname.s3.amazonaws.com/objectid.

S3 REST Security
S3 REST resources are secure. This is important not just for your own purposes, but also because customers are billed depending on how their S3 buckets and objects are used. An AWSSecretKey is assigned to each AWS customer, and this key is identified by an AWSAccessKeyID. The key must be kept secret and will be used to digitally sign REST requests. S3 security features are:

Authentication: Requests include AWSAccessKeyID
Authorization: Access Control List (ACL) could be applied to each resource
Integrity: Requests are digitally signed with AWSSecretKey
Confidentiality: S3 is available through both HTTP and HTTPS
Non repudiation: Requests are time stamped (with integrity, it's a proof of transaction)
The signing algorithm is HMAC/SHA1 (Hashing for Message Authentication with SHA1). Implementing a String signature in Java is done as follows:

private javax.crypto.spec.SecretKeySpec signingKey = null;
private javax.crypto.Mac mac = null;
...
// This method converts AWSSecretKey into crypto instance.
public void setKey(String AWSSecretKey) throws Exception
{
  mac = Mac.getInstance("HmacSHA1");
  byte[] keyBytes = AWSSecretKey.getBytes("UTF8");
  signingKey = new SecretKeySpec(keyBytes, "HmacSHA1");
  mac.init(signingKey);
}

// This method creates S3 signature for a given String.
public String sign(String data) throws Exception
{
  // Signed String must be BASE64 encoded.
  byte[] signBytes = mac.doFinal(data.getBytes("UTF8"));
  String signature = encodeBase64(signBytes);
  return signature;
}
...
Authentication and signature have to be passed into the Authorization HTTP header like this:

Authorization: AWS <AWSAccessKeyID>: <Signature>.
The signature must include the following information:

HTTP method name (PUT, GET, DELETE, etc.)
Content-MD5, if any
Content-Type, if any (e.g., text/plain)
Metadata headers, if any (e.g., "x-amz-acl" for ACL)
GMT timestamp of the request formatted as EEE, dd MMM yyyy HH:mm:ss
URI path such as /mybucket/myobjectid
Here is a sample of successful S3 REST request/response to create "onjava" bucket:

Request:
PUT /onjava HTTP/1.1
Content-Length: 0
User-Agent: jClientUpload
Host: s3.amazonaws.com
Date: Sun, 05 Aug 2007 15:33:59 GMT
Authorization: AWS 15B4D3461F177624206A:YFhSWKDg3qDnGbV7JCnkfdz/IHY=

Response:
HTTP/1.1 200 OK
x-amz-id-2: tILPE8NBqoQ2Xn9BaddGf/YlLCSiwrKP+OQOpbi5zazMQ3pC56KQgGk
x-amz-request-id: 676918167DFF7F8C
Date: Sun, 05 Aug 2007 15:30:28 GMT
Location: /onjava
Content-Length: 0
Server: AmazonS3
Notice the delay between request and response timestamp? The request Date has been issued after the response Date. This is because the response date is coming from the Amazon S3 server. If the difference from request to response timestamp is too high then a RequestTimeTooSkewed error is returned. This point is another important feature of S3 security; it isn't possible to roll your clock too far forward or back and make things appear to happen when they didn't.

Note: Thanks to ACL, an AWS user can grant read access to objects for anyone (anonymous). Then signing is not required and objects can be addressed (especially for download) with a browser. It means that S3 can also be used as hosting service to serve HTML pages, images, videos, applets; S3 even allows granting time-limited access to objects.

Creating a Bucket
The code below details the Java implementation of "onjava" S3 bucket creation. It relies on packages java.net for HTTP, java.text for date formatting and java.util for time stamping. All these packages are included in J2SE; no external library is needed to talk to the S3 REST interface. First, it generates the String to sign, then it instantiates the HTTP REST connection with the required headers. Finally, it issues the request to s3.amazonaws.com web server.

public void createBucket() throws Exception
{
  // S3 timestamp pattern.
  String fmt = "EEE, dd MMM yyyy HH:mm:ss ";
  SimpleDateFormat df = new SimpleDateFormat(fmt, Locale.US);
  df.setTimeZone(TimeZone.getTimeZone("GMT"));

  // Data needed for signature
  String method = "PUT";
  String contentMD5 = "";
  String contentType = "";
  String date = df.format(new Date()) + "GMT";
  String bucket = "/onjava";

  // Generate signature
  StringBuffer buf = new StringBuffer();
  buf.append(method).append("\n");
  buf.append(contentMD5).append("\n");
  buf.append(contentType).append("\n");
  buf.append(date).append("\n");
  buf.append(bucket);
  String signature = sign(buf.toString());

  // Connection to s3.amazonaws.com
  HttpURLConnection httpConn = null;
  URL url = new URL("http","s3.amazonaws.com",80,bucket);
  httpConn = (HttpURLConnection) url.openConnection();
  httpConn.setDoInput(true);
  httpConn.setDoOutput(true);
  httpConn.setUseCaches(false);
  httpConn.setDefaultUseCaches(false);
  httpConn.setAllowUserInteraction(true);
  httpConn.setRequestMethod(method);
  httpConn.setRequestProperty("Date", date);
  httpConn.setRequestProperty("Content-Length", "0");
  String AWSAuth = "AWS " + keyId + ":" + signature;
  httpConn.setRequestProperty("Authorization", AWSAuth);
  // Send the HTTP PUT request.
  int statusCode = httpConn.getResponseCode();
  if ((statusCode/100) != 2)
  {
    // Deal with S3 error stream.
    InputStream in = httpConn.getErrorStream();
    String errorStr = getS3ErrorCode(in);
    ...
  }
}
Dealing with REST Errors
Basically, all HTTP 2xx response status codes are success and others 3xx, 4xx, 5xx report some kind of error. Details of error message are available in the HTTP response body as an XML document. REST error responses are defined in S3 developer guide. For instance, an attempt to create a bucket that already exists will return:

HTTP/1.1 409 Conflict
x-amz-request-id: 64202856E5A76A9D
x-amz-id-2: cUKZpqUBR/RuwDVq+3vsO9mMNvdvlh+Xt1dEaW5MJZiL
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Sun, 05 Aug 2007 15:57:11 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>BucketAlreadyExists</Code>
  <Message>The named bucket you tried to create already exists</Message>
  <RequestId>64202856E5A76A9D</RequestId>
  <BucketName>awsdownloads</BucketName>
  <HostId>cUKZpqUBR/RuwDVq+3vsO9mMNvdvlh+Xt1dEaW5MJZiL</HostId>
</Error>
Code is the interesting value in the XML document. Generally, this can be displayed as an error message to the end user. It can be extracted by parsing the XML stream with SAXParserFactory, SAXParser and DefaultHandler classes from org.xml.sax and javax.xml.parsers packages. Basically, you instantiate a SAX parser, then implement the S3ErrorHandler that will filter for Code tag when notified by the SAX parser. Finally, return the S3 error code as String:

public String getS3ErrorCode(InputStream doc) throws Exception
{
  String code = null;
  SAXParserFactory parserfactory = SAXParserFactory.newInstance();
  parserfactory.setNamespaceAware(false);
  parserfactory.setValidating(false);
  SAXParser xmlparser = parserfactory.newSAXParser();
  S3ErrorHandler handler = new S3ErrorHandler();
  xmlparser.parse(doc, handler);
  code = handler.getErrorCode();
  return code;
}

// This inner class implements a SAX handler.
class S3ErrorHandler extends DefaultHandler
{
  private StringBuffer code = new StringBuffer();
  private boolean append = false;

  public void startElement(String uri, String ln, String qn, Attributes atts)
  {
    if (qn.equalsIgnoreCase("Code")) append = true;
  }
  public void endElement(String url, String ln, String qn)
  {
    if (qn.equalsIgnoreCase("Code")) append = false;
  }
  public void characters(char[] ch, int s, int length)
  {
    if (append) code.append(new String(ch, s, length));
  }

  public String getErrorCode()
  {
    return code.toString();
  }
}
A list of all error codes is provided in S3 developer guide. You're now able to create a bucket on Amazon S3 and deal with errors. Full source code is available in resources section.

File Uploading
Upload and download operations require more attention—S3 storage is unlimited, but it allows 5 GB transfer maximum per object. An optional content MD5 check is supported to make sure that transfer has not been corrupted, although an MD5 computation on a 5 GB file will take some time even on fast hardware.

S3 stores the uploaded object only if the transfer is successfully completed. If a network issue occurs then file has to be to uploaded again from the start. S3 doesn't support resuming or object content partial update. That's one of the limits of the first "S" (Simple) in S3, but the simplicity also makes dealing with the API much easier.

When performing a file transfer with S3, you will be responsible for streaming the objects. A good implementation will always stream objects, as otherwise they will grow in Java's heap; with S3's limit of 5 GB on an object, you could quickly be seeing an OutOfMemoryException.

An example of a good upload implementation is available in the resources section of this article.

Beyond This Example
Many other operations are available through the S3 APIs:

List buckets and objects
Delete buckets and objects
Upload and download objects
Add meta-data to objects
Apply permissions
Monitor traffic and get statistics (still a beta API)
Adding custom meta-data to an object is an interesting feature. For example, when uploading a video file, you could add "author," "title," and "location" properties, and retrieve them later when listing the objects. Getting statistics (IP address, referrer, bytes transferred, time to process, etc.) on buckets could be useful too to monitor traffic.

Conclusion
This article introduced the basics of Amazon Simple Store Service REST API. It detailed how to implement bucket creation in Java and how to deal with S3 security principles. It showed that HTTP and XML skills are needed when developing with S3 REST API. Some S3 operations could be improved (especially for upload), but overall Amazon S3 rocks. To go beyond what was presented in this article, you could check Java S3 tools available in the resources section.

References and Resources
Source code: Source code for this article
SOAP: Simple Object Access Protocol
REST: REpresentational State Transfer
S3 APIs: Amazon S3 Developer Guide
HMAC: Keyed-Hashing for Message Authentication (RFC 2104)
S3 forum: S3 forum for developers
S3 upload applet: A Java applet to upload files and folders to S3
Java S3 toolkit: An S3 toolkit for J2SE and J2ME provided by Amazon
Jets3t: Another Java toolkit for S3

分享到:
评论

相关推荐

    s3-simple-blob-store:Amazon S3抽象blob商店

    s3-simple-blob-store Amazon S3从改编,以支持读取流上的字节范围请求。安装用npm安装$ npm install s3-simple-blob-store例子var aws = require ( 'aws-sdk' ) ;var s3blobs = require ( 's3-blob-store' ) ;var ...

    亚马逊云计算介绍.pptx

    在存储服务中,Simple Storage Service (S3) 是一个可扩展的、低成本的对象存储服务,适合存储各种类型的数据。S3采用了最终一致性模型,保证了高可用性和低延迟。Elastic Block Store (EBS) 则提供块级别的存储,专...

    aws-csa-notes-2019-master.zip

    2. 存储:Amazon Simple Storage Service (S3) 是一个高度可用的云存储服务,适合存放各种数据。Amazon Elastic Block Store (EBS) 提供块级别的存储,常与EC2实例配合使用。Amazon Glacier 则提供长期归档存储。 3...

    Amazon的体系结构1.rar

    2. **存储服务**:Amazon S3(Simple Storage Service)是高度可靠和可扩展的对象存储服务,用于存储和检索任何量的数据。Amazon EBS(Elastic Block Store)则提供块级存储卷,支持EC2实例的持久化数据存储。 3. *...

    vertx-s3ConfigReader:从s3读取configFile

    `vertx-s3ConfigReader` 是一个基于Java开发的库,专为Vert.x框架设计,用于从Amazon S3(Simple Storage Service)服务中读取配置文件。这个库提供了一个方便的方式来管理应用程序的配置,特别是在分布式系统中,...

    Amazon云计算.ppt

    2. 简单存储服务S3(Simple Storage Service):提供无限的存储空间,适用于存储各种类型的数据。S3采用非传统的关系数据库模式,以对象和桶的形式存储数据,保证数据的冗余和最终一致性。 3. 简单数据库服务Simple ...

    Laravel开发-laravel-simple-uploader

    例如,如果你想使用Amazon S3,你需要配置AWS的凭证并设置'default'为's3'。 2. **表单提交**: 创建一个HTML表单,包含一个`&lt;input type="file"&gt;`字段,确保表单的`enctype`属性设置为`multipart/form-data`。这...

    云计算第三版精品课程配套PPT课件含习题(31页)第3章 Amazon 云计算第三版 AWS(五).rar

    2. **Amazon Simple Storage Service (S3)** - S3是AWS的对象存储服务,用于存放各种类型的数据,如图片、视频、文档等。它提供了高可用性和持久性,适合存储大量非结构化数据。 3. **Amazon Virtual Private Cloud...

    AWS云平台Visio模具

    2. **存储服务**:涵盖Amazon Simple Storage Service (S3) - 对象存储;Amazon Elastic Block Store (EBS) - 针对EC2实例的块存储;及Amazon Glacier - 长期冷存储。 3. **数据库服务**:如Amazon Relational ...

    云计算第三版精品课程配套PPT课件含习题(31页)第3章 Amazon 云计算第三版 AWS(四).rar

    存储服务可能包括Amazon Simple Storage Service (S3)——用于对象存储的高可用、持久且安全的解决方案,以及Amazon Elastic Block Store (EBS)——为EC2实例提供块级别存储。 在数据库部分,课程可能会讨论Amazon ...

    Pro PowerShell for Amazon Web Services DevOps for the AWS Cloud

    Use Simple Storage Service (S3) to reliably store and archive data Control access to resources using Identity and Access Management (IAM) Who this book is for Pro PowerShell for Amazon Web Services is...

    Manning.Amazon.Web.Services.in.Action 高清版

    3. 存储与内容分发:涵盖S3(Simple Storage Service)用于对象存储,EBS(Elastic Block Store)用于块级存储,以及CloudFront作为全球内容分发网络。此外,还包括Glacier和Snowball等冷存储解决方案。 4. 数据库...

    善用生命周期管理,节省成本又增效.pdf

    首先,AWS的存储服务包括了多种类型,如Amazon EFS(Elastic File System)、Amazon EBS(Elastic Block Store)、Amazon S3(Simple Storage Service)和Amazon Glacier等。这些服务分别适用于不同的使用场景,例如...

    AWS存储服务新功能概述.pdf

    首先,Amazon Simple Storage Service (Amazon S3) 是AWS的标志性对象存储服务,用于持久化非结构化数据如图片、文档和备份。新功能可能包括增强的S3 VPC(虚拟私有云)集成,提供更安全的数据传输通道,并且通过S3 ...

    AWS Certified Solutions Architect-Associate SAA考试题库ExamShoot公开版

    - **选项D**: 从每个站点上传数据到最近区域的Amazon EC2实例中,将数据存储在Amazon Elastic Block Store (EBS)卷中,定期对EBS卷进行快照并复制到包含目标S3存储桶的区域,然后在该区域恢复EBS卷。 - **优点**: ...

    GOOGLE云计算与AMAZON云计算对比.pdf

    - **S3 (Simple Storage Service)** 是AWS的存储服务,建立在Dynamo之上,提供对象存储解决方案,适合存放静态文件或备份数据。 - **SimpleDB** 是一个非关系型数据库服务,可能与Dynamo技术有关联,用于轻量级...

    云计算引领互联网新时代.pdf

    首先,亚马逊是云计算领域的先行者,其Amazon Web Services (AWS) 包括了Simple Storage Service (S3)、Elastic Compute Cloud (EC2)、Elastic Block Store (EBS) 和Simple Queuing Service (SQS)等核心服务。...

Global site tag (gtag.js) - Google Analytics