import java.util.regex.Matcher;
import java.util.regex.Pattern;
Pattern pattern = Pattern.compile("[-\\w\\.]+@[-\\w\\.]+");
// s是待处理的文本
Matcher m = pattern.matcher(s);
while (m.find()) {
String email = s.substring(m.start(), m.end());
log.debug("email found " + email);
email的规范格式,在[RFC 2822] Internet Message Format中有详细说明。
http://tools.ietf.org/html/rfc2822#section-3.4.1 写道
3.4.1. Addr-spec specification
An addr-spec is a specific Internet identifier that contains a
locally interpreted string followed by the at-sign character ("@",
ASCII value 64) followed by an Internet domain. The locally
interpreted string is either a quoted-string or a dot-atom. If the
string can be represented as a dot-atom (that is, it contains no
characters other than atext characters or "." surrounded by atext
characters), then the dot-atom form SHOULD be used and the
quoted-string form SHOULD NOT be used. Comments and folding white
space SHOULD NOT be used around the "@" in the addr-spec.
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
domain-literal = [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]
dcontent = dtext / quoted-pair
dtext = NO-WS-CTL / ; Non white space controls
%d33-90 / ; The rest of the US-ASCII
%d94-126 ; characters not including "[",
; "]", or "\"
The domain portion identifies the point to which the mail is
delivered. In the dot-atom form, this is interpreted as an Internet
domain name (either a host name or a mail exchanger name) as
described in [STD3, STD13, STD14]. In the domain-literal form, the
domain is interpreted as the literal Internet address of the
particular host. In both cases, how addressing is used and how
messages are transported to a particular host is covered in the mail
transport document [RFC2821]. These mechanisms are outside of the
scope of this document.
The local-part portion is a domain dependent string. In addresses,
it is simply interpreted on the particular host as a name of a
particular mailbox.
大名鼎鼎的 jquery validation 中是这样来写email的正则表达式的
// http://docs.jquery.com/Plugins/Validation/Methods/email
email: function(value, element) {
// contributed by Scott Gonzalez: http://projects.scottsplayground.com/email_address_validation/
return this.optional(element) || /^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_
的确够复杂,其中的字符还可以包括很多unicode字符。代码的注释中,有一行提供了一个网址:http://projects.scottsplayground.com/email_address_validation/ ,访问之后找到了一个js代码,很详细的写了email的各个构成部分的验证,是根据 RFC 2822 来的。
// 以下代码来自 http://projects.scottsplayground.com/email_address_validation/lib/email.js
* Email addresses
* Definitions from (unless otherwise noted):
* - [RFC 2822] Internet Message Format
// internationalized
var text =
"(" +
"[" +
"\\x01-\\x09" +
"\\x0b" +
"\\x0c" +
"\\x0d-\\x7f" +
"]" +
"|" + ucschar +
"[" +
"\\x01-\\x08" +
"\\x0b" +
"\\x0c" +
"\\x0e-\\x1f" +
"\\x7f" +
// section 2.2.2 Header Fields
var SP = "\\x20";
var HTAB = "\\x09";
var WSP =
"(" +
SP +
"|" + HTAB +
// section 2.1 General Description
var CRLF = "(\\x0d\\x0a)";
// removed obsolete folding white space (obs-FWS)
var FWS =
"(" +
"(" +
WSP + "*" +
")?" +
WSP + "+" +
var DQUOTE = "(\\x22)";
// internationalized
var qtext =
"(" +
"|\\x21" +
"|[\\x23-\\x5b]" +
"|[\\x5d-\\x7e]" +
"|" + ucschar +
// removed obsolete quoted pair (obs-qp)
var quotedPair =
"(" +
"\\\\" +
text +
var qcontent =
"(" +
qtext +
"|" + quotedPair +
// removed comments and folding white space (CFWS)
var quotedString =
"(" +
"(" +
FWS + "?" +
qcontent +
")*" +
FWS + "?" +
// created from symbols in atext
var atextSymbols = "[!#\\$%&'\\*\\+\\-\\/=\\?\\^_`{\\|}~]";
// internationalized
var atext =
"(" +
"|" + DIGIT +
"|" + atextSymbols +
"|" + ucschar +
var dotAtomText =
"(" +
atext + "+" +
"(" +
"\\." +
atext + "+" +
")*" +
// removed comments and folding white space (CFWS)
var dotAtom = dotAtomText;
// removed comments and folding white space (CFWS)
var atom = atext + "+";
// ihostName from iri.http.js (http://projects.scottsplayground.com/iri)
var domain = ihostName;
// removed obsolete local part (obs-local-part)
var localPart =
"(" +
dotAtom +
"|" + quotedString +
var addrSpec = localPart + "@" + domain;
另外一个很好的说明文档是 http://www.markussipila.info/pub/emailvalidator.php
// define a regular expression for "normal" addresses
$normal = "^[a-z0-9_\+-]+(\.[a-z0-9_\+-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*\.([a-z]{2,4})$";
// define a regular expression for "strange looking" but syntactically valid addresses
$validButRare = "^[a-z0-9,!#\$%&'\*\+/=\?\^_`\{\|}~-]+(\.[a-z0-9,!#\$%&'\*\+/=\?\^_`\{\|}~-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*\.([a-z]{2,})$";
if (eregi($normal, $email)) {
echo("The address $email is valid and looks normal.");
} else if (eregi($validButRare, $email)) {
echo("The address $email looks a bit strange but it is syntactically valid. You might want to check it for typos.");
} else {
echo("The address $email is not valid.");
