`
lan13217
  • 浏览: 496209 次
  • 性别: Icon_minigender_1
社区版块
存档分类
最新评论

转义字符处理类

    博客分类:
  • Java
 
阅读更多
http://www.javapractices.com/topic/TopicAction.do?Id=96
import java.net.URLEncoder;
import java.io.UnsupportedEncodingException;
import java.text.CharacterIterator;
import java.text.StringCharacterIterator;
import java.util.regex.Pattern;
import java.util.regex.Matcher;


/**
 Convenience methods for escaping special characters related to HTML, XML, 
 and regular expressions.
 
 <P>To keep you safe by default, WEB4J goes to some effort to escape 
 characters in your data when appropriate, such that you <em>usually</em>
 don't need to think too much about escaping special characters. Thus, you
  shouldn't need to <em>directly</em> use the services of this class very often. 
 
 <P><span class='highlight'>For Model Objects containing free form user input, 
 it is highly recommended that you use {@link SafeText}, not <tt>String</tt></span>.
 Free form user input is open to malicious use, such as
 <a href='http://www.owasp.org/index.php/Cross_Site_Scripting'>Cross Site Scripting</a>
 attacks. 
 Using <tt>SafeText</tt> will protect you from such attacks, by always escaping 
 special characters automatically in its <tt>toString()</tt> method.   
 
 <P>The following WEB4J classes will automatically escape special characters 
 for you, when needed : 
 <ul>
 <li>the {@link SafeText} class, used as a building block class for your 
 application's Model Objects, for modeling all free form user input
 <li>the {@link Populate} tag used with forms
 <li>the {@link Report} class used for creating quick reports
 <li>the {@link Text}, {@link TextFlow}, and {@link Tooltips} custom tags used 
 for translation
 </ul> 
*/
public final class EscapeChars {

  /**
    Escape characters for text appearing in HTML markup.
    
    <P>This method exists as a defence against Cross Site Scripting (XSS) hacks.
    The idea is to neutralize control characters commonly used by scripts, such that
    they will not be executed by the browser. This is done by replacing the control
    characters with their escaped equivalents.  
    See {@link hirondelle.web4j.security.SafeText} as well.
    
    <P>The following characters are replaced with corresponding 
    HTML character entities :
    <table border='1' cellpadding='3' cellspacing='0'>
    <tr><th> Character </th><th>Replacement</th></tr>
    <tr><td> < </td><td> &lt; </td></tr>
    <tr><td> > </td><td> &gt; </td></tr>
    <tr><td> & </td><td> &amp; </td></tr>
    <tr><td> " </td><td> &quot;</td></tr>
    <tr><td> \t </td><td> &#009;</td></tr>
    <tr><td> ! </td><td> &#033;</td></tr>
    <tr><td> # </td><td> &#035;</td></tr>
    <tr><td> $ </td><td> &#036;</td></tr>
    <tr><td> % </td><td> &#037;</td></tr>
    <tr><td> ' </td><td> &#039;</td></tr>
    <tr><td> ( </td><td> &#040;</td></tr> 
    <tr><td> ) </td><td> &#041;</td></tr>
    <tr><td> * </td><td> &#042;</td></tr>
    <tr><td> + </td><td> &#043; </td></tr>
    <tr><td> , </td><td> &#044; </td></tr>
    <tr><td> - </td><td> &#045; </td></tr>
    <tr><td> . </td><td> &#046; </td></tr>
    <tr><td> / </td><td> &#047; </td></tr>
    <tr><td> : </td><td> &#058;</td></tr>
    <tr><td> ; </td><td> &#059;</td></tr>
    <tr><td> = </td><td> &#061;</td></tr>
    <tr><td> ? </td><td> &#063;</td></tr>
    <tr><td> @ </td><td> &#064;</td></tr>
    <tr><td> [ </td><td> &#091;</td></tr>
    <tr><td> \ </td><td> &#092;</td></tr>
    <tr><td> ] </td><td> &#093;</td></tr>
    <tr><td> ^ </td><td> &#094;</td></tr>
    <tr><td> _ </td><td> &#095;</td></tr>
    <tr><td> ` </td><td> &#096;</td></tr>
    <tr><td> { </td><td> &#123;</td></tr>
    <tr><td> | </td><td> &#124;</td></tr>
    <tr><td> } </td><td> &#125;</td></tr>
    <tr><td> ~ </td><td> &#126;</td></tr>
    </table>
    
    <P>Note that JSTL's {@code <c:out>} escapes <em>only the first 
    five</em> of the above characters.
   */
   public static String forHTML(String aText){
     final StringBuilder result = new StringBuilder();
     final StringCharacterIterator iterator = new StringCharacterIterator(aText);
     char character =  iterator.current();
     while (character != CharacterIterator.DONE ){
       if (character == '<') {
         result.append("&lt;");
       }
       else if (character == '>') {
         result.append("&gt;");
       }
       else if (character == '&') {
         result.append("&amp;");
      }
       else if (character == '\"') {
         result.append("&quot;");
       }
       else if (character == '\t') {
         addCharEntity(9, result);
       }
       else if (character == '!') {
         addCharEntity(33, result);
       }
       else if (character == '#') {
         addCharEntity(35, result);
       }
       else if (character == '$') {
         addCharEntity(36, result);
       }
       else if (character == '%') {
         addCharEntity(37, result);
       }
       else if (character == '\'') {
         addCharEntity(39, result);
       }
       else if (character == '(') {
         addCharEntity(40, result);
       }
       else if (character == ')') {
         addCharEntity(41, result);
       }
       else if (character == '*') {
         addCharEntity(42, result);
       }
       else if (character == '+') {
         addCharEntity(43, result);
       }
       else if (character == ',') {
         addCharEntity(44, result);
       }
       else if (character == '-') {
         addCharEntity(45, result);
       }
       else if (character == '.') {
         addCharEntity(46, result);
       }
       else if (character == '/') {
         addCharEntity(47, result);
       }
       else if (character == ':') {
         addCharEntity(58, result);
       }
       else if (character == ';') {
         addCharEntity(59, result);
       }
       else if (character == '=') {
         addCharEntity(61, result);
       }
       else if (character == '?') {
         addCharEntity(63, result);
       }
       else if (character == '@') {
         addCharEntity(64, result);
       }
       else if (character == '[') {
         addCharEntity(91, result);
       }
       else if (character == '\\') {
         addCharEntity(92, result);
       }
       else if (character == ']') {
         addCharEntity(93, result);
       }
       else if (character == '^') {
         addCharEntity(94, result);
       }
       else if (character == '_') {
         addCharEntity(95, result);
       }
       else if (character == '`') {
         addCharEntity(96, result);
       }
       else if (character == '{') {
         addCharEntity(123, result);
       }
       else if (character == '|') {
         addCharEntity(124, result);
       }
       else if (character == '}') {
         addCharEntity(125, result);
       }
       else if (character == '~') {
         addCharEntity(126, result);
       }
       else {
         //the char is not a special one
         //add it to the result as is
         result.append(character);
       }
       character = iterator.next();
     }
     return result.toString();
  }
  

  /**
   Escape all ampersand characters in a URL. 
    
   <P>Replaces all <tt>'&'</tt> characters with <tt>'&amp;'</tt>.
   
  <P>An ampersand character may appear in the query string of a URL.
   The ampersand character is indeed valid in a URL.
   <em>However, URLs usually appear as an <tt>HREF</tt> attribute, and 
   such attributes have the additional constraint that ampersands 
   must be escaped.</em>
   
   <P>The JSTL <c:url> tag does indeed perform proper URL encoding of 
   query parameters. But it does not, in general, produce text which 
   is valid as an <tt>HREF</tt> attribute, simply because it does 
   not escape the ampersand character. This is a nuisance when 
   multiple query parameters appear in the URL, since it requires a little 
   extra work.
  */
  public static String forHrefAmpersand(String aURL){
    return aURL.replace("&", "&amp;");
  }
   
  /**
    Synonym for <tt>URLEncoder.encode(String, "UTF-8")</tt>.
   
    <P>Used to ensure that HTTP query strings are in proper form, by escaping
    special characters such as spaces.
   
    <P>It is important to note that if a query string appears in an <tt>HREF</tt>
    attribute, then there are two issues - ensuring the query string is valid HTTP
    (it is URL-encoded), and ensuring it is valid HTML (ensuring the 
    ampersand is escaped).
   */
   public static String forURL(String aURLFragment){
     String result = null;
     try {
       result = URLEncoder.encode(aURLFragment, "UTF-8");
     }
     catch (UnsupportedEncodingException ex){
       throw new RuntimeException("UTF-8 not supported", ex);
     }
     return result;
   }

  /**
   Escape characters for text appearing as XML data, between tags.
   
   <P>The following characters are replaced with corresponding character entities :
   <table border='1' cellpadding='3' cellspacing='0'>
   <tr><th> Character </th><th> Encoding </th></tr>
   <tr><td> < </td><td> &lt; </td></tr>
   <tr><td> > </td><td> &gt; </td></tr>
   <tr><td> & </td><td> &amp; </td></tr>
   <tr><td> " </td><td> &quot;</td></tr>
   <tr><td> ' </td><td> &#039;</td></tr>
   </table>
   
   <P>Note that JSTL's {@code <c:out>} escapes the exact same set of 
   characters as this method. <span class='highlight'>That is, {@code <c:out>}
    is good for escaping to produce valid XML, but not for producing safe 
    HTML.</span>
  */
  public static String forXML(String aText){
    final StringBuilder result = new StringBuilder();
    final StringCharacterIterator iterator = new StringCharacterIterator(aText);
    char character =  iterator.current();
    while (character != CharacterIterator.DONE ){
      if (character == '<') {
        result.append("&lt;");
      }
      else if (character == '>') {
        result.append("&gt;");
      }
      else if (character == '\"') {
        result.append("&quot;");
      }
      else if (character == '\'') {
        result.append("&#039;");
      }
      else if (character == '&') {
         result.append("&amp;");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();
  }
  
  /**
   Escapes characters for text appearing as data in the 
   <a href='http://www.json.org/'>Javascript Object Notation</a>
   (JSON) data interchange format.
   
   <P>The following commonly used control characters are escaped :
   <table border='1' cellpadding='3' cellspacing='0'>
   <tr><th> Character </th><th> Escaped As </th></tr>
   <tr><td> " </td><td> \" </td></tr>
   <tr><td> \ </td><td> \\ </td></tr>
   <tr><td> / </td><td> \/ </td></tr>
   <tr><td> back space </td><td> \b </td></tr> 
   <tr><td> form feed </td><td> \f </td></tr>
   <tr><td> line feed </td><td> \n </td></tr>
   <tr><td> carriage return </td><td> \r </td></tr>
   <tr><td> tab </td><td> \t </td></tr>
   </table>
   
   <P>See <a href='http://www.ietf.org/rfc/rfc4627.txt'>RFC 4627</a> for more information.
  */
  public static String forJSON(String aText){
    final StringBuilder result = new StringBuilder();
    StringCharacterIterator iterator = new StringCharacterIterator(aText);
    char character = iterator.current();
    while (character != StringCharacterIterator.DONE){
      if( character == '\"' ){
        result.append("\\\"");
      }
      else if(character == '\\'){
        result.append("\\\\");
      }
      else if(character == '/'){
        result.append("\\/");
      }
      else if(character == '\b'){
        result.append("\\b");
      }
      else if(character == '\f'){
        result.append("\\f");
      }
      else if(character == '\n'){
        result.append("\\n");
      }
      else if(character == '\r'){
        result.append("\\r");
      }
      else if(character == '\t'){
        result.append("\\t");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();    
  }

  /**
   Return <tt>aText</tt> with all <tt>'<'</tt> and <tt>'>'</tt> characters
   replaced by their escaped equivalents.
  */
  public static String toDisableTags(String aText){
    final StringBuilder result = new StringBuilder();
    final StringCharacterIterator iterator = new StringCharacterIterator(aText);
    char character =  iterator.current();
    while (character != CharacterIterator.DONE ){
      if (character == '<') {
        result.append("&lt;");
      }
      else if (character == '>') {
        result.append("&gt;");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();
  }
  

  /**
   Replace characters having special meaning in regular expressions
   with their escaped equivalents, preceded by a '\' character.
  
   <P>The escaped characters include :
  <ul>
  <li>.
  <li>\
  <li>?, * , and +
  <li>&
  <li>:
  <li>{ and }
  <li>[ and ]
  <li>( and )
  <li>^ and $
  </ul>
  */
  public static String forRegex(String aRegexFragment){
    final StringBuilder result = new StringBuilder();

    final StringCharacterIterator iterator = 
      new StringCharacterIterator(aRegexFragment)
    ;
    char character =  iterator.current();
    while (character != CharacterIterator.DONE ){
      /*
       All literals need to have backslashes doubled.
      */
      if (character == '.') {
        result.append("\\.");
      }
      else if (character == '\\') {
        result.append("\\\\");
      }
      else if (character == '?') {
        result.append("\\?");
      }
      else if (character == '*') {
        result.append("\\*");
      }
      else if (character == '+') {
        result.append("\\+");
      }
      else if (character == '&') {
        result.append("\\&");
      }
      else if (character == ':') {
        result.append("\\:");
      }
      else if (character == '{') {
        result.append("\\{");
      }
      else if (character == '}') {
        result.append("\\}");
      }
      else if (character == '[') {
        result.append("\\[");
      }
      else if (character == ']') {
        result.append("\\]");
      }
      else if (character == '(') {
        result.append("\\(");
      }
      else if (character == ')') {
        result.append("\\)");
      }
      else if (character == '^') {
        result.append("\\^");
      }
      else if (character == '$') {
        result.append("\\$");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();
  }
  
  /**
   Escape <tt>'$'</tt> and <tt>'\'</tt> characters in replacement strings.
   
   <P>Synonym for <tt>Matcher.quoteReplacement(String)</tt>.
   
   <P>The following methods use replacement strings which treat 
   <tt>'$'</tt> and <tt>'\'</tt> as special characters:
   <ul>
   <li><tt>String.replaceAll(String, String)</tt>
   <li><tt>String.replaceFirst(String, String)</tt>
   <li><tt>Matcher.appendReplacement(StringBuffer, String)</tt>
   </ul>
   
   <P>If replacement text can contain arbitrary characters, then you 
   will usually need to escape that text, to ensure special characters 
   are interpreted literally.
  */
  public static String forReplacementString(String aInput){
    return Matcher.quoteReplacement(aInput);
  }
  
  /**
   Disable all <tt><SCRIPT></tt> tags in <tt>aText</tt>.
   
   <P>Insensitive to case.
  */  
  public static String forScriptTagsOnly(String aText){
    String result = null;
    Matcher matcher = SCRIPT.matcher(aText);
    result = matcher.replaceAll("&lt;SCRIPT>");
    matcher = SCRIPT_END.matcher(result);
    result = matcher.replaceAll("&lt;/SCRIPT>");
    return result;
  }
  
  // PRIVATE //
  
  private EscapeChars(){
    //empty - prevent construction
  }
  
  private static final Pattern SCRIPT = Pattern.compile(
    "<SCRIPT>", Pattern.CASE_INSENSITIVE
   );
  private static final Pattern SCRIPT_END = Pattern.compile(
    "</SCRIPT>", Pattern.CASE_INSENSITIVE
  );
  
  private static void addCharEntity(Integer aIdx, StringBuilder aBuilder){
    String padding = "";
    if( aIdx <= 9 ){
       padding = "00";
    }
    else if( aIdx <= 99 ){
      padding = "0";
    }
    else {
      //no prefix
    }
    String number = padding + aIdx.toString();
    aBuilder.append("&#" + number + ";");
  }
}
分享到:
评论

相关推荐

    Mybatis在Mapper.xml文件中的转义字符处理方式.pdf

    为了解决这个问题,Mybatis提供了一些处理转义字符的方式,以确保SQL语句能够正确地写入Mapper.xml文件中并被解析。 在Mapper.xml文件中处理转义字符的常见方法主要有以下几种: 1. 常用的XML转义字符写法 在XML中...

    常用HTML转义字符,html转义符,JavaScript转义符,html转义字符表,HTML语言特殊字符对照表(ISO Latin-1字符集) - 来源:嘻嘻网 114_xixik_com_files

    在描述中提到的“常用HTML转义字符”和“html转义符”就是指这类字符实体。 HTML转义字符分为预定义字符实体和数字字符实体两种类型。预定义字符实体包括常见的如"&amp;"(&)、"&lt;"(&lt;)、"&gt;"(&gt;)、"&quot;...

    C语言转义字符大全

    1. **转义字符的字母敏感性**:转义字符中的字母必须为小写,且每个转义字符被视为单一字符处理。 2. **控制字符的影响**:像`\v`和`\f`这样的控制字符对屏幕显示无直接影响,但在打印设备上会产生特定效果,如页面...

    C++第4课转义字符

    通过学习和理解C++中的转义字符,开发者可以更灵活地控制文本的显示和处理,从而编写出更加高效和可读的代码。在实际编程中,熟练运用转义字符能够帮助我们解决许多看似复杂的问题,提升程序的可维护性和用户体验。

    C#常用的转义字符

    在C#编程语言中,转义字符是非常重要的概念之一,它们可以帮助开发者在字符串中插入特殊字符或者控制字符,从而实现更丰富的文本处理功能。本文将详细介绍C#中常见的转义字符及其用法,并通过具体的示例来帮助读者更...

    HTML和XML中的转义字符

    这两个符号常被用作HTML标签的起始与结束符号,因此需要使用转义字符来确保它们作为普通文本而非标签的一部分被处理。 3. **`&quot;`**:代表双引号(")。当双引号出现在属性值中时,必须使用`&quot;`,以免被误...

    转义字符.txt

    在Java中,转义字符主要分为以下几类: 1. **八进制转义字符**:以反斜杠`\`开头,后跟1到3位的八进制数字,表示ASCII码中的某个字符。例如,`\0`代表空字符(ASCII码为0),而`\13`则表示十进制中的9,即ASCII码表...

    常见转义字符 Java

    在编程语言中,转义字符是一类特殊的字符序列,用于表示那些在文本中无法直接表示或者具有特殊含义的字符。Java作为一种广泛使用的面向对象编程语言,支持多种转义字符,这些字符在字符串处理、文件操作等方面扮演着...

    C语言常用转义字符表

    这种形式的转义字符提供了更灵活的方式来表示任何字符,尤其是在处理非标准或不可见字符时。 在使用转义字符时,需要注意以下几点: - 转义字符中的字母必须是小写的。 - \v和\f虽然在屏幕上没有视觉效果,但会...

    gson转义字符

    本文将围绕“gson转义字符”这一主题展开,深入探讨如何避免Gson在使用时自动将一些字符转为Unicode转义字符。 ### 一、问题背景 在Java项目中,我们经常会使用Gson来处理JSON格式的数据。例如,我们需要将一个...

    JAVA转义字符

    Java支持多种类型的一般转义字符,这些字符主要用于处理文本中的特殊符号: - **单引号 `\'`**:表示单引号字符。 - **双引号 `\"`**:表示双引号字符。 - **反斜杠 `\\`**:表示一个反斜杠字符。 - **换行符 `\n`*...

    转义字符的通用处理方式

    NULL 博文链接:https://herry.iteye.com/blog/1976955

    HTML 转义字符串

    为了解决这个问题,我们需要使用转义字符序列来代替这些特殊字符。 例如,小于号 `的转义序列是 `&lt;`,大于号 `&gt;` 的转义序列是 `&gt;`,和号 `&` 的转义序列是 `&amp;`。这样,即使在文本中出现这些特殊字符,...

    html页面转义字符对照表

    通过以上的总结可以看出,HTML中的转义字符非常丰富,不仅可以处理基本的ASCII字符集,还可以处理扩展的Unicode字符集。这对于创建国际化和高质量的网页是非常重要的。开发者可以根据实际需要选择合适的转义字符来...

    测量程序编制 - python 12数据类型:String(字符串)-转义字符.pptx

    在Python编程语言中,字符...比如在输出多行文本、格式化输出、处理包含引号的文本等场景下,转义字符能够帮助我们更灵活地控制字符串的显示和处理。在实际编程中,掌握这些转义字符的用法能够提升代码的可读性和效率。

    Android xml转义字符

    ### Android XML转义字符 在Android开发过程中,经常会遇到需要在`res/values/strings.xml`文件中使用特殊字符的情况。由于XML是一种标记语言,它对某些字符有特定的规定,如果直接将这些特殊字符写入XML文件,则...

    正则表达 转义字符

    ### 正则表达式转义字符详解 #### 一、引言 正则表达式(Regular Expression)是一种用于模式匹配的强大工具,在文本处理、搜索、替换等操作中扮演着极其重要的角色。转义字符在正则表达式中具有特殊含义,它们...

    常用HTML转义字符

    在处理用户输入、插入特殊字符或进行字符串拼接时,使用转义字符可以确保浏览器正确地解析和展示内容。例如,如果你要在网页上显示一个带有尖括号的句子,如 "我喜欢吃苹果&lt;和香蕉&gt;",你应该写成 "我喜欢吃苹果&amp;...

Global site tag (gtag.js) - Google Analytics