`
lan13217
  • 浏览: 496043 次
  • 性别: Icon_minigender_1
社区版块
存档分类
最新评论

Escape special characters

    博客分类:
  • Java
 
阅读更多
for Json in notepad++ replace
   <tr><td> " </td><td> \" </td></tr> 
   <tr><td> \ </td><td> \\ </td></tr> 
   <tr><td> / </td><td> \/ </td></tr> 
   <tr><td> back space </td><td> \b </td></tr>  
   <tr><td> form feed </td><td> \f </td></tr> 
   <tr><td> line feed </td><td> \n </td></tr> 
   <tr><td> carriage return </td><td> \r </td></tr> 
   <tr><td> tab </td><td> \t </td></tr>

find:     ([\b\t\n\f\r\"\'\\]{1})
replace:   \\\1


http://www.javapractices.com/topic/TopicAction.do?Id=96
import java.net.URLEncoder;
import java.io.UnsupportedEncodingException;
import java.text.CharacterIterator;
import java.text.StringCharacterIterator;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

import hirondelle.web4j.security.SafeText;
import hirondelle.web4j.ui.translate.Text;
import hirondelle.web4j.ui.translate.Tooltips;
import hirondelle.web4j.ui.translate.TextFlow;
import hirondelle.web4j.ui.tag.Populate;
import hirondelle.web4j.database.Report;

/**
 Convenience methods for escaping special characters related to HTML, XML, 
 and regular expressions.
 
 <P>To keep you safe by default, WEB4J goes to some effort to escape 
 characters in your data when appropriate, such that you <em>usually</em>
 don't need to think too much about escaping special characters. Thus, you
  shouldn't need to <em>directly</em> use the services of this class very often. 
 
 <P><span class='highlight'>For Model Objects containing free form user input, 
 it is highly recommended that you use {@link SafeText}, not <tt>String</tt></span>.
 Free form user input is open to malicious use, such as
 <a href='http://www.owasp.org/index.php/Cross_Site_Scripting'>Cross Site Scripting</a>
 attacks. 
 Using <tt>SafeText</tt> will protect you from such attacks, by always escaping 
 special characters automatically in its <tt>toString()</tt> method.   
 
 <P>The following WEB4J classes will automatically escape special characters 
 for you, when needed : 
 <ul>
 <li>the {@link SafeText} class, used as a building block class for your 
 application's Model Objects, for modeling all free form user input
 <li>the {@link Populate} tag used with forms
 <li>the {@link Report} class used for creating quick reports
 <li>the {@link Text}, {@link TextFlow}, and {@link Tooltips} custom tags used 
 for translation
 </ul> 
*/
public final class EscapeChars {

  /**
    Escape characters for text appearing in HTML markup.
    
    <P>This method exists as a defence against Cross Site Scripting (XSS) hacks.
    The idea is to neutralize control characters commonly used by scripts, such that
    they will not be executed by the browser. This is done by replacing the control
    characters with their escaped equivalents.  
    See {@link hirondelle.web4j.security.SafeText} as well.
    
    <P>The following characters are replaced with corresponding 
    HTML character entities :
    <table border='1' cellpadding='3' cellspacing='0'>
    <tr><th> Character </th><th>Replacement</th></tr>
    <tr><td> < </td><td> &lt; </td></tr>
    <tr><td> > </td><td> &gt; </td></tr>
    <tr><td> & </td><td> &amp; </td></tr>
    <tr><td> " </td><td> &quot;</td></tr>
    <tr><td> \t </td><td> &#009;</td></tr>
    <tr><td> ! </td><td> &#033;</td></tr>
    <tr><td> # </td><td> &#035;</td></tr>
    <tr><td> $ </td><td> &#036;</td></tr>
    <tr><td> % </td><td> &#037;</td></tr>
    <tr><td> ' </td><td> &#039;</td></tr>
    <tr><td> ( </td><td> &#040;</td></tr> 
    <tr><td> ) </td><td> &#041;</td></tr>
    <tr><td> * </td><td> &#042;</td></tr>
    <tr><td> + </td><td> &#043; </td></tr>
    <tr><td> , </td><td> &#044; </td></tr>
    <tr><td> - </td><td> &#045; </td></tr>
    <tr><td> . </td><td> &#046; </td></tr>
    <tr><td> / </td><td> &#047; </td></tr>
    <tr><td> : </td><td> &#058;</td></tr>
    <tr><td> ; </td><td> &#059;</td></tr>
    <tr><td> = </td><td> &#061;</td></tr>
    <tr><td> ? </td><td> &#063;</td></tr>
    <tr><td> @ </td><td> &#064;</td></tr>
    <tr><td> [ </td><td> &#091;</td></tr>
    <tr><td> \ </td><td> &#092;</td></tr>
    <tr><td> ] </td><td> &#093;</td></tr>
    <tr><td> ^ </td><td> &#094;</td></tr>
    <tr><td> _ </td><td> &#095;</td></tr>
    <tr><td> ` </td><td> &#096;</td></tr>
    <tr><td> { </td><td> &#123;</td></tr>
    <tr><td> | </td><td> &#124;</td></tr>
    <tr><td> } </td><td> &#125;</td></tr>
    <tr><td> ~ </td><td> &#126;</td></tr>
    </table>
    
    <P>Note that JSTL's {@code <c:out>} escapes <em>only the first 
    five</em> of the above characters.
   */
   public static String forHTML(String aText){
     final StringBuilder result = new StringBuilder();
     final StringCharacterIterator iterator = new StringCharacterIterator(aText);
     char character =  iterator.current();
     while (character != CharacterIterator.DONE ){
       if (character == '<') {
         result.append("&lt;");
       }
       else if (character == '>') {
         result.append("&gt;");
       }
       else if (character == '&') {
         result.append("&amp;");
      }
       else if (character == '\"') {
         result.append("&quot;");
       }
       else if (character == '\t') {
         addCharEntity(9, result);
       }
       else if (character == '!') {
         addCharEntity(33, result);
       }
       else if (character == '#') {
         addCharEntity(35, result);
       }
       else if (character == '$') {
         addCharEntity(36, result);
       }
       else if (character == '%') {
         addCharEntity(37, result);
       }
       else if (character == '\'') {
         addCharEntity(39, result);
       }
       else if (character == '(') {
         addCharEntity(40, result);
       }
       else if (character == ')') {
         addCharEntity(41, result);
       }
       else if (character == '*') {
         addCharEntity(42, result);
       }
       else if (character == '+') {
         addCharEntity(43, result);
       }
       else if (character == ',') {
         addCharEntity(44, result);
       }
       else if (character == '-') {
         addCharEntity(45, result);
       }
       else if (character == '.') {
         addCharEntity(46, result);
       }
       else if (character == '/') {
         addCharEntity(47, result);
       }
       else if (character == ':') {
         addCharEntity(58, result);
       }
       else if (character == ';') {
         addCharEntity(59, result);
       }
       else if (character == '=') {
         addCharEntity(61, result);
       }
       else if (character == '?') {
         addCharEntity(63, result);
       }
       else if (character == '@') {
         addCharEntity(64, result);
       }
       else if (character == '[') {
         addCharEntity(91, result);
       }
       else if (character == '\\') {
         addCharEntity(92, result);
       }
       else if (character == ']') {
         addCharEntity(93, result);
       }
       else if (character == '^') {
         addCharEntity(94, result);
       }
       else if (character == '_') {
         addCharEntity(95, result);
       }
       else if (character == '`') {
         addCharEntity(96, result);
       }
       else if (character == '{') {
         addCharEntity(123, result);
       }
       else if (character == '|') {
         addCharEntity(124, result);
       }
       else if (character == '}') {
         addCharEntity(125, result);
       }
       else if (character == '~') {
         addCharEntity(126, result);
       }
       else {
         //the char is not a special one
         //add it to the result as is
         result.append(character);
       }
       character = iterator.next();
     }
     return result.toString();
  }
  

  /**
   Escape all ampersand characters in a URL. 
    
   <P>Replaces all <tt>'&'</tt> characters with <tt>'&amp;'</tt>.
   
  <P>An ampersand character may appear in the query string of a URL.
   The ampersand character is indeed valid in a URL.
   <em>However, URLs usually appear as an <tt>HREF</tt> attribute, and 
   such attributes have the additional constraint that ampersands 
   must be escaped.</em>
   
   <P>The JSTL <c:url> tag does indeed perform proper URL encoding of 
   query parameters. But it does not, in general, produce text which 
   is valid as an <tt>HREF</tt> attribute, simply because it does 
   not escape the ampersand character. This is a nuisance when 
   multiple query parameters appear in the URL, since it requires a little 
   extra work.
  */
  public static String forHrefAmpersand(String aURL){
    return aURL.replace("&", "&amp;");
  }
   
  /**
    Synonym for <tt>URLEncoder.encode(String, "UTF-8")</tt>.
   
    <P>Used to ensure that HTTP query strings are in proper form, by escaping
    special characters such as spaces.
   
    <P>It is important to note that if a query string appears in an <tt>HREF</tt>
    attribute, then there are two issues - ensuring the query string is valid HTTP
    (it is URL-encoded), and ensuring it is valid HTML (ensuring the 
    ampersand is escaped).
   */
   public static String forURL(String aURLFragment){
     String result = null;
     try {
       result = URLEncoder.encode(aURLFragment, "UTF-8");
     }
     catch (UnsupportedEncodingException ex){
       throw new RuntimeException("UTF-8 not supported", ex);
     }
     return result;
   }

  /**
   Escape characters for text appearing as XML data, between tags.
   
   <P>The following characters are replaced with corresponding character entities :
   <table border='1' cellpadding='3' cellspacing='0'>
   <tr><th> Character </th><th> Encoding </th></tr>
   <tr><td> < </td><td> &lt; </td></tr>
   <tr><td> > </td><td> &gt; </td></tr>
   <tr><td> & </td><td> &amp; </td></tr>
   <tr><td> " </td><td> &quot;</td></tr>
   <tr><td> ' </td><td> &#039;</td></tr>
   </table>
   
   <P>Note that JSTL's {@code <c:out>} escapes the exact same set of 
   characters as this method. <span class='highlight'>That is, {@code <c:out>}
    is good for escaping to produce valid XML, but not for producing safe 
    HTML.</span>
  */
  public static String forXML(String aText){
    final StringBuilder result = new StringBuilder();
    final StringCharacterIterator iterator = new StringCharacterIterator(aText);
    char character =  iterator.current();
    while (character != CharacterIterator.DONE ){
      if (character == '<') {
        result.append("&lt;");
      }
      else if (character == '>') {
        result.append("&gt;");
      }
      else if (character == '\"') {
        result.append("&quot;");
      }
      else if (character == '\'') {
        result.append("&#039;");
      }
      else if (character == '&') {
         result.append("&amp;");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();
  }
  
  /**
   Escapes characters for text appearing as data in the 
   <a href='http://www.json.org/'>Javascript Object Notation</a>
   (JSON) data interchange format.
   
   <P>The following commonly used control characters are escaped :
   <table border='1' cellpadding='3' cellspacing='0'>
   <tr><th> Character </th><th> Escaped As </th></tr>
   <tr><td> " </td><td> \" </td></tr>
   <tr><td> \ </td><td> \\ </td></tr>
   <tr><td> / </td><td> \/ </td></tr>
   <tr><td> back space </td><td> \b </td></tr> 
   <tr><td> form feed </td><td> \f </td></tr>
   <tr><td> line feed </td><td> \n </td></tr>
   <tr><td> carriage return </td><td> \r </td></tr>
   <tr><td> tab </td><td> \t </td></tr>
   </table>
   
   <P>See <a href='http://www.ietf.org/rfc/rfc4627.txt'>RFC 4627</a> for more information.
  */
  public static String forJSON(String aText){
    final StringBuilder result = new StringBuilder();
    StringCharacterIterator iterator = new StringCharacterIterator(aText);
    char character = iterator.current();
    while (character != StringCharacterIterator.DONE){
      if( character == '\"' ){
        result.append("\\\"");
      }
      else if(character == '\\'){
        result.append("\\\\");
      }
      else if(character == '/'){
        result.append("\\/");
      }
      else if(character == '\b'){
        result.append("\\b");
      }
      else if(character == '\f'){
        result.append("\\f");
      }
      else if(character == '\n'){
        result.append("\\n");
      }
      else if(character == '\r'){
        result.append("\\r");
      }
      else if(character == '\t'){
        result.append("\\t");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();    
  }

  /**
   Return <tt>aText</tt> with all <tt>'<'</tt> and <tt>'>'</tt> characters
   replaced by their escaped equivalents.
  */
  public static String toDisableTags(String aText){
    final StringBuilder result = new StringBuilder();
    final StringCharacterIterator iterator = new StringCharacterIterator(aText);
    char character =  iterator.current();
    while (character != CharacterIterator.DONE ){
      if (character == '<') {
        result.append("&lt;");
      }
      else if (character == '>') {
        result.append("&gt;");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();
  }
  

  /**
   Replace characters having special meaning in regular expressions
   with their escaped equivalents, preceded by a '\' character.
  
   <P>The escaped characters include :
  <ul>
  <li>.
  <li>\
  <li>?, * , and +
  <li>&
  <li>:
  <li>{ and }
  <li>[ and ]
  <li>( and )
  <li>^ and $
  </ul>
  */
  public static String forRegex(String aRegexFragment){
    final StringBuilder result = new StringBuilder();

    final StringCharacterIterator iterator = 
      new StringCharacterIterator(aRegexFragment)
    ;
    char character =  iterator.current();
    while (character != CharacterIterator.DONE ){
      /*
       All literals need to have backslashes doubled.
      */
      if (character == '.') {
        result.append("\\.");
      }
      else if (character == '\\') {
        result.append("\\\\");
      }
      else if (character == '?') {
        result.append("\\?");
      }
      else if (character == '*') {
        result.append("\\*");
      }
      else if (character == '+') {
        result.append("\\+");
      }
      else if (character == '&') {
        result.append("\\&");
      }
      else if (character == ':') {
        result.append("\\:");
      }
      else if (character == '{') {
        result.append("\\{");
      }
      else if (character == '}') {
        result.append("\\}");
      }
      else if (character == '[') {
        result.append("\\[");
      }
      else if (character == ']') {
        result.append("\\]");
      }
      else if (character == '(') {
        result.append("\\(");
      }
      else if (character == ')') {
        result.append("\\)");
      }
      else if (character == '^') {
        result.append("\\^");
      }
      else if (character == '$') {
        result.append("\\$");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();
  }
  
  /**
   Escape <tt>'$'</tt> and <tt>'\'</tt> characters in replacement strings.
   
   <P>Synonym for <tt>Matcher.quoteReplacement(String)</tt>.
   
   <P>The following methods use replacement strings which treat 
   <tt>'$'</tt> and <tt>'\'</tt> as special characters:
   <ul>
   <li><tt>String.replaceAll(String, String)</tt>
   <li><tt>String.replaceFirst(String, String)</tt>
   <li><tt>Matcher.appendReplacement(StringBuffer, String)</tt>
   </ul>
   
   <P>If replacement text can contain arbitrary characters, then you 
   will usually need to escape that text, to ensure special characters 
   are interpreted literally.
  */
  public static String forReplacementString(String aInput){
    return Matcher.quoteReplacement(aInput);
  }
  
  /**
   Disable all <tt><SCRIPT></tt> tags in <tt>aText</tt>.
   
   <P>Insensitive to case.
  */  
  public static String forScriptTagsOnly(String aText){
    String result = null;
    Matcher matcher = SCRIPT.matcher(aText);
    result = matcher.replaceAll("&lt;SCRIPT>");
    matcher = SCRIPT_END.matcher(result);
    result = matcher.replaceAll("&lt;/SCRIPT>");
    return result;
  }
  
  // PRIVATE //
  
  private EscapeChars(){
    //empty - prevent construction
  }
  
  private static final Pattern SCRIPT = Pattern.compile(
    "<SCRIPT>", Pattern.CASE_INSENSITIVE
   );
  private static final Pattern SCRIPT_END = Pattern.compile(
    "</SCRIPT>", Pattern.CASE_INSENSITIVE
  );
  
  private static void addCharEntity(Integer aIdx, StringBuilder aBuilder){
    String padding = "";
    if( aIdx <= 9 ){
       padding = "00";
    }
    else if( aIdx <= 99 ){
      padding = "0";
    }
    else {
      //no prefix
    }
    String number = padding + aIdx.toString();
    aBuilder.append("&#" + number + ";");
  }
}
分享到:
评论

相关推荐

    VIM的最新的Taglist

    Fix an extra space in the check for exctags.... Escape special characters like backslash in the tag name when saving a session file. Add an internal function to get and detect file types.

    Unix shell programming in 24 hours.pdf

    - **Quoting with Backslashes** Backslashes `\` are used to escape special characters, allowing them to be treated as literal text. - **Using Single Quotes** Single quotes `' '` preserve the literal ...

    浅谈ES6 模板字符串的具体使用方法

    // Don't escape special characters in the template. s += templateData[i]; } return s; } var html = SaferHTML `&lt;p&gt;这是关于字符串模板的介绍&lt;/p&gt;`; ``` 模板字符串的优点是可以简洁地编写字符串的编译和...

    LaTeX的Python库PyLaTeX.zip

    Functionality to escape special LaTeX characters Bold, italic and verbatim functions Every class has a dump method, which writes the output to a filepointer this way you can use snippets in in ...

    Caja-HTML-Sanitizer:将Google CajaHTML Sanitizer捆绑在npm可安装的node.js模块中

    // Escapes HTML special characters in attribute values as HTML entitiesvar yourParser = sanitizer.makeSaxParser(yourHandler); // Given a SAX-like event handler, produce a function that feeds those ...

    DataView.RowFilter的使用(包括in,like等SQL中的操作符)

    DataView RowFilter Syntax [C#] This example describes syntax of DataView.RowFil ter expression.... Column names If a column name contains any of these special characters ~ ( ) # / / = &gt; &lt; + – * %

    GitHub官方MarkDown格式教学

    Markdown allows you to use backslash escapes to generate literal characters which would otherwise have special meaning in Markdown's formatting syntax. #### 五、图片(Images) Markdown支持插入图片,...

    jsp简单自定义标签的forEach遍历及转义字符示例.docx

    with special characters like " and ' &lt;/c:escape&gt; ``` 通过上述步骤,我们可以看到如何在JSP中实现自定义标签的`forEach`遍历功能以及如何处理转义字符。这种方式不仅可以提高代码的可读性和可维护性,还能让...

    PowerShellPack

    About_special_characters.help.txt About_switch.help.txt About_system_state.help.txt About_types.help.txt About_commonparameters.help.txt About_where.help.txt About_while.help.txt About_wildcard.help....

    Microsoft Codeview and Utilities User's Guide

    A.1 Special Characters in Regular Expressions A.2 Searching for Special Characters A.3 Using the Period A.4 Using Brackets A.4.1 Using the Dash within Brackets A.4.2 Using the Caret within Brackets A....

    15分钟精通正则表达式

    7. **转义字符(Escape Characters)**:某些字符具有特殊含义,如果希望匹配这些字符本身,需要使用转义字符。 - `\.`, `\*`, `+`, `?`, `\{`, `\}`, `\[`, `\]`, `\$`, `^`, `|`, `\(`, `\)`:用于匹配这些特殊...

    2009 达内Unix学习笔记

    集合了 所有的 Unix命令大全 ...telnet 192.168.0.23 自己帐号 sd08077-you0 ftp工具 192.168.0.202 tools-toolss ... 各个 shell 可互相切换 ksh:$ sh:$ csh:guangzhou% bash:bash-3.00$ ... 命令和参数之间必需用空格隔...

    ruby正则表达式规则

    2. **特殊字符(Special Characters)** - `.`:表示任何单个字符(除了换行符)。 - `^`:表示字符串或行的开始。 - `$`:表示字符串或行的结束。 - `*`:表示前面的元素可以出现0次或多次。 - `+`:表示前面的...

    C# XML字符串包含特殊字符的处理转换方法小结

    // special handling for quotes else if (isAttribute && chr == '\"') sb.Append("&quot;"); else if (isAttribute && chr == '\'') sb.Append("&apos;"); // Legal sub-chr32 characters else if (chr == ...

    Linux Bash Guide Beginner

    - **Escape characters**:介绍了转义字符的概念。 - **Single quotes**:说明了单引号的用法。 - **Double quotes**:解释了双引号的作用。 - **ANSI-C quoting**:介绍了一种遵循ANSI C标准的引用方式。 - **...

    正则表达式语法一览表(单页)

    #### 九、特殊字符(Special Characters) - **\n**:新行。 - **\r**:回车。 - **\t**:制表符。 - **\v**:垂直制表符。 - **\f**:换页符。 - **\xxx**:八进制形式的字符。 - **\xhh**:十六进制形式的字符。 ...

    simple-tags

    === Simple Tags === Contributors: momo360modena ...Tags: tag, posts, tags, admin, administration, tagging, navigation, terms, taxonomy Requires at least: 3.0 Tested up to: 3.3 Stable tag: 2.2 ...

    一个java正则表达式工具类源代码.zip(内含Regexp.java文件)

    * \e The escape character ('\u001B') \e esc符号 ('\u001B') * \cx The control character corresponding to x \cx x 对应的控制符 * * Character classes 字符类 * ...

Global site tag (gtag.js) - Google Analytics