`
lvsenlin
  • 浏览: 126444 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

XPATH学习总结

阅读更多

http://www.w3.org/TR/xpath/

eclipse xpath插件在线安装地址

http://eclipse-xpath.sourceforge.net/update/site.xml

http://www.bastian-bergerhoff.com/eclipse/features

 

 XPath手册 [源于ZVON]

基本的XPath语法类似于在一个文件系统中定位文件,如果路径以斜线 / 开始, 那么该路径就表示到一个元素的绝对路径
/AAA
选择根元素AAA

     <AAA
          <BBB/> 
          <CCC/> 
          <BBB/> 
          <BBB/> 
          <DDD
               <BBB/> 
          </DDD
          <CCC/> 
     </AAA>
/AAA/CCC
选择AAA的所有CCC子元素

     <AAA
          <BBB/> 
          <CCC/> 
          <BBB/> 
          <BBB/> 
          <DDD
               <BBB/> 
          </DDD
          <CCC/> 
     </AAA>
/AAA/DDD/BBB
选择AAA的子元素DDD的所有子元素

     <AAA
          <BBB/> 
          <CCC/> 
          <BBB/> 
          <BBB/> 
          <DDD
               <BBB/> 
          </DDD
          <CCC/> 
     </AAA>
如果路径以双斜线 // 开头, 则表示选择文档中所有满足双斜线//之后规则的元素(无论层级关系)
//BBB
选择所有BBB元素

     <AAA
          <BBB/> 
          <CCC/> 
          <BBB/> 
          <DDD
               <BBB/> 
          </DDD
          <CCC
               <DDD
                    <BBB/> 
                    <BBB/> 
               </DDD
          </CCC
     </AAA>
//DDD/BBB
选择所有父元素是DDD的BBB元素

     <AAA
          <BBB/> 
          <CCC/> 
          <BBB/> 
          <DDD
               <BBB/> 
          </DDD
          <CCC
               <DDD
                    <BBB/> 
                    <BBB/> 
               </DDD
          </CCC
     </AAA>
星号 * 表示选择所有由星号之前的路径所定位的元素
/AAA/CCC/DDD/*
选择所有路径依附于/AAA/CCC/DDD的元素

     <AAA
          <XXX
               <DDD
                    <BBB/> 
                    <BBB/> 
                    <EEE/> 
                    <FFF/> 
               </DDD
          </XXX
          <CCC
               <DDD
                    <BBB/> 
                    <BBB/> 
                    <EEE/> 
                    <FFF/> 
               </DDD
          </CCC
          <CCC
               <BBB
                    <BBB
                         <BBB/> 
                    </BBB
               </BBB
          </CCC
     </AAA>
/*/*/*/BBB
选择所有的有3个祖先元素的BBB元素

     <AAA
          <XXX
               <DDD
                    <BBB/> 
                    <BBB/> 
                    <EEE/> 
                    <FFF/> 
               </DDD
          </XXX
          <CCC
               <DDD
                    <BBB/> 
                    <BBB/> 
                    <EEE/> 
                    <FFF/> 
               </DDD
          </CCC
          <CCC
               <BBB
                    <BBB
                         <BBB/> 
                    </BBB
               </BBB
          </CCC
     </AAA>
//*
选择所有元素

     <AAA
          <XXX
               <DDD
                    <BBB/> 
                    <BBB/> 
                    <EEE/> 
                    <FFF/> 
               </DDD
          </XXX
          <CCC
               <DDD
                    <BBB/> 
                    <BBB/> 
                    <EEE/> 
                    <FFF/> 
               </DDD
          </CCC
          <CCC
               <BBB
                    <BBB
                         <BBB/> 
                    </BBB
               </BBB
          </CCC
     </AAA>
方块号里的表达式可以进一步的指定元素, 其中数字表示元素在选择集里的位置, 而last()函数则表示选择集中的最后一个元素.
/AAA/BBB[1]
选择AAA的第一个BBB子元素

     <AAA
          <BBB/> 
          <BBB/> 
          <BBB/> 
          <BBB/> 
     </AAA>
/AAA/BBB[last()]
选择AAA的最后一个BBB子元素

     <AAA
          <BBB/> 
          <BBB/> 
          <BBB/> 
          <BBB/> 
     </AAA>
//@id
选择所有的id属性

     <AAA
          <BBB id = "b1"/> 
          <BBB id = "b2"/> 
          <BBB name = "bbb"/> 
          <BBB/> 
     </AAA>
//BBB[@id]
选择有id属性的BBB元素

     <AAA
          <BBB id = "b1"/> 
          <BBB id = "b2"/> 
          <BBB name = "bbb"/> 
          <BBB/> 
     </AAA>
//BBB[@name]
选择有name属性的BBB元素

     <AAA
          <BBB id = "b1"/> 
          <BBB id = "b2"/> 
          <BBB name = "bbb"/> 
          <BBB/> 
     </AAA>
//BBB[@*]
选择有任意属性的BBB元素

     <AAA
          <BBB id = "b1"/> 
          <BBB id = "b2"/> 
          <BBB name = "bbb"/> 
          <BBB/> 
     </AAA>
//BBB[not(@*)]
选择没有属性的BBB元素

     <AAA
          <BBB id = "b1"/> 
          <BBB id = "b2"/> 
          <BBB name = "bbb"/> 
          <BBB/> 
     </AAA>
属性的值可以被用来作为选择的准则, normalize-space函数删除了前部和尾部的空格, 并且把连续的空格串替换为一个单一的空格
//BBB[@id='b1']
选择含有属性id且其值为'b1'的BBB元素

     <AAA
          <BBB id = "b1"/> 
          <BBB name = " bbb "/> 
          <BBB name = "bbb"/> 
     </AAA>
//BBB[@name='bbb']
选择含有属性name且其值为'bbb'的BBB元素

     <AAA
          <BBB id = "b1"/> 
          <BBB name = " bbb "/> 
          <BBB name = "bbb"/> 
     </AAA>
//BBB[normalize-space(@name)='bbb']
选择含有属性name且其值(在用normalize-space函数去掉前后空格后)为'bbb'的BBB元素

     <AAA
          <BBB id = "b1"/> 
          <BBB name = " bbb "/> 
          <BBB name = "bbb"/> 
     </AAA>
count()函数可以计数所选元素的个数
//*[count(BBB)=2]
选择含有2个BBB子元素的元素

     <AAA
          <CCC
               <BBB/> 
               <BBB/> 
               <BBB/> 
          </CCC
          <DDD
               <BBB/> 
               <BBB/> 
          </DDD
          <EEE
               <CCC/> 
               <DDD/> 
          </EEE
     </AAA>
//*[count(*)=2]
选择含有2个子元素的元素

     <AAA
          <CCC
               <BBB/> 
               <BBB/> 
               <BBB/> 
          </CCC
          <DDD
               <BBB/> 
               <BBB/> 
          </DDD
          <EEE
               <CCC/> 
               <DDD/> 
          </EEE
     </AAA>
//*[count(*)=3]
选择含有3个子元素的元素

     <AAA
          <CCC
               <BBB/> 
               <BBB/> 
               <BBB/> 
          </CCC
          <DDD
               <BBB/> 
               <BBB/> 
          </DDD
          <EEE
               <CCC/> 
               <DDD/> 
          </EEE
     </AAA>
name()函数返回元素的名称, start-with()函数在该函数的第一个参数字符串是以第二个参数字符开始的情况返回true, contains()函数当其第一个字符串参数包含有第二个字符串参数时返回true.
//*[name()='BBB']
选择所有名称为BBB的元素(这里等价于//BBB)

     <AAA
          <BCC
               <BBB/> 
               <BBB/> 
               <BBB/> 
          </BCC
          <DDB
               <BBB/> 
               <BBB/> 
          </DDB
          <BEC
               <CCC/> 
               <DBD/> 
          </BEC
     </AAA>
//*[starts-with(name(),'B')]
选择所有名称以"B"起始的元素

     <AAA
          <BCC
               <BBB/> 
               <BBB/> 
               <BBB/> 
          </BCC
          <DDB
               <BBB/> 
               <BBB/> 
          </DDB
          <BEC
               <CCC/> 
               <DBD/> 
          </BEC
     </AAA>
//*[contains(name(),'C')]
选择所有名称包含"C"的元素

     <AAA
          <BCC
               <BBB/> 
               <BBB/> 
               <BBB/> 
          </BCC
          <DDB
               <BBB/> 
               <BBB/> 
          </DDB
          <BEC
               <CCC/> 
               <DBD/> 
          </BEC
     </AAA>
string-length函数返回字符串的字符数,你应该用&lt;替代<, 用&gt;代替>
//*[string-length(name()) = 3]
选择名字长度为3的元素

     <AAA
          <Q/> 
          <SSSS/> 
          <BB/> 
          <CCC/> 
          <DDDDDDDD/> 
          <EEEE/> 
     </AAA>
//*[string-length(name()) < 3]
选择名字长度小于3的元素

     <AAA
          <Q/> 
          <SSSS/> 
          <BB/> 
          <CCC/> 
          <DDDDDDDD/> 
          <EEEE/> 
     </AAA>
//*[string-length(name()) > 3]
选择名字长度大于3的元素

     <AAA
          <Q/> 
          <SSSS/> 
          <BB/> 
          <CCC/> 
          <DDDDDDDD/> 
          <EEEE/> 
     </AAA>
多个路径可以用分隔符 | 合并在一起
//CCC | //BBB
选择所有的CCC和BBB元素

     <AAA
          <BBB/> 
          <CCC/> 
          <DDD
               <CCC/> 
          </DDD
          <EEE/> 
     </AAA>
/AAA/EEE | //BBB
选择所有的BBB元素和所有是AAA的子元素的EEE元素

     <AAA
          <BBB/> 
          <CCC/> 
          <DDD
               <CCC/> 
          </DDD
          <EEE/> 
     </AAA>
/AAA/EEE | //DDD/CCC | /AAA | //BBB
可以合并的路径数目没有限制

     <AAA
          <BBB/> 
          <CCC/> 
          <DDD
               <CCC/> 
          </DDD
          <EEE/> 
     </AAA>
child轴(axis)包含上下文节点的子元素, 作为默认的轴,可以忽略不写.
/AAA
等价于 /child::AAA

     <AAA
          <BBB/> 
          <CCC/> 
     </AAA>
/child::AAA
等价于/AAA

     <AAA
          <BBB/> 
          <CCC/> 
     </AAA>
/AAA/BBB
等价于/child::AAA/child::BBB

     <AAA
          <BBB/> 
          <CCC/> 
     </AAA>
/child::AAA/child::BBB
等价于/AAA/BBB

     <AAA
          <BBB/> 
          <CCC/> 
     </AAA>
/child::AAA/BBB
二者都可以被合并

     <AAA
          <BBB/> 
          <CCC/> 
     </AAA>
descendant (后代)轴包含上下文节点的后代,一个后代是指子节点或者子节点的子节点等等, 因此descendant轴不会包含属性和命名空间节点.
/descendant::*
选择文档根元素的所有后代.即所有的元素被选择

     <AAA
          <BBB
               <DDD
                    <CCC
                         <DDD/> 
                         <EEE/> 
                    </CCC
               </DDD
          </BBB
          <CCC
               <DDD
                    <EEE
                         <DDD
                              <FFF/> 
                         </DDD
                    </EEE
               </DDD
          </CCC
     </AAA>
/AAA/BBB/descendant::*
选择/AAA/BBB的所有后代元素

     <AAA
          <BBB
               <DDD
                    <CCC
                         <DDD/> 
                         <EEE/> 
                    </CCC
               </DDD
          </BBB
          <CCC
               <DDD
                    <EEE
                         <DDD
                              <FFF/> 
                         </DDD
                    </EEE
               </DDD
          </CCC
     </AAA>
//CCC/descendant::*
选择在祖先元素中有CCC的所有元素

     <AAA
          <BBB
               <DDD
                    <CCC
                         <DDD/> 
                         <EEE/> 
                    </CCC
               </DDD
          </BBB
          <CCC
               <DDD
                    <EEE
                         <DDD
                              <FFF/> 
                         </DDD
                    </EEE
               </DDD
          </CCC
     </AAA>
//CCC/descendant::DDD
选择所有以CCC为祖先元素的DDD元素

     <AAA
          <BBB
               <DDD
                    <CCC
                         <DDD/> 
                         <EEE/> 
                    </CCC
               </DDD
          </BBB
          <CCC
               <DDD
                    <EEE
                         <DDD
                              <FFF/> 
                         </DDD
                    </EEE
               </DDD
          </CCC
     </AAA>
parent轴(axis)包含上下文节点的父节点, 如果有父节点的话
//DDD/parent::*
选择DDD元素的所有父节点

     <AAA
          <BBB
               <DDD
                    <CCC
                         <DDD/> 
                         <EEE/> 
                    </CCC
               </DDD
          </BBB
          <CCC
               <DDD
                    <EEE
                         <DDD
                              <FFF/> 
                         </DDD
                    </EEE
               </DDD
          </CCC
     </AAA>
ancestor轴(axis)包含上下节点的祖先节点, 该祖先节点由其上下文节点的父节点以及父节点的父节点等等诸如此类的节点构成,所以ancestor轴总是包含有根节点,除非上下文节点就是根节点本身.
/AAA/BBB/DDD/CCC/EEE/ancestor::*
选择一个绝对路径上的所有节点

     <AAA
          <BBB
               <DDD
                    <CCC
                         <DDD/> 
                         <EEE/> 
                    </CCC
               </DDD
          </BBB
          <CCC
               <DDD
                    <EEE
                         <DDD
                              <FFF/> 
                         </DDD
                    </EEE
               </DDD
          </CCC
     </AAA>
//FFF/ancestor::*
选择FFF元素的祖先节点

     <AAA
          <BBB
               <DDD
                    <CCC
                         <DDD/> 
                         <EEE/> 
                    </CCC
               </DDD
          </BBB
          <CCC
               <DDD
                    <EEE
                         <DDD
                              <FFF/> 
                         </DDD
                    </EEE
               </DDD
          </CCC
     </AAA>
following-sibling轴(axis)包含上下文节点之后的所有兄弟节点
/AAA/BBB/following-sibling::*

     <AAA
          <BBB
               <CCC/> 
               <DDD/> 
          </BBB
          <XXX
               <DDD
                    <EEE/> 
                    <DDD/> 
                    <CCC/> 
                    <FFF/> 
                    <FFF
                         <GGG/> 
                    </FFF
               </DDD
          </XXX
          <CCC
               <DDD/> 
          </CCC
     </AAA>
//CCC/following-sibling::*

     <AAA
          <BBB
               <CCC/> 
               <DDD/> 
          </BBB
          <XXX
               <DDD
                    <EEE/> 
                    <DDD/> 
                    <CCC/> 
                    <FFF/> 
                    <FFF
                         <GGG/> 
                    </FFF
               </DDD
          </XXX
          <CCC
               <DDD/> 
          </CCC
     </AAA>
preceding-sibling 轴(axis)包含上下文节点之前的所有兄弟节点
/AAA/XXX/preceding-sibling::*

     <AAA
          <BBB
               <CCC/> 
               <DDD/> 
          </BBB
          <XXX
               <DDD
                    <EEE/> 
                    <DDD/> 
                    <CCC/> 
                    <FFF/> 
                    <FFF
                         <GGG/> 
                    </FFF
               </DDD
          </XXX
          <CCC
               <DDD/> 
          </CCC
     </AAA>
//CCC/preceding-sibling::*

     <AAA
          <BBB
               <CCC/> 
               <DDD/> 
          </BBB
          <XXX
               <DDD
                    <EEE/> 
                    <DDD/> 
                    <CCC/> 
                    <FFF/> 
                    <FFF
                         <GGG/> 
                    </FFF
               </DDD
          </XXX
          <CCC
               <DDD/> 
          </CCC
     </AAA>
following轴(axis)包含同一文档中按文档顺序位于上下文节点之后的所有节点, 除了祖先节点,属性节点和命名空间节点
/AAA/XXX/following::*

     <AAA
          <BBB
               <CCC/> 
               <ZZZ
                    <DDD/> 
                    <DDD
                         <EEE/> 
                    </DDD
               </ZZZ
               <FFF
                    <GGG/> 
               </FFF
          </BBB
          <XXX
               <DDD
                    <EEE/> 
                    <DDD/> 
                    <CCC/> 
                    <FFF/> 
                    <FFF
                         <GGG/> 
                    </FFF
               </DDD
          </XXX
          <CCC
               <DDD/> 
          </CCC
     </AAA>
//ZZZ/following::*

     <AAA
          <BBB
               <CCC/> 
               <ZZZ
                    <DDD/> 
                    <DDD
                         <EEE/> 
                    </DDD
               </ZZZ
               <FFF
                    <GGG/> 
               </FFF
          </BBB
          <XXX
               <DDD
                    <EEE/> 
                    <DDD/> 
                    <CCC/> 
                    <FFF/> 
                    <FFF
                         <GGG/> 
                    </FFF
               </DDD
          </XXX
          <CCC
               <DDD/> 
          </CCC
     </AAA>
following轴(axis)包含同一文档中按文档顺序位于上下文节点之前的所有节点, 除了祖先节点,属性节点和命名空间节点
/AAA/XXX/preceding::*

     <AAA
          <BBB
               <CCC/> 
               <ZZZ
                    <DDD/> 
               </ZZZ
          </BBB
          <XXX
               <DDD
                    <EEE/> 
                    <DDD/> 
                    <CCC/> 
                    <FFF/> 
                    <FFF
                         <GGG/> 
                    </FFF
               </DDD
          </XXX
          <CCC
               <DDD/> 
          </CCC
     </AAA>
//GGG/preceding::*

     <AAA
          <</
分享到:
评论

相关推荐

    xpath详解总结-很全面.docx

    XPath 详解总结 XPath 是 W3C 的一个标准,主要目的是为了在 XML1.0 或 XML1.1 文档节点树中定位节点所设计。XPath 是一种表达式语言,返回值可能是节点、节点集合、原子值、节点和原子值的混合等。 XPath 路径...

    xpath详解总结,很全面[参照].pdf

    XPath 详解总结 XPath 是 W3C 的一个标准,它的主要目的是为了在 XML 文档节点树中定位节点。XPath 有两种版本:XPath1.0 和 XPath2.0。XPath2.0 是 XPath1.0 的超集,支持更加丰富的数据类型,并且保持了对 XPath...

    XPath教程

    XPath,全称XML Path Language,是一种在XML文档中查找信息的语言。它被设计用来选取XML文档中的节点...通过本教程的学习,你应该能够运用XPath有效地在XML文档中导航、选取信息,并在XSL和其他XML相关技术中灵活应用。

    XPATH参考手册

    XPath 摘要是指对 XPath 教程的总结和对下一步学习的建议。 XPath 函数是指 XPath 中的标准函数库,包括了字符串值、数值、日期和时间比较、节点和 QName 处理、序列处理、逻辑值等方面的函数。XPath 函数库共有...

    自动化测试工程师的xpath实用技巧总结

    通过上述技巧的学习,我们可以更加灵活地使用XPath来进行元素定位,这对于自动化测试工程师来说是非常重要的技能。无论是面对复杂的页面结构还是变化多端的元素属性,掌握好XPath都能让我们事半功倍。希望本文的内容...

    xpath-helper工具

    总结来说,XPath Helper 是一个便捷的JavaScript辅助工具,它使开发者能够利用XPath语言高效地在HTML或XML文档中查找和提取数据。通过直观的界面和实时反馈,它促进了XPath的学习和使用,特别适用于网页开发和自动化...

    xpathDemo.zip

    总结来说,"xpathDemo.zip"是一个关于使用JDOM和XPath处理XML文档的实例。通过学习和运行这个DEMO,你可以深入了解如何在Java环境中有效地解析和操作XML数据,同时掌握XPath的强大查询能力。这在处理大量XML数据或...

    XPath官方文档 教程

    总结,XPath是XML处理中的强大工具,它提供了一种高效的方式来定位和操作XML文档中的数据。掌握XPath,对于处理XML格式的数据至关重要,无论是在Web开发、数据交换还是XML文档的自动化处理中,都有广泛的应用场景。...

    xpath语法与函数

    #### XPath学习建议 在学习XPath之前,建议先熟悉HTML/XHTML和XML/XML命名空间的基础知识。这有助于更好地理解XPath如何在XML文档中导航。此外,可以参考W3School提供的完整XPath 2.0、XQuery 1.0和XSLT 2.0的内置...

    Dom4j 解析Xml文档及 XPath查询 学习笔记

    总结,Dom4j是Java中解析XML的强大工具,结合XPath,能够高效地查找、处理XML文档中的数据。通过熟练掌握这两个技术,可以轻松地处理复杂的XML操作。同时,持续查阅官方文档,有助于深入理解和应用Dom4j的各种功能。

    xpath_helper-master.rar

    总结来说,XPath Helper是一个强大的辅助工具,它帮助开发者和爬虫工程师高效地编写和测试XPath表达式,提高他们在处理XML和HTML文档时的效率。如果你正在学习网络爬虫或者需要在XML数据中提取信息,掌握XPath和使用...

    XPath最通俗的教程.rar

    XPath 表达式的运算符。XPath 实例 本章使用 "books.xml" 文档来演示一些 XPath 实例。XPath 摘要 本文内容包括在本教程所学知识的一个总结,以及我们向你推荐的下一步应该学习的内容

    Xpath 工具

    总结来说,XPath工具是XML处理中不可或缺的一部分,它简化了对XML数据的操作和查询。"Xpath测试工具.exe"作为一个集成的开发辅助工具,可以帮助开发者快速验证XPath表达式,提高工作效率。源码的开放性进一步增强了...

    一个简单的爬虫demo使用了一些Xpath技术

    总结,这个爬虫demo是学习和实践Xpath技术的良好起点,它揭示了如何使用Python和Xpath来提取网页数据,这对于数据分析、信息监控或者构建定制化的信息获取系统都是非常有价值的技能。同时,理解并熟练掌握Xpath可以...

    XPath 使用方法

    ### XPath 使用方法详解 #### 一、XPath简介 XPath 是一门专门用于在 XML 文档中查找信息的语言。...通过深入学习 XPath 的语法、标准函数以及节点关系等内容,可以更高效地处理复杂的 XML 数据结构。

    XPath参考文档

    XPath,全称XML Path Language,是一种在XML文档中查找信息的语言。它被设计用来方便地选取XML文档中的节点,如元素、属性、文本等。...通过学习提供的XPath教程文档,您可以更深入地掌握这一强大的XML处理技术。

    XPATH 使用手册个人总结版本

    个人在学习 xpath 的过程中,记录的一些属性用法。关注公众号:pctansuo 中文名称:爬虫探索者可以获得 python 学习资料,相关资源

    Xpath语法格式总结

    以下是对XPath语法格式的一个全面总结。 XPath版本 XPath 1.0 成为W3C标准是在1999年,而XPath 2.0的标准确立是在2007年。W3C关于XPath的详细文档可以在官方链接中找到。XPath 2.0作为XPath 1.0的超集,支持更丰富...

    DOM4J_xpath

    ### DOM4J与XPath详解 #### 一、DOM4J简介 **DOM4J**是一款由dom4j.org开发的开源XML解析库,专为Java平台设计,它不仅支持DOM和SAX这...对于想要学习DOM4J和XPath的朋友来说,本文提供了深入的理解和技术实践指南。

Global site tag (gtag.js) - Google Analytics