poi实现word文档的导入（针对.doc .docx rtf） -

rwjavagbl

浏览: 14002 次
性别:
来自: 北京

最近访客更多访客>>

长弓月业

fantadust

tangkaikai1

浪迹天涯lg

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (3)

社区版块

存档分类

poi实现word文档的导入（针对.doc .docx rtf）

poi word 导入

public void importUnitInfo() throws IOException{
   String filePath = this.getHttpServletRequest().getParameter("docWord");
   File file = new File(filePath);
   POIFSFileSystem pfs = null;
   List<String> list = new ArrayList<String>();
   organization = new Organization();
   String info="";
   Boolean sign =true;
      FileInputStream ins = new FileInputStream(filePath);// 载入文档
//    WordExtractor extractor = new WordExtractor(ins);
//    // 对DOC文件进行提取
//    String text = extractor.getText();
   XWPFWordExtractor docx = null;
   int index = filePath.lastIndexOf(".");
   String fileType =filePath.substring(index);
   try {
   if(fileType.equals(".docx")){

   docx = new XWPFWordExtractor(POIXMLDocument.openPackage(filePath));//对docx文档的操作

   }else if(fileType.equals(".doc")){

   pfs = new POIFSFileSystem(ins); // 对doc文档的操作
   }
   } catch (Exception e) {
   if(pfs==null&&docx==null){
   sign=false;

   }
   }
   if(sign==true){//导入的文件格式是word(doc或者docx)
   try {
if(fileType.equals(".doc")){
   HWPFDocument hwpf = new HWPFDocument(pfs);
   Range range = hwpf.getRange();// 得到文档的读取范围
   TableIterator it = new TableIterator(range);// 迭代文档中的表格
       String cellString="";
   if (it.hasNext()) {
          TableRow tr = null;
          TableCell td = null;
          org.apache.poi.hwpf.usermodel.Paragraph para = null;
          org.apache.poi.hwpf.usermodel.Table tb = it.next();
          // 迭代行，从第1行开始
         for (int i = 0; i < tb.numRows(); i++) {
             tr = tb.getRow(i);
             for (int j = 0; j < tr.numCells(); j++) {
                 td = tr.getCell(j);// 取得单元格
                  // 取得单元格的内容
                  for (int k = 0; k < td.numParagraphs(); k++) {
                     para = td.getParagraph(k);
                      cellString = para.text();
                      boolean flag = true;

                      if (cellString != null && cellString.compareTo("") != 0&&flag==true) {
                         // 如果不trim，取出的内容后会有一个乱码字符
                             cellString = cellString.trim();
                         }
                          info+= cellString;
                      }
                  }

              }
          }
}else if(fileType.equals(".docx")){
info = docx.getText();
info =info.replaceAll("\n", "");
info = info.replaceAll("\r", "");

}
    info = info.replaceAll("\\s*", "");
.......（获得word中的内容（info）后，对具体内容操作很简单就不列出了） this.getHttpServletResponse().getWriter().write("success");
   } catch (Exception e) {
   this.getHttpServletResponse().getWriter().write("fail");
}
}else {//导入的文件是rtf格式
try {
RTFEditorKit rtf = new RTFEditorKit();
DefaultStyledDocument styledDoc = new DefaultStyledDocument();
FileInputStream in = new FileInputStream(filePath);
rtf.read(in, styledDoc, 0);
info = new String(styledDoc.getText(0,styledDoc.getLength()).getBytes("GBK")); //提取文本，读取中文需要使用ISO8859_1编码，否则会出现乱码
info = info.replaceAll("\n", "");
info = info.replaceAll("\\s*", "");
.......（获得word中的内容（info）后，对具体内容操作很简单就不列出了） [/color][color=black][size=xx-small][/size] this.getHttpServletResponse().getWriter().write("success");
} catch (IOException e) {
this.getHttpServletResponse().getWriter().write("fail");
e.printStackTrace();
} catch (BadLocationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

}

分享到：