论坛首页 编程语言技术论坛

方便的文件树遍历

浏览 3385 次
精华帖 (0) :: 良好帖 (0) :: 新手帖 (6) :: 隐藏帖 (0)
作者 正文
   发表时间:2008-06-16  
我经常会遇到进行批量文件修改的情况。Windows 脚本我十分不精通,以前都是靠现写一个 C# 程序。遇到 Ruby 后,我十分喜欢它语法上的灵活性。(虽然我认为太灵活不一定好)而且它还是一种脚本语言,很方便。考虑到我所遇到的情况,我想写一个类来支持对文件名(文件夹名)或全路径名进行正则表达式匹配。同时也支持反向过滤。即,保留那些没有匹配上的文件或文件夹。

举一个例子。比如用备份工具备份“我的文档”再还原后,很多隐藏文件现在都会显现出来。比如 thumbs.db 和 picasa.ini(因为我用 Google 的 Picasa)。我想把 picasa.ini 重新隐藏起来并删除 thumbs.db。可以这样写:
ftree = FileTree.new("c:\documents and settings\username\my documents")
free.traverse(
  [
    /^picasa.ini$/i,
    /^thumbs.db$/i
  ],
  {
    :entry_type        => :file
    :for_basename_only => true
  }
) do |file|
  if file =~ /picasa/i
    `attrib +h #{file}`
  else
    `attrib -s -h -r #{file}`
    `del #{file}`
  end
end

我认为这样还算是比较容易的吧 如果只是想遍历一下,更简单,直接
FileTree.new("c:\dummy").traverse /file_name_pattern/


FileTree的源代码如下:
# author: Yang Dong
# date:   2008-6-15
# 
# this class is designed to convient the traverse of file trees. you can
# just output the whole structure or you can specify some regular expressions
# to filter the unwanted files or directories, and customize the actions
# against them, plus some additional controls.
# 
# examples to use (based on windows os):
#   1) say, you want to see the whole file structure. just write:
#     FileTree.new("c:\dummy_directory").traverse
#   
#   2) say, you want to hide all the picasa.ini files, write:
#     ftree = FileTree.new("c:\dummy_dir")
#     ftree.traverse(
#       /^picasa.ini$/i,
#       {
#         :entry_type        => :file,
#         :for_basename_only => true
#       }
#     ) do |file|
#       `attrib \"#{file}\" -s -h -r`
#     end
class FileTree
  require "pathname"
  
  def initialize(dir)
    @dir = dir.chomp.gsub(/\\/, '/')
    pn = Pathname.new(@dir)
    pn.cleanpath
    raise "no such directory" unless pn.exist? && pn.directory?
  end
  
  # traverse the given directory. use filter_patterns to specify
  # what kind of file name you would like to match. attach a
  # block if you want to give some actions against the matched
  # files other than just put them out on the standard out.
  # the block takes one argument indicating the absolute file path
  # of the matched one.
  # 
  # the filter_patterns is an array containing regular expression
  # objects.
  # 
  # the options give some additional control over filtering.
  # for details about filter_patterns and options, refer to the
  # filter method.
  #
  # caution: the patterns and actions will not be applied to the root folder
  # given.
  def traverse(filter_patterns = nil, options = nil, &block)
    trav @dir, filter_patterns, options, &block
  end
  
  private
    def trav(dir, filter_patterns = nil, options = nil, &block)
      pn = Pathname.new(dir)
      children = pn.children

      children.each do |child|
        if filter(child, filter_patterns, options)
          if block
            block.call child.realpath.to_s
          else
            puts child.realpath.to_s
          end
        end

        if child.exist? and child.directory?
          trav child.realpath.to_s, filter_patterns, options, &block
        end
      end
    end

    # filters the given entry. if entry passed the filter, returns true.
    # otherwise false.
    # 
    # the filter_patterns is an array containing regular expression
    # objects.
    # 
    # options is a hash which supports the following options:
    # entry_type:
    #   use this to specify to filter file or directory. if you only want
    #   to do something with files, then use { :entry_type => :file }.
    #   otherwise, use { :entry_type => :dir }. default is nil, which means
    #   either will be okay.
    # exclude_matched:
    #   specify true to indicate that the matched file entries (including
    #   directories) will not pass the filter. this can be used when you want
    #   to do something with most of the entries in your folder but with some
    #   exceptions. default is set to false.
    # for_basename_only:
    #   indicates whether the regular expression pattern will be comparing with
    #   the directory or file name only. the default is false, which means not
    #   only the name will be compared, but also the whole path will be
    #   compared.
    #
    def filter(entry, filter_patterns = nil, options = nil)
      # defines a series of default options.
      options = {} if options.nil?
      if options[:entry_type] == :file
        return false unless entry.file?
      elsif options[:entry_type] == :dir
        return false unless entry.directory?
      end

      filter_patterns = [ // ] if filter_patterns == nil
      unless filter_patterns.is_a?(Array)
        filter_patterns = ([] << filter_patterns)
      end

      filter_patterns.each do |filter_pattern|
        if options[:exclude_matched]
          if options[:for_basename_only]
            return false if entry.basename.to_s =~ filter_pattern
          else
            return false if entry.realpath.to_s =~ filter_pattern
          end
        else
          if options[:for_basename_only]
            return true if entry.basename.to_s =~ filter_pattern
          else
            return true if entry.realpath.to_s =~ filter_pattern
          end
        end
      end
      
      if options[:exclude_matched]
        return true
      else
        return false
      end
    end
end

有点长,不过一半是注释。如果有的地方的意图看不明白,可以参考下面的测试代码。测试使用与测试代码文件同级的一个“test_folder”文件夹。它的目录结构如下:

C:/netbeans-proj/file_tree/test/test_folder/test
C:/netbeans-proj/file_tree/test/test_folder/test/readme.txt
C:/netbeans-proj/file_tree/test/test_folder/test/src
C:/netbeans-proj/file_tree/test/test_folder/test/src/Assert.java
C:/netbeans-proj/file_tree/test/test_folder/test/src/Entry.java

如果要运行此测试,要先把这个文件结构构造出来才可以。也请保证“test_folder”的上级目录中没有包含src、assert、entry、readme这几个字符串的。不然,测试可能会出问题。
require 'test/unit'
require "file_tree"

class FileTreeTest < Test::Unit::TestCase
  def setup
    @root = "#{File.dirname(__FILE__).gsub(/\\/, "/")}/test_folder"
    @file_tree = FileTree.new(@root)
  end
  
  def test_simple_traverse
    output = ""
    @file_tree.traverse do |entry|
      output += "#{entry}\n"
    end
    
    expected_output = <<TAG
#{@root}/test
#{@root}/test/readme.txt
#{@root}/test/src
#{@root}/test/src/Assert.java
#{@root}/test/src/Entry.java
TAG
    assert_equal expected_output, output
  end
  
  def test_entry_type
    output = ""
    @file_tree.traverse(nil, :entry_type => :file) do |file|
      output += "#{file}\n"
    end
    
    expected_output = <<TAG
#{@root}/test/readme.txt
#{@root}/test/src/Assert.java
#{@root}/test/src/Entry.java
TAG
    assert_equal expected_output, output
    
    ##########################################
    
    output = ""
    @file_tree.traverse(nil, :entry_type => :dir) do |dir|
      output += "#{dir}\n"
    end
    
    expected_output = <<TAG
#{@root}/test
#{@root}/test/src
TAG
    assert_equal expected_output, output
  end
  
  def test_exclude_matched
    output = ""
    @file_tree.traverse(nil, :exclude_matched => true) do |entry|
      output += "#{entry}\n"
    end
    assert_equal "", output
    
    ###############################################
    
    output = ""
    @file_tree.traverse(/src/, :exclude_matched => true) do |entry|
      output += "#{entry}\n"
    end
    
    expected_output = <<TAG
#{@root}/test
#{@root}/test/readme.txt
TAG
    assert_equal expected_output, output
  end
  
  def test_for_basename_only
    output = ""
    @file_tree.traverse(/src/, :for_basename_only => true) do |entry|
      output += "#{entry}\n"
    end
    
    expected_output = <<TAG
#{@root}/test/src
TAG
    assert_equal expected_output, output
  end
  
  def test_multiple_patterns
    output = ""
    @file_tree.traverse [ /assert/i, /readme/i ] do |entry|
      output += "#{entry}\n"
    end
    
    expected_output = <<TAG
#{@root}/test/readme.txt
#{@root}/test/src/Assert.java
TAG
    assert_equal expected_output, output
  end
  
  def test_complicated_traverse
    output = ""
    @file_tree.traverse(
      [
        /assert/i,
        /readme/i
      ],
      {
        :entry_type        => :file,
        :exclude_matched   => true,
        :for_basename_only => true
      }
    ) do |file|
      output += "#{file}\n"
    end
    
    expected_output = <<TAG
#{@root}/test/src/Entry.java
TAG
    assert_equal expected_output, output
  end
end
   发表时间:2008-06-16  
ruby自带find功能就是用来做路径遍历的,不需要自己写
引用

# find.rb: the Find module for processing all files under a given directory.
# The +Find+ module supports the top-down traversal of a set of file paths.
0 请登录后投票
   发表时间:2008-06-16  
非常感谢!这个就当是练习了……
0 请登录后投票
论坛首页 编程语言技术版

跳转论坛:
Global site tag (gtag.js) - Google Analytics