论坛首页 编程语言技术论坛

用Ruby查找替换UTF-8文件中的中文字符

浏览 4592 次
精华帖 (0) :: 良好帖 (0) :: 新手帖 (4) :: 隐藏帖 (0)
作者 正文
   发表时间:2011-01-25   最后修改:2011-01-27
Precondition:
1. The Source Java file is saved as UTF-8 type
2. The Source java file containts some chinese words as comments for some test data.
3. My ruby code file is saved as UTF-8 as well.

Requirement:
1. Find all the '克' in the java files under a foder, and repleace it to '可'
2. User runs the replace.rb file through command line like:
replace <dir name>

Code: replace.rb
##############################################################
dName = ARGV[0]     # Directory to process
srcStr = '克'  # srcStr = ARGV[1].dup.force_encoding('utf-8')
destStr = '可' # destStr = ARGV[2].dup.force_encoding('utf-8')
##############################################################
Dir.chdir(dName)
Dir["**/*.java"].each do |file|

  oldFileName = File.basename(file,".java") + ".org"
  File.rename(file, oldFileName)

  oldFile = File.open(oldFileName, "r")
  newFile = File.open(file, "w")

  puts "........................... #{File.basename(newFile)}"
  oldFile.each_line do |line|
    line.force_encoding('utf-8') 
    puts oldFile.lineno, line if line =~ /#{srcStr}/
    newFile.puts line.gsub(/#{srcStr}/, destStr)
  end
  
  oldFile.close
  newFile.close

end


Issue:
I'm new to Ruby. The coding style and error handling is not enough

I failed to pass the chinese words as command arguments by above code like : replace <dir name> <sourceString>, <replaceString>
Example: replace <dir name> "克" "可"
This way doesn't work.  I don't know the cause so far.

Udpaded on 26/01/2011
I found the solution:
change the source code from
srcStr = ARGV[1].dup.force_encoding('utf-8')
destStr = ARGV[2].dup.force_encoding('utf-8')

to
srcStr = ARGV[1].dup.encode('utf-8')
destStr = ARGV[2].dup.encode('utf-8')


This can make the paramemter passed and convert to UTF-8 successfully.

Good article:
http://blog.grayproductions.net/articles/ruby_19s_string

But later I tried to change the code
line.force_encoding('utf-8')

to
line.encode('utf-8')


It has error again. 
论坛首页 编程语言技术版

跳转论坛:
Global site tag (gtag.js) - Google Analytics