论坛首页 编程语言技术论坛

只想与你深发展..

浏览 5498 次
该帖已经被评为良好帖
作者 正文
   发表时间:2009-06-14  
先来个老段子:
引用

自从深发展银行推出那条知性的广告语“只想与你深发展”后,银行业内人士又自编出了更知性的姊妹篇:“光大是不行的”


正题:研究一下ruby中深拷贝(deep clone)的问题。

1.=
ruby里拷贝对象的最简单方法,下面来看一下效果:
irb(main):002:0> a = "Hooopo"
=> "Hooopo"
irb(main):003:0> b = a
=> "Hooopo"
irb(main):004:0> b.object_id
=> 23424840
irb(main):005:0> a.object_id
=> 23424840
irb(main):006:0> b.gsub!("o","-")
=> "H---p-"
irb(main):007:0> p a
"H---p-"
=> nil
irb(main):008:0> p b
"H---p-"
=> nil

b修改后,a也跟着修改了,并且a和b的object_id一样,就是同一个对象嘛~
2.dup和clone(dup和clone差不多,但是还是有一些区别的)
irb(main):001:0> a = "Hooopo"
=> "Hooopo"
irb(main):002:0> b = a.dup
=> "Hooopo"
irb(main):003:0> a.object_id
=> 23428740
irb(main):004:0> b.object_id
=> 23425010
irb(main):005:0> b.gsub!("o","-")
=> "H---p-"
irb(main):006:0> p b
"H---p-"
=> nil
irb(main):007:0> p a
"Hooopo"
=> nil
irb(main):008:0>

b和a是不同对象,并且改变b后a不改变。这是我们想要的。但是别高兴太早,看下面例子:
class Klass
    attr_accessor :str
 end
 s1 = Klass.new     
 s1.str = "Hooopo"   
 p s1
 s2 = s1.dup       
 p s2
 s2.str.gsub!("o","-")   
 p s1         
 p s2  
#results
#<Klass:0x2d2cd30 @str="Hooopo">
#<Klass:0x2d2cc7c @str="Hooopo">
#<Klass:0x2d2cd30 @str="H---p-">
#<Klass:0x2d2cc7c @str="H---p-">  

显然,问题又有了,改变s2后s1也跟着改变了。同样的陷阱也发生在Array和Hash里,看下面代码:
irb(main):008:0> arr = [1,[1,1]]
=> [1, [1, 1]]
irb(main):009:0> arr_dup = arr.dup
=> [1, [1, 1]]
irb(main):010:0> arr_dup[0] = 2
=> 2
irb(main):011:0> arr
=> [1, [1, 1]]
irb(main):012:0> arr_dup[1][0] = 2
=> 2
irb(main):013:0> arr
=> [1, [2, 1]]

在第一次给arr_dup[0]复值的时候arr没有改变,而第二次改变arr_dup[1][0]的时候arr也跟着改变了。就是说Array#dup只拷贝了一层,还不够深入呀..同样Hash也是,看下面代码:
irb(main):001:0> hash = {:key => {:key => "value"}}
=> {:key=>{:key=>"value"}}
irb(main):002:0> hash_dup = hash.dup
=> {:key=>{:key=>"value"}}
irb(main):003:0> hash_dup[:key][:key] = "value_changed"
=> "value_changed"
irb(main):004:0> hash_dup
=> {:key=>{:key=>"value_changed"}}
irb(main):005:0> hash
=> {:key=>{:key=>"value_changed"}}

3.用序列化实现深拷贝
irb(main):014:0> arr = [1,[1,1]]
=> [1, [1, 1]]
irb(main):015:0> arr_dump = Marshal.load(Marshal.dump(arr))
=> [1, [1, 1]]
irb(main):016:0> arr_dump.object_id
=> 22807940
irb(main):017:0> arr.object_id
=> 22850200
irb(main):018:0> arr_dump[1][1] = 2
=> 2
irb(main):019:0> arr_dump
=> [1, [1, 2]]
irb(main):020:0> arr
=> [1, [1, 1]]

irb(main):012:0> hash = {:key => {:key => "value"}}
=> {:key=>{:key=>"value"}}
irb(main):013:0> hash_dump = Marshal.load(Marshal.dump(hash))
=> {:key=>{:key=>"value"}}
irb(main):014:0> hash.object_id
=> 22790990
irb(main):015:0> hash_dump.object_id
=> 22755440
irb(main):016:0> hash_dump[:key][:key] = "value not changed"
=> "value not changed"
irb(main):017:0> hash
=> {:key=>{:key=>"value"}}
irb(main):018:0> hash_dump
=> {:key=>{:key=>"value not changed"}}


情况似乎好多了。但是还有一个问题,就是Marshal只能序列化一般对象,数组哈希,高级一些的对象不能序列化(IO,Proc,singleton等)

下面是两个deep clone的实现(via:http://www.artima.com/forums/flat.jsp?forum=123&thread=40913)
class Object
      def deep_clone
        Marshal::load(Marshal.dump(self))
      end
end

   class Object
      def dclone
        case self
          when Fixnum,Bignum,Float,NilClass,FalseClass,
               TrueClass,Continuation
            klone = self
          when Hash
            klone = self.clone
            self.each{|k,v| klone[k] = v.dclone}
          when Array
            klone = self.clone
            klone.clear
            self.each{|v| klone << v.dclone}
          else
            klone = self.clone
        end
        klone.instance_variables.each {|v|
          klone.instance_variable_set(v,
            klone.instance_variable_get(v).dclone)
        }
        klone
      end
    end

更复杂的(via:http://d.hatena.ne.jp/pegacorn/20070417/1176817721)...
class Object
  def deep_clone
    _deep_clone({})
  end

  protected
  def _deep_clone(cloning_map)
    return cloning_map[self] if cloning_map.key? self
    cloning_obj = clone
    cloning_map[self] = cloning_obj
    cloning_obj.instance_variables.each do |var|
      val = cloning_obj.instance_variable_get(var)
      begin
        val = val._deep_clone(cloning_map)
      rescue TypeError
        next
      end
      cloning_obj.instance_variable_set(var, val)
    end
    cloning_map.delete(self)
  end
end

class Array
  protected
  def _deep_clone(cloning_map)
    return cloning_map[self] if cloning_map.key? self
    cloning_obj = super
    cloning_map[self] = cloning_obj
    cloning_obj.map! do |val|
      begin
        val = val._deep_clone(cloning_map)
      rescue TypeError
        #
      end
      val
    end
    cloning_map.delete(self)
  end
end

class Hash
  protected
  def _deep_clone(cloning_map)
    return cloning_map[self] if cloning_map.key? self
    cloning_obj = super
    cloning_map[self] = cloning_obj
    pairs = cloning_obj.to_a
    cloning_obj.clear
    pairs.each do |pair|
      pair.map! do |val|
        begin
          val = val._deep_clone(cloning_map)
        rescue TypeError
          #
        end
        val
      end
      cloning_obj[pair[0]] = pair[1]
    end
    cloning_map.delete(self)
  end
end

Something about dup and clone
1.ruby字面量(Fixnum,true,false,nil,Symbol)不可以调用dup和clone方法,会报TypeError。
2.对对象的state(taint,frozen)的改变:dup会把frozen的对象unfrozen,clone不会。
irb(main):019:0> o = Object.new
=> #<Object:0x2b7855c>
irb(main):020:0> o.taint
=> #<Object:0x2b7855c>
irb(main):021:0> o.freeze
=> #<Object:0x2b7855c>
irb(main):024:0> [o.frozen?,o.tainted?]
=> [true, true]
irb(main):025:0> o_clone = o.clone
=> #<Object:0x2b44c0c>
irb(main):026:0> [o_clone.frozen?,o_clone.tainted?]
=> [true, true]
irb(main):027:0> o_dup = o.dup
=> #<Object:0x2b32e6c>
irb(main):028:0> [o_dup.frozen?,o_dup.tainted?]
=> [false, true]


3.对单体方法的拷贝:clone会连同单体方法一起拷贝,dup不会。
irb(main):029:0> o = Object.new
=> #<Object:0x2b256a4>
irb(main):030:0> def o.say
irb(main):031:1>  puts "Hello,Hooopo"
irb(main):032:1> end
=> nil
irb(main):033:0> o_dup = o.dup
=> #<Object:0x296acec>
irb(main):035:0> o_clone = o.clone
=> #<Object:0x2dba194>
irb(main):037:0> o_dup.say
NoMethodError: undefined method `say' for #<Object:0x296acec>
        from (irb):37
        from :0
irb(main):038:0> o_clone.say
Hello,Hooopo



   发表时间:2009-06-14  
Array、Hash、实例变量 已经足够表达数据结构了。 data 易 dump,class 不易 dump ……

现在最想要的是 Binding 和 Proc 的 Marshal.dump, 或者 dump module 也行……
0 请登录后投票
   发表时间:2009-06-14  
night_stalker 写道
Array、Hash、实例变量 已经足够表达数据结构了。 data 易 dump,class 不易 dump ……

现在最想要的是 Binding 和 Proc 的 Marshal.dump, 或者 dump module 也行……


|I can understand why Thread and the IOs can't be dumped (because there
|are underlying operating system structures associated with them), but
|can't see why Proc, Continuation and Method couldn't be dumped/loaded ...
|although I could see that it might not be possible in the special case
|where their bindings included a Thread or an IO of some form.
|
|Is it just that it's considered too hard to get right (and I'm not saying
|it would be easy!) or is there some other reason?
Procs, Bindings and Continuations contain references to C stack
information which is not portable.  Methods have references to C
function.  All of above information is not portable.
                                                        matz.

 

0 请登录后投票
   发表时间:2009-06-14  
http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/8c08df930291a2e9/3cccf36faad55d8c?hl=zh-cn&lnk=gst&q=dump+module#3cccf36faad55d8c

How about the following?  The only major disadvantage is that it will be a
little bit slower than using a real Proc:
require 'delegate' 
class DumpableProc < DelegateClass(Proc) 
  def initialize(str) 
    eval "@proc = proc { #{str} }" 
    @str = str 
    super(@proc) 
  end 
  def _dump(limit) 
    @str 
  end 
  def self._load(str) 
    self.new(str) 
  end 
end 
dp = DumpableProc.new("puts 'foo!'") 
dp.call() 
dump_str = Marshal.dump(dp) 
puts dump_str 
dp2 = Marshal.load(dump_str) 
dp2.call() 

0 请登录后投票
   发表时间:2009-06-14  
1.8 可以用:
require 'ruby2ruby'
module X
  def x
    'x'
  end
end
Ruby2Ruby.translate X
#=> "module X\n  def x\n    \"x\"\n  end\nend"


但是 1.9 不能用……
0 请登录后投票
   发表时间:2009-12-10   最后修改:2009-12-10
Hooopo分析真是透彻。看下api。
Produces a shallow copy of obj—the instance variables of obj are copied, but not the objects they reference. dup copies the tainted state of obj. See also the discussion under Object#clone. In general, clone and dup may have different semantics in descendent classes. While clone is used to duplicate an object, including its internal state, dup typically uses the class of the descendent object to create the new instance.
分析下。
1,Produces a shallow copy of obj—the instance variables of obj are copied, but not the objects they reference.
借用下Hooopo的代码。希望不要介意。
irb(main):001:0> a = "Hooopo"
=> "Hooopo"
irb(main):002:0> b = a.dup
=> "Hooopo"
irb(main):003:0> a.object_id
=> 23428740
irb(main):004:0> b.object_id
=> 23425010
irb(main):005:0> b.gsub!("o","-")
=> "H---p-"
irb(main):006:0> p b
"H---p-"
=> nil
irb(main):007:0> p a
"Hooopo"
=> nil
irb(main):008:0>

b对象,存储的"Hooopo",b改变,所以a不会变。
class Klass
    attr_accessor :str
 end
 s1 = Klass.new     
 s1.str = "Hooopo"   
 p s1
 s2 = s1.dup       
 p s2
 s2.str.gsub!("o","-")   
 p s1         
 p s2  
#results
#<Klass:0x2d2cd30 @str="Hooopo">
#<Klass:0x2d2cc7c @str="Hooopo">
#<Klass:0x2d2cd30 @str="H---p-">
#<Klass:0x2d2cc7c @str="H---p-">  

s1,s2虽然是不同的对象,但是对象内储的却是Klass一个实例的地址。所以会改变。
irb(main):008:0> arr = [1,[1,1]]
=> [1, [1, 1]]
irb(main):009:0> arr_dup = arr.dup
=> [1, [1, 1]]
irb(main):010:0> arr_dup[0] = 2
=> 2
irb(main):011:0> arr
=> [1, [1, 1]]
irb(main):012:0> arr_dup[1][0] = 2
=> 2
irb(main):013:0> arr
=> [1, [2, 1]]

attr数组内存储的是1和[2,1]的地址,所以attr[0]改变不会受影响,而attr[1]则会受影响。
2,While clone is used to duplicate an object, including its internal state, dup typically uses the class of the descendent object to create the new instance.
最后这句是关键。clone是在复制对象,包括对象的状态,而dup则是从其派生类创建一个新的实例。所以对于attr=[1, [1, 1]],attr[0]存的是integer,会创建integer类型,attr[1]是一个数组类型,dup只能简单的复制attr[1]中用来保存[1,1]的地址的类的一个新实例。
对于hash,或者更复杂的类,我想原理应该是相同的。
有什么不对的,还希望能指出来。
0 请登录后投票
论坛首页 编程语言技术版

跳转论坛:
Global site tag (gtag.js) - Google Analytics