`
温柔一刀
  • 浏览: 862390 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

8.3. Enumerables in General

阅读更多

What makes a collection enumerable? Largely it is just the fact of being a collection. The module Enumerable has the requirement that the default iterator each should be defined. Sequence as such is not an issue since even an unordered collection such as a hash can have an iterator.

Additionally, if the methods min, max, and sort are to be used, the collection must have a comparison method (<=>). This is fairly obvious.

So an enumerable is just a collection that can be searched, traversed, and possibly sorted. As a rule of thumb, any user-defined collection that does not subclass an existing core class should probably mix in the Enumerable module.

Bear in mind that what we say about one enumerable applies in effect to all of them. The actual data structure could be an array, a hash, or a tree, to name a few.

There are, of course, some nuances of behavior. An array is an ordered collection of individual items, whereas a hash is an unordered collection of paired key-value associations. Naturally there will be differences in their behavior.

Many of the methods we looked at for arrays and/or hashes (such as map and find) really originate here in the Enumerable module. In many cases it was difficult to determine how to cover this material. Any confusion or inaccuracy should be considered my fault.

The array is the most common and representative collection that mixes in this module. Therefore by default I will use it as an example.

8.3.1. The inject Method

The inject method comes to Ruby via Smalltalk (and was introduced in Ruby 1.8). Its behavior is interesting, if a little difficult to grasp at first sight.

This method relies on the fact that frequently we will iterate through a list and "accumulate" a result that changes as we iterate. The most common example, of course, would be finding the sum of a list of numbers. Whatever the operation, there is usually an "accumulator" of some kind (for which we supply an initial value) and a function or operation we apply (represented in Ruby as a block).

For a trivial example or two, suppose that we have this array of numbers and we want to find the sum of all of them:

nums = [3,5,7,9,11,13]
sum = nums.inject(0) {|x,n| x+n }

Note how we start with an accumulator of 0 (the "addition identity"). Then the block gets the current accumulated value and the current value from the list passed in. In each case, the block takes the previous sum and adds the current item to it.

Obviously, this is equivalent to the following piece of code:

sum = 0
nums.each {|n| sum += n }

So the abstraction level is only slightly higher. If inject never fits nicely in your brain, don't use it. But if you get over the initial confusion, you might find yourself inventing new and elegant ways to use it.

The accumulator value is optional. If it is omitted, the first item is used as the accumulator and is then omitted from iteration.

sum = nums.inject {|x,n| x+n }

# Means the same as:

sum = nums[0]
nums[1..-1].each {|n| sum += n }

A similar example is finding the product of the numbers. Note that the accumulator, if given, must be 1 since that is the "multiplication identity."

prod = nums.inject(1) {|x,n| x*n }

# or:

prod = nums.inject {|x,n| x*n }

The following slightly more complex example takes a list of words and finds the longest words in the list:

words = %w[ alpha beta gamma delta epsilon eta theta ]
longest_word = words.inject do |best,w|
  w.length > best.length ? w : best
end
# return value is "epsilon"

8.3.2. Using Quantifiers

The quantifiers any? and all? were added in Ruby 1.8 to make it easier to test the nature of a collection. Each of these takes a block (which of course tests true or false).

nums = [1,3,5,8,9]

# Are any of these numbers even?
flag1 = nums.any? {|x| x % 2 == 0 }    # true

# Are all of these numbers even?
flag2 = nums.all? {|x| x % 2 == 0 }    # false

In the absence of a block, these simply test the truth value of each element. That is, a block {|x| x } is added implicitly.

flag1 = list.all?   # list contains no falses or nils
flag1 = list.any?   # list contains at least one true value (non-nil
                    #   or non-false)

8.3.3. The partition Method

As the saying goes, "There are two kinds of people in the worldthose who divide people into two kinds, and those who don't." The partition doesn't deal with people (unless we can encode them as Ruby objects), but it does divide a collection into two parts.

When partition is called and passed a block, the block is evaluated for each element in the collection. The truth value of each result is then evaluated, and a pair of arrays (inside another array) is returned. All the elements resulting in true go in the first array; the others go in the second.

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9]

odd_even = nums.partition {|x| x % 2 == 1 }
# [[1,3,5,7,9],[2,3,4,6,8]]

under5 = nums.partition {|x| x < 5 }
# [[1,2,3,4],[5,6,7,8,9]]

squares = nums.partition {|x| Math.sqrt(x).to_i**2 == x }
# [[1,4,9],[2,3,5,6,7,8]]

If we wanted to partition into more than two groups, we'd have to write our own simple method for that. I will call this classify after the method in the Set class.

module Enumerable
  def classify(&block)
    hash = {}
    self.each do |x|
      result = block.call(x)
      (hash[result] ||= []) << x
    end
    hash
  end
end

nums = [1,2,3,4,5,6,7,8,9]
mod3 = nums.classify {|x| x % 3 }
# { 0=>[3,6,9], 1=>[1,4,7], 2=>[2,5,8] }

words = %w[ area arboreal brick estrous clear donor ether filial
patina ]
vowels = words.classify {|x| x.count("aeiou") }
# {1=>["brick"], 2=>["clear", "donor", "ether"],
#  3=>["area", "estrous", "filial", "patina"], 4=>["arboreal"]}

initials = words.classify {|x| x[0..0] }
# {"a"=>["area", "arboreal"], "b"=>["brick"], "c"=>["clear"],
#  "d"=>["donor"], "p"=>["patina"], "e"=>["estrous", "ether"],
#  "f"=>["filial"]}

8.3.4. Iterating by Groups

In every case we've seen so far, we iterate over a list a single item at a time. However, there might be times we want to grab these in pairs or triples or some other quantity.

The iterator each_slice takes a parameter n and iterates over that many elements at a time. (To use this, we need the enumerator library.) If there are not enough items left to form a slice, that slice will be smaller in size.

require 'enumerator'

arr = [1,2,3,4,5,6,7,8,9,10]
arr.each_slice(3) do |triple|
  puts triple.join(",")
end

# Output:
# 1,2,3
# 4,5,6
# 7,8,9
# 10

There is also the possibility of iterating with a "sliding window" of the given size with the each_cons iterator. (If this name seems unintuitive, it is part of the heritage of Lisp.) In this case, the slices will always be the same size.

require 'enumerator'

arr = [1,2,3,4,5,6,7,8,9,10]
arr.each_cons(3) do |triple|
  puts triple.join(",")
end

# Output:
# 1,2,3
# 2,3,4
# 3,4,5
# 4,5,6
# 5,6,7
# 6,7,8
# 7,8,9
# 8,9,10

8.3.5. Converting to Arrays or Sets

Every enumerable can in theory be converted trivially to an array (by using to_a). For example, a hash results in a nested array of pairs:

hash = {1=>2, 3=>4, 5=>6}
arr  =  hash.to_a           #  [[5, 6], [1, 2], [3, 4]]

The method enTRies is an alias for the to_a method.

If the set library has been required, there will also be a to_set method that works as expected. See section 9.1, "Working with Sets," for a discussion of sets.

require 'set'
hash = {1=>2, 3=>4, 5=>6}
set = hash.to_set           # #<Set: {[1, 2], [3, 4], [5, 6]}>

8.3.6. Using Enumerator Objects

An Enumerator object is basically a wrapper that turns an iterator method into a full-fledged Enumerable. After being wrapped in this way, it naturally has all the usual methods and features available to it.

In this contrived example, class Foo has an iterator but nothing else. In fact, the iterator itself does nothing but four yield operations. To further clarify how this works, the iterator is named every rather than each:

require 'enumerator'

class Foo
  def every
    yield 3
    yield 2
    yield 1
    yield 4
  end
end

foo = Foo.new

# Pass in the object and the iterator name...
enum = Enumerable::Enumerator.new(foo,:every)

enum.each {|x| p x }     # Print out the items
array = enum.to_a        # [3,2,1,4]
sorted = enum.sort       # [1,2,3,4]

If this conversion seems puzzling to you, it is essentially the same as this:

enum = []
foo.every {|x| enum << x }

In the previous example, enum is a real array, not just an Enumerator object. So although there are subtle differences, this is another way to convert an object to an Enumerable.

If enumerator is required, Object will have an enum_for method. So the object instantiation in the first example could also be written more compactly:

enum = foo.enum_for(:every)

We've already seen that we can iterate over groups with each_slice and each_cons. As it turns out, there are special methods enum_slice and enum_cons that will create enumerator objects using these iterators (in effect transforming the iterator name to each). Bear in mind that Enumerable::Enumerator.new and enum_for can both take an optional list of arguments at the end. Here we use that fact to pass in the "window size" to the iterator:

array = [5,3,1,2]

discrete = array.enum_slice(2)
# Same as: Enumerable::Enumerator.new(array,:each_slice,2)

overlap  = array.enum_cons(2)
# Same as: Enumerable::Enumerator.new(array,:each_cons,2)

discrete.each {|x| puts x.join(",") }
# Output:
# 5,3
# 1,2

overlap.each {|x| puts x.join(",") }
# Output:
# 5,3
# 3,1
# 1,2

8.3.7. Using Generator Objects

The idea of a generator is interesting. The normal Ruby iterator is an internal iterator; the iterator drives the logic by repeatedly calling the code block.

There is also an external iterator, where the code drives the logic, and the iterator provides data items "on demand" rather than on its own precise schedule.

By analogy, think of getline as providing an external iterator onto an IO object. You call it at will, and it provides you data. Contrast that with the internal iterator each_line, which simply passes each line in succession into the code block.

Sometimes internal iterators are not appropriate to the problem at hand. There is always a valid solution, but it may not always be convenient. Sometimes an external iterator is more convenient.

The generator library simply enables the conversion from an internal iterator to an external one. It provides an IO-like interface with methods such as next, rewind, and end?. Here's an example:

require 'generator'

array = [7,8,9,10,11,12]

gen = Generator.new(array)

what  = gen.current    # 7
where = gen.index      # 0  (same as pos)

while gen.end? and gen.current < 11
  gen.next
end

puts gen.current       # 11
puts gen.next          # 11
puts gen.index         # 4      (index same as pos)
puts gen.next?         # true   (next? same as end?)
puts gen.next          # 12
puts gen.next?         # false

Note how we can "read" through the collection an item at a time at will, using one loop or multiple loops. The end? method detects an end of collection; the generator literally throws an EOFError if you ignore this. An alias for end? is next?.

The index method (alias pos) tells us our index or position in the collection. Naturally it is indexed from zero just like an array or file offset.

The current and next methods may be a little unintuitive. Imagine an implicit "get" done at the beginning so that the current item is the same as the next item. Obviously, next advances the pointer, whereas current does not.

Because many collections can only move forward by their nature, the generator behaves the same way. There is no prev method; in theory there could be, but it would not always apply. The rewind method will reset to the beginning if needed.

The real drawback to the generator library is that it is implemented with continuations. In all current versions of Ruby, these are computationally expensive, so large numbers of repetitions might expose the slowness.

评论

相关推荐

    kettle v8.3_part2 (pdi-ce-8.3.0.0-371.zip.rm2)

    共5个(part1/2/3/4/5)分包,全部下载后用命令 “copy /b pdi-ce-8.3.0.0-371.zip.rm1+pdi-ce-8.3.0.0-371.zip.rm2+pdi-ce-8.3.0.0-371.zip.rm3+pdi-ce-8.3.0.0-371.zip.rm4+pdi-ce-8.3.0.0-371.zip.rm5 pdi-ce-8.3...

    CentOS-8.3.2011-x86_64-boot.zip

    【标题】"CentOS-8.3.2011-x86_64-boot.zip" 指的是一个包含了 CentOS 8.3.2011 的启动版镜像的压缩文件,专为 x86_64 架构(64位)的计算机设计。CentOS 是一个基于 Red Hat Enterprise Linux (RHEL) 的开源操作...

    kettle v8.3_part4 (pdi-ce-8.3.0.0-371.zip.rm4)

    共5个(part1/2/3/4/5)分包,全部下载后用命令 “copy /b pdi-ce-8.3.0.0-371.zip.rm1+pdi-ce-8.3.0.0-371.zip.rm2+pdi-ce-8.3.0.0-371.zip.rm3+pdi-ce-8.3.0.0-371.zip.rm4+pdi-ce-8.3.0.0-371.zip.rm5 pdi-ce-8.3...

    gbase数据库连接驱动包8.3.81.53.rar

    《GBase数据库连接驱动包8.3.81.53详解》 GBase数据库,全称为Greenplum Base,是南大通用数据技术有限公司推出的一款高性能并行数据库系统,广泛应用于大数据处理、数据分析等领域。它支持SQL标准,提供丰富的数据...

    GBase JDBC 8.3.81.53.zip

    GBase JDBC 8.3.81.53 是一款专为GBase数据库设计的Java数据库连接器(Java Database Connectivity,简称JDBC),它允许Java应用程序通过标准的JDBC接口与GBase数据库进行通信和数据操作。GBase是南大通用数据系统...

    8.3.0.0-371 pentaho-kettle kettle-core

    8.3.0.0-371 pentaho-kettle kettle-core

    Nero-8.3.6.0_chs_trial序列号

    ### Nero-8.3.6.0_chs_trial 序列号及软件介绍 #### 软件概述 Nero-8.3.6.0_chs_trial 是一款功能强大的经典刻录软件,广泛应用于数据备份、多媒体文件管理等多个领域。这款软件能够支持多种格式的光盘刻录工作,...

    pentaho kettle-engine-8.3.0.0-371.jar

    pentaho kettle-engine-8.3.0.0-371.jar

    kettle v8.3_part1(pdi-ce-8.3.0.0-371.zip.rm1)

    共5个(part1/2/3/4/5)分包,全部下载后用命令 “copy /b pdi-ce-8.3.0.0-371.zip.rm1+pdi-ce-8.3.0.0-371.zip.rm2+pdi-ce-8.3.0.0-371.zip.rm3+pdi-ce-8.3.0.0-371.zip.rm4+pdi-ce-8.3.0.0-371.zip.rm5 pdi-ce-8.3...

    GBaseODBC_8.3.81.53_build53.17_windows-x86.rar

    GBase ODBC 8.3.81.53_build53.17_windows-x86.rar 是一个针对Windows操作系统的32位ODBC驱动程序,由南大通用数据技术有限公司(GBase)开发,主要用于连接和管理GBase数据库系统。ODBC(Open Database ...

    php 8.3.11 下载, windows / linux / macos 平台下载

    php 8.3.11 下载, windows / linux / macos 平台下载 下载日期: 2024-09-12

    pdi-ce-8.3.0.0-371.txt

    pdi-ce-8.3.0.0-371.zip-kettle8.3版本,适用于大数据ETL开发人员进行大数据抽取转换(清洗)加载的一款开源ETL工具,Pentaho DataIntegration

    nero 8.3.13.0 asian micro

    Nero 8.3.13.0 Asian Micro是一款针对亚洲市场的特别定制版本,它集成了基本的刻录功能,适合那些对刻录软件需求不复杂但又希望软件简洁易用的用户。在这个版本中,Nero精简了一些高级功能,以提供更轻量级的体验,...

    gbase插件(gbase-connector-java-8.3.81.53-build55.5.5-bin.jar)

    gbase插件(gbase-connector-java-8.3.81.53-build55.5.5-bin.jar)

    Nero-8.3.20.0

    Nero 8.3.20.0 是一款经典的多媒体软件套装,主要以其强大的光盘刻录功能闻名。这个版本,标记为 "Nero-8.3.20.0_asian_lite",暗示它是针对亚洲市场的精简版,可能包含了一些针对该地区用户特性的优化或语言支持。 ...

    gbase-connector-java-8.3.81.53-build52.8-bin.rar

    《GBase连接驱动详解——基于Java的gbase-connector-8.3.81.53-build52.8-bin》 在IT行业中,数据库管理是至关重要的环节,尤其是在大数据处理领域。GBase是一款高性能、高可用性的分布式数据库系统,尤其在处理大...

    AjilalVijayan.DrawingPurge v8.3.0.0 Tested(32BitAvailable ButNot64Bit)

    《AutoCAD清理工具AjilalVijayan.DrawingPurge v8.3.0.0详解》 在计算机辅助设计(CAD)领域,高效管理和优化图形文件是至关重要的。AjilalVijayan.DrawingPurge v8.3.0.0是一款专为AutoCAD设计的清理工具,它旨在...

    metastore-8.3.0.0-371.jar pentaho需要的jar

    metastore-8.3.0.0-371.jar pentaho需要的jar

    2019年9月官方最新版Kettle pdi-ce-8.3.0.0-371

    kettle2019版8.3.0.0-371压缩版本,pdi-ce-8.3.0.0-371.rar 源文件1.05G,超出最大上传文件大小 上传的文件是解压后通过WinRAR重新压缩的,如果不能使用请及时回复

    Nero v8.3.2.1注册码Nero v8.3.2.1注册码

    Nero v8.3.2.1版本作为该系列中的一个更新迭代,不仅继承了前代版本的强大功能,还在用户体验方面做了进一步优化,提高了软件运行效率和稳定性。对于那些经常需要处理大量音频、视频资料或需要频繁刻录光盘的用户来...

Global site tag (gtag.js) - Google Analytics