论坛首页 编程语言技术论坛

用ruby写了一个搜索下载歌曲的工具

浏览 6069 次
精华帖 (0) :: 良好帖 (2) :: 新手帖 (0) :: 隐藏帖 (0)
作者 正文
   发表时间:2008-04-15  
前几天用java写了一个GUI的搜索下载工具,主要利用baidu mp3搜索的结果。david同学用perl写了命令行的类似的下载工具,为了练练ruby,我又写了ruby版的。
Fetcher类:
根据url来Fetch到页面,供Parser分析之用
 require "net/http" 

 class Fetcher
  
  def fetch(url)
    host = url.scan(/\/\/(.*?)\//m)[0][0]
    path = url.split(/#{host}\//)[1]
   # print "host: ",host,"\n"
   # print "path: ",path,"\n"
    h = Net::HTTP.new(host,80)
    resp = h.get("/#{path}",nil)
   
    if resp.message == "OK"
     # puts "建立连接成功..." 
      return resp.body     
    end 
    return ""
  end

end


Parser类:
提取出可供下载的链接,并通过ping,来选取速度最快的连接,供Download之用:
class Parser
public
  def initialize()
    @fetcher = Fetcher.new
  end

  def parse_mp3(html)
    urls = html.scan(/<a href="(.*?)"/m)
    download_hosts_urls = {}
    parse_threads = []
    for url in urls do
        if url[0] =~ /.*?\.mp3,,.*?/
           parse_threads << Thread.new(url) do |url|
              song_url = url[0].gsub(" ","%20")
              download_url = parse_download_url(song_url)
              if download_url
              	host =  download_url.scan(/\/\/(.*?)\//m)[0][0] 
              	#We only want to find the best download url,so we needn't care duplicate key
              	download_hosts_urls[host] = download_url
              end 
           end
        end
    end
    parse_threads.each{|t| t.join}
    puts "已经搜索到#{download_hosts_urls.size}个链接可以下载..."
    exit(1) if download_hosts_urls.size == 0
    puts "正在选择速度最快的链接..."
    host = select_best_host(download_hosts_urls.keys)
    download_hosts_urls[host]
  end

private
  def select_best_host(hosts)
    times_hosts = {}
    threads = []
    hosts.each do |host|
      threads << Thread.new(host) do |host|
           response = `ping -c 1 -W 30 #{host}` #use`ping -n 1 -w 30 #{host}` in windows
           r_t = response.scan(/time=(\d+)/m) #only get integer part
           times_hosts[r_t[0][0]] = host unless r_t.empty? #duplicate key no problem 
      end
    end
   
    threads.each{|t| t.join}
   
    times = times_hosts.keys
    min = times.min
    times_hosts[min]
  end

  def parse_download_url(song_url)
     html = @fetcher.fetch(song_url)
     urls = html.scan(/<a href="(.*?)"/m)
     return nil if urls.empty? || urls[0][0] =~ /.*?\.html/
     return urls[0][0]      
  end
end


Download类:
 require "open-uri"
require "parser"
require "fetcher"

class Download
public
  def initialize(song_name)
    @song_name = song_name
    @search_url = "http://mp3.baidu.com/m?f=ms&tn=baidump3&ct=134217728&lf=&rn=&word=#@song_name&lm=0"
    @parser = Parser.new
    @fetcher = Fetcher.new
  end
 
  def download
    puts "正在建立连接..."
    html = @fetcher.fetch(@search_url)
    puts "正在获取搜索结果..."
    url = @parser.parse_mp3(html)
    puts "已经获得最快的下载连接:#{url}.\n开始下载..."
    doDownload(url)    
    puts "下载完毕..."
  end
private
  def doDownload(url)
    open(url) do |fin|
  	size = fin.size
  	download_size = 0
  	puts "大小: #{size / 1024}KB"
  	filename = url[url.rindex('/')+1, url.length-1]
  	puts "歌曲名: #{filename}"
  	open(File.basename("./#{filename}"),"wb") do |fout|
     	    while buf = fin.read(1024) do
       		fout.write buf
       		download_size += buf.size
                print "已经下载: #{download_size * 100 / size}%\r"
                STDOUT.flush 
           end
       end
    end 
    puts
  end
end

download = Download.new(ARGV[0])
download.download

引用

fuliang@fuliang-desktop:~/program/ruby/mp3download$ ruby download.rb pretty body
正在建立连接...
正在获取搜索结果...
已经搜索到25个链接可以下载...
正在选择速度最快的链接...
已经获得最快的下载连接:http://www.jxggzp.com/muisc/20051122185348.mp3.
开始下载...
大小: 6570KB
歌曲名: 20051122185348.mp3
已经下载: 100%
下载完毕...

基本上可以使用。现在还存在一些问题,下载链接中有中文,往往会失败,主要是没有进行编码,知道ruby有个Iconv.conv来转换编码,不知道如何直接对中文进行编码:不知道没有像encode("gb2312","大海")之类的方法。另一个是下载问题:进度条有问题,主要open-uri使用open貌似就把文件下载到本地了,造成open很长时间,fin.read,fout.write是本地操作则非常快,结果下载进度从开始出现到下载完成瞬间就完成。希望各位达人可以帮助修正两个问题。
  • mp3.rar (1.8 KB)
  • 描述: windows下的源码 [文中代码为Ubuntu下]
  • 下载次数: 161
   发表时间:2008-04-15  
Iconv.new("gbk","utf-8").iconv("做人不能太CNN")
0 请登录后投票
   发表时间:2008-04-15  
读ruby代码就像自然语言一样
0 请登录后投票
   发表时间:2009-03-04  
我觉得这文章不错呀,为什么没人顶呀
0 请登录后投票
   发表时间:2009-03-04  
为什么要用正则来分析url中的host、port、path等呢 直接用URI.parse不好么。就是用正则也得加上i这个选项忽略大小写吧。
0 请登录后投票
   发表时间:2009-03-06  
orange0513 写道
为什么要用正则来分析url中的host、port、path等呢 直接用URI.parse不好么。就是用正则也得加上i这个选项忽略大小写吧。


同意,lz的代码一点没有ruby风格啊。。。
比如:

 require "net/http" 

 class Fetcher
  
  def fetch(url)
    host = url.scan(/\/\/(.*?)\//m)[0][0]
    path = url.split(/#{host}\//)[1]
   # print "host: ",host,"\n"
   # print "path: ",path,"\n"
    h = Net::HTTP.new(host,80)
    resp = h.get("/#{path}",nil)
   
    if resp.message == "OK"
     # puts "建立连接成功..." 
      return resp.body     
    end 
    return ""
  end

end

 可以这样:

 require "net/http" 

 class Fetcher
  
  def fetch(url)
    url=URI.parse url
 
    h = Net::HTTP.new(url.host,url.port)
    resp = h.get(url.request_uri)
   
    if resp.code == "200"
     # puts "建立连接成功..." 
      resp.body     
    end 
  end
end

 还有那个for循环,完全可以这样:

html.scan(/<a href="(.*?)"/im) .flatten.each do |url|
  #do someting here
end

 还有:

song_url = url[0].gsub(" ","%20")

 可以这样:

requrie'cgi'
song_url=CGI::escape url[0]

最后/<a href="(.*?)"/m最好这样写:

/<a\s+href="(.*?)"/im
 

 

0 请登录后投票
   发表时间:2009-03-07  
MD,牛人真多~
0 请登录后投票
   发表时间:2009-03-07   最后修改:2009-03-07
当时刚学ruby,对ruby的api不熟。学习了。

Hooopo 写道

orange0513 写道
为什么要用正则来分析url中的host、port、path等呢 直接用URI.parse不好么。就是用正则也得加上i这个选项忽略大小写吧。
同意,lz的代码一点没有ruby风格啊。。。比如:


Ruby代码

require&nbsp;"net/http"&nbsp; &nbsp;&nbsp;
&nbsp;&nbsp;
class&nbsp;Fetcher &nbsp;&nbsp;
&nbsp; &nbsp;&nbsp;
&nbsp;def&nbsp;fetch(url) &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;host&nbsp;=&nbsp;url.scan(/\/\/(.*?)\//m)[0][0] &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;path&nbsp;=&nbsp;url.split(/#{host}\//)[1] &nbsp;&nbsp;
&nbsp;&nbsp;#&nbsp;print&nbsp;"host:&nbsp;",host,"\n" &nbsp;&nbsp;
&nbsp;&nbsp;#&nbsp;print&nbsp;"path:&nbsp;",path,"\n" &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;h&nbsp;=&nbsp;Net::HTTP.new(host,80) &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;resp&nbsp;=&nbsp;h.get("/#{path}",nil) &nbsp;&nbsp;
&nbsp;&nbsp; &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;if&nbsp;resp.message&nbsp;==&nbsp;"OK"&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;#&nbsp;puts&nbsp;"建立连接成功..."&nbsp; &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;resp.body&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;end&nbsp; &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;return&nbsp;""&nbsp;&nbsp;
&nbsp;end&nbsp;&nbsp;
&nbsp;&nbsp;
nd&nbsp;&nbsp; require "net/http"

class Fetcher
 
  def fetch(url)
    host = url.scan(/\/\/(.*?)\//m)[0][0]
    path = url.split(/#{host}\//)[1]
   # print "host: ",host,"\n"
   # print "path: ",path,"\n"
    h = Net::HTTP.new(host,80)
    resp = h.get("/#{path}",nil)
  
    if resp.message == "OK"
     # puts "建立连接成功..."
      return resp.body    
    end
    return ""
  end

end



&nbsp;可以这样:



Ruby代码

require&nbsp;"net/http"&nbsp; &nbsp;&nbsp;
&nbsp;&nbsp;
class&nbsp;Fetcher &nbsp;&nbsp;
&nbsp; &nbsp;&nbsp;
&nbsp;def&nbsp;fetch(url) &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;url=URI.parse&nbsp;url &nbsp;&nbsp;
&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;h&nbsp;=&nbsp;Net::HTTP.new(url.host,url.port) &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;resp&nbsp;=&nbsp;h.get(url.request_uri) &nbsp;&nbsp;
&nbsp;&nbsp; &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;if&nbsp;resp.code&nbsp;==&nbsp;"200"&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;#&nbsp;puts&nbsp;"建立连接成功..."&nbsp; &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;resp.body&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;end&nbsp; &nbsp;&nbsp;
&nbsp;end&nbsp;&nbsp;
nd&nbsp;&nbsp; require "net/http"

class Fetcher
 
  def fetch(url)
    url=URI.parse url

    h = Net::HTTP.new(url.host,url.port)
    resp = h.get(url.request_uri)
  
    if resp.code == "200"
     # puts "建立连接成功..."
      resp.body    
    end
  end
end



&nbsp;还有那个for循环,完全可以这样:



Ruby代码

html.scan(/&lt;a&nbsp;href="(.*?)"/im)&nbsp;.flatten.each&nbsp;do&nbsp;|url| &nbsp;&nbsp;
&nbsp;&nbsp;#do&nbsp;someting&nbsp;here &nbsp;&nbsp;
end&nbsp;&nbsp;html.scan(/&lt;a href="(.*?)"/im) .flatten.each do |url|
  #do someting here
end

&nbsp;还有:



Ruby代码

song_url&nbsp;=&nbsp;url[0].gsub("&nbsp;","%20")&nbsp;&nbsp;song_url = url[0].gsub(" ","%20")

&nbsp;可以这样:



Ruby代码

requrie'cgi'&nbsp;&nbsp;
song_url=CGI::escape&nbsp;url[0]&nbsp;&nbsp;requrie'cgi'
song_url=CGI::escape url[0]

最后/&lt;a&nbsp;href="(.*?)"/m最好这样写:



Ruby代码

/&lt;a\s+href="(.*?)"/im&nbsp;&nbsp;/&lt;a\s+href="(.*?)"/im&nbsp;

&nbsp;

0 请登录后投票
论坛首页 编程语言技术版

跳转论坛:
Global site tag (gtag.js) - Google Analytics