浏览 6069 次
锁定老帖子 主题:用ruby写了一个搜索下载歌曲的工具
精华帖 (0) :: 良好帖 (2) :: 新手帖 (0) :: 隐藏帖 (0)
|
|
---|---|
作者 | 正文 |
发表时间:2008-04-15
前几天用java写了一个GUI的搜索下载工具,主要利用baidu mp3搜索的结果。david同学用perl写了命令行的类似的下载工具,为了练练ruby,我又写了ruby版的。
Fetcher类: 根据url来Fetch到页面,供Parser分析之用 require "net/http" class Fetcher def fetch(url) host = url.scan(/\/\/(.*?)\//m)[0][0] path = url.split(/#{host}\//)[1] # print "host: ",host,"\n" # print "path: ",path,"\n" h = Net::HTTP.new(host,80) resp = h.get("/#{path}",nil) if resp.message == "OK" # puts "建立连接成功..." return resp.body end return "" end end Parser类: 提取出可供下载的链接,并通过ping,来选取速度最快的连接,供Download之用: class Parser public def initialize() @fetcher = Fetcher.new end def parse_mp3(html) urls = html.scan(/<a href="(.*?)"/m) download_hosts_urls = {} parse_threads = [] for url in urls do if url[0] =~ /.*?\.mp3,,.*?/ parse_threads << Thread.new(url) do |url| song_url = url[0].gsub(" ","%20") download_url = parse_download_url(song_url) if download_url host = download_url.scan(/\/\/(.*?)\//m)[0][0] #We only want to find the best download url,so we needn't care duplicate key download_hosts_urls[host] = download_url end end end end parse_threads.each{|t| t.join} puts "已经搜索到#{download_hosts_urls.size}个链接可以下载..." exit(1) if download_hosts_urls.size == 0 puts "正在选择速度最快的链接..." host = select_best_host(download_hosts_urls.keys) download_hosts_urls[host] end private def select_best_host(hosts) times_hosts = {} threads = [] hosts.each do |host| threads << Thread.new(host) do |host| response = `ping -c 1 -W 30 #{host}` #use`ping -n 1 -w 30 #{host}` in windows r_t = response.scan(/time=(\d+)/m) #only get integer part times_hosts[r_t[0][0]] = host unless r_t.empty? #duplicate key no problem end end threads.each{|t| t.join} times = times_hosts.keys min = times.min times_hosts[min] end def parse_download_url(song_url) html = @fetcher.fetch(song_url) urls = html.scan(/<a href="(.*?)"/m) return nil if urls.empty? || urls[0][0] =~ /.*?\.html/ return urls[0][0] end end Download类: require "open-uri" require "parser" require "fetcher" class Download public def initialize(song_name) @song_name = song_name @search_url = "http://mp3.baidu.com/m?f=ms&tn=baidump3&ct=134217728&lf=&rn=&word=#@song_name&lm=0" @parser = Parser.new @fetcher = Fetcher.new end def download puts "正在建立连接..." html = @fetcher.fetch(@search_url) puts "正在获取搜索结果..." url = @parser.parse_mp3(html) puts "已经获得最快的下载连接:#{url}.\n开始下载..." doDownload(url) puts "下载完毕..." end private def doDownload(url) open(url) do |fin| size = fin.size download_size = 0 puts "大小: #{size / 1024}KB" filename = url[url.rindex('/')+1, url.length-1] puts "歌曲名: #{filename}" open(File.basename("./#{filename}"),"wb") do |fout| while buf = fin.read(1024) do fout.write buf download_size += buf.size print "已经下载: #{download_size * 100 / size}%\r" STDOUT.flush end end end puts end end download = Download.new(ARGV[0]) download.download 引用 fuliang@fuliang-desktop:~/program/ruby/mp3download$ ruby download.rb pretty body 正在建立连接... 正在获取搜索结果... 已经搜索到25个链接可以下载... 正在选择速度最快的链接... 已经获得最快的下载连接:http://www.jxggzp.com/muisc/20051122185348.mp3. 开始下载... 大小: 6570KB 歌曲名: 20051122185348.mp3 已经下载: 100% 下载完毕... 基本上可以使用。现在还存在一些问题,下载链接中有中文,往往会失败,主要是没有进行编码,知道ruby有个Iconv.conv来转换编码,不知道如何直接对中文进行编码:不知道没有像encode("gb2312","大海")之类的方法。另一个是下载问题:进度条有问题,主要open-uri使用open貌似就把文件下载到本地了,造成open很长时间,fin.read,fout.write是本地操作则非常快,结果下载进度从开始出现到下载完成瞬间就完成。希望各位达人可以帮助修正两个问题。 声明:ITeye文章版权属于作者,受法律保护。没有作者书面许可不得转载。
推荐链接
|
|
返回顶楼 | |
发表时间:2008-04-15
Iconv.new("gbk","utf-8").iconv("做人不能太CNN")
|
|
返回顶楼 | |
发表时间:2008-04-15
读ruby代码就像自然语言一样
|
|
返回顶楼 | |
发表时间:2009-03-04
我觉得这文章不错呀,为什么没人顶呀
|
|
返回顶楼 | |
发表时间:2009-03-04
为什么要用正则来分析url中的host、port、path等呢 直接用URI.parse不好么。就是用正则也得加上i这个选项忽略大小写吧。
|
|
返回顶楼 | |
发表时间:2009-03-06
orange0513 写道
为什么要用正则来分析url中的host、port、path等呢 直接用URI.parse不好么。就是用正则也得加上i这个选项忽略大小写吧。
require "net/http" class Fetcher def fetch(url) host = url.scan(/\/\/(.*?)\//m)[0][0] path = url.split(/#{host}\//)[1] # print "host: ",host,"\n" # print "path: ",path,"\n" h = Net::HTTP.new(host,80) resp = h.get("/#{path}",nil) if resp.message == "OK" # puts "建立连接成功..." return resp.body end return "" end end 可以这样:
require "net/http" class Fetcher def fetch(url) url=URI.parse url h = Net::HTTP.new(url.host,url.port) resp = h.get(url.request_uri) if resp.code == "200" # puts "建立连接成功..." resp.body end end end 还有那个for循环,完全可以这样:
html.scan(/<a href="(.*?)"/im) .flatten.each do |url| #do someting here end 还有:
song_url = url[0].gsub(" ","%20") 可以这样:
requrie'cgi' song_url=CGI::escape url[0] 最后/<a href="(.*?)"/m最好这样写:
/<a\s+href="(.*?)"/im
|
|
返回顶楼 | |
发表时间:2009-03-07
MD,牛人真多~
|
|
返回顶楼 | |
发表时间:2009-03-07
最后修改:2009-03-07
当时刚学ruby,对ruby的api不熟。学习了。
Hooopo 写道 orange0513 写道 为什么要用正则来分析url中的host、port、path等呢 直接用URI.parse不好么。就是用正则也得加上i这个选项忽略大小写吧。 同意,lz的代码一点没有ruby风格啊。。。比如: Ruby代码 require "net/http" class Fetcher def fetch(url) host = url.scan(/\/\/(.*?)\//m)[0][0] path = url.split(/#{host}\//)[1] # print "host: ",host,"\n" # print "path: ",path,"\n" h = Net::HTTP.new(host,80) resp = h.get("/#{path}",nil) if resp.message == "OK" # puts "建立连接成功..." return resp.body end return "" end nd require "net/http" class Fetcher def fetch(url) host = url.scan(/\/\/(.*?)\//m)[0][0] path = url.split(/#{host}\//)[1] # print "host: ",host,"\n" # print "path: ",path,"\n" h = Net::HTTP.new(host,80) resp = h.get("/#{path}",nil) if resp.message == "OK" # puts "建立连接成功..." return resp.body end return "" end end 可以这样: Ruby代码 require "net/http" class Fetcher def fetch(url) url=URI.parse url h = Net::HTTP.new(url.host,url.port) resp = h.get(url.request_uri) if resp.code == "200" # puts "建立连接成功..." resp.body end end nd require "net/http" class Fetcher def fetch(url) url=URI.parse url h = Net::HTTP.new(url.host,url.port) resp = h.get(url.request_uri) if resp.code == "200" # puts "建立连接成功..." resp.body end end end 还有那个for循环,完全可以这样: Ruby代码 html.scan(/<a href="(.*?)"/im) .flatten.each do |url| #do someting here end html.scan(/<a href="(.*?)"/im) .flatten.each do |url| #do someting here end 还有: Ruby代码 song_url = url[0].gsub(" ","%20") song_url = url[0].gsub(" ","%20") 可以这样: Ruby代码 requrie'cgi' song_url=CGI::escape url[0] requrie'cgi' song_url=CGI::escape url[0] 最后/<a href="(.*?)"/m最好这样写: Ruby代码 /<a\s+href="(.*?)"/im /<a\s+href="(.*?)"/im |
|
返回顶楼 | |