(转载)python爬虫入门 -

永夜-极光

浏览: 253030 次
性别:
来自: 深圳

最近访客更多访客>>

amo

u012363178

shenyouhai

zjamson

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

(转载)python爬虫入门

博客分类：

python
转载

步骤1: 安装2个包

requests和beautifulsoup

步骤2:导入代码,并执行

import requests
import csv
import random
import time
import socket
import http.client
# import urllib.request
from bs4 import BeautifulSoup

def get_content(url , data = None):
    header={
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, sdch',
        'Accept-Language': 'zh-CN,zh;q=0.8',
        'Connection': 'keep-alive',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.235'
}
    timeout = random.choice(range(80, 180))
    while True:
        try:
            rep = requests.get(url,headers = header,timeout = timeout)
            rep.encoding = 'utf-8'
# req = urllib.request.Request(url, data, header)
            # response = urllib.request.urlopen(req, timeout=timeout)
            # html1 = response.read().decode('UTF-8', errors='ignore')
            # response.close()
break
# except urllib.request.HTTPError as e:
        #         print( '1:', e)
        #         time.sleep(random.choice(range(5, 10)))
        #
        # except urllib.request.URLError as e:
        #     print( '2:', e)
        #     time.sleep(random.choice(range(5, 10)))
except socket.timeout as e:
            print( '3:', e)
            time.sleep(random.choice(range(8,15)))

        except socket.error as e:
            print( '4:', e)
            time.sleep(random.choice(range(20, 60)))

        except http.client.BadStatusLine as e:
            print( '5:', e)
            time.sleep(random.choice(range(30, 80)))

        except http.client.IncompleteRead as e:
            print( '6:', e)
            time.sleep(random.choice(range(5, 15)))

    return rep.text
    # return html_text
def get_data(html_text):
    final = []
    bs = BeautifulSoup(html_text, "html.parser")  # 创建BeautifulSoup对象
body = bs.body # 获取body部分
data = body.find('div', {'id': '7d'})  # 找到id为7d的div
ul = data.find('ul')  # 获取ul部分
li = ul.find_all('li')  # 获取所有的li
for day in li: # 对每个li标签中的内容进行遍历
temp = []
        date = day.find('h1').string  # 找到日期
temp.append(date)  # 添加到temp中
inf = day.find_all('p')  # 找到li中的所有p标签
temp.append(inf[0].string,)  # 第一个p标签中的内容（天气状况）加到temp中
if inf[1].find('span') is None:
            temperature_highest = None # 天气预报可能没有当天的最高气温（到了傍晚，就是这样），需要加个判断语句,来输出最低气温
else:
            temperature_highest = inf[1].find('span').string  # 找到最高温
temperature_highest = temperature_highest.replace('℃', '')  # 到了晚上网站会变，最高温度后面也有个℃
temperature_lowest = inf[1].find('i').string  # 找到最低温
temperature_lowest = temperature_lowest.replace('℃', '')  # 最低温度后面有个℃，去掉这个符号
temp.append(temperature_highest)   # 将最高温添加到temp中
temp.append(temperature_lowest)   #将最低温添加到temp中
final.append(temp)   #将temp加到final中
return final

def write_data(data, name):
    file_name = name
    with open(file_name, 'a', errors='ignore', newline='') as f:
            f_csv = csv.writer(f)
            f_csv.writerows(data)

if __name__ == '__main__':
                url = 'http://www.weather.com.cn/weather/101190401.shtml'
html = get_content(url)
                result = get_data(html)
                write_data(result, 'weather.csv')

步骤3: 结果如下:

23日（今天）	多云	19	12
24日（明天）	多云	20	12
25日（后天）	多云	21	14
26日（周四）	多云	21	14
27日（周五）	多云	22	14
28日（周六）	多云	21	15
29日（周日）	多云转晴	21	11

分享到：

python爬虫入门(解析) | 文件IO,音频buffer处理

2017-10-23 13:02
浏览 555
评论(0)
分类:非技术
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

(转载)python爬虫入门

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

(转载)python爬虫入门

评论

发表评论

相关推荐

springboot启动时执行指定方法

Web调用本地程序

Java Config与注解（转载）

java调用python脚本,传入参数

Gradle 命令行打包APK,输出到指定路径

列表生成式和生成器

自动安装apk,失败自动重连

正则表达式

python爬虫入门(解析)

Gradel构建差异化版本实例

闭包的概念

（转载）真理

(转载)从黑格尔谈“联系”与联系的局限性

（转载）康德与先验论

利用 Java 实现组合式解析器,基于 Java 的界面布局 DSL 的设计与实现(转载)

工具方法论(转载)

谈方法论（一）：认识与方法的基本问题(转载)

华夏基石e洞察归纳

因果论

转载:<<集体行动的逻辑>>总结

最近访客更多访客>>