- 浏览: 2551921 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Python Crawler(5)Deployment on RaspberryPi
Check python version
>python -V
Python 2.7.13
Install pip on raspberryPi
>sudo apt-get install python-pip
>pip -V
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)
It worked before but today when I run pip -V, it stuck.
Try this one
>curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
>python get-pip.py
>pip -V
pip 9.0.1 from /usr/local/lib/python2.7/dist-packages (python 2.7)
Install scrapy ENV
>sudo pip install scrapy
Exception
No package 'libffi' found
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
Solution:
>sudo apt-get install libxml2-dev libxslt1-dev
>sudo pip install lxml
Exceptions:
Could not import setuptools which is required to install from a source distribution.
Please install setuptools.
src/lxml/etree.c:91:20: fatal error: Python.h: No such file or directory
Running setup.py install for cffi ... error
Running setup.py install for cryptography ... error
Solution:
>sudo apt-get install python-dev
>sudo pip install -U setuptools
>sudo apt-get install python-cffi
>sudo apt-get install gcc libffi-dev libssl-dev python-dev
Compile and Install does not work.
So try this one
>sudo apt-get install python-cryptography
>sudo apt-get install python-crypto
>sudo apt-get install -y python-lxml
>sudo pip install scrapy
It does not work on raspberryPi 1 and 2. So I just install scrapyd there.
>sudo pip install scrapyd
>scrapyd --version
twistd (the Twisted daemon) 17.5.0
Copyright (c) 2001-2016 Twisted Matrix Laboratories.
See LICENSE for details.
Even on my raspberrypi1, I have issues when I run the command
>scrapy shell 'http://quotes.toscrape.com/page/1'
Exceptions:
'module' object has no attribute 'OP_NO_TLSv1_1'
Solution:
https://github.com/scrapy/scrapy/issues/2473
>sudo pip install --upgrade scrapy
>sudo pip install --upgrade twisted
>sudo pip install --upgrade pyopenssl
Install the Clients
>sudo pip install scrapyd-client
Install deploy tool
>sudo pip install scrapyd-deploy
Install selenium Support
>sudo pip install selenium
Start the Server
>scrapyd
Bind issue I guess, I can access 6800 on that server with localhost:6800, but not work from remote.
Add one file in /opt/scrapyd
cat scrapyd.conf
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 100
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 20
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
>nohup scrapyd &
Command to install all dependencies
>pip install -r requirements.txt
List the base version
https://docs.resin.io/runtime/resin-base-images/?ref=dockerhub
I also docker that application.
start.sh easily start the service
#!/bin/sh -ex
#start the service
cd /tool/scrapyd/
scrapyd
The Makefile open 6800 Port
IMAGE=sillycat/public
TAG=raspberrypi-scrapyd
NAME=raspberrypi-scrapyd
docker-context:
build: docker-context
docker build -t $(IMAGE):$(TAG) .
run:
docker run -d -p 6800:6800 --name $(NAME) $(IMAGE):$(TAG)
debug:
docker run -ti -p 6800:6800 --name $(NAME) $(IMAGE):$(TAG) /bin/bash
clean:
docker stop ${NAME}
docker rm ${NAME}
logs:
docker logs ${NAME}
publish:
docker push ${IMAGE}:${TAG}
fetch:
docker pull ${IMAGE}:${TAG}
The Dockerfile has all the installation steps
#Set up FTP in Docker
#Prepre the OS
FROM resin/raspberrypi3-python
MAINTAINER Carl Luo <luohuazju@gmail.com>
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update
RUN apt-get -y dist-upgrade
#install the software
RUN pip install scrapyd
#copy the config
RUN mkdir -p /tool/scrapyd/
ADD conf/scrapyd.conf /tool/scrapyd/
#set up the app
EXPOSE 6800
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD[ "./start.sh" ]
conf/scrapyd.conf will have the configurations
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 100
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 20
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
References:
scrape
http://sillycat.iteye.com/blog/2391523
http://sillycat.iteye.com/blog/2391524
http://sillycat.iteye.com/blog/2391685
http://sillycat.iteye.com/blog/2391926
https://stackoverflow.com/questions/33785755/getting-could-not-find-function-xmlcheckversion-in-library-libxml2-is-libxml2
https://github.com/fredley/play-pi/issues/22
https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory
Check python version
>python -V
Python 2.7.13
Install pip on raspberryPi
>sudo apt-get install python-pip
>pip -V
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)
It worked before but today when I run pip -V, it stuck.
Try this one
>curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
>python get-pip.py
>pip -V
pip 9.0.1 from /usr/local/lib/python2.7/dist-packages (python 2.7)
Install scrapy ENV
>sudo pip install scrapy
Exception
No package 'libffi' found
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
Solution:
>sudo apt-get install libxml2-dev libxslt1-dev
>sudo pip install lxml
Exceptions:
Could not import setuptools which is required to install from a source distribution.
Please install setuptools.
src/lxml/etree.c:91:20: fatal error: Python.h: No such file or directory
Running setup.py install for cffi ... error
Running setup.py install for cryptography ... error
Solution:
>sudo apt-get install python-dev
>sudo pip install -U setuptools
>sudo apt-get install python-cffi
>sudo apt-get install gcc libffi-dev libssl-dev python-dev
Compile and Install does not work.
So try this one
>sudo apt-get install python-cryptography
>sudo apt-get install python-crypto
>sudo apt-get install -y python-lxml
>sudo pip install scrapy
It does not work on raspberryPi 1 and 2. So I just install scrapyd there.
>sudo pip install scrapyd
>scrapyd --version
twistd (the Twisted daemon) 17.5.0
Copyright (c) 2001-2016 Twisted Matrix Laboratories.
See LICENSE for details.
Even on my raspberrypi1, I have issues when I run the command
>scrapy shell 'http://quotes.toscrape.com/page/1'
Exceptions:
'module' object has no attribute 'OP_NO_TLSv1_1'
Solution:
https://github.com/scrapy/scrapy/issues/2473
>sudo pip install --upgrade scrapy
>sudo pip install --upgrade twisted
>sudo pip install --upgrade pyopenssl
Install the Clients
>sudo pip install scrapyd-client
Install deploy tool
>sudo pip install scrapyd-deploy
Install selenium Support
>sudo pip install selenium
Start the Server
>scrapyd
Bind issue I guess, I can access 6800 on that server with localhost:6800, but not work from remote.
Add one file in /opt/scrapyd
cat scrapyd.conf
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 100
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 20
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
>nohup scrapyd &
Command to install all dependencies
>pip install -r requirements.txt
List the base version
https://docs.resin.io/runtime/resin-base-images/?ref=dockerhub
I also docker that application.
start.sh easily start the service
#!/bin/sh -ex
#start the service
cd /tool/scrapyd/
scrapyd
The Makefile open 6800 Port
IMAGE=sillycat/public
TAG=raspberrypi-scrapyd
NAME=raspberrypi-scrapyd
docker-context:
build: docker-context
docker build -t $(IMAGE):$(TAG) .
run:
docker run -d -p 6800:6800 --name $(NAME) $(IMAGE):$(TAG)
debug:
docker run -ti -p 6800:6800 --name $(NAME) $(IMAGE):$(TAG) /bin/bash
clean:
docker stop ${NAME}
docker rm ${NAME}
logs:
docker logs ${NAME}
publish:
docker push ${IMAGE}:${TAG}
fetch:
docker pull ${IMAGE}:${TAG}
The Dockerfile has all the installation steps
#Set up FTP in Docker
#Prepre the OS
FROM resin/raspberrypi3-python
MAINTAINER Carl Luo <luohuazju@gmail.com>
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update
RUN apt-get -y dist-upgrade
#install the software
RUN pip install scrapyd
#copy the config
RUN mkdir -p /tool/scrapyd/
ADD conf/scrapyd.conf /tool/scrapyd/
#set up the app
EXPOSE 6800
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD[ "./start.sh" ]
conf/scrapyd.conf will have the configurations
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 100
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 20
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
References:
scrape
http://sillycat.iteye.com/blog/2391523
http://sillycat.iteye.com/blog/2391524
http://sillycat.iteye.com/blog/2391685
http://sillycat.iteye.com/blog/2391926
https://stackoverflow.com/questions/33785755/getting-could-not-find-function-xmlcheckversion-in-library-libxml2-is-libxml2
https://github.com/fredley/play-pi/issues/22
https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory
发表评论
-
Stop Update Here
2020-04-28 09:00 316I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 476NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 369Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 370Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 337Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 431Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 436Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 374Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 455VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 385Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 478NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 423Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 337Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 248GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 451GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 328GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 314Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 318Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 294Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 312Serverless with NodeJS and Tenc ...
相关推荐
本教程"PythonCrawler-master"旨在教授如何利用Python进行网页数据的抓取和处理。教程涵盖了网络爬虫的基础知识,包括HTML解析、HTTP请求、数据存储等核心内容,同时也涉及了一些高级技巧,如模拟登录、反爬虫策略和...
**PythonCrawler-Scrapy-Mysql-File-Template 框架详解** 本文将深入探讨一个基于Python的开源爬虫框架——Scrapy,以及如何利用它来构建爬虫项目,将抓取的数据存储到MySQL数据库或文件中。Scrapy是一个强大的、...
python库。 资源全名:spidy_web_crawler-1.6.0-py3-none-any.whl
Python-Crawler-master是一个关于Python爬虫的项目,主要利用Python的多线程技术来实现对电影天堂网站资源的高效抓取。在这个项目中,开发者旨在提供一个实用且高效的爬虫框架,帮助用户获取到电影天堂网站上的丰富...
Python爬虫示例之distribute_crawler-master.Python爬虫示例之distribute_crawler-master.Python爬虫示例之distribute_crawler-master.Python爬虫示例之distribute_crawler-master.Python爬虫示例之distribute_...
Python website crawler.
( )\ ) ) ) ( ( (()/( ( ( /( ( /( )\ ( ) ( ( )\ ( ( /(_)))\ ) )\()))\()) ( ( (((_) )( ( /( )\))( ((_) ))\ )( (_)) (()/( (_))/((_)\ )\ )\ ) )\___ (()\ )(_))((_)()\ _ /((_)(()\
简单爬虫操作,直达博客——复工复产,利用Python爬虫爬取火车票信息,利用Python 爬虫获取火车票信息
Python爬虫学习路径图_Learn-Python-Crawler
学习 Python 爬虫需要掌握以下几个方面的知识:首先,需要了解 Python 基础知识,包括变量、数据类型、控制结构、函数、模块等。 Python 是一种易于学习的语言,对于初学者来说,学习 Python 基础知识并不困难。其次...
5. `utils.py`:工具函数,包含通用的辅助方法。 6. `requirements.txt`:列出项目依赖的Python库和版本。 7. `logs`:日志文件夹,记录爬虫运行时的错误和信息。 8. `test`:测试目录,包含单元测试和集成测试代码...
在这个“python_crawler”项目中,我们很可能看到了一系列用于学习和实践Python爬虫技术的代码和资源。下面,我们将深入探讨Python爬虫的一些核心知识点。 1. **基础概念**:Python爬虫,也称为网络爬虫或网页抓取...
【标题】"weibo-crawler-master_talk3z9_weibo-crawler_微博id_girl5j1_python_源" 指的是一款基于Python的微博爬虫项目,由用户"talk3z9"开发,专门用于抓取新浪微博的数据。项目名称为"weibo-crawler",可能是一个...
python-crawler-master很好的学习资源
Fun's Python crawler and Python data analysis small projects (some interesting Python crawlers and data analysis projects)interested-python interesting Python crawler and data analysis small projects...
这个"python-crawler-master.zip"压缩包显然包含了一个完整的Python爬虫项目,适合初学者学习和实践。让我们详细了解一下Python爬虫的基本概念、重要性以及如何进行开发。 Python爬虫是一种自动化程序,用于遍历...
在IT行业中,网络爬虫(Web Crawler)是一种自动化程序,用于从互联网上抓取大量数据,而Python作为一门强大的编程语言,因其简洁易用的特性,在爬虫领域被广泛应用。"crawler_webcrawler_python_parallel_"这个标题...
Here is a basic Python web crawler code that uses the requests and beautifulsoup4 libraries: This code sends an HTTP request to the specified URL, then uses BeautifulSoup to parse the ...
【标题】:“crawlerforSinaweibo_爬虫python_webcrawler_python_weibo_python爬虫_源码” 这个标题明确指出这是一个关于Python爬虫的项目,特别针对的是新浪微博(Sina Weibo)的数据抓取。"Webcrawler"是网络爬虫...
在这个"python-video-crawler.rar"压缩包中,包含的是一个Python实现的视频采集项目,它可以帮助我们从国内几个知名的视频站点抓取相关的视频信息。这个工具对于数据分析、内容监控或者研究网络视频趋势的开发者来说...