论坛首页 编程语言技术论坛

小红书采集的分享

浏览 607 次
精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
作者 正文
   发表时间:2021-03-15  
对于小红书我们都不陌生,一个号称拥有超过一亿用户的生活分享app。其用户笔记内容涵盖吃穿玩乐买,涉及时尚、护肤、彩妆、美食、旅行、影视、读书、健身等各个生活方式的各个领域,而且每天都有很多的用户分享笔记。可想而知其数据价值有多大。
我们都知道越大的app数据的获取越不容易,而且小红书只能通过app接口通过抓包工具获取数据,接下来我们分享下用java进行小红书的爬取,代码分享如下:
import org.apache.commons.httpclient.Credentials;
import org.apache.commons.httpclient.HostConfiguration;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.UsernamePasswordCredentials;
import org.apache.commons.httpclient.auth.AuthScope;
import org.apache.commons.httpclient.methods.GetMethod;
import java.io.IOException;public class Main { 
  # 代理服务器(产品官网 www.16yun.cn)  
private static final String PROXY_HOST = "t.16yun.cn"; 
  private static final int PROXY_PORT = 31111; 
   public static void main(String[] args) {    
   HttpClient client = new HttpClient();   
    HttpMethod method = new GetMethod("https://httpbin.org/ip");   
     HostConfiguration config = client.getHostConfiguration();    
   config.setProxy(PROXY_HOST, PROXY_PORT);     
   client.getParams().setAuthenticationPreemptive(true); 
       String username = "16ABCCKJ";    
   String password = "712323";   
    Credentials credentials = new UsernamePasswordCredentials(username, password);     
  AuthScope authScope = new AuthScope(PROXY_HOST, PROXY_PORT);  
      client.getState().setProxyCredentials(authScope, credentials);   
     try {            client.executeMethod(method);  
          if (method.getStatusCode() == HttpStatus.SC_OK) {  
             String response = method.getResponseBodyAsString();       
        System.out.println("Response = " + response);  
         }        } catch (IOException e) {    
       e.printStackTrace();        } finally {    
       method.releaseConnection();     
  }    }}
论坛首页 编程语言技术版

跳转论坛:
Global site tag (gtag.js) - Google Analytics