-
Notifications
You must be signed in to change notification settings - Fork 9.5k
【领导留言板 - 人民网】Rss Wanted - 希望能够抓取指定城市的人民网留言板 #8188
Description
网站地址
以北京市为例子,需要查看地方领导的留言板,领导留言板 > 地方领导 > 北京市 > 北京市委书记蔡奇
http://liuyan.people.com.cn/threads/list?fid=539
id 是539,不同的行政区划有不同的ID
网站描述
《领导留言板》为您搭建与领导沟通的桥梁,不干涉双方沟通结果。《领导留言板》会积极携手各方更好地解决问题,但作为非主管部门,《领导留言板》无法保证您的留言一定能获得公开展示(有关留言能否展示请参阅“2.3留言及评价涉及以下内容将无法获得公开展示”),也不保证您的留言能获得回复,亦不保证您的诉求一定能获得解决。
需要生成什么内容?
希望能够抓取指定城市的人民网留言板内容
额外描述
我一开始用FEED43抓,结果网站源代码中没有内容,用开发者工具看,发现查询地址是http://liuyan.people.com.cn/threads/queryThreadsList?fid=539&lastItem=0,用浏览器打开结果报错,错误代码405
上CSDN看,发现了相关内容,有具体的代码,但我看不懂,不知道能不能用。
JAVA爬取页面出现405错误
需要一个JAVA采集器的框架
需要一个JAVA采集器的框架
代码摘录如下
package com.java.activiti.controller;
import net.sf.json.JSONObject;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.HttpStatus;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
public class python {
public static void main(String[] args) {
String url = "http://liuyan.people.com.cn/threads/queryThreadsList?fid=539&lastItem=0";
HttpClient httpClient = null;
HttpPost httpPost = null;
HttpResponse response = null;
try {
httpClient = HttpClients.createDefault();
httpPost = new HttpPost(url);// 传入URL地址
httpPost.addHeader("Accept",
"pplication/json, text/javascript, */*; q=0.01");
httpPost.addHeader("Accept-Encoding", "gzip, deflate");// 设置请求头
httpPost.addHeader("Accept-Language",
"zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2");// 设置请求头
//httpPost.addHeader("Connection", "keep-alive");// 设置请求头
// httpPost.addHeader("Content-Length", "19");//设置请求头
//httpPost.addHeader("Content-Type",
// "application/x-www-form-urlencoded; charset=UTF-8");// 设置请求头
httpPost.addHeader("Referer",
"http://liuyan.people.com.cn/threads/list?fid=539");// 设置请求头
//httpPost.addHeader("User-Agent",
// "Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/64.0");// 设置请求头
//httpPost.addHeader("X-Requested-With", "XMLHttpRequest");// 设置请求头
response = httpClient.execute(httpPost);// 获取响应
int statusCode = response.getStatusLine().getStatusCode();
System.out.println("HTTP Status Code:" + statusCode);
if (statusCode != HttpStatus.SC_OK) {
System.out.println("HTTP请求未成功!HTTP Status Code:"
+ response.getStatusLine());
}
HttpEntity httpEntity = response.getEntity();
String reponseContent = EntityUtils.toString(httpEntity);
EntityUtils.consume(httpEntity);
System.out.println("响应内容:" + reponseContent);
} catch (Exception e) {
e.printStackTrace();
}
}
}