Skip to content

【领导留言板 - 人民网】Rss Wanted - 希望能够抓取指定城市的人民网留言板 #8188

@KwToPA

Description

@KwToPA

网站地址

以北京市为例子,需要查看地方领导的留言板,领导留言板 > 地方领导 > 北京市 > 北京市委书记蔡奇
http://liuyan.people.com.cn/threads/list?fid=539

id 是539,不同的行政区划有不同的ID

网站描述

《领导留言板》为您搭建与领导沟通的桥梁,不干涉双方沟通结果。《领导留言板》会积极携手各方更好地解决问题,但作为非主管部门,《领导留言板》无法保证您的留言一定能获得公开展示(有关留言能否展示请参阅“2.3留言及评价涉及以下内容将无法获得公开展示”),也不保证您的留言能获得回复,亦不保证您的诉求一定能获得解决。

需要生成什么内容?

希望能够抓取指定城市的人民网留言板内容

额外描述

我一开始用FEED43抓,结果网站源代码中没有内容,用开发者工具看,发现查询地址是http://liuyan.people.com.cn/threads/queryThreadsList?fid=539&lastItem=0,用浏览器打开结果报错,错误代码405

上CSDN看,发现了相关内容,有具体的代码,但我看不懂,不知道能不能用。

JAVA爬取页面出现405错误
需要一个JAVA采集器的框架
需要一个JAVA采集器的框架

代码摘录如下

package com.java.activiti.controller;
import net.sf.json.JSONObject;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.HttpStatus;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
public class python {
    public static void main(String[] args) {
        String url = "http://liuyan.people.com.cn/threads/queryThreadsList?fid=539&lastItem=0";
        HttpClient httpClient = null;
        HttpPost httpPost = null;
        HttpResponse response = null;
        try {
            httpClient = HttpClients.createDefault();
            httpPost = new HttpPost(url);// 传入URL地址
            httpPost.addHeader("Accept",
                    "pplication/json, text/javascript, */*; q=0.01");
            httpPost.addHeader("Accept-Encoding", "gzip, deflate");// 设置请求头
            httpPost.addHeader("Accept-Language",
                    "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2");// 设置请求头
            //httpPost.addHeader("Connection", "keep-alive");// 设置请求头
            // httpPost.addHeader("Content-Length", "19");//设置请求头
            //httpPost.addHeader("Content-Type",
                //    "application/x-www-form-urlencoded; charset=UTF-8");// 设置请求头
            httpPost.addHeader("Referer",
                    "http://liuyan.people.com.cn/threads/list?fid=539");// 设置请求头
            //httpPost.addHeader("User-Agent",
            //        "Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/64.0");// 设置请求头
            //httpPost.addHeader("X-Requested-With", "XMLHttpRequest");// 设置请求头
            response = httpClient.execute(httpPost);// 获取响应
            int statusCode = response.getStatusLine().getStatusCode();
            System.out.println("HTTP Status Code:" + statusCode);
            if (statusCode != HttpStatus.SC_OK) {
                System.out.println("HTTP请求未成功!HTTP Status Code:"
                        + response.getStatusLine());
            }
            HttpEntity httpEntity = response.getEntity();
            String reponseContent = EntityUtils.toString(httpEntity);
            EntityUtils.consume(httpEntity);
            System.out.println("响应内容:" + reponseContent);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions