jugad2 - Vasudev Ram on software innovation: HTTP

Showing posts with label HTTP. Show all posts

Saturday, March 1, 2014

The protocol-relative URL (for web links)

Seen via a Hacker News thread.

http://www.paulirish.com/2010/the-protocol-relative-url/

Friday, January 3, 2014

Use WebSockets and Python for web-based system monitoring

By Vasudev Ram

I got to know about websocketd recently, via a reply to this question I posted on Hacker News: Ask HN: What are you using Go for?

websocketd is "a small command-line tool that will wrap an existing command-line interface program, and allow it to be accessed via a WebSocket". It's written in Go, by Joe Walnes.

He describes websocketd as "Like inetd, but for WebSockets. Turn any application that uses STDIN/STDOUT into a WebSocket server.".

The websocketd README goes on to say:

[ WebSocket-capable applications can now be built very easily. As long as you can write an executable program that reads STDIN and writes to STDOUT, you can build a WebSocket server. Do it in Python, Ruby, Perl, Bash, .NET, C, Go, PHP, Java, Clojure, Scala, Groovy, Expect, Awk, VBScript, Haskell, Lua, R, whatever! No networking libraries necessary. ]

Websocket topic on Wikipedia

So I wrote a small Python program to try out websocketd. It uses the psutil module to get disk space info (total, used, and free) from the system.

(I had blogged about psutil earlier, here:

psutil, Python tool to get process info and more.)

Here is the code:

# psutil_disk_usage.py

import string
from time import sleep
import psutil

print "Disk Space (MB)".rjust(46)
print " ".rjust(25) + "Total".rjust(10) + "Used".rjust(10) + "Free".rjust(10)  
for i in range(5):
    du = psutil.disk_usage('/')
    print str(i + 1).rjust(25) + str(du.total/1024/1024).rjust(10) + str(du.used/1024/1024).rjust(10) + str(du.free/1024/1024).rjust(10)  
    sleep(2)

When this program is run directly at the prompt, with the command:

python psutil_disk_usage.py

, it gives this output:

Disk Space (MB)
                             Total      Used      Free
                       1     99899     91309      8590
                       2     99899     91309      8590
                       3     99899     91309      8590
                       4     99899     91309      8590
                       5     99899     91309      8590

Running this program under the control of websocketd, with the command:

websocketd --port=8080 python psutil_disk_usage.py

, causes the output of the program to go to the browser that is listening on port 8080 (see below *).

You have to:

set PYTHONUNBUFFERED=true

at the command line first, for it to work as a WebSocket server; it works fine as a plain command-line program, without that setting.

See this StackOverflow question.

(*) You also have to write a WebSocket client, i.e. an HTML page with JavaScript, that the server can connect to, and send data to. The JavaScript code listens for a connection and then reads the data sent by the server and displays it on the web page.

In my next post, I'll show the JavaScript WebSocket client, which is a modified version of an example on the websocketd page.

- Vasudev Ram - Dancing Bison Enterprises

Contact Page

Vitamins & Supplements

Share |

Thursday, December 27, 2012

Pholcidae, a Python web crawler library

bbrodriges/pholcidae · GitHub

Pholcidae (named after a _family_ of spiders, appropriately) (*), is a Python library that can be used to write custom web crawlers (a.k.a. spiders).

It has a handful of attributes that can be set when creating an instance of the spider, to customize how it works, including a list of valid links, exclude links, domain to crawl, page at which to start crawling the domain, whether to crawl links pointing out of the domain, cookies and headers to set for the page requests, etc.

Your program gets access to each page that it crawls, both the URL and the raw content, to process as you wish.

Looks like a useful tool to experiment with creating custom web crawlers, since it is not a standalone crawler program but a crawler library.

As I've many times in the past on this blog and elsewhere, making the bulk of your code into a library or libraries (and then writing a thinnish main wrapper over it to make it a complete runnable program), enhances its applicability manyfold.

This point applies even if you are not going to release the code as open source, because even then, you, or others on your team, can reuse those libraries to create other useful programs for the same area, at less cost and time.

Though this idea is not new or original (it's almost as old as computing, in fact, a lot of people still don't seem to know it or apply it, which is the cause of tons of wasted effort and rework (a.k.a. waste of money) in the software industry.

(*) Appropriate because, being a library, it can be used to create a family of spiders (or many different spiders), rather than just one, as would be the case if it was a standalone program - which neatly illustrates the point I just made above.

- Vasudev Ram
www.dancingbison.com

Wednesday, November 21, 2012

Turq, mock HTTP server scriptable in Python

By Vasudev Ram

Saw turq via a post comment.

Excerpt:

[ Turq is a tool for semi-interactively testing and debugging HTTP clients. Somewhat like httpbin, but more interactive and flexible. Turq runs a small HTTP server that is scriptable in a Python-based DSL. It lets you quickly set up mock URLs that respond with the status, headers and body of your choosing.
]

- Vasudev Ram - Dancing Bison Enterprises

Share |

Saturday, November 17, 2012

RequestBin: collect and inspect HTTP requests (service / CLI tool)

RequestBin — Collect and inspect HTTP requests, debug webhooks

Looks interesting.

The site says the CLI (command-line interface) tool is coming soon.

I am trying out the hosted service.

Thursday, October 11, 2012

httpbin.org by Kenneth Reitz

By Vasudev Ram

Interesting HTTP tool.

HttpBin.org

A few examples:

Returns user-agent: http://httpbin.org/user-agent

Returns given HTTP status code: http://httpbin.org/status/418

Returns headers dict: http://httpbin.org/headers

- Vasudev Ram - Dancing Bison Enterprises

Share |

Saturday, September 15, 2012

Lightweight web servers: IBM dW article by Cameron Laird

Lightweight Web servers

Pretty interesting and useful article, IMO. The range of possible uses of such lightweight web servers is surprising.

Monday, August 13, 2012

Inferno on Disco, Python MapReduce library / daemon for structured text

By Vasudev Ram

Inferno is an open-source Python MapReduce library. It has (from the site):

[ A query language for large amounts of structured text (CSV, JSON, etc).

A continuous and scheduled MapReduce daemon with an HTTP interface that automatically launches MapReduce jobs to handle a constant stream of incoming data. ]

Overview of Inferno.

This overview page has a nice serial example: starting with a small set of test data, it shows how to query for a certain result, in SQL and then in AWK (both are easy one-liners), but then goes on to show how the achieve the same result using Inferno.

The interesting point is that the Inferno code is also small (a "rule" of ~10 lines, presumably stored in a config file) and a one-line command, but the difference from the SQL and AWK examples is that this runs a Disco MapReduce job to distribute the work across the nodes on a cluster. There is almost nothing in the Inferno code to indicate that this is a distributed computing MapReduce job.

Inferno uses Disco.

Disco is "a distributed computing framework based on the MapReduce paradigm. Disco is open-source; developed by Nokia Research Center to solve real problems in handling massive amounts of data."

Some users of Disco: (Chango, Nokia, Zemanta). Chango staff seem to be the developers of Disco.

- Vasudev Ram - Dancing Bison Enterprises

Share |

Tuesday, July 31, 2012

Twython - a Python Twitter library

By Vasudev Ram

Twython is a Python library for Twitter. It is written by Ryan McGgrath.

Saw it from my own recent blog post about Twitter libraries.

Excerpt from the Twython Github site:

[ An up to date, pure Python wrapper for the Twitter API. Supports Twitter's main API, Twitter's search API, and using OAuth with Twitter. ]

I tried it out a little (the search feature), it worked fine. It returns JSON output.

The Twython installer (the usual "python setup.py install" kind) also installs the simplejson Python library, which is required, as well as the requests Python library, which is a more user-friendly HTTP library (billed as "HTTP for Humans) for Python, than the standard httplib one. BTW, another good Python HTTP library is httplib2, which was first developed by Joe Gregorio, IIRC.

- Vasudev Ram - Dancing Bison Enterprises

Share |

Monday, July 11, 2011

WebStatusCodes, a web development utility by Brian Jones

By Vasudev Ram - dancingbison.com | @vasudevram | jugad2.blogspot.com

I think this tool can be useful to web developers:

WebStatusCodes - http://webstatuscodes.appspot.com/ - is a web development utility written by Brian Jones, using Google AppEngine. Excerpts from the main page of the site, which convey what it is about:

[
This was put here as a handy testing tool for apps that need to test how their code deals with various HTTP status codes, and as a very basic reference for those who know HTTP but need an occasional reminder of whether 302 or 301 is "permanent", or can't remember which code means "Not Modified".
...
Request a valid status code by putting the status code as the first part of the URL path. For example, requesting
http://webstatuscodes.appspot.com/403 will return a 403 forbidden error, and the body of the response will contain the standard message describing that response.
...
Currently, this site doesn't do anything other than return the error. 401 doesn't issue a challenge, for example.
...
This site is also a handy reference. Not only is the list of supported codes and their accompanying short description messages in a table below (in case you *don't* know what you're looking for), but
requesting a status code in a browser includes the description in the body of the response (in case you *do* know what you're looking for).
]

Below the last paragraph above, is a table of web (HTTP) request status codes and their accompanying short descriptions.

Posted via email
- Vasudev Ram - Dancing Bison Enterprises

jugad2 - Vasudev Ram on software innovation

Pages

Saturday, March 1, 2014

The protocol-relative URL (for web links)

Friday, January 3, 2014

Use WebSockets and Python for web-based system monitoring

Thursday, December 27, 2012

Pholcidae, a Python web crawler library

Wednesday, November 21, 2012

Turq, mock HTTP server scriptable in Python

Saturday, November 17, 2012

RequestBin: collect and inspect HTTP requests (service / CLI tool)

Thursday, October 11, 2012

httpbin.org by Kenneth Reitz

Saturday, September 15, 2012

Lightweight web servers: IBM dW article by Cameron Laird

Monday, August 13, 2012

Inferno on Disco, Python MapReduce library / daemon for structured text

Tuesday, July 31, 2012

Twython - a Python Twitter library

Monday, July 11, 2011

WebStatusCodes, a web development utility by Brian Jones

Blog Archive

Labels