{"@attributes":{"version":"2.0"},"channel":{"title":"matthieu.io - Debian","description":"Posts categorized as 'debian'","link":"https:\/\/matthieu.io","item":[{"title":"Debsources, python3, and funky file names","description":"<p>Rumors are running that python2 is not a thing anymore.<\/p>\n\n<p>Well, I'm certainly late to the party, but I'm happy to report that\n<a href=\"https:\/\/sources.debian.org\">sources.debian.org<\/a> is now running python3.<\/p>\n\n<!--more-->\n\n<h2 id=\"wait-it-wasnt\">Wait, it wasn't?<\/h2>\n\n<p>Back when development started, python3 was very much a real language, but it was\nhard to adopt because it was not supported by many libraries. So python2 was\nchosen, meaning <code class=\"highlighter-rouge\">print<\/code>-based debugging was used in lieu of <code class=\"highlighter-rouge\">print()<\/code>-based\ndebugging, and <code class=\"highlighter-rouge\">str<\/code> were <code class=\"highlighter-rouge\">bytes<\/code>, not <code class=\"highlighter-rouge\">unicode<\/code>.<\/p>\n\n<p>And things were working just fine. One day python2 EOL was announced, with a\ndate far in the future. Far enough to procrastinate for a long time. Combine\nthis with a codebase that is stable enough to not see many commits, and the fact\nthat Debsources is a volunteer-based project that happens at best on week-ends,\nand you end up with a dormant software and a missed deadline.<\/p>\n\n<p>But, as dormant as the codebase is, the instance hosted at\n<a href=\"https:\/\/sources.debian.org\">sources.debian.org<\/a> is very popular and gets 200k\nto 500k hits per day. Largely enough to be worth a proper maintenance and a\ntransition to python3.<\/p>\n\n<h2 id=\"funky-file-names\">Funky file names<\/h2>\n\n<p>While transitioning to python3 and juggling left and right with <code class=\"highlighter-rouge\">str<\/code>, <code class=\"highlighter-rouge\">bytes<\/code>\nand <code class=\"highlighter-rouge\">unicode<\/code> for internal objects, files, database entries and HTTP content, I\nstumbled upon a bug that has been there since day 1.<\/p>\n\n<p>Quick recap if you're unfamiliar with this tool: Debsources displays the content\nof the source packages in the Debian archive. In other words, it's a bit like\nGitHub, but for the Debian source code.<\/p>\n\n<p>And some pieces of software out there, that ended up in Debian packages, happen\nto contain files whose names can't be decoded to UTF-8. Interestingly enough,\nthere's no such thing as a standard for file names: with a few exceptions that\nvary by operating system, any sequence of bytes can be a legit file name. And\nsome sequences of bytes are not valid UTF-8.<\/p>\n\n<p>Of course those files are rare, and using ASCII characters to name a file is a\nmuch more common practice than using bytes in a non-UTF-8 character\nencoding. But when you deal with almost 100 million files on which you have no\ncontrol (those files come from free software projects, and make their way into\nDebian without any renaming), it happens.<\/p>\n\n<p>Now back to the bug: when trying to display such a file through the web\ninterface, it would crash because it can't convert the file name to UTF-8, which\nis needed for the HTML representation of the page.<\/p>\n\n<h2 id=\"bugfix\">Bugfix<\/h2>\n\n<p>An often valid approach when trying to represent invalid UTF-8 content is to\nignore errors, and replace them with <code class=\"highlighter-rouge\">?<\/code> or <code class=\"highlighter-rouge\">\ufffd<\/code>. This is what Debsources\nactually does to display non-UTF-8 <em>file content<\/em>.<\/p>\n\n<p>Unfortunately, this best-effort approach is not suitable for file names, as file\nnames are also <em>identifiers<\/em> in Debsources: among other places, they are part of\nURLs. If an URL were to use placeholder characters to replace those bytes, there\nwould be no deterministic way to match it with a file on disk anymore.<\/p>\n\n<p>The representation of binary data into text is a known problem. Multiple\nlossless solutions exist, such as base64 and its variants, but URLs looking like\n<code class=\"highlighter-rouge\">https:\/\/sources.debian.org\/src\/Y293c2F5LzMuMDMtOS4yL2Nvd3NheS8=<\/code> are not\nreadable at all compared to\n<code class=\"highlighter-rouge\">https:\/\/sources.debian.org\/src\/cowsay\/3.03-9.2\/cowsay\/<\/code>. Plus, not\nbackwards-compatible with all existing links.<\/p>\n\n<p>The solution I chose is to use <a href=\"\/blog\/2021\/11\/04\/binary-data-url\/\">double-percent encoding<\/a>: this allows the representation of any byte in an\nURL, while keeping allowed characters unchanged - and preventing CGI gateways\nfrom trying to decode non-UTF-8 bytes. This is the best of both worlds: regular\nfile names get to appear normally and are human-readable, and funky file names\nonly have percent signs and hex numbers where needed.<\/p>\n\n<p>Here is an example of such an URL:\n<a href=\"https:\/\/sources.debian.org\/src\/aspell-is\/0.51-0-4\/%25EDslenska.alias\/\">https:\/\/sources.debian.org\/src\/aspell-is\/0.51-0-4\/%25EDslenska.alias\/<\/a>. Notice\nthe <code class=\"highlighter-rouge\">%25ED<\/code> to represent the percentage symbol itself (<code class=\"highlighter-rouge\">%25<\/code>) followed by an\ninvalid UTF-8 byte (<code class=\"highlighter-rouge\">%ED<\/code>).<\/p>\n\n<p>Transitioning to this was quite a challenge, as those file names don't only\nappear in URLs, but also in web pages themselves, log files, database tables,\netc. And everything was done with <code class=\"highlighter-rouge\">str<\/code>: made sense in python2 when <code class=\"highlighter-rouge\">str<\/code> were\n<code class=\"highlighter-rouge\">bytes<\/code>, but not much in python3.<\/p>\n\n<h2 id=\"what-are-those-files-whats-their-network\">What are those files? What's their network?<\/h2>\n\n<p>I was wondering too. Let's list them!<\/p>\n\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import<\/span> <span class=\"nn\">os<\/span>\n\n<span class=\"k\">with<\/span> <span class=\"nb\">open<\/span><span class=\"p\">(<\/span><span class=\"s\">'non-utf-8-paths.bin'<\/span><span class=\"p\">,<\/span> <span class=\"s\">'wb'<\/span><span class=\"p\">)<\/span> <span class=\"k\">as<\/span> <span class=\"n\">f<\/span><span class=\"p\">:<\/span>\n    <span class=\"k\">for<\/span> <span class=\"n\">root<\/span><span class=\"p\">,<\/span> <span class=\"n\">folders<\/span><span class=\"p\">,<\/span> <span class=\"n\">files<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">os<\/span><span class=\"p\">.<\/span><span class=\"n\">walk<\/span><span class=\"p\">(<\/span><span class=\"sa\">b<\/span><span class=\"s\">'\/srv\/sources.debian.org\/sources\/'<\/span><span class=\"p\">):<\/span>\n        <span class=\"k\">for<\/span> <span class=\"n\">path<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">folders<\/span> <span class=\"o\">+<\/span> <span class=\"n\">files<\/span><span class=\"p\">:<\/span>\n            <span class=\"k\">try<\/span><span class=\"p\">:<\/span>\n                <span class=\"n\">path<\/span><span class=\"p\">.<\/span><span class=\"n\">decode<\/span><span class=\"p\">(<\/span><span class=\"s\">'utf-8'<\/span><span class=\"p\">)<\/span>\n            <span class=\"k\">except<\/span> <span class=\"nb\">UnicodeDecodeError<\/span><span class=\"p\">:<\/span>\n                <span class=\"n\">f<\/span><span class=\"p\">.<\/span><span class=\"n\">write<\/span><span class=\"p\">(<\/span><span class=\"n\">root<\/span> <span class=\"o\">+<\/span> <span class=\"sa\">b<\/span><span class=\"s\">'\/'<\/span> <span class=\"o\">+<\/span> <span class=\"n\">path<\/span> <span class=\"o\">+<\/span> <span class=\"sa\">b<\/span><span class=\"s\">'<\/span><span class=\"se\">\\n<\/span><span class=\"s\">'<\/span><span class=\"p\">)<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Running this on the Debsources main instance, which hosts pretty much all Debian\npackages that were part of a Debian release, I could find 307 files (among a\ntotal of almost 100 million files).<\/p>\n\n<p>Without looking deep into them, they seem to fall into 2 categories:<\/p>\n\n<ul>\n  <li>File names that are not valid UTF-8, but are valid in a different charset. Not\nall software is developed in English or on UTF-8 systems.<\/li>\n  <li>File names that can't be decoded to UTF-8 on purpose, to be used as input to\ntest suites, and assert resilience of the software to non-UTF-8 data.<\/li>\n<\/ul>\n\n<p>That last point hits home, as it was clearly lacking in Debsources. A funky file\nname is now part of its test suite. ;)<\/p>\n","pubDate":"Mon, 24 Jan 2022 00:00:00 +0100","link":"https:\/\/matthieu.io\/blog\/2022\/01\/24\/debsources-python3-funky-file-names\/","guid":"https:\/\/matthieu.io\/blog\/2022\/01\/24\/debsources-python3-funky-file-names\/"},{"title":"MiniDebconf in Toulouse","description":"<p>I attended the MiniDebconf in Toulouse, which was hosted in the larger\nCapitole du Libre, a free software event with talks, presentation of\nassociations, and a keysigning party. I didn't expect the event to be\nthat big, and I was very impressed by its organization. Cheers to all\nthe volunteers, it has been an amazing week-end!<\/p>\n\n<p>Here's a sum-up of the talks I attended.<\/p>\n\n<h2 id=\"du-logiciel-libre-\u00e0-la-monnaie-libre\">Du logiciel libre \u00e0 la monnaie libre<\/h2>\n\n<p><strong>Speaker<\/strong>: \u00c9lo\u00efs<\/p>\n\n<p>The first talk I attended was, translated to English, \"from free\nsoftware to free money\".<\/p>\n\n<p>\u00c9lo\u00efs compared the 4 freedoms of free software with money, and what\nproperties money needs to exhibit in order to be considered free. He\nthen introduced \u011e1, a project of free (as in free speech!) money,\nstarted in the region around Toulouse. Contrary to some distributed\nledgers such as Bitcoin, \u011e1 isn't based on an hash-based\nproof-of-work, but rather around a web of trust of people certifying\neach other, hence limiting the energy consumption required by the\nnetwork to function.<\/p>\n\n<h2 id=\"yunohost\">YunoHost<\/h2>\n\n<p><strong>Speaker<\/strong>: Jimmy Monin<\/p>\n\n<p>I then attended a presentation of YunoHost. Being an happy user\nmyself, it was very nice to discover the future expected features, and\nalso meet two of the developers. YunoHost is a Debian-based project,\naimed at providing all the tools necessary to self-host applications,\nincluding email, website, calendar, development tools, and dozens of\nother packages.<\/p>\n\n<h2 id=\"premiers-pas-dans-lunivers-de-debian\">Premiers pas dans l'univers de Debian<\/h2>\n\n<p><strong>Speaker<\/strong>: Nicolas Dandrimont<\/p>\n\n<p>For the first talk of the MiniDebConf, Nicolas Dandrimont introduced\nDebian, its philosophy, and how it works with regards to upstreams and\ndownstreams. He gave many details on the teams, the infrastructure,\nand the internals of Debian.<\/p>\n\n<h2 id=\"trusting-your-computer-and-system\">Trusting your computer and system<\/h2>\n\n<p><strong>Speaker<\/strong>: Jonas Smedegaard<\/p>\n\n<p>Jonas introduced some security concepts, and how they are abused and\noften meaningless (to quote his own words, \"secure is bullshit\"). He\ndescribed a few projects which lean towards a more secure and open\nhardware, for both phones and laptops.<\/p>\n\n<h2 id=\"automatiser-la-gestion-de-configuration-de-debian-avec-ansible\">Automatiser la gestion de configuration de Debian avec Ansible<\/h2>\n\n<p><strong>Speaker<\/strong>: J\u00e9r\u00e9my Lecour<\/p>\n\n<p>J\u00e9r\u00e9my, from Evolix, introduced Ansible, and how they use it to manage\nhundreds of Debian servers. Ansible is a very powerful tool, and a\nhuge ecosystem, in many ways similar to Puppet or Chef, except it is\nagent-less, using only ssh connections to communicate with remote\nmachines. Very nice to compare their use of Ansible with mine, since\nthat's the software I use at work for deploying experiments.<\/p>\n\n<h2 id=\"making-debian-for-everybody\">Making Debian for everybody<\/h2>\n\n<p><strong>Speaker<\/strong>: Samuel Thibault<\/p>\n\n<p>Samuel gave a talk about accessibility, and the general availability\nof the tools in today's operating systems, including Debian. The\nlesson to take home is that we often don't do enough in this domain,\nparticularly when considering some issues people might have that we\ndon't always think about. Accessibility on computers (and elsewhere)\nshould be the default, and never require complex setups.<\/p>\n\n<h2 id=\"retour-dexp\u00e9rience--mise-\u00e0-jour-de-milliers-de-terminaux-debian\">Retour d'exp\u00e9rience : mise \u00e0 jour de milliers de terminaux Debian<\/h2>\n\n<p><strong>Speaker<\/strong>: Cyril Brulebois<\/p>\n\n<p>Cyril described a problem he was hired for, an update of thousands of\nDebian servers from wheezy to jessie, which he discovered afterwards\nwas worse than initially thought, since the machines were running the\nout-of-date squeeze. Since they were not always administered with the\nbest sysadmin practices, they were all exhibiting different\nconfigurations and different packages lists, which raised many issues\nand gave him interesting challenges. They were solved using Ansible,\nwhich also had the effect of standardizing their system administration\npractices.<\/p>\n\n<h2 id=\"retour-dexp\u00e9rience--utilisation-de-debian-chez-evolix\">Retour d'exp\u00e9rience : utilisation de Debian chez Evolix<\/h2>\n\n<p><strong>Speaker<\/strong>: Gr\u00e9gory Colpart<\/p>\n\n<p>Gr\u00e9gory described Evolix, a company which manages servers for their\nclients, and how they were inspired by Debian, for both their internal\ntools and their practices. It is very interesting to see that some of\nthe Debian values can be easily exported for a more open and\ncollaborative business.<\/p>\n\n<h2 id=\"lightning-talks\">Lightning talks<\/h2>\n\n<p>To close the conference, two lightning talks were presented,\ndescribing the switch from Windows XP to Debian in an ecologic\nassociation near Toulouse; and how snapshot.debian.org can be used\nwith bisections to find the source of some regressions.<\/p>\n\n<h2 id=\"conclusion\">Conclusion<\/h2>\n\n<p>A big thank you to all the organizers and the associations who\ncontributed to make this event a success. Cheers!<\/p>\n","pubDate":"Sun, 19 Nov 2017 00:00:00 +0100","link":"https:\/\/matthieu.io\/blog\/2017\/11\/19\/minidebconf-toulouse\/","guid":"https:\/\/matthieu.io\/blog\/2017\/11\/19\/minidebconf-toulouse\/"},{"title":"Debugging 101","description":"<p>While teaching this semester a class on concurrent programming, I\nrealized during the labs that most of the students couldn't properly\n<em>debug<\/em> their code. They are at the end of a 2-year cursus, know many\ndifferent programming languages and frameworks, but when it comes to\ntracking down a bug in their own code, they often lacked the\nbasics. Instead of debugging for them I tried to give them general\ndirections that they could apply for the next bugs. I will try here to\nsummarize the very first basic things to know about\ndebugging. Because, remember, writing software is 90% debugging, and\n10% introducing new bugs (that is not from me, but I could not find\nthe original quote).<\/p>\n\n<p>So here is my take at <em>Debugging 101<\/em>.<\/p>\n\n<h2 id=\"use-the-right-tools\">Use the right tools<\/h2>\n\n<p>Many good tools exist to assist you in writing correct software, and\nit would put you behind in terms of productivity not to use\nthem. Editors which catch syntax errors while you write them, for\nexample, will help you a lot. And there are many features out there in\neditors, compilers, debuggers, which will prevent you from introducing\ntrivial bugs. Your editor should be your friend; explore its features\nand customization options, and find an efficient workflow with them,\nthat you like and can improve over time. The best way to fix bugs is\nnot to have them in the first place, obviously.<\/p>\n\n<h2 id=\"test-early-test-often\">Test early, test often<\/h2>\n\n<p>I've seen students writing code for one hour before running <code class=\"highlighter-rouge\">make<\/code>,\nthat would fail so hard that hundreds of lines of errors and warnings\nwere outputted. There are two main reasons doing this is a bad idea:<\/p>\n\n<ul>\n  <li>You have to debug all the errors at once, and the complexity of\nsolving many bugs, some dependent on others, is way higher than the\ncomplexity of solving a single bug. Moreover, it's discouraging.<\/li>\n  <li>Wrong assumptions you made at the beginning will make the following\nlines of code wrong. For example if you chose the wrong data\nstructure for storing some information, you will have to fix all the\ncode using that structure. It's less painful to realize earlier it\nwas the wrong one to choose, and you have more chances of knowing\nthat if you compile and execute often.<\/li>\n<\/ul>\n\n<p>I recommend to test your code (compilation <em>and<\/em> execution) every few\nlines of code you write. When something breaks, chances are it will\ncome from the last line(s) you wrote. Compiler errors will be shorter,\nand will point you to the same place in the code. Once you get more\nconfident using a particular language or framework, you can write\nmore lines at once without testing. That's a slow process, but it's\nok. If you set up the right keybinding for compiling and executing\nfrom within your editor, it shouldn't be painful to test early and\noften.<\/p>\n\n<h2 id=\"read-the-logs\">Read the logs<\/h2>\n\n<p>Spot the places where your program\/compiler\/debugger writes text, and\nread it carefully. It can be your terminal (quite often), a file in\nyour current directory, a file in <code class=\"highlighter-rouge\">\/var\/log\/<\/code>, a web page on a local\nserver, anything. Learn where different software write logs on your\nsystem, and integrate reading them in your workflow. Often, it will be\nyour only information about the bug. Often, it will tell you where the\nbug lies. Sometimes, it will even give you hints on how to fix it.<\/p>\n\n<p>You may have to filter out a lot of garbage to find relevant\ninformation about your bug. Learn to spot some keywords like <code class=\"highlighter-rouge\">error<\/code>\nor <code class=\"highlighter-rouge\">warning<\/code>. In long stacktraces, spot the lines concerning your\nfiles; because more often, your code is to be blamed, rather than\ndeeper library code. <code class=\"highlighter-rouge\">grep<\/code> the logs with relevant keywords. If you\nhave the option, colorize the output. Use <code class=\"highlighter-rouge\">tail -f<\/code> to follow a file\ngetting updated. There are so many ways to grasp logs, so find what\nworks best with you and never forget to use it!<\/p>\n\n<h2 id=\"print-foobar\">Print foobar<\/h2>\n\n<p>That one doesn't concern compilation errors (unless it's a <code class=\"highlighter-rouge\">Makefile<\/code>\nerror, in that case this file is your code anyway).<\/p>\n\n<p>When the program logs and output failed to give you where an error\noccured (oh hi <code class=\"highlighter-rouge\">Segmentation fault<\/code>!), and before having to dive into\na memory debugger or system trace tool, spot the portion of your\nprogram that causes the bug and add in there some <code class=\"highlighter-rouge\">print<\/code>\nstatements. You can either <code class=\"highlighter-rouge\">print(\"foo\")<\/code> and <code class=\"highlighter-rouge\">print(\"bar\")<\/code>, just to\nknow that your program reaches or not a certain place in your code, or\n<code class=\"highlighter-rouge\">print(some_faulty_var)<\/code> to get more insights on your program\nstate. It will give you precious information.<\/p>\n\n<div class=\"language-c++ highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">stderr<\/span> <span class=\"o\">&gt;&gt;<\/span> <span class=\"s\">\"foo\"<\/span> <span class=\"o\">&gt;&gt;<\/span> <span class=\"n\">endl<\/span><span class=\"p\">;<\/span>\n<span class=\"n\">my_db<\/span><span class=\"p\">.<\/span><span class=\"n\">connect<\/span><span class=\"p\">();<\/span> <span class=\"c1\">\/\/ is this broken?<\/span>\n<span class=\"n\">stderr<\/span> <span class=\"o\">&gt;&gt;<\/span> <span class=\"s\">\"bar\"<\/span> <span class=\"o\">&gt;&gt;<\/span> <span class=\"n\">endl<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>In the example above, you can be sure it is the connection to the\ndatabase <code class=\"highlighter-rouge\">my_db<\/code> that is broken if you get <code class=\"highlighter-rouge\">foo<\/code> and not <code class=\"highlighter-rouge\">bar<\/code> on your\nstandard error.<\/p>\n\n<p>(That is an hypothetical example. If you know something can break,\nsuch as a database connection, then you should always enclose it in a\n<code class=\"highlighter-rouge\">try<\/code>\/<code class=\"highlighter-rouge\">catch<\/code> structure).<\/p>\n\n<h2 id=\"isolate-and-reproduce-the-bug\">Isolate and reproduce the bug<\/h2>\n\n<p>This point is linked to the previous one. You may or may not have\nisolated the line(s) causing the bug, but maybe the issue is not\nalways raised. It can depend on many other things: the program or\nfunction parameters, the network status, the amount of memory\navailable, the decisions of the OS scheduler, the user rights on the\nsystem or on some files, etc. More generally, any assumption you made\non any external dependency can appear to be wrong (even if it's right\n99% of the time). According to the context, try to isolate the set of\nconditions that trigger the bug. It can be as simple as \"when there is\nno internet connection\", or as complicated as \"when the CPU load of\nsome external machine is too high, it's a leap year, and the input\ncontains illegal utf-8 characters\" (ok, that one is a lot, but it\nsurely happens!). But you need to reliably be able to reproduce the\nbug, in order to be sure later that you indeed fixed it.<\/p>\n\n<p>Of course when the bug is triggered at every run, it can be\nfrustrating that your program never works but it will in general be\neasier to fix.<\/p>\n\n<h2 id=\"rtfm\">RTFM<\/h2>\n\n<p>Always read the documentation before reaching out for help. Be it\n<code class=\"highlighter-rouge\">man<\/code>, a book, a website or a wiki, you will find precious information\nthere to assist you in using a language or a specific library. It can\nbe quite intimidating at first, but it's often organized the same\nway. You're likely to find a search tool, an API reference, a\ntutorial, and many examples. Compare your code against them. Check in\nthe FAQ, maybe your bug and its solution are already referenced there.<\/p>\n\n<p>You'll rapidly find yourself getting used to the way documentation is\norganized, and you'll be more and more efficient at finding instantly\nwhat you need. Always keep the doc window open!<\/p>\n\n<h2 id=\"google-and-stack-overflow-are-your-friends\">Google and Stack Overflow are your friends<\/h2>\n\n<p>Let's be honest: many of the bugs you'll encounter have been\nencountered before. Learn to write efficient queries on search\nengines, and use the knowledge you can find on questions&amp;answers\nforums like Stack Overflow. Read the answers and comments. Be wise\nthough, and never <a href=\"https:\/\/twitter.com\/ThePracticalDev\/status\/705825638851149824\">blindly copy and\npaste<\/a>\ncode from there. It can be as bad as introducing malicious security\nissues into your code, and you won't learn anything. Oh, and don't\ncopy and paste anyway. You have to be sure you understand every single\nline, so better write them by hand; it's also better for memorizing\nthe issue.<\/p>\n\n<h2 id=\"take-notes\">Take notes<\/h2>\n\n<p>Once you have identified and solved a particular bug, I advise to\nwrite about it. No need for shiny interfaces: keep a list of your bugs\nalong with their solutions in one or many text files, organized by\nlanguage or framework, that you can easily <code class=\"highlighter-rouge\">grep<\/code>.<\/p>\n\n<p>It can seem slightly cumbersome to do so, but it proved (at least to\nme) to be very valuable. I can often recall I have encountered some\nbuggy situation in the past, but don't always remember the\nsolution. Instead of losing all the debugging time again, I search in\nmy bug\/solution list first, and when it's a hit I'm more than happy I\nkept it.<\/p>\n\n<h2 id=\"reach-out-for-help-from-your-duck\">Reach out for help (from your duck)<\/h2>\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Rubber_duck_debugging\">Rubber duck debugging<\/a>\nencourages you to explain your problem loudly, for example to a rubber\nduck sitting on your desk. Using plain words to describe a problem\nwill force you to think about it clearly, and you'll be amazed how\nfast it might lead you to identifying the root cause of your bug,\nbecause you'll realize there's a problem somewhere you couldn't think\nabout if you didn't start from the beginning.<\/p>\n\n<h2 id=\"further-debugging\">Further debugging<\/h2>\n\n<p>Remember this was only <em>Debugging 101<\/em>, that is, the very first steps\non how to debug code on your own, instead of getting frustrated and\nhelplessly stare at your screen without knowing where to begin. When\nyou'll write more software, you'll get used to more efficient\nworkflows, and you'll discover tools that are here to assist you in\nwriting bug-free code and spotting complex bugs efficiently. Listed\nbelow are <em>some<\/em> of the tools or general ideas used to debug more\ncomplex software. They belong more to a software engineering course\nthan a Debugging 101 blog post. But it's good to know as soon as\npossible these exist, and if you read the manuals there's no reason\nyou can't rock with them!<\/p>\n\n<ul>\n  <li>\n    <p><em>Loggers<\/em>. To make the \"foobar\" debugging more efficient, some\nlibraries are especially designed for the task of logging out\ninformation about a running program. They often have way more\nfeatures than a simple <code class=\"highlighter-rouge\">print<\/code> statement (at the price of being\nover-engineered for simple programs): severity levels (<code class=\"highlighter-rouge\">info<\/code>,\n<code class=\"highlighter-rouge\">warning<\/code>, <code class=\"highlighter-rouge\">error<\/code>, <code class=\"highlighter-rouge\">fatal<\/code>, etc), output in rotating files, and\nmany more.<\/p>\n  <\/li>\n  <li>\n    <p><em>Version control<\/em>. Following the evolution of a program in time,\nover multiple versions, contributors and forks, is a hard\ntask. That's where version control plays: it allows you to keep the\nentire history of your program, and switch to any previous\nversion. This way you can identify more easily when a bug was\nintroduced (and by whom), along with the patch (a set of changes to\na code base) that introduced it. Then you know where to apply your\nfix. Famous version control tools include Git, Subversion, and\nMercurial.<\/p>\n  <\/li>\n  <li>\n    <p><em>Debuggers<\/em>. Last but not least, it wouldn't make sense to talk\nabout debugging without mentioning debuggers. They are tools to\ninspect the state of a program (for example the type and value of\nvariables) while it is running. You can pause the program, and\nexecute it line by line, while watching the state evolve. Sometimes\nyou can also manually change the value of variables to see what\nhappens. Even though some of them are hard to use, they are very\nvaluable tools, totally worth diving into!<\/p>\n  <\/li>\n<\/ul>\n\n<p>Don't hesitate to comment on this, and provide your debugging 101\ntips! I'll be happy to update the article with valuable feedback.<\/p>\n\n<p>Happy debugging!<\/p>\n","pubDate":"Sun, 23 Oct 2016 00:00:00 +0200","link":"https:\/\/matthieu.io\/blog\/2016\/10\/23\/debugging-101\/","guid":"https:\/\/matthieu.io\/blog\/2016\/10\/23\/debugging-101\/"},{"title":"A one-liner to catch'em all!","description":"<p>I wrote a Bash one-liner to open the source code (in\n<a href=\"http:\/\/sources.debian.net\">Debsources<\/a>) of any file on your system\n(if it belongs to a Debian package).<\/p>\n\n<p>It will simply retrieve the associated package and point your default\nbrowser to its source code.<\/p>\n\n<p>Add this somewhere in your $PATH, and name this file <code class=\"highlighter-rouge\">debsrc<\/code>:<\/p>\n\n<div class=\"language-bash highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c\">#!\/bin\/bash<\/span>\n\n<span class=\"k\">function <\/span>debsrc <span class=\"o\">{<\/span>\n    <span class=\"nb\">readlink<\/span> <span class=\"nt\">-f<\/span> <span class=\"nv\">$1<\/span> | xargs dpkg-query <span class=\"nt\">--search<\/span> | <span class=\"nb\">cut<\/span> <span class=\"nt\">-d<\/span>: <span class=\"nt\">-f1<\/span> | xargs apt-cache showsrc | <span class=\"nb\">head<\/span> <span class=\"nt\">-n<\/span> 1 | grep-dctrl <span class=\"nt\">-s<\/span> <span class=\"s1\">'Package'<\/span> <span class=\"nt\">-n<\/span> <span class=\"s1\">''<\/span> | <span class=\"nb\">awk<\/span> <span class=\"nt\">-F<\/span> <span class=\"s2\">\" \"<\/span> <span class=\"s1\">'{print \"http:\/\/sources.debian.net\/src\/\"$1\"\/latest\/\"}'<\/span> | xargs x-www-browser\n<span class=\"o\">}<\/span>\n\n<span class=\"nv\">CMD<\/span><span class=\"o\">=<\/span><span class=\"s2\">\"<\/span><span class=\"nv\">$1<\/span><span class=\"s2\">\"<\/span>\ndebsrc <span class=\"k\">${<\/span><span class=\"nv\">CMD<\/span><span class=\"k\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>And try something like <code class=\"highlighter-rouge\">debsrc \/usr\/share\/doc\/acpi\/AUTHORS<\/code>. Enjoy!<\/p>\n\n<p>Update: improved the one-liner thanks to josch's advice.<\/p>\n","pubDate":"Sun, 16 Aug 2015 00:00:00 +0200","link":"https:\/\/matthieu.io\/blog\/2015\/08\/16\/one-liner-to-catch-em-all\/","guid":"https:\/\/matthieu.io\/blog\/2015\/08\/16\/one-liner-to-catch-em-all\/"},{"title":"Debsources got swag and continuous integration","description":"<p>Debsources (<a href=\"http:\/\/sources.debian.net\">http:\/\/sources.debian.net<\/a>) is still under\nactive development. We recently had a Gnome Outreachy intern, Jingjie Jiang, and we're\nabout to work with 2 GSoC students, Cl\u00e9ment Schreiner and Orestis Ioannou.<\/p>\n\n<p>I will present here the GitHub mirror we've set up, in order to allow external pull\nrequests to be submitted, and to use the continous integration service provided by\nTravis-CI.<\/p>\n\n<h2 id=\"github-and-travis-ci\">GitHub and Travis-CI<\/h2>\n\n<p>Debsources' source code is hosted on\n<a href=\"https:\/\/anonscm.debian.org\/cgit\/qa\/debsources.git\">Debian's git servers<\/a>, and from\nthere is mirrored to <a href=\"https:\/\/github.com\/Debian\/debsources\">GitHub<\/a>. Every time a commit\nis pushed (to <em>master<\/em> or other branches) or a pull request is open, the test suite will\nbe automatically run on Travis-CI, and the result (tests pass or don't) is displayed on\nGitHub. This allows us to quickly filter external contributions (when they are submitted\non GitHub), and be sure everything works with our setup, before reviewing work.<\/p>\n\n<p>Travis-CI runs the tests on OpenVZ containers. The complete infrastructure was a bit\nchallenging to setup, but as we now have a Docker recipe to quicly begin to hack on\nDebsources, most of the work could be done using the <em>Dockerfile<\/em> instructions.<\/p>\n\n<p>In average, a run on Travis-CI (which includes git cloning the code and test data, setup\nthe server, and run the tests suite) takes 7 minutes, which is an ok amount of time to\nwait for before submitting a pull request, in my opinion.<\/p>\n\n<h2 id=\"bugs-discovered-in-the-process\">Bugs discovered in the process<\/h2>\n\n<p>Setting up this continuous integration infrastructure made me discover a few bugs.<\/p>\n\n<h3 id=\"python-magic-does-black-magic\">Python magic does black magic<\/h3>\n\n<p>Debsources runs fine on Debian (not surprisingly), but I got tricked by black magic when\nI tried to run it on Ubuntu (which is the OS run in Travis-CI's containers).<\/p>\n\n<p>We use the <em>magic<\/em> library to guess the type of files we're dealing with, for instance\nwhen we need to decide between rendering a file (for text files) or downloading it (for\nbinary files).<\/p>\n\n<p>Here comes the tricky part: the Python bindings for libmagic are not the same in Debian\nand Pypi. Debsources uses Debian package <em>python-magic<\/em>, which is not in Ubuntu 12.04.\nMoreover, there's no Python egg for it on Pypi, which has however another package\n(called <em>magic<\/em>) which provides a different API.<\/p>\n\n<p>I solved this with a dirty hack, using the fact python-magic lies in a single file:<\/p>\n\n<div class=\"language-bash highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">mkdir<\/span> \/tmp\/python-magic <span class=\"o\">&amp;&amp;<\/span> wget https:\/\/raw.githubusercontent.com\/file\/file\/master\/python\/magic.py <span class=\"nt\">-O<\/span> \/tmp\/python-magic\/magic.py <span class=\"o\">&amp;&amp;<\/span> <span class=\"nb\">export <\/span><span class=\"nv\">PYTHONPATH<\/span><span class=\"o\">=<\/span>\/tmp\/python-magic\/:<span class=\"nv\">$PYTHONPATH<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>It simply downloads the library, saves it in a temporary folder and includes it in the\nPython path. Let's see for how long it works before everything breaks!<\/p>\n\n<p>Edit: Thanks to Cl\u00e9ment Schreiner, python-magic is now installed by cloning its Github\nrepository and <code class=\"highlighter-rouge\">pip install<\/code>'ing into it.<\/p>\n\n<h3 id=\"size-of-a-directory\">Size of a directory<\/h3>\n\n<p>One test in the suite was ensuring the information returned by <code class=\"highlighter-rouge\">ls -l<\/code> on a directory\nand stored in the DB was the right information. Inode metadata was tested, such as name,\npermissions, type, or size.<\/p>\n\n<p>Interestingly enough, the size of a directory was tested, and expected to be 4096 bytes.\nThe size of a directory actually depends on the filesystem in use, and on the number of\nfiles this directory contains. We often see 4096 because it's the size of a not-too-big\ndirectory on ext4.<\/p>\n\n<p>Travis-CI doesn't use ext4:<\/p>\n\n<div class=\"language-bash highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ <\/span><span class=\"nb\">df<\/span> <span class=\"nt\">-T<\/span>\nFilesystem            Type     1K-blocks      Used Available Use%\nMounted on\n\/vz\/private\/209140041 simfs    125829120 103460612  22368508  83% \/\nnone                  devtmpfs   1572864         8   1572856   1% \/dev\nnone                  tmpfs       314576        56    314520   1% \/run\nnone                  tmpfs         5120         4      5116   1%\n\/run\/lock\nnone                  tmpfs      1572864         0   1572864   0%\n\/run\/shm\n\/dev\/null             tmpfs       786432    171584    614848  22%\n\/var\/ramfs\n<\/code><\/pre><\/div><\/div>\n\n<p>Simfs is a container filesystem for OpenVZ, on which directories have different sizes\nthan on ext4:<\/p>\n\n<div class=\"language-bash highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ <\/span><span class=\"nb\">ls<\/span> <span class=\"nt\">-al<\/span> \/\ntotal 0\ndrwxr-xr-x 23 root     root      480 Feb  4 18:08 <span class=\"nb\">.<\/span>\ndrwxr-xr-x 23 root     root      480 Feb  4 18:08 ..\ndrwxr-xr-x  2 root     root     2480 Feb  4 18:20 bin\ndrwxr-xr-x  2 root     root       40 Apr 19  2012 boot\ndrwxr-xr-x  5 root     root      660 Apr 30 13:56 dev\ndrwxr-xr-x 99 root     root     3560 Apr 30 13:56 etc\n<span class=\"nt\">-rw-r--r--<\/span>  1 root     root        0 Feb  4 17:56 fastboot\ndrwxr-xr-x  3 root     root       80 Feb  4 17:57 home\n<span class=\"o\">[<\/span>...]\n<\/code><\/pre><\/div><\/div>\n\n<p>Directory sizes are not even powers of 2.<\/p>\n\n<p>Hence I changed the test to not check directory sizes. Hopefully this will help to make\nDebsources work on more filesystems!<\/p>\n\n<h3 id=\"an-empty-file-is-hiding\">An empty file is hiding<\/h3>\n\n<p>Last but not least, because this\n<a href=\"https:\/\/bugs.debian.org\/cgi-bin\/bugreport.cgi?bug=783832\">bug<\/a> is still open in the\nwild. A file, which appears to be empty, is not taken into account by Debsources'\nupdater. This file is <code class=\"highlighter-rouge\">sources\/non-free\/m\/make-doc-non-dfsg\/4.0-2\/.pc\/applied-patches<\/code>.\nIt is present in the filesystem in the container, is not the only empty file over there,\nbut still doesn't appear in the database, and make fail the test which counts files.<\/p>\n\n<p>The test has been commented out (booooooh), so that we still can use Travis-CI's\nplatform for our GSoC students, before it's fixed.<\/p>\n\n<h2 id=\"conclusion\">Conclusion<\/h2>\n\n<p>Making Debsources run automatically on a different platform as the one we usually use\npermitted us to spot bugs, write dirty hacks, and expand the filesystems it's supposed\nto run on.<\/p>\n\n<p>Now, let's hope the continuous integration will help our GSoC students, and let's wish\nthem good luck!<\/p>\n","pubDate":"Thu, 07 May 2015 00:00:00 +0200","link":"https:\/\/matthieu.io\/blog\/2015\/05\/07\/debsources-got-swag-and-continuous-integration\/","guid":"https:\/\/matthieu.io\/blog\/2015\/05\/07\/debsources-got-swag-and-continuous-integration\/"}]}}