{"@attributes":{"version":"2.0"},"channel":{"title":"DEV Community: Evan Wilde","description":"The latest articles on DEV Community by Evan Wilde (@etcwilde).","link":"https:\/\/dev.to\/etcwilde","image":{"url":"https:\/\/media2.dev.to\/dynamic\/image\/width=90,height=90,fit=cover,gravity=auto,format=auto\/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F16817%2F51472c0d-cda2-4f6a-a076-e15a08952965.jpg","title":"DEV Community: Evan Wilde","link":"https:\/\/dev.to\/etcwilde"},"language":"en","item":[{"title":"Perf - Perfect Profiling of C\/C++ on Linux","pubDate":"Sat, 18 Nov 2017 06:22:29 +0000","link":"https:\/\/dev.to\/etcwilde\/perf---perfect-profiling-of-cc-on-linux-of","guid":"https:\/\/dev.to\/etcwilde\/perf---perfect-profiling-of-cc-on-linux-of","description":"<p>I've seen power-Visual Studio users do amazing things with the profiler and debugger. It looks amazing, but there is one small problem. I'm on Linux. No visual studio for me. How do I profile now? There are some profilers out there for Linux too, each with varying degrees of usability.<\/p>\n\n<p>Perf is a neat little tool that I just found for profiling programs. Perf uses statistical profiling, where it polls the program and sees what function is working. This is less accurate, but has less of a performance hit than something like Callgrind, which tracks every call. The results are still reasonable accurate, and even with fewer samples, it will show which functions are taking a lot of time, even if it misses functions that are very fast (which are probably not the ones you are looking for while profiling anyway).<\/p>\n\n<p>With a rather contrived example, we implement the infamous Fibonacci sequence in the most pedagogical way possible. Basic Recursion.<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight c\"><code><span class=\"c1\">\/\/ fib.c<\/span>\n<span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;stdio.h&gt;<\/span><span class=\"cp\">\n#include<\/span> <span class=\"cpf\">&lt;stdlib.h&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">fib<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">x<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n  <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">x<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">0<\/span><span class=\"p\">)<\/span> <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n  <span class=\"k\">else<\/span> <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">x<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"k\">return<\/span> <span class=\"mi\">1<\/span><span class=\"p\">;<\/span>\n  <span class=\"k\">return<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">2<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">main<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">argc<\/span><span class=\"p\">,<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">argv<\/span><span class=\"p\">[])<\/span> <span class=\"p\">{<\/span>\n\n  <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">45<\/span><span class=\"p\">;<\/span> <span class=\"o\">++<\/span><span class=\"n\">i<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"n\">printf<\/span><span class=\"p\">(<\/span><span class=\"s\">\"%d<\/span><span class=\"se\">\\n<\/span><span class=\"s\">\"<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">i<\/span><span class=\"p\">));<\/span>\n  <span class=\"p\">}<\/span>\n  <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Running the program with time gives us:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight shell\"><code><span class=\"nv\">$ <\/span><span class=\"nb\">time<\/span> .\/fib\n\n...\n\n.\/fib  15.05s user 0.00s system 99% cpu 15.059 total\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>It seems that the program stops outputting fast enough shortly after 40 numbers. My machine is able to run 40 numbers in ~ 2 seconds, but if you bump that up to 45, it takes a whopping 15 seconds. But being completely naive, I can't spot the issue. I need to profile it.<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight shell\"><code><span class=\"nv\">$ <\/span>perf record .\/fib\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Running the program again using this command generates the <code>perf.data<\/code> file, which contains all of the time information of our program. Other than creating this new file, nothing exciting shows up. It says how many times perf woke up, the number of samples, and the size of the perf.data file.<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight shell\"><code><span class=\"o\">[<\/span> perf record: Woken up 10 <span class=\"nb\">times <\/span>to write data <span class=\"o\">]<\/span>\n<span class=\"o\">[<\/span> perf record: Captured and wrote 2.336 MB perf.data <span class=\"o\">(<\/span>60690 samples<span class=\"o\">)<\/span> <span class=\"o\">]<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Things get interesting when we run:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight shell\"><code>perf report\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>which gives us the following view<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwg0qka4jqdxoeqhv9i7.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwg0qka4jqdxoeqhv9i7.png\" alt=\"perf default output\" width=\"770\" height=\"193\"><\/a><\/p>\n\n<p>Okay, great. So basically, it doesn't tell us too much. It does say that we're spending all of our time in the fib function, but not much else. We can hit enter with the <code>fib<\/code> function highlighted, we get a few options. One is to annotate the function. This shows us a disassembly of the instructions in the function, with the percentage of time being taken by each one. So... we can optimize at the assembly level I guess? Seems a bit extreme. That's a little too low-level for most of us mortals.<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpo8vhzfvwazdwsbjhvwh.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpo8vhzfvwazdwsbjhvwh.png\" alt=\"perf disassembly\" width=\"463\" height=\"708\"><\/a><\/p>\n\n<p>There is actually some helpful information in this view if we're clever though. We'll get back on that later.<\/p>\n\n<p>By default, perf only collects time information. We probably want to see a callgraph since we're making calls. Maybe the base case is the culprit (returning a single number is always suspicious after all (laugh with me here, that's a joke)).<\/p>\n\n<p>To get the call graph, we pass the <code>-g<\/code> option to <code>perf record<\/code>.<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight shell\"><code><span class=\"nv\">$ <\/span>perf record <span class=\"nt\">-g<\/span> .\/fib\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>And wait for it to run again... Then run <code>perf report<\/code> again to see the new view.<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanbcrpha8vjht5opzu0k.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanbcrpha8vjht5opzu0k.png\" alt=\"perf graph\" width=\"778\" height=\"297\"><\/a><\/p>\n\n<p>Not super different. Neat! Okay, so there are some differences between the first run and the second. Namely, instead of having an <code>overhead<\/code> column, it now has a <code>children<\/code> and <code>self<\/code> column, as well as some <code>+<\/code> signs next to some function calls.<\/p>\n\n<p>Children refers to the overhead of subsequent calls, while self refers to the cost of the function itself. A high children overhead, and a low self overhead indicates that the function makes a call that is expensive, but a high self overhead indicates that the function itself is expensive.<\/p>\n\n<p>The <code>+<\/code> indicates that the function makes a call that is accessible. Not all functions are follow-able, likely due to the optimizer striping the function names (or <code>strip -d<\/code> which strips debugging symbols) to shrink the binary.<\/p>\n\n<p>Highlighting a row and hitting enter will expand it, showing the functions called. Note that some older versions of <code>perf<\/code> show the callees by default, not the the functions being called from the function we're looking at. If you are using an older version it might be necessary to change this behaviour by calling record with <code>perf record -g 'graph,0.5,caller'<\/code>.<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmcd4vvwolfutcs86wsyk.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmcd4vvwolfutcs86wsyk.png\" alt=\"perf graph expanded\" width=\"800\" height=\"831\"><\/a><\/p>\n\n<p>In my case, I expanded the call to the <code>main<\/code> function, which doesn't spend much time inside of itself. Inside, there are a bunch of calls the the 'fib' function. The bottom one has a <code>+<\/code> symbol next to it, indicating that it can be expanded. expanding it shows another call to fib, with a <code>+<\/code> symbol.<\/p>\n\n<p>So.. eventually, we get this big long chain of calls to fib from fib. So recursion strikes again!<br>\nGoing back to the disassembly, most of the time is spent doing the <code>mov %rsp,%rbp<\/code> which are the registers responsible for the stack and frame pointers in the x86 architecture. Since we see that there is a chain of calls to the fib function, and this particular instruction is quite hot, there's a good indication that something about the recursion is to blame. Now we know what to fix. Fixing it is up to you. Some solutions might include memoization, or emulating recursion with a stack and a while loop, or even a combination of both, but that is beyond the scope of this post.<\/p>\n\n<p>While this is a contrived example, it gives an introduction to profiling using the perf tool. There are many additional features that are not covered here, but this should be enough to get you started with profiling using perf. The results from perf are not perfect, but they are usable. There are other tools out there, but I rather enjoy the interface with perf, and that it doesn't hurt runtime performance very much. What do you think? What are your favourite profilers and tools for profiling?<\/p>\n\n","category":["profiling","linux","cpp","c"]},{"title":"Git and the interactive patch add","pubDate":"Fri, 04 Aug 2017 13:34:25 +0000","link":"https:\/\/dev.to\/etcwilde\/git-and-the-interactive-patch-add","guid":"https:\/\/dev.to\/etcwilde\/git-and-the-interactive-patch-add","description":"<p>Sometimes, while working, we see a fix that should be made to a part of a file that is near our work area, but is not directly related to what we are working on. Do we make the fix now and just throw it in with the wash when we add the file, or do we wait and hope to remember to fix it so that it can be committed separately?<\/p>\n\n<p>The answer is, neither. We make the fix now, but instead of just adding the entire file, we can select \"hunks\" of code that we want to have in our commit using the <code>git add --interactive file.txt<\/code> or <code>git add --patch file.txt<\/code>. Both will result in an interactive console that shows the hunks, or grouped lines being changed, that you can either select to be staged, or ignored.<\/p>\n\n<h2>\n  \n  \n  Git add patch\n<\/h2>\n\n<p>What does it do? It lets you interactively select chunks of code that you want staged for commit from a file, while leaving out other changes in the file.<\/p>\n\n<p>Upon invoking <code>git add --patch<\/code>, it will show the changes in the file and give you a prompt. You can enter one of 9 different commands depending on what you want to do at the first stage, but there are other commands too; here is a list of all of the commands you can give to git add --patch<\/p>\n\n<ul>\n<li>y - Stage this hunk<\/li>\n<li>n - do not stage this hunk<\/li>\n<li>q - quit; do not stage this hunk or any other hunks<\/li>\n<li>a - stage this hunk, and all hunks after this one<\/li>\n<li>d - do not stage this hunk, or any later hunks in the file<\/li>\n<li>g - select a hunk to go to<\/li>\n<li>\/ - search for a hunk using a regex<\/li>\n<li>j - leave this hunk undecided, see next undecided hunk<\/li>\n<li>J - leave this hunk undecided, see next hunk<\/li>\n<li>k - leave this hunk undecided, see previous undecided hunk<\/li>\n<li>K - leave this hunk undecided, see previous hunk<\/li>\n<li>s - split the current hunk into smaller hunks<\/li>\n<li>e - manually edit the current hunk<\/li>\n<li>? - print the list of commands<\/li>\n<\/ul>\n\n<h2>\n  \n  \n  Enough talk, examples please\n<\/h2>\n\n<p>Okay, lets imagine a very simple setup. Let's say I'm adding documentation to functions in the following C file (call it <code>file.c<\/code>).<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight c\"><code><span class=\"cp\">#include &lt;stdio.h&gt;\n<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">printHelloWorld<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span>\n  <span class=\"n\">print<\/span><span class=\"p\">(<\/span><span class=\"s\">\"hello world\"<\/span><span class=\"p\">);<\/span>\n  <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">main<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">argc<\/span><span class=\"p\">,<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">argv<\/span><span class=\"p\">[])<\/span> <span class=\"p\">{<\/span>\n  <span class=\"n\">printHelloWorld<\/span><span class=\"p\">();<\/span>\n  <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>I get to the <code>printHelloWorld<\/code> function and add the documentation.<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight c\"><code><span class=\"cm\">\/**\n * printHelloWorld\n * writes \"Hello world\" to the console\n *\n * params: none\n * return: 0 if the function executed correctly\n *\/<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">printHelloWorld<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span>\n  <span class=\"n\">print<\/span><span class=\"p\">(<\/span><span class=\"s\">\"hello world\"<\/span><span class=\"p\">);<\/span>\n  <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>... but I also notice that someone wrote <code>print<\/code> instead of <code>printf<\/code>. That's a bug (which admittedly, if they had the right pre-commit hooks or test in place, would be caught long before it ever reached a public repository, but hang with me). <\/p>\n\n<p>I fix it and end up with<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight c\"><code><span class=\"cm\">\/**\n * printHelloWorld\n * writes \"Hello world\" to the console\n *\n * params: none\n * return: 0 if the function executed correctly\n *\/<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">printHelloWorld<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span>\n  <span class=\"n\">printf<\/span><span class=\"p\">(<\/span><span class=\"s\">\"hello world\"<\/span><span class=\"p\">);<\/span>\n  <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>I continue adding the rest of the documentation as I should and end up with the final file<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight c\"><code><span class=\"cp\">#include &lt;stdio.h&gt;\n<\/span>\n<span class=\"cm\">\/**\n * printHelloWorld\n * writes \"Hello world\" to the console\n *\n * params: none\n * return: 0 if the function executed correctly\n *\/<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">printHelloWorld<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span>\n  <span class=\"n\">printf<\/span><span class=\"p\">(<\/span><span class=\"s\">\"hello world\"<\/span><span class=\"p\">);<\/span>\n  <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"cm\">\/**\n * main\n * runs the program\n *\n * params: none\n * return: 0 if program executed without error\n *\/<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">main<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">argc<\/span><span class=\"p\">,<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">argv<\/span><span class=\"p\">[])<\/span> <span class=\"p\">{<\/span>\n  <span class=\"n\">printHelloWorld<\/span><span class=\"p\">();<\/span>\n  <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Documentation is done, but I made that one bug fix as well, and it should not be included in the changes made.<\/p>\n\n<p>We run <code>git add --patch file.c<\/code><br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight plaintext\"><code>diff --git a\/file.c b\/file.c\nindex 7afa2a7..4293372 100644\n-------- a\/test.c\n+++ b\/test.c\n@@ -1,10 +1,24 @@\n #include &lt;stdio.h&gt;\n\n+\/**\n+ * printHelloWorld\n+ * writes \"Hello world\" to the console\n+ *\n+ * params: none\n+ * return: 0 if the function executed correctly\n+ *\/\n int printHelloWorld() {\n-  print(\"hello world\");\n+  printf(\"hello world\");\n   return 0;\n }\n\n+\/**\n+ * main\n+ * runs the program\n+ *\n+ * params: none\n+ * return: 0 if program executed without error\n+ *\/\n int main(int argc, char *argv[]) {\n   printHelloWorld();\n   return 0;\nStage this hunk [y,n,q,a,d,\/,s,e,?]? \n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>We choose to break the hunk into smaller pieces, choosing <code>s<\/code>.<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight diff\"><code><span class=\"p\">Split into 3 hunks.\n@@ -1,3 +1,10 @@<\/span>\n #include &lt;stdio.h&gt;\n\n+\/**\n<span class=\"gi\">+ * printHelloWorld\n+ * writes \"Hello world\" to the console\n+ *\n+ * params: none\n+ * return: 0 if the function executed correctly\n+ *\/\n<\/span> int printHelloWorld() {\n<span class=\"p\">Stage this hunk [y,n,q,a,d,\/,j,J,g,e,?]? \n<\/span><\/code><\/pre>\n\n<\/div>\n\n\n\n<p>If we want to commit the documentation first, we would press <code>y<\/code>, staging this hunk and going to the next hunk. If we want to commit the bugfix first, we would press <code>n<\/code>, and not stage this hunk, but still go to the next one. I choose to do the bugfix first (note that we can always re-order commits with a simple rebase after the matter). We choose <code>n<\/code>.<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight diff\"><code><span class=\"p\">@@ -3,5 +10,5 @@<\/span>\n int printHelloWorld() {\n<span class=\"gd\">-  print(\"hello world\");\n<\/span><span class=\"gi\">+  printf(\"hello world\");\n<\/span>   return 0;\n }\n\nStage this hunk [y,n,q,a,d,\/,K,j,J,g,e,?]? \n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>This is the only one we want, so we choose <code>y<\/code>. which will take us to the last hunk, which again is documentation, so we just choose <code>q<\/code> to quit.<\/p>\n\n<p>If we run <code>git status<\/code> now, we'll see that file.c is both staged and not staged. Running <code>git diff --cached<\/code>, shows us that our bugfix is ready for staging, and <code>git diff<\/code> shows us that our documentation has not been staged. Just run a quick <code>git commit<\/code> to commit the bugfix. Then, since the rest is just documentation, we can do a normal <code>git add file.c<\/code> followed by the standard <code>git commit<\/code>.<\/p>\n\n<p>We'll end up with a repo with the following commits;<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight plaintext\"><code>* aa7b7ec (HEAD -&gt; master) adding documentation to functions\n* ecc0aa7 bugfix: fixing print to printf\n* e106991 Writing the program\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h2>\n  \n  \n  Closing up\n<\/h2>\n\n<p>With <code>git add --patch<\/code> we can interactively select only the changes from a file that we want. So we can fix the changes that we see, as we see them, but we don't need to include them in the commit, which allows us to logically separate changes nicely, while being able to make bug fixes and typo-corrections as we work.<\/p>\n\n","category":"git"},{"title":"What I learned from running a user study","pubDate":"Fri, 21 Jul 2017 12:21:44 +0000","link":"https:\/\/dev.to\/etcwilde\/what-i-learned-from-running-a-user-study","guid":"https:\/\/dev.to\/etcwilde\/what-i-learned-from-running-a-user-study","description":"<p>User studies. A necessary evil. They are time consuming, tricky to get right, and sometimes obnoxious, and worst of all, we have to deal with humans. But... we still have to do them. We can have a piece of software with all the testing and documentation in the world, but if it doesn't verifiably help people, what is the point? Simply having a large user base does not indicate that people <em>like<\/em> your program, they may simply need to use it. User testing verifies that people (that really fancy script that clicks on links isn't a person) can actually use your program and understand it. <\/p>\n\n<p>These are some ideas that I collected after running my first comparison study on a <a href=\"http:\/\/li.turingmachine.org\">tool<\/a> that tests a <a href=\"https:\/\/dev.to\/etcwilde\/merge-trees-visualizing-git-repositories\">model<\/a> I built for visualizing git repositories.<\/p>\n\n<h2>\n  \n  \n  The Questions\n<\/h2>\n\n<p>Coming up with questions that meet your goal is actually incredibly difficult. There are two main issues that I ran into, testing what I want, and making it clear to the participants what they are trying to do.<\/p>\n\n<p>How do we remedy this? With a user study of the user study. Basically, it's a good idea to run a pilot study. You'll want to run from 3 to 5 users in this pilot study. These participants should be from the same demographic as the intended users of the program, and obviously, the same demographic as the people who will actually be in your study. You don't want the pilot to have too many people, or you will be wasting time, and more importantly, you'll be wasting participants. You can't use the participants from the pilot study in your actual user study.<\/p>\n\n<p>Don't just stop at testing the questions using the pilot study. Actually collect the results, then use this data to figure out what kind of analysis you will want to run.<\/p>\n\n<p>There are two main types of questions; quantitative and qualitative. While the academic software engineering papers are starting to mix the two, the reviewers seem to have a strong preference for quantitative (I don't have a citation so this is just anecdotal). Quantitative studies are easier for at least two reasons; first, the numbers are concrete and cannot be argued with (assuming you didn't cheat while conducting the study), second the answers are a number, which is very easy to compare once you get to analysis.<\/p>\n\n<p>Numbers only tell half the picture. They tell the \"what\", not so much the \"why\". It's very hard to concretely describe the \"why\" though. Your participants will likely have many reasons behind why they did something, even if the resulting \"what\" is the same, so you'll probably need more participants. The other issue, it takes more room in your paper.<\/p>\n\n<p>Now, you have your questions figured out, the tasks they are going to do, and everything ready to go. There is a bias that stems from doing things in a certain order. You'll need to randomize things to make sure that this doesn't happen. I used a short python script to handle this for me. I would run the script for every participant, and it would generate a script for me to read to the participant. It randomized the order that the tools were used, and the order of the tasks. So long as I read the script correctly, the study would be conducted correctly.<\/p>\n\n<h2>\n  \n  \n  Collection\n<\/h2>\n\n<p>I used screen + audio capture. This turned out to be really good for a lot of reasons. It had one drawback, I had to sit there and watch every one of them and extract the key pieces of information. The rest of it was very nice.<\/p>\n\n<p>The first, it makes the study feel less like a study and more like a conversation. Your environment will likely feel unusually clean, so that will already put your participant at unease. It won't help if you are sitting there taking notes and playing with a timer while watching their every move. The video recording was invisible to them (though they were informed via the consent form), so it felt less like being a lab-rat being examined and a little more like being an actual user of a tool.<\/p>\n\n<p>Once the videos were collected, I sat down and could watch them and extract the information there. If I missed something the first time, I could go back and rewatch that part. I could get more context from the video too, whereas hand-written notes can only record what you wrote down. Furthermore, I didn't have to fuss with a timer, the duration is right in the video.<\/p>\n\n<h2>\n  \n  \n  Storage\n<\/h2>\n\n<p>At least from the people I've talked to, CSV files are a popular way to store data. I think they are crazy. CSV files are easy to script with and do data entry. That is about where their usefulness ends. I ended up using a sqlite database. Some might argue that a full relational database is a little overkill for storing something as small as a user study. They are right, but they are also wrong. Yes, I don't need ACID or transactions, or any of that; for that part they are completely correct. So what does the relational database give me that a simple CSV file doesn't? Flexibility. If I want to look at different metrics in the data, I don't need to write another script to analyze it, I just make a query in the database. When you are creating the CSV files, you don't know exactly what you'll be looking for in the data. CSV files are terrible for experimenting. You will likely need to implement most of the features in a relational database to operate on the CSV files, so why not just use a database and save yourself the trouble?<\/p>\n\n<p>Some laughed while I sat there doing hundreds of INSERT statements to add the data to the database. They stopped laughing when they asked about various aspects of the data, and with a single query was able to manipulate the data to show exactly what I wanted in a matter of seconds.<\/p>\n\n<p>You don't need a full database. This is why sqlite is a great option. It's simple, fast enough, and it puts the data into a file instead of some seemingly hidden location on the hard drive. That last point is quite useful since it is likely that you will need to have something about how the data will be destroyed in a secure manner once the study is done somewhere in the participant consent form.<\/p>\n\n<h2>\n  \n  \n  Analysis\n<\/h2>\n\n<p>Before I go too deeply into this, I'm a vim user by default, but the org-mode feature in emacs so completely destroys what vim can do, that emacs is really the best option here.<\/p>\n\n<p>Org-mode. It's a great tool for research. It's like a markdown on steroids. Vim tries to have an org-mode, but it falls so far short of the actual org-mode that it can barely call itself the same thing. Most of my emacs configurations live in an org file.<\/p>\n\n<p>So if it's markdown, why is it good for analysis? Because you can run code directly from within the document, and have the results placed directly below the code. This makes your research entirely replicable. If anyone has any questions about the method used for analysis, you point them to the org-mode file. It has the queries you made, and the results to those queries. It's great for completely reproducible and verifiable research.<\/p>\n\n<p>A quick example;<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight plaintext\"><code>#+name: results\n#+begin_src sqlite :db .\/data\/data.db :colnames yes\nSELECT question, tool, avg(time), count(*) total, count(CASE WHEN correct = 1 THEN 1 END) correct\nFROM results;\n#+end_src\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<div class=\"table-wrapper-paragraph\"><table>\n<thead>\n<tr>\n<th>question<\/th>\n<th>tool<\/th>\n<th>avg(time)<\/th>\n<th>total<\/th>\n<th>correct<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>1<\/td>\n<td>1<\/td>\n<td>1.25<\/td>\n<td>10<\/td>\n<td>7<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>2<\/td>\n<td>4.5<\/td>\n<td>10<\/td>\n<td>2<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>1<\/td>\n<td>0.2<\/td>\n<td>10<\/td>\n<td>8<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>2<\/td>\n<td>9.1<\/td>\n<td>10<\/td>\n<td>1<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n\n<p>Hitting C-c C-c inside of the block would create the table directly in the document. Github will even render this nicely, syntax highlighting the query, and making the table look pretty.<\/p>\n\n<p>Furthermore, R scripts, which are really great for analysis on numerical data, can be run in a similar manner, and since emacs can render pdf, can generate images that are directly embedded in your document. It makes for a very fast and effortless workflow once you get used to emacs.<\/p>\n\n<p>R scripts are amazing for performing statistical analysis and generating plots. Use it. You can program in R like a normal programming language, with conditional ifs and loops. That part isn't special, but the extension packages are what make it special. It has most statistical tests, you apply the ones that you want. The results will give you both the test statistic and the P-value. You can provide R with the RSQLite database backend, and you can connect it directly with the database. No CSV files to manipulate externally, query for only the data you are interested in and perform the analysis with that. Very clean, very easy, and completely reproducible.<\/p>\n\n<h2>\n  \n  \n  Recap\n<\/h2>\n\n<p>So, these are things that I learned from running my first user study. Some, the hard way, some seemed inherently clear and worked well (confirmation bias, yes, but it worked). <\/p>\n\n<p>So a quick recap<\/p>\n\n<ul>\n<li>It's good to (completely) test the test to make sure it works before actually using it.<\/li>\n<li>Use a script to script the study.<\/li>\n<li>screen capture + audio is a great way to collect the results.<\/li>\n<li>Keep the results in an actual dbms, it helps for analysis.<\/li>\n<li>emacs + org-mode + R + database makes the final analysis easy.<\/li>\n<\/ul>\n\n<p>Let me know what you have learned and I'll start a little list of ideas for making user studies as painless as possible.<\/p>\n\n","category":["testing","evaluation"]},{"title":"Merge-Trees: Visualizing the integration of commits in Git Repositories","pubDate":"Sun, 16 Jul 2017 19:15:58 +0000","link":"https:\/\/dev.to\/etcwilde\/merge-trees-visualizing-git-repositories","guid":"https:\/\/dev.to\/etcwilde\/merge-trees-visualizing-git-repositories","description":"<p>Merge-trees are a model that my supervisor and I came up with to show how a commit is merged into the master branch of the repository.<\/p>\n\n<h1>\n  \n  \n  The Directed Acyclic Graph\n<\/h1>\n\n<p>The directed acyclic graph (DAG) is the \"beautiful\", graph-theory based, model used internally by git to store commits and merges in the repository, and it's actually really good at this job. It's somewhat reasonable for visualizing repositories. We're used to seeing a visualization of the DAG on Github, under the network tab or the big middle pane with all the pretty lines on GitKraken, or if you are terminally inclined, <code>git log --graph<\/code>.<br>\n<a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48th7txsv5drmh9iixwy.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48th7txsv5drmh9iixwy.png\" alt=\"Github DAG visualization from vmware\/haret\" width=\"800\" height=\"305\"><\/a><\/p>\n\n<p>Since all of these tools use the DAG for visualization, it must be great, right? Nope. Once the repository reaches a certain size with a given level of complexity, these tools start to break apart. My main focus was on the Linux repository. Before the shenanigans that Microsoft went through to switch to git, Linux was probably the most complex git repositories in existence (that I know of anyway), making use of nearly every feature in git at some point or another. With the complexity and size of git, our friends over at Github and GitKraken can no longer provide us with our pretty pictures. Github doesn't even try, just showing a message saying that the repository is too big. GitKraken just sits there, trying to load it for hours. It might eventually get somewhere too, if you had enough memory and patience, but it would still take hours to perform any operations once it's loaded.<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febr42juy6asgk6bmdjzd.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febr42juy6asgk6bmdjzd.png\" alt=\"github failure message\" width=\"553\" height=\"202\"><\/a><\/p>\n\n<p>But... we still can have our pretty pictures. Both Gitk, which is a GUI application shipped with git, and <code>git log --graph<\/code> on the terminal, are able to produce a visualization.<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favsyyz8tr500qz9isq8a.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favsyyz8tr500qz9isq8a.png\" alt=\"gitk interface showing a subsection of the Linux Kernel\" width=\"800\" height=\"803\"><\/a><\/p>\n\n<p>Hmm, which one of those lines is the master branch? What's actually going on here? Who knows? It turns out, that line that goes from red, to bright green to red again, is the master branch. How do I know this, I've looked at this repository more than what could be considered healthy, specifically this commit.<\/p>\n\n<p>So why do people use the DAG for visualizations? It's exactly representative of the repository. Proponents might argue that it shows exactly how the repository looks. Realists might argue that, \"it's easy\". A simple pass through the repository will render the DAG, while using a different model (I don't actually know of anyone trying to design a different model) is quite difficult and will require some pre-processing steps. Further complicating matters, it turns out that git has no internal notion of a \"master\" branch. The idea of a main branch is something carried over from the versioning systems of yore (SVN, CVS). There is a somewhat agreed upon convention though. Every commit and merge has a parent list. Commits only have one parent, while merges may have many (two if you are a normal person, as many as you want if you are of the octopus-merge group). The first parent in the list is the previous commit or merge toward the initial commit, that is on the same branch as the commit being made. The second (through nth) are the next commit or merge that you are merging, in the order that you reference the merges.<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyn82carwo9mqo5qb7n3l.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyn82carwo9mqo5qb7n3l.png\" alt=\"before-merging-example\" width=\"184\" height=\"108\"><\/a><\/p>\n\n<p>So if I'm on the master branch and I do <code>git merge --no-ff branch1 branch2<\/code><\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cy3go79zumlrln74h2n.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cy3go79zumlrln74h2n.png\" alt=\"after-merging-example\" width=\"248\" height=\"148\"><\/a><\/p>\n\n<p>The merge commit will have a parent list containing three elements (in this order);<\/p>\n\n<ul>\n<li>reference to the previous commit on the master branch<\/li>\n<li>reference to the last commit at the time of the merge on branch1<\/li>\n<li>reference to the last commit at the time of the merge on branch2<\/li>\n<\/ul>\n\n<p>If something happens, say someone rebases incorrectly, or merges something funny, or does a <code>git pull<\/code> incorrectly (this is why you get those people saying use <code>git fetch<\/code> followed by <code>git rebase<\/code> instead of <code>git pull<\/code>), you can switch the order of those parents, which will make the master branch disappear. This is called a <a href=\"https:\/\/bit-booster.blogspot.ca\/2016\/02\/no-foxtrots-allowed.html\" rel=\"noopener noreferrer\">foxtrot<\/a>.<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhl3fnlx81nn9mfcowpm6.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhl3fnlx81nn9mfcowpm6.png\" alt=\"foxtrot-example\" width=\"214\" height=\"123\"><\/a><\/p>\n\n<p>In this case, we swapped branch2 and master, which makes it looks like E and D are both on the master branch, even though we know that the master branch goes directly from commit B to merge A.<\/p>\n\n<p>Another issue is fast-foward merging. Many people are afraid of merge commits, and without a good visualization merge commits just complicate things, so they do fast-forward merges. Yeah, it gets rid of the merge commits, but since branches are usually separate ideas, the merge commits are like the separation of paragraphs. Yeah, you could write a book without paragraphs, but it will suck. Don't do that to your repositories, or they will suck too.<\/p>\n\n<h1>\n  \n  \n  Merge-Tree model\n<\/h1>\n\n<p>Okay, so now that we've thoroughly discussed the DAG, let's talk about the Merge-Tree. A Merge-Tree is a tree-based structure showing how groups of commits are merged into the master branch. The root is the merge into the master branch, the leaves are the individual commits, and the inner nodes are any merges that the commits must pass through to hit the master branch.<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsrund50a5fd151j9msos.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsrund50a5fd151j9msos.png\" alt=\"merge-tree visualization\" width=\"350\" height=\"300\"><\/a><\/p>\n\n<p>This is a merge-tree visualization for the same commit that is highlighted up above in the picture of the DAG for Gitk. The bright orange node is the commit. The model removes commits that are not related to the commit we are interested in, which makes the visualization a lot less confusing. Furthermore, because we know which commits belong to a merge, we are able to provide a full summarization of the commits and merges within the root. This can be the files modified and the authors.<\/p>\n\n<p>Constructing merge-trees relies on knowledge of the master branch to determine where the roots should be.  <\/p>\n\n<h1>\n  \n  \n  Verification\n<\/h1>\n\n<p>We built a small tool called <a href=\"http:\/\/li.turingmachine.org\" rel=\"noopener noreferrer\">Linvis<\/a> to test the concept of using merge-trees. A user can search for a commit by the hash, author, filename, and keywords from the log message. The search engine uses full-text searching for commits and merges that match your query.<\/p>\n\n<p>Searching for 'net-next' for instance will return a whole bunch of trees with commits and merges related to the networking module of Linux. The results are grouped by the merge-tree they came from, with the link to the root at the top, and the links to the individual commits in the table.<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1oxtsm9flmkl4qhh5ax.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1oxtsm9flmkl4qhh5ax.png\" alt=\"Search results for net-next\" width=\"800\" height=\"358\"><\/a><\/p>\n\n<p>The trees are ordered based on the mean of the search-ranking, so more relevant trees are at the front, and others float to the back. we provide various summaries, like the files modified<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fkhqe21n2h2mvrl0soryl.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fkhqe21n2h2mvrl0soryl.png\" alt=\"Files modified in a merge tree\" width=\"800\" height=\"243\"><\/a><\/p>\n\n<p>and the authors<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbzq6bg97s479gut86qy.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbzq6bg97s479gut86qy.png\" alt=\"authors making changes in a merge tree\" width=\"800\" height=\"259\"><\/a><\/p>\n\n<p>Furthermore, we have three different visualizations of trees, which offer different benefits. The Reingold-Tilford tree being the classic tree, while the pack-tree was designed for file systems, which, like repositories, generally have a very wide but shallow structure.<\/p>\n\n<p>The merge-tree is an interesting concept with lots of potential. The utility of merge-trees is dependent on the structure of the repository; if the repository uses fast-forward merges, and has foxtrots, merge-trees will either not be useful or correct, as they rely on knowledge of the master branch. The <a href=\"https:\/\/github.com\/apache\/camel\" rel=\"noopener noreferrer\">Caml<\/a> repository, for instance, uses a CVS repository structure. All changes are made into the master branch of the repository. Branching is only used for separating releases. Modifications to the release go in the release branch, but the release branches are never merged back into the master branch. Merge-Trees would not help at all in this situation; every commit would be a separate merge-tree.<\/p>\n\n<p>Some git workflows are centered around merging into the master branch at a release. These would likely benefit the most from merge-trees, as managers can easily summarize who wrote what. Furthermore, people can see exactly how every commit made it's way into the project.<\/p>\n\n<p>The Linux repository doesn't use either of these. Linus Torvalds holds all the power, being the only one who can push to the master branch. He simply merges large submodules of the kernel at a time, then leaves tags at release points. This structure benefits from the merge-tree structure too.<\/p>\n\n<p>This has been a long post, so I'll end it here. Let me know what you think.<\/p>\n\n","category":["git","visualizations"]},{"title":"Hi, I'm Evan","pubDate":"Sat, 24 Jun 2017 18:33:27 +0000","link":"https:\/\/dev.to\/etcwilde\/hi-im-evan","guid":"https:\/\/dev.to\/etcwilde\/hi-im-evan","description":"<p>I have been coding for who knows how long. Probably around six years now, actually.<\/p>\n\n<p>You can find me on Twitter as <a href=\"https:\/\/twitter.com\/etcwilde\" rel=\"noopener noreferrer\">@etcwilde<\/a><\/p>\n\n<p>I'm currently a computer science masters student at the University of Victoria, working in the area of software engineering practices. I tend to do a lot of work with git, both as a tool and as a research topic.<\/p>\n\n<p>I mostly use C++, C, and Python, but know Javascript and Java as well. I dabble with Haskell, Rust, Go, and many other languages, but I do not know them well enough to use them extensively. I am familiar with F#, C#, NJ\/SML, and ruby.<\/p>\n\n<p>I am bilingual with my text editors, sitting on both side of the emacs\/vim religious war. I started out with Vim, but my supervisor has made me learn emacs as well.<\/p>\n\n<p>My environment is Linux with the i3 window manager, though that is usually just for having Google next to my terminal. I like the termite terminal emulator, and have tmux running there for managing different simultaneous projects. Generally have a pane open for emacs next to a pane for vim, which looks kind of weird and garners disdain from both communities.<\/p>\n\n<p>Evan<\/p>\n\n","category":"introduction"}]}}