{"@attributes":{"version":"2.0"},"channel":{"title":"DEV Community: Nested Software","description":"The latest articles on DEV Community by Nested Software (@nestedsoftware).","link":"https:\/\/dev.to\/nestedsoftware","image":{"url":"https:\/\/media2.dev.to\/dynamic\/image\/width=90,height=90,fit=cover,gravity=auto,format=auto\/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F59384%2Faeea81a5-33f5-4ed8-89d9-6ec48ad4bd8e.png","title":"DEV Community: Nested Software","link":"https:\/\/dev.to\/nestedsoftware"},"language":"en","item":[{"title":"Generics and Variance with Java","pubDate":"Sat, 20 Sep 2025 23:11:45 +0000","link":"https:\/\/dev.to\/nestedsoftware\/generics-and-variance-with-java-27a2","guid":"https:\/\/dev.to\/nestedsoftware\/generics-and-variance-with-java-27a2","description":"<p>In this article, we\u2019ll learn about generics in Java, with an emphasis on the concept of variance.<\/p>\n\n<h1>\n  \n  \n  Substitution of values\n<\/h1>\n\n<p>Let's start by introducing types and subtypes. Java supports assigning a subclass value to a variable of a base type. This is known as a widening reference assignment. We can therefore say that a Float is a subtype of a Number:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">Float<\/span> <span class=\"n\">myFloat<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">Float<\/span><span class=\"o\">.<\/span><span class=\"na\">valueOf<\/span><span class=\"o\">(<\/span><span class=\"mf\">3.14f<\/span><span class=\"o\">);<\/span>\n<span class=\"nc\">Number<\/span> <span class=\"n\">number<\/span> <span class=\"o\">=<\/span> <span class=\"n\">myFloat<\/span><span class=\"o\">;<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h1>\n  \n  \n  Arrays are covariant and reified\n<\/h1>\n\n<p>Variance tells us what happens to this subtyping relationship when the original types are placed in the context of another type. <\/p>\n\n<p>Let's take arrays, for example. We can ask, since Float is a subtype of Number, what can we say about an array of Floats relative to an array of Numbers?<\/p>\n\n<p>It turns out that in Java, arrays are covariant. That is, we can also assign an array of floats to an array of numbers:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">Float<\/span><span class=\"o\">[]<\/span> <span class=\"n\">floats<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">Float<\/span><span class=\"o\">[<\/span><span class=\"mi\">10<\/span><span class=\"o\">];<\/span>\n<span class=\"nc\">Number<\/span><span class=\"o\">[]<\/span> <span class=\"n\">numbers<\/span> <span class=\"o\">=<\/span> <span class=\"n\">floats<\/span><span class=\"o\">;<\/span> <span class=\"c1\">\/\/compiles due to covariance <\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>When we use the term covariant here, we mean that if a type like Float is a subtype of Number, then an array of Floats is also a subtype of an array of Numbers.<\/p>\n\n<p>The subtyping relationship for arrays goes in the same direction as the underlying types and is therefore covariant.<\/p>\n\n<p>Notice that in the case of Float and Number, subtyping is implemented using inheritance, but an array of Floats is also a subtype of an array of Numbers. We can say that subtyping is a more general concept than inheritance. <\/p>\n\n<p>The way that covariance is implemented for arrays does introduce a potential flaw into Java applications. A developer can write code to insert an object of the wrong type into the <code>floats<\/code> array below, yet that code will successfully compile:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">Float<\/span><span class=\"o\">[]<\/span> <span class=\"n\">floats<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">Float<\/span><span class=\"o\">[<\/span><span class=\"mi\">10<\/span><span class=\"o\">];<\/span> \n<span class=\"nc\">Number<\/span><span class=\"o\">[]<\/span> <span class=\"n\">numbers<\/span> <span class=\"o\">=<\/span> <span class=\"n\">floats<\/span><span class=\"o\">;<\/span> \n<span class=\"nc\">Integer<\/span> <span class=\"n\">integer<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">3<\/span><span class=\"o\">;<\/span> <span class=\"c1\">\/\/auto-boxing<\/span>\n<span class=\"n\">numbers<\/span><span class=\"o\">[<\/span><span class=\"mi\">0<\/span><span class=\"o\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">integer<\/span><span class=\"o\">;<\/span> <span class=\"c1\">\/\/compiles but not safe<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Type information for arrays is compiled into the bytecode and is available at runtime. In fact, we can use <code>instanceof<\/code> as follows for arrays:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">Integer<\/span><span class=\"o\">[]<\/span> <span class=\"n\">integers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">Integer<\/span><span class=\"o\">[<\/span><span class=\"mi\">10<\/span><span class=\"o\">];<\/span>\n<span class=\"nc\">Object<\/span> <span class=\"n\">o<\/span> <span class=\"o\">=<\/span> <span class=\"n\">integers<\/span><span class=\"o\">;<\/span>\n<span class=\"k\">if<\/span> <span class=\"o\">(<\/span><span class=\"n\">o<\/span> <span class=\"k\">instanceof<\/span> <span class=\"nc\">Float<\/span><span class=\"o\">[])<\/span> <span class=\"o\">{<\/span> <span class=\"c1\">\/\/ false at runtime<\/span>\n    <span class=\"c1\">\/\/ etc...<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Another way to put it is that arrays in Java are reified. Since arrays are reified, the Java runtime knows that we are trying to put an Integer into an array of Floats. When we try to actually run the line of code <code>numbers[0] = integer<\/code>, the application will throw an <code>ArrayStoreException<\/code>. <\/p>\n\n<p>The way covariance has been implemented for arrays in Java has a defect, but at least a buggy piece of code that tries to put the wrong type into an array will fail fast at runtime.<\/p>\n\n<h1>\n  \n  \n  Generics\n<\/h1>\n\n<h2>\n  \n  \n  Collections before Java 5\n<\/h2>\n\n<p>Originally, Java did not have support for generics. These were added in Java 5. Prior to the introduction of generics, developers would have to cast to the desired type, and they would have to manually ensure that this cast was safe at runtime.<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span> <span class=\"n\">strings<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">();<\/span>\n<span class=\"n\">strings<\/span><span class=\"o\">.<\/span><span class=\"na\">add<\/span><span class=\"o\">(<\/span><span class=\"mi\">3<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ compiles<\/span>\n<span class=\"nc\">String<\/span> <span class=\"n\">string<\/span> <span class=\"o\">=<\/span> <span class=\"o\">(<\/span><span class=\"nc\">String<\/span><span class=\"o\">)<\/span> <span class=\"n\">strings<\/span><span class=\"o\">.<\/span><span class=\"na\">get<\/span><span class=\"o\">(<\/span><span class=\"mi\">0<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ ClassCastException at runtime<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>In Java 5, generics were introduced, and support for generics was added to the collections library as well. With generics, we can make collections typesafe:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">String<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">strings<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">strings<\/span><span class=\"o\">.<\/span><span class=\"na\">add<\/span><span class=\"o\">(<\/span><span class=\"mi\">3<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ does not compile<\/span>\n<span class=\"n\">strings<\/span><span class=\"o\">.<\/span><span class=\"na\">add<\/span><span class=\"o\">(<\/span><span class=\"s\">\"hello world\"<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/compiles<\/span>\n<span class=\"nc\">String<\/span> <span class=\"n\">string<\/span> <span class=\"o\">=<\/span> <span class=\"n\">strings<\/span><span class=\"o\">.<\/span><span class=\"na\">get<\/span><span class=\"o\">(<\/span><span class=\"mi\">0<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/type safe, no explicit cast needed<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h2>\n  \n  \n  Basics of generics\n<\/h2>\n\n<p>A class can be parameterized to one or more generic type parameters. For example, the following <code>Pair<\/code> class supports creating a tuple of two arbitrary items:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">Pair<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">K<\/span><span class=\"o\">,<\/span> <span class=\"no\">V<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">{<\/span>\n    <span class=\"kd\">private<\/span> <span class=\"kd\">final<\/span> <span class=\"no\">K<\/span> <span class=\"n\">first<\/span><span class=\"o\">;<\/span>\n    <span class=\"kd\">private<\/span> <span class=\"kd\">final<\/span> <span class=\"no\">V<\/span> <span class=\"n\">second<\/span><span class=\"o\">;<\/span>\n\n    <span class=\"kd\">public<\/span> <span class=\"nf\">Pair<\/span><span class=\"o\">(<\/span><span class=\"no\">K<\/span> <span class=\"n\">first<\/span><span class=\"o\">,<\/span> <span class=\"no\">V<\/span> <span class=\"n\">second<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n        <span class=\"k\">this<\/span><span class=\"o\">.<\/span><span class=\"na\">first<\/span> <span class=\"o\">=<\/span> <span class=\"n\">first<\/span><span class=\"o\">;<\/span>\n        <span class=\"k\">this<\/span><span class=\"o\">.<\/span><span class=\"na\">second<\/span> <span class=\"o\">=<\/span> <span class=\"n\">second<\/span><span class=\"o\">;<\/span>\n    <span class=\"o\">}<\/span>\n<span class=\"o\">}<\/span>\n\n<span class=\"nc\">Pair<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">String<\/span><span class=\"o\">,<\/span> <span class=\"nc\">Integer<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">p<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">Pair<\/span><span class=\"o\">&lt;&gt;(<\/span><span class=\"s\">\"age\"<\/span><span class=\"o\">,<\/span> <span class=\"mi\">30<\/span><span class=\"o\">);<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>We can also place a base class or interface as a constraint on the upper bound for the type parameter. Additional interfaces can also be added to the bound. For example, the following <code>Repository<\/code> class is parameterized to a type that must be an entity, and that entity must also be serializable and comparable to other entities of the same type:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">class<\/span> <span class=\"nc\">Repository<\/span> <span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span> <span class=\"kd\">extends<\/span> <span class=\"nc\">Entity<\/span> <span class=\"o\">&amp;<\/span> <span class=\"nc\">Serializable<\/span> <span class=\"o\">&amp;<\/span> <span class=\"nc\">Comparable<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;&gt;<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ etc...<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h2>\n  \n  \n  Recursive type bounds\n<\/h2>\n\n<p>As we can see in the previous example, type parameters are occasionally defined recursively, i.e. our type <code>T<\/code> was supplied as a parameter to <code>Comparable<\/code>. <\/p>\n\n<p>In the following example, we create our own interface, similar to Java\u2019s <code>Comparable<\/code>. Rather than allowing the <code>compareTo<\/code> method to apply to any arbitrary type, here we arrange for <code>MyComparable<\/code> to apply to the specific class or interface that implements our <code>MyComparable<\/code> interface:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">interface<\/span> <span class=\"nc\">MyComparable<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span> <span class=\"kd\">extends<\/span> <span class=\"nc\">MyComparable<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;&gt;<\/span> <span class=\"o\">{<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"nf\">compareTo<\/span><span class=\"o\">(<\/span><span class=\"no\">T<\/span> <span class=\"n\">other<\/span><span class=\"o\">);<\/span>\n<span class=\"o\">}<\/span>\n\n<span class=\"kd\">public<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">MyInteger<\/span> <span class=\"kd\">implements<\/span> <span class=\"nc\">MyComparable<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">MyInteger<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">{<\/span>\n    <span class=\"kd\">public<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">compareTo<\/span><span class=\"o\">(<\/span><span class=\"nc\">MyInteger<\/span> <span class=\"n\">otherInteger<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n        <span class=\"c1\">\/\/ ...<\/span>\n    <span class=\"o\">}<\/span>\n<span class=\"o\">}<\/span>\n\n<span class=\"kd\">public<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">MyFloat<\/span> <span class=\"kd\">implements<\/span> <span class=\"nc\">MyComparable<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">MyFloat<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">{<\/span>\n    <span class=\"kd\">public<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">compareTo<\/span><span class=\"o\">(<\/span><span class=\"nc\">MyFloat<\/span> <span class=\"n\">otherFloat<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n        <span class=\"c1\">\/\/ ...<\/span>\n    <span class=\"o\">}<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>This pattern is sometimes used for builders, to support a fluent interface for a subclassed builder, such that it can return itself to continue chaining calls. It's also used behind the scenes for Java enums:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"c1\">\/\/from openjdk source code<\/span>\n<span class=\"kd\">public<\/span> <span class=\"kd\">abstract<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">Enum<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">E<\/span> <span class=\"kd\">extends<\/span> <span class=\"nc\">Enum<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">E<\/span><span class=\"o\">&gt;&gt;<\/span> <span class=\"kd\">implements<\/span> <span class=\"nc\">Constable<\/span><span class=\"o\">,<\/span> <span class=\"nc\">Comparable<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">E<\/span><span class=\"o\">&gt;,<\/span> <span class=\"nc\">Serializable<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ ...<\/span>\n<span class=\"o\">}<\/span>\n\n<span class=\"kd\">public<\/span> <span class=\"kd\">enum<\/span> <span class=\"nc\">MyEnum<\/span> <span class=\"o\">{<\/span>\n    <span class=\"no\">VAL1<\/span><span class=\"o\">,<\/span>\n    <span class=\"no\">VAL2<\/span>\n<span class=\"o\">}<\/span>\n\n<span class=\"c1\">\/\/ it's not legal to write code like this, but MyEnum does extend Enum&lt;MyEnum&gt;<\/span>\n<span class=\"kd\">public<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">MyEnum<\/span> <span class=\"kd\">extends<\/span> <span class=\"nc\">Enum<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">MyEnum<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">{<\/span> \n    <span class=\"c1\">\/\/ ...<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h2>\n  \n  \n  Generic type parameters for methods\n<\/h2>\n\n<p>Generic type parameters can also be applied directly to a method declaration:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">static<\/span> <span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"no\">T<\/span> <span class=\"nf\">firstElement<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">items<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">items<\/span><span class=\"o\">.<\/span><span class=\"na\">get<\/span><span class=\"o\">(<\/span><span class=\"mi\">0<\/span><span class=\"o\">);<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>As with classes, we can define upper bounds for the type parameters for methods:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">static<\/span> <span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span> <span class=\"kd\">extends<\/span> <span class=\"nc\">MyService<\/span> <span class=\"o\">&amp;<\/span> <span class=\"nc\">Closeable<\/span><span class=\"o\">&gt;<\/span> <span class=\"nc\">MyResourceWrapper<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"nf\">of<\/span><span class=\"o\">(<\/span><span class=\"no\">T<\/span> <span class=\"n\">input<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ etc...<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h2>\n  \n  \n  Generic type parameters are invariant and not reified\n<\/h2>\n\n<p>Unlike arrays, generics are not covariant - they are invariant. For example, the following will not compile:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">numbers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;();<\/span> <span class=\"c1\">\/\/ does not compile<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>This means that while an Integer is a subtype of Number, a list of Integers is not a subtype of a list of Numbers. If this were allowed in the same way as it is with arrays, we could introduce the wrong type of object, such as a Float, into the list, and the code would still compile.<\/p>\n\n<p>However, with generics the problem would be worse. Unlike arrays, the type information supplied via generics is not available at runtime. This is called type erasure. In general, we cannot do the following:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"k\">if<\/span> <span class=\"o\">(<\/span><span class=\"n\">someObject<\/span> <span class=\"k\">instanceof<\/span> <span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">String<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">strings<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ etc...<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>The best we could do is something like this:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"k\">if<\/span> <span class=\"o\">(<\/span><span class=\"n\">o<\/span> <span class=\"k\">instanceof<\/span> <span class=\"nc\">List<\/span><span class=\"o\">&lt;?&gt;<\/span> <span class=\"n\">someList<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ etc...<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>When generics were introduced in Java 5, the designers decided not to include the type information for objects with generic type parameters in the bytecode, in order to  maintain backward compatibility with older versions of Java. <\/p>\n\n<p>Therefore, unlike arrays, generics are not reified. Behind the scenes, the compiled bytecode still casts to the desired type. It's just that this casting is deemed safe given that the code has been compiled successfully. This remains the case in modern Java.<\/p>\n\n<p>Because generics are not reified, the runtime doesn't associate an instance of a collection with any particular type. If collections were covariant, that means we could successfully insert the wrong type of object into a collection at runtime, and we would only get a runtime error at some point in the future, when we tried to use that object later on. <\/p>\n\n<p>When the wrong type of object is successfully introduced at runtime like this, it's called heap pollution. Heap pollution can still occur in Java in a number of ways, e.g. when mixing generics with arrays and varargs. It can happen if the developer makes unsafe casts or uses raw collections, or via reflection as well. However, for the most part, generics help us to make our Java code typesafe.<\/p>\n\n<h2>\n  \n  \n  Variance and PECS\n<\/h2>\n\n<p>While generic type parameters are invariant, there is support for variance with generics in the form of wildcard type parameters. A well known acronym, PECS, which stands for \"producer extends, consumer super\" is often used as a mnemonic when thinking about variance. We will go into more detail to explain variance and this acronym below.<\/p>\n\n<h3>\n  \n  \n  Covariance\n<\/h3>\n\n<p>Wildcard type parameters cannot be used as part of a generic type declaration. That is to say, in Java, variance for generics is expressed at the use site.<\/p>\n\n<p>The following is an example of a wildcard type parameter being used for a variable declaration:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">extends<\/span> <span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">numbers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Float<\/span><span class=\"o\">&gt;();<\/span>\n<span class=\"n\">numbers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;();<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>In the above code, we can say that a list of Integers or Floats is a subtype of a list of covariant Numbers. <\/p>\n\n<p>Why is this useful? Let's say we've written a class <code>MyStack<\/code> which offers standard stack operations like <code>push<\/code> and <code>pop<\/code>. Now we wish to add a <code>pushAll<\/code> method which allows us to push multiple items at a time onto our stack. We could try something like this:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">MyStack<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ other methods like push, pop, etc. not shown<\/span>\n\n    <span class=\"kd\">public<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">pushAll<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">items<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n        <span class=\"k\">for<\/span> <span class=\"o\">(<\/span><span class=\"no\">T<\/span> <span class=\"n\">item<\/span> <span class=\"o\">:<\/span> <span class=\"n\">items<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n            <span class=\"n\">push<\/span><span class=\"o\">(<\/span><span class=\"n\">item<\/span><span class=\"o\">);<\/span>\n        <span class=\"o\">}<\/span>\n    <span class=\"o\">}<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>However, this means that if we have a stack of Numbers, we cannot push the items from a stack of Integers onto our stack, since generics are invariant:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">integers<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">List<\/span><span class=\"o\">.<\/span><span class=\"na\">of<\/span><span class=\"o\">(<\/span><span class=\"mi\">1<\/span><span class=\"o\">,<\/span> <span class=\"mi\">2<\/span><span class=\"o\">,<\/span> <span class=\"mi\">3<\/span><span class=\"o\">);<\/span>\n<span class=\"nc\">MyStack<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">numbers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">MyStack<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">numbers<\/span><span class=\"o\">.<\/span><span class=\"na\">pushAll<\/span><span class=\"o\">(<\/span><span class=\"n\">integers<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ does not compile<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>In principle, there is no harm in having integers on our stack, but the compiler cannot allow a <code>List&lt;Integer&gt;<\/code> where a <code>List&lt;Number<\/code>&gt; is expected, because this could cause heap pollution, as mentioned earlier.<\/p>\n\n<p>We can solve this problem with covariance:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">MyStack<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ other methods like push, pop, etc. not shown<\/span>\n\n    <span class=\"kd\">public<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">pushAll<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">extends<\/span> <span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">items<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n        <span class=\"k\">for<\/span> <span class=\"o\">(<\/span><span class=\"no\">T<\/span> <span class=\"n\">item<\/span> <span class=\"o\">:<\/span> <span class=\"n\">items<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n            <span class=\"n\">push<\/span><span class=\"o\">(<\/span><span class=\"n\">item<\/span><span class=\"o\">);<\/span>\n        <span class=\"o\">}<\/span>\n    <span class=\"o\">}<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Covariance with wildcards also lets us write code along the following lines:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">static<\/span> <span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"nf\">combine<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">extends<\/span> <span class=\"nc\">List<\/span> <span class=\"o\">&lt;?<\/span> <span class=\"kd\">extends<\/span> <span class=\"no\">T<\/span><span class=\"o\">&gt;&gt;<\/span> <span class=\"n\">listOfLists<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ etc...<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>In the above code, we are able to combine the supplied lists together, regardless of how many different subclasses of type <code>&lt;T&gt;<\/code> there may be.<\/p>\n\n<p>To prevent the issues that covariant arrays have, the compiler imposes a restriction on how wildcards can be used. In the case of the <code>pushAll<\/code> method, the compiler knows every individual item in <code>items<\/code> must be a number, so pushing onto our stack is typesafe. <\/p>\n\n<p>However, we don't know what is actually passed in - it could be a <code>List&lt;Number&gt;<\/code>, a <code>List&lt;Integer&gt;<\/code>, a <code>List&lt;Float<\/code>, etc. Because of this, the following code doesn't compile:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">static<\/span> <span class=\"kt\">double<\/span> <span class=\"nf\">averageOrDefault<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">extends<\/span> <span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span><span class=\"n\">numbers<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"o\">(<\/span><span class=\"n\">numbers<\/span><span class=\"o\">.<\/span><span class=\"na\">isEmpty<\/span><span class=\"o\">())<\/span> <span class=\"o\">{<\/span>\n        <span class=\"n\">numbers<\/span><span class=\"o\">.<\/span><span class=\"na\">add<\/span><span class=\"o\">(<\/span><span class=\"mi\">0<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ does not compile<\/span>\n    <span class=\"o\">}<\/span>\n    <span class=\"k\">return<\/span> <span class=\"nf\">average<\/span><span class=\"o\">(<\/span><span class=\"n\">numbers<\/span><span class=\"o\">);<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>The reason is that we could call this method with <code>List&lt;Integer&gt;<\/code> but also with some other lists of Numbers:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">integers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">averageOrDefault<\/span><span class=\"o\">(<\/span><span class=\"n\">integers<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ compiles<\/span>\n\n<span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">floats<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">averageOrDefault<\/span><span class=\"o\">(<\/span><span class=\"n\">floats<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ compiles<\/span>\n\n<span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">numbers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">averageOrDefault<\/span><span class=\"o\">(<\/span><span class=\"n\">numbers<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ also compiles<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>With a covariant generic type, <code>null<\/code> is the only valid argument that can be passed in to such a method, since <code>null<\/code> isn't specific to any particular type.<\/p>\n\n<p>That's the reason for the \"producer extends\" part of PECS. When we use covariance, we know any items we obtain will have the desired upper bound, but the compiler can't know for sure what the exact type is. We know that the producer can supply us with an instance that is a subtype of the upper bound on the type parameter, but we don't know which one, so the most specific we can get is to assign values to variables typed to the upper bound. Covariance is therefore used when we want to, in some sense, get items out. Hence we think of covariant generics as producers.<\/p>\n\n<h3>\n  \n  \n  Contravariance\n<\/h3>\n\n<p>Now we want to implement a <code>popAll<\/code> method for <code>MyStack<\/code>, which pops all items from our stack and adds them to the supplied list:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">MyStack<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ other methods like push, pop, etc. not shown<\/span>\n\n    <span class=\"kd\">public<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">popAll<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">items<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n        <span class=\"k\">while<\/span> <span class=\"o\">(!<\/span><span class=\"n\">isEmpty<\/span><span class=\"o\">())<\/span> <span class=\"o\">{<\/span>\n            <span class=\"no\">T<\/span> <span class=\"n\">popped<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pop<\/span><span class=\"o\">();<\/span>\n            <span class=\"n\">items<\/span><span class=\"o\">.<\/span><span class=\"na\">add<\/span><span class=\"o\">(<\/span><span class=\"n\">popped<\/span><span class=\"o\">);<\/span>\n        <span class=\"o\">}<\/span>\n    <span class=\"o\">}<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>The following won't compile:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Object<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">anything<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"nc\">MyStack<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">integers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">MyStack<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">push<\/span><span class=\"o\">(<\/span><span class=\"mi\">1<\/span><span class=\"o\">);<\/span>\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">push<\/span><span class=\"o\">(<\/span><span class=\"mi\">2<\/span><span class=\"o\">);<\/span>\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">push<\/span><span class=\"o\">(<\/span><span class=\"mi\">3<\/span><span class=\"o\">);<\/span>\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">popAll<\/span><span class=\"o\">(<\/span><span class=\"n\">anything<\/span><span class=\"o\">);<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Even though we can see that it's safe to add integers to a list of objects, the compiler won't allow this code to compile because generics are invariant. However, we can fix this by making the argument contravariant:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">MyStack<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ other methods like push, pop, etc. not shown<\/span>\n\n    <span class=\"kd\">public<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">popAll<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">super<\/span> <span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">items<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n        <span class=\"k\">while<\/span> <span class=\"o\">(!<\/span><span class=\"n\">isEmpty<\/span><span class=\"o\">())<\/span> <span class=\"o\">{<\/span>\n            <span class=\"no\">T<\/span> <span class=\"n\">popped<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pop<\/span><span class=\"o\">();<\/span>\n            <span class=\"n\">items<\/span><span class=\"o\">.<\/span><span class=\"na\">add<\/span><span class=\"o\">(<\/span><span class=\"n\">popped<\/span><span class=\"o\">);<\/span>\n        <span class=\"o\">}<\/span>\n    <span class=\"o\">}<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Now our code below will compile:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Object<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">anything<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n\n<span class=\"nc\">MyStack<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">integers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">MyStack<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">push<\/span><span class=\"o\">(<\/span><span class=\"mi\">1<\/span><span class=\"o\">);<\/span>\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">push<\/span><span class=\"o\">(<\/span><span class=\"mi\">2<\/span><span class=\"o\">);<\/span>\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">push<\/span><span class=\"o\">(<\/span><span class=\"mi\">3<\/span><span class=\"o\">);<\/span>\n\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">popAll<\/span><span class=\"o\">(<\/span><span class=\"n\">anything<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ compiles<\/span>\n\n<span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">numbers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">popAll<\/span><span class=\"o\">(<\/span><span class=\"n\">numbers<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ also compiles<\/span>\n\n<span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">moreIntegers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">popAll<\/span><span class=\"o\">(<\/span><span class=\"n\">moreIntegers<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ also compiles - super is inclusive<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>However, the following won't compile:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">floats<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">integers<\/span><span class=\"o\">.<\/span><span class=\"na\">popAll<\/span><span class=\"o\">(<\/span><span class=\"n\">floats<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ does not compile!<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Here we can see that contravariance allows us to safely feed items into the list, as long the type of object passed in is a subtype of the lower bound specified for the argument to the method being called. <\/p>\n\n<p>However, since we don't know precisely what type of list was passed in, if we want to call a method that returns an item from that list, all we can do is assign that item to <code>Object<\/code>:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Object<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">items<\/span><span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">items<\/span><span class=\"o\">.<\/span><span class=\"na\">add<\/span><span class=\"o\">(<\/span><span class=\"s\">\"hello\"<\/span><span class=\"o\">);<\/span>\n\n<span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">super<\/span> <span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">contravariantNumbers<\/span> <span class=\"o\">=<\/span> <span class=\"n\">items<\/span><span class=\"o\">;<\/span>\n<span class=\"n\">items<\/span><span class=\"o\">.<\/span><span class=\"na\">add<\/span><span class=\"o\">(<\/span><span class=\"mf\">3.14<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ can add any subtype of number<\/span>\n<span class=\"k\">for<\/span><span class=\"o\">(<\/span><span class=\"nc\">Object<\/span> <span class=\"n\">o<\/span> <span class=\"o\">:<\/span> <span class=\"n\">contravariantNumbers<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span> <span class=\"c1\">\/\/can only get Objects though<\/span>\n    <span class=\"nc\">System<\/span><span class=\"o\">.<\/span><span class=\"na\">out<\/span><span class=\"o\">.<\/span><span class=\"na\">println<\/span><span class=\"o\">(<\/span><span class=\"s\">\"o = \"<\/span> <span class=\"o\">+<\/span> <span class=\"n\">o<\/span><span class=\"o\">);<\/span> \n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Here we can say that a list of Objects is a subtype of a contravariant list of Numbers, so the variance goes in the opposite direction from Number being a subtype of Object, hence the \"contra\" in contravariance. That's why the following assignment makes sense:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">super<\/span> <span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">contravariantNumbers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Object<\/span><span class=\"o\">&gt;();<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>We can see that this is a mirror image of the situation with covariance. That's why contravariant types are thought of as consumers, i.e. the \"consumer super\" in PECS. With a contravariant type, we think about supplying items to it in some sense, hence it is a consumer. <\/p>\n\n<h3>\n  \n  \n  Invariance with unbounded wildcards\n<\/h3>\n\n<p>We can also specify an unbounded wildcard:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;?&gt;<\/span> <span class=\"n\">arbitraryList<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;();<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>This is useful when we don't care about the type of object. When specifying an unbounded wildcard, as with covariance we can't supply an argument other than null to methods that take parameters of the type that the class or method was parameterized to. Also, we can only assign the type parameterized values returned from methods to Object, as with contravariance. In this scenario, there a single specific type, so the code is still typesafe, but it doesn\u2019t matter what it is.<\/p>\n\n<p>Below are some examples where such wildcards make sense:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kt\">boolean<\/span> <span class=\"nf\">containsAll<\/span><span class=\"o\">(<\/span><span class=\"nc\">Collection<\/span><span class=\"o\">&lt;?&gt;<\/span> <span class=\"n\">c<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ etc...<\/span>\n<span class=\"o\">}<\/span>\n\n<span class=\"kd\">public<\/span> <span class=\"kd\">static<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">size<\/span><span class=\"o\">(<\/span><span class=\"nc\">Iterable<\/span><span class=\"o\">&lt;?&gt;<\/span> <span class=\"n\">iterable<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"c1\">\/\/ etc...<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h3>\n  \n  \n  Combining covariance and contravariance together\n<\/h3>\n\n<p>The following example pulls covariance and contravariance for generics together. We copy all of the items from <code>source<\/code> into <code>destination<\/code>.<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">static<\/span> <span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">copy<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">super<\/span> <span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">destination<\/span><span class=\"o\">,<\/span> <span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">extends<\/span> <span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">source<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"k\">for<\/span> <span class=\"o\">(<\/span><span class=\"no\">T<\/span> <span class=\"n\">item<\/span> <span class=\"o\">:<\/span> <span class=\"n\">source<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n        <span class=\"n\">destination<\/span><span class=\"o\">.<\/span><span class=\"na\">add<\/span><span class=\"o\">(<\/span><span class=\"n\">item<\/span><span class=\"o\">);<\/span>\n    <span class=\"o\">}<\/span>\n<span class=\"o\">}<\/span>\n\n<span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">integers<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">List<\/span><span class=\"o\">.<\/span><span class=\"na\">of<\/span><span class=\"o\">(<\/span><span class=\"mi\">1<\/span><span class=\"o\">,<\/span> <span class=\"mi\">2<\/span><span class=\"o\">,<\/span> <span class=\"mi\">3<\/span><span class=\"o\">);<\/span>\n<span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">numbers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">copy<\/span><span class=\"o\">(<\/span><span class=\"n\">numbers<\/span><span class=\"o\">,<\/span> <span class=\"n\">integers<\/span><span class=\"o\">);<\/span> <span class=\"c1\">\/\/ variance allows the types of the two arguments to be different<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Let's consider how the compiler treats type <code>T<\/code> in the above example. We pass in <code>integers<\/code> as the source, so we can infer that for this particular call, <code>Integer<\/code> must extend <code>T<\/code>. We pass in <code>numbers<\/code> as the destination, so <code>Number<\/code> must be a base type of <code>T<\/code>. Therefore, for this scenario, <code>T<\/code> must be <code>Number<\/code>. <\/p>\n\n<p>If we passed in a <code>List&lt;Float&gt;<\/code> as the destination, the code would not compile, since <code>T<\/code> would have to extend <code>Integer<\/code>. If we passed in <code>List&lt;Object&gt;<\/code> as the source, that also would not compile, since <code>T<\/code> must extend <code>Number<\/code>. <\/p>\n\n<p>In this case, if the source is <code>List&lt;Integer&gt;<\/code>, then destination must be one of <code>List&lt;Object&gt;<\/code>, <code>List&lt;Number&gt;<\/code>, or <code>List&lt;Integer&gt;<\/code>:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Object<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">objects<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;();<\/span>\n<span class=\"n\">copy<\/span><span class=\"o\">(<\/span><span class=\"n\">objects<\/span><span class=\"o\">,<\/span> <span class=\"n\">integers<\/span><span class=\"o\">);<\/span>\n\n<span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">numbers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;(<\/span><span class=\"nc\">List<\/span><span class=\"o\">.<\/span><span class=\"na\">of<\/span><span class=\"o\">(<\/span><span class=\"mi\">1<\/span><span class=\"o\">,<\/span> <span class=\"mf\">3.14<\/span><span class=\"o\">,<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">BigDecimal<\/span><span class=\"o\">(<\/span><span class=\"s\">\"50.500\"<\/span><span class=\"o\">));<\/span>\n<span class=\"n\">copy<\/span><span class=\"o\">(<\/span><span class=\"n\">numbers<\/span><span class=\"o\">,<\/span> <span class=\"n\">integers<\/span><span class=\"o\">);<\/span>\n\n<span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">moreIntegers<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">ArrayList<\/span><span class=\"o\">&lt;&gt;(<\/span><span class=\"nc\">List<\/span><span class=\"o\">.<\/span><span class=\"na\">of<\/span><span class=\"o\">(<\/span><span class=\"mi\">1<\/span><span class=\"o\">,<\/span> <span class=\"mi\">2<\/span><span class=\"o\">,<\/span> <span class=\"mi\">3<\/span><span class=\"o\">));<\/span>\n<span class=\"n\">copy<\/span><span class=\"o\">(<\/span><span class=\"n\">moreIntegers<\/span><span class=\"o\">,<\/span> <span class=\"n\">integers<\/span><span class=\"o\">);<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>We could also pass in variables with wildcards:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">extends<\/span> <span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">covariantNumbers<\/span> <span class=\"o\">=<\/span> <span class=\"n\">integers<\/span><span class=\"o\">;<\/span>\n<span class=\"nc\">List<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">super<\/span> <span class=\"nc\">Number<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">contravariantNumbers<\/span> <span class=\"o\">=<\/span> <span class=\"n\">objects<\/span><span class=\"o\">;<\/span>\n<span class=\"n\">copy<\/span><span class=\"o\">(<\/span><span class=\"n\">contravariantNumbers<\/span><span class=\"o\">,<\/span> <span class=\"n\">covariantNumbers<\/span><span class=\"o\">);<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h3>\n  \n  \n  Declaration vs. use-site variance\n<\/h3>\n\n<p>As we have seen, in Java, variance for generics can only be expressed at the use site via wildcards (e.g., <code>List&lt;? extends T&gt;<\/code>, <code>List&lt;? super T&gt;<\/code>). We cannot make a generic type parameter for a class or method covariant or contravariant. <\/p>\n\n<p>In another JVM language, Scala, we can actually specify variance at the declaration site, i.e. the declaration of the type parameter itself. <\/p>\n\n<p>In the code below, <code>Box<\/code> is covariant in T, i.e., <code>+T<\/code>. The compiler enforces type safety for all of its methods, without wildcards. Returning <code>T<\/code> is allowed, but accepting a <code>T<\/code> as a parameter is not:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight scala\"><code><span class=\"k\">class<\/span> <span class=\"nc\">Box<\/span><span class=\"o\">[<\/span><span class=\"kt\">+T<\/span><span class=\"o\">](<\/span><span class=\"k\">private<\/span> <span class=\"k\">var<\/span> <span class=\"n\">value<\/span><span class=\"k\">:<\/span> <span class=\"kt\">T<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">get<\/span><span class=\"k\">:<\/span> <span class=\"kt\">T<\/span> <span class=\"o\">=<\/span> <span class=\"n\">value<\/span> <span class=\"c1\">\/\/ compiles, T in covariant (return) position<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">set<\/span><span class=\"o\">(<\/span><span class=\"n\">newValue<\/span><span class=\"k\">:<\/span> <span class=\"kt\">T<\/span><span class=\"o\">)<\/span><span class=\"k\">:<\/span> <span class=\"kt\">Unit<\/span> <span class=\"o\">=<\/span> <span class=\"o\">{<\/span>\n      <span class=\"n\">value<\/span> <span class=\"k\">=<\/span> <span class=\"n\">newValue<\/span> <span class=\"c1\">\/\/ does not compile, covariant type T appears in contravariant position<\/span>\n    <span class=\"o\">}<\/span>\n    <span class=\"k\">override<\/span> <span class=\"k\">def<\/span> <span class=\"nf\">toString<\/span><span class=\"k\">:<\/span> <span class=\"kt\">String<\/span> <span class=\"o\">=<\/span> <span class=\"n\">s<\/span><span class=\"s\">\"Box($value)\"<\/span>\n<span class=\"o\">}<\/span>\n\n<span class=\"k\">object<\/span> <span class=\"nc\">VarianceDemo<\/span> <span class=\"k\">extends<\/span> <span class=\"nc\">App<\/span> <span class=\"o\">{<\/span>\n    <span class=\"k\">val<\/span> <span class=\"nv\">intBox<\/span><span class=\"k\">:<\/span> <span class=\"kt\">Box<\/span><span class=\"o\">[<\/span><span class=\"kt\">Integer<\/span><span class=\"o\">]<\/span> <span class=\"k\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">Box<\/span><span class=\"o\">[<\/span><span class=\"kt\">Integer<\/span><span class=\"o\">](<\/span><span class=\"nv\">Integer<\/span><span class=\"o\">.<\/span><span class=\"py\">valueOf<\/span><span class=\"o\">(<\/span><span class=\"mi\">123<\/span><span class=\"o\">))<\/span>\n    <span class=\"k\">val<\/span> <span class=\"nv\">numberBox<\/span><span class=\"k\">:<\/span> <span class=\"kt\">Box<\/span><span class=\"o\">[<\/span><span class=\"kt\">Number<\/span><span class=\"o\">]<\/span> <span class=\"k\">=<\/span> <span class=\"n\">intBox<\/span> <span class=\"c1\">\/\/ compiles Integer is a subtype of Number, and Box is covariant<\/span>\n\n    <span class=\"nf\">println<\/span><span class=\"o\">(<\/span><span class=\"nv\">intBox<\/span><span class=\"o\">.<\/span><span class=\"py\">get<\/span><span class=\"o\">)<\/span> <span class=\"c1\">\/\/ 123<\/span>\n    <span class=\"nf\">println<\/span><span class=\"o\">(<\/span><span class=\"nv\">numberBox<\/span><span class=\"o\">.<\/span><span class=\"py\">get<\/span><span class=\"o\">)<\/span> <span class=\"c1\">\/\/ 123 as a Number<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">printNumber<\/span><span class=\"o\">(<\/span><span class=\"n\">box<\/span><span class=\"k\">:<\/span> <span class=\"kt\">Box<\/span><span class=\"o\">[<\/span><span class=\"kt\">Number<\/span><span class=\"o\">])<\/span><span class=\"k\">:<\/span> <span class=\"kt\">Unit<\/span> <span class=\"o\">=<\/span>\n        <span class=\"nf\">println<\/span><span class=\"o\">(<\/span><span class=\"n\">s<\/span><span class=\"s\">\"Number inside: ${box.get}\"<\/span><span class=\"o\">)<\/span>\n\n    <span class=\"nf\">printNumber<\/span><span class=\"o\">(<\/span><span class=\"n\">intBox<\/span><span class=\"o\">)<\/span> <span class=\"c1\">\/\/ compiles due to covariance<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h3>\n  \n  \n  Type capture\n<\/h3>\n\n<p>Consider the following <code>swap<\/code> method. We don't really need to specify the type parameter for this method, since we are re-organizing items that are already in the collection. Therefore it makes sense to use an unbounded wildcard:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">static<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">swap<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;?&gt;<\/span> <span class=\"n\">list<\/span><span class=\"o\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">i<\/span><span class=\"o\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">j<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"n\">list<\/span><span class=\"o\">.<\/span><span class=\"na\">set<\/span><span class=\"o\">(<\/span><span class=\"n\">i<\/span><span class=\"o\">,<\/span> <span class=\"n\">list<\/span><span class=\"o\">.<\/span><span class=\"na\">set<\/span><span class=\"o\">(<\/span><span class=\"n\">j<\/span><span class=\"o\">,<\/span> <span class=\"n\">list<\/span><span class=\"o\">.<\/span><span class=\"na\">get<\/span><span class=\"o\">(<\/span><span class=\"n\">i<\/span><span class=\"o\">)));<\/span> <span class=\"c1\">\/\/ does not compile<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>This does pose a problem though. We can't get an item out of this list except as an Object, and that means we can't safely put that item back into the list at a different location. Whenever we use a wildcard like this, behind the scenes, Java assigns the type as an arbitrary synthetic type, such as <code>CAP#1<\/code>. It doesn't matter what this is exactly. We can write a utility method that binds this synthetic type to a type parameter. This is called capture conversion:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">static<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">swap<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;?&gt;<\/span> <span class=\"n\">list<\/span><span class=\"o\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">i<\/span><span class=\"o\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">j<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"n\">swapCapture<\/span><span class=\"o\">(<\/span><span class=\"n\">list<\/span><span class=\"o\">,<\/span> <span class=\"n\">i<\/span><span class=\"o\">,<\/span> <span class=\"n\">j<\/span><span class=\"o\">);<\/span>\n<span class=\"o\">}<\/span>\n<span class=\"kd\">private<\/span> <span class=\"kd\">static<\/span> <span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">swapCapture<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">list<\/span><span class=\"o\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">i<\/span><span class=\"o\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">j<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n    <span class=\"no\">T<\/span> <span class=\"n\">temp<\/span> <span class=\"o\">=<\/span> <span class=\"n\">list<\/span><span class=\"o\">.<\/span><span class=\"na\">get<\/span><span class=\"o\">(<\/span><span class=\"n\">i<\/span><span class=\"o\">);<\/span>\n    <span class=\"n\">list<\/span><span class=\"o\">.<\/span><span class=\"na\">set<\/span><span class=\"o\">(<\/span><span class=\"n\">i<\/span><span class=\"o\">,<\/span> <span class=\"n\">list<\/span><span class=\"o\">.<\/span><span class=\"na\">get<\/span><span class=\"o\">(<\/span><span class=\"n\">j<\/span><span class=\"o\">));<\/span>\n    <span class=\"n\">list<\/span><span class=\"o\">.<\/span><span class=\"na\">set<\/span><span class=\"o\">(<\/span><span class=\"n\">j<\/span><span class=\"o\">,<\/span> <span class=\"n\">temp<\/span><span class=\"o\">);<\/span>\n\n    <span class=\"c1\">\/\/ we could also implement this more concisely as follows<\/span>\n    <span class=\"c1\">\/\/ list.set(i, list.set(j, list.get(i)));<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>As we can see, the above code allows us to use a well-defined <code>temp<\/code> variable so that we can swap the items in a typesafe manner.<\/p>\n\n<p>We could declare the <code>swap<\/code> method with the type parameter <code>&lt;T&gt;<\/code> in the first place, avoiding the wildcards entirely. However, from an API design point of view, it is cleaner to use the wildcard here. If we are not referring to the same type in multiple places, it is a good practice to use wildcards.<\/p>\n\n<h3>\n  \n  \n  Do not return variables with wildcards\n<\/h3>\n\n<p>While using wildcards for method parameters is appropriate, it is not generally a good practice for the return type to use a wildcard.<\/p>\n\n<p>This limits what the client can do, and makes it harder to chain methods that don't expect wildcards. Variance should be something that adds flexibility to an API while maintaining type safety, but it should not be an unnecessary burden on the developer using that API.<\/p>\n\n<h3>\n  \n  \n  Additional examples\n<\/h3>\n\n<p>In the following example, the <code>max<\/code> method returns the largest value from the supplied collection:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">static<\/span> <span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span> <span class=\"kd\">extends<\/span> <span class=\"nc\">Object<\/span> <span class=\"o\">&amp;<\/span> <span class=\"nc\">Comparable<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">super<\/span> <span class=\"no\">T<\/span><span class=\"o\">&gt;&gt;<\/span> <span class=\"no\">T<\/span> <span class=\"nf\">max<\/span><span class=\"o\">(<\/span><span class=\"nc\">Collection<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">extends<\/span> <span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">coll<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n <span class=\"c1\">\/\/ ...<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>It makes sense to use a contravariant type for <code>Comparable&lt;? super T&gt;<\/code>, since we don't want to force <code>&lt;T&gt;<\/code> itself to implement the Comparable interface directly. One of its base classes or interfaces could do that instead.<\/p>\n\n<p>Similarly, the <code>sort<\/code> method below also takes a <code>Comparator&lt;? super T&gt;<\/code>. We could pass in a list of integers as <code>list<\/code> and a <code>Comparator&lt;Number&gt;<\/code> or <code>Comparator&lt;Object&gt;<\/code>. Both of these comparators can safely consume an Integer.<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">static<\/span> <span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">sort<\/span><span class=\"o\">(<\/span><span class=\"nc\">List<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">list<\/span><span class=\"o\">,<\/span> <span class=\"nc\">Comparator<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">super<\/span> <span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">c<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n <span class=\"c1\">\/\/ ...<\/span>\n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>In the following example from the <code>Optional<\/code> class, the <code>map<\/code> method takes a function that will be applied to the object already stored in the Optional. This function will be contravariant with respect to its argument and covariant with respect to its return values:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"kd\">public<\/span> <span class=\"kd\">final<\/span> <span class=\"kd\">class<\/span> <span class=\"nc\">Optional<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">{<\/span> \n    <span class=\"kd\">public<\/span> <span class=\"o\">&lt;<\/span><span class=\"no\">U<\/span><span class=\"o\">&gt;<\/span> <span class=\"nc\">Optional<\/span><span class=\"o\">&lt;<\/span><span class=\"no\">U<\/span><span class=\"o\">&gt;<\/span> <span class=\"nf\">map<\/span><span class=\"o\">(<\/span><span class=\"nc\">Function<\/span><span class=\"o\">&lt;?<\/span> <span class=\"kd\">super<\/span> <span class=\"no\">T<\/span><span class=\"o\">,<\/span> <span class=\"o\">?<\/span> <span class=\"kd\">extends<\/span> <span class=\"no\">U<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">mapper<\/span><span class=\"o\">)<\/span> <span class=\"o\">{<\/span>\n        <span class=\"nc\">Objects<\/span><span class=\"o\">.<\/span><span class=\"na\">requireNonNull<\/span><span class=\"o\">(<\/span><span class=\"n\">mapper<\/span><span class=\"o\">);<\/span>\n        <span class=\"k\">if<\/span> <span class=\"o\">(!<\/span><span class=\"n\">isPresent<\/span><span class=\"o\">())<\/span> <span class=\"o\">{<\/span>\n            <span class=\"k\">return<\/span> <span class=\"nf\">empty<\/span><span class=\"o\">();<\/span>\n        <span class=\"o\">}<\/span> <span class=\"k\">else<\/span> <span class=\"o\">{<\/span>\n            <span class=\"k\">return<\/span> <span class=\"nc\">Optional<\/span><span class=\"o\">.<\/span><span class=\"na\">ofNullable<\/span><span class=\"o\">(<\/span><span class=\"n\">mapper<\/span><span class=\"o\">.<\/span><span class=\"na\">apply<\/span><span class=\"o\">(<\/span><span class=\"n\">value<\/span><span class=\"o\">));<\/span>\n        <span class=\"o\">}<\/span>\n    <span class=\"o\">}<\/span>  \n<span class=\"o\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>The following makes sense for this use-case:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight java\"><code><span class=\"nc\">Function<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Number<\/span><span class=\"o\">,<\/span> <span class=\"nc\">String<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">mapper<\/span> <span class=\"o\">=<\/span> <span class=\"o\">(<\/span><span class=\"n\">num<\/span><span class=\"o\">)<\/span> <span class=\"o\">-&gt;<\/span> <span class=\"n\">num<\/span><span class=\"o\">.<\/span><span class=\"na\">toString<\/span><span class=\"o\">();<\/span>\n<span class=\"nc\">Optional<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">Integer<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">optionalInt<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">Optional<\/span><span class=\"o\">.<\/span><span class=\"na\">of<\/span><span class=\"o\">(<\/span><span class=\"mi\">3<\/span><span class=\"o\">);<\/span>\n<span class=\"nc\">Optional<\/span><span class=\"o\">&lt;<\/span><span class=\"nc\">CharSequence<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">optionalChars<\/span> <span class=\"o\">=<\/span> <span class=\"n\">optionalInt<\/span><span class=\"o\">.<\/span><span class=\"na\">map<\/span><span class=\"o\">(<\/span><span class=\"n\">mapper<\/span><span class=\"o\">);<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>In the above code, we can pass in any Number to our <code>mapper<\/code> function, so passing in an Integer is fine. We can also return any subtype of String, so assigning to a base type such as CharSequence is also fine.<\/p>\n\n<h1>\n  \n  \n  References\n<\/h1>\n\n<ul>\n<li><a href=\"https:\/\/www.oreilly.com\/library\/view\/effective-java-3rd\/9780134686097\/\" rel=\"noopener noreferrer\">Effective Java, by Joshua Bloch (Chapter 5, Generics)<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/openjdk\/\" rel=\"noopener noreferrer\">OpenJDK source code<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/google\/guava\" rel=\"noopener noreferrer\">Guava source code<\/a><\/li>\n<li>\n<a href=\"https:\/\/github.com\/palantir\/delegate-processors\" rel=\"noopener noreferrer\">Palantir delegate processors source code<\/a>\n*<a href=\"https:\/\/docs.scala-lang.org\/tour\/variances.html\" rel=\"noopener noreferrer\">Variance in Scala<\/a>\n<\/li>\n<\/ul>\n\n","category":["java","generics","oop","variance"]},{"title":"Performance and Scalability for Database-Backed Applications","pubDate":"Mon, 01 Jul 2024 05:26:27 +0000","link":"https:\/\/dev.to\/nestedsoftware\/performance-and-scalability-for-database-backed-applications-pca","guid":"https:\/\/dev.to\/nestedsoftware\/performance-and-scalability-for-database-backed-applications-pca","description":"<p>The following are some techniques that I've found to be both useful and practical over the years for scaling data-intensive applications.<\/p>\n\n<h1>\n  \n  \n  Split Large Jobs into Smaller Parallel Jobs\n<\/h1>\n\n<p>When processing large amounts of data, a very useful technique to improve performance is to break up a given job into smaller jobs that can run in parallel. Once all of the jobs have completed, the partial results can be integrated together. <\/p>\n\n<p>Keep in mind that when the parallel jobs are hitting the same database, locking can become a bottleneck. Also, this approach requires that you be wary of over-taxing the database, since all of these sessions will be running concurrently.<\/p>\n\n<p>For very high scalability, this type of idea can be implemented with tools like <a href=\"https:\/\/www.talend.com\/resources\/what-is-mapreduce\" rel=\"noopener noreferrer\">MapReduce<\/a>.<\/p>\n\n<h1>\n  \n  \n  Pre-fetch Data Before Processing\n<\/h1>\n\n<p>I\/O latency is a common cause of performance obstacles. Replacing multiple calls to the database with a single call is often helpful. <\/p>\n\n<p>Here you would pre-load data from the database and cache it in memory. That way the data can be used\/reused without requiring separate round trips to the database.<\/p>\n\n<p>It's important to keep in mind the possibility that the cached data may be updated during processing, which may or may not have ramification for the given use case.<\/p>\n\n<p>Storing large amounts of data in RAM also increases the resource usage of the application, so it's important to consider the tradeoffs between performance and memory usage.<\/p>\n\n<h1>\n  \n  \n  Batch Multiple SQL Executions into a Single Call\n<\/h1>\n\n<p>Consider a batch data import job. The job may repeatedly execute SQL statements to persist data in a loop. You can instead collect a certain amount of data within the application, and issue a single SQL call to the database. This again reduces the amount of I\/O required. <\/p>\n\n<p>One issue with this approach is that a single failure will cause the entire transaction to rollback. When a batch fails, you can re-run each item in that batch again one at a time so that the rest of the data can still be persisted, and only the failing records will produce an error.<\/p>\n\n<blockquote>\n<p>Note: If you're sending individual SQL statements in a loop, you can also set the database commit frequency so as to commit in batches rather than for each individual row.<\/p>\n<\/blockquote>\n\n<h1>\n  \n  \n  Optimize SQL Queries\n<\/h1>\n\n<p>Working with relational databases can be somewhat of an art form. When queries perform poorly, it can be helpful to deeply understand the execution plan used by the database engine and to improve the SQL based on that information. <\/p>\n\n<p>Rewriting inefficient SQL queries as well as reviewing indexes associated with the tables in the query can help to improve performance. In Oracle, one can add database hints to help improve queries, though personally I prefer to avoid that as much as possible.<\/p>\n\n<h1>\n  \n  \n  Use Separate Databases\/Schemas\n<\/h1>\n\n<p>Having a single large database can be convenient, but it also can introduce performance problems when there are huge numbers of rows in important tables. For example, let's say a b2b enterprise application is used by many different companies. Having a separate database or schema for each company can significantly improve performance. <\/p>\n\n<p>Such partitioning also makes it easier to maintain security so that a company's data won't be accidentally accessed by the wrong users.<\/p>\n\n<blockquote>\n<p>When data is broken up across multiple schemas, it may make sense to aggregate it into a single database that can be used for management and analytics - in the example above this database would have information about all of the companies in the system.<\/p>\n<\/blockquote>\n\n<h1>\n  \n  \n  Refactor Database Structure\n<\/h1>\n\n<p>In some cases, the structure of the database tables can reduce performance significantly. <\/p>\n\n<p>Sometimes breaking up a single table into multiple tables can help (this is known as normalizing the tables), as the original table structure may have a large number of nullable columns. <\/p>\n\n<p>In other cases, it may be helpful to go the other way and de-normalize tables (combine data from multiple tables into a single table). This allows data to be retrieved all at once, without requiring joins. Instead of fully denormalizing the data, it may be preferable to use a materialized view.<\/p>\n\n<p>Working with the indexes available on database tables can also be helpful. In general we want to avoid using indexes too much when reading large amounts of data. We also want to keep in mind that indexes increase the cost for updates to the database even as they improve reads. If we occasionally read data but frequently update that data, improving the performance of the former at the expense of the latter may be a bad idea. <\/p>\n\n<h1>\n  \n  \n  Organize Transactions into Sagas\n<\/h1>\n\n<p>Database transactions can have a significant impact on performance, so keeping transactions small is a good idea. <\/p>\n\n<p>It may be possible to break up long-running transactions into multiple transactions. What was once a single transaction becomes known as a saga. <\/p>\n\n<p>For example, let\u2019s say you\u2019re building an application that handles purchases. You can save an order in an unapproved state, and then move the order through to completion in multiple steps where each step is a separate transaction. <\/p>\n\n<p>With sagas, it's important to understand that the database will now have data that may later be deemed invalid - e.g. a pending order may end up not being finalized. In some cases, data that has been persisted may need to be undone at the application level rather than relying on the transaction rollback - this is known as backward recovery. Alternatively, it may be possible to fix the problems that caused the initial failure and to keep the saga going - this is called forward recovery (see <a href=\"https:\/\/docs.aws.amazon.com\/prescriptive-guidance\/latest\/cloud-design-patterns\/saga.html\" rel=\"noopener noreferrer\">Saga Patterns<\/a>).<\/p>\n\n<h1>\n  \n  \n  Separate Transactional Processing from Reporting and Analytics\n<\/h1>\n\n<p>There is a fundamental tradeoff in database optimization when managing small transactions vs. running large reports (see <a href=\"https:\/\/aws.amazon.com\/compare\/the-difference-between-olap-and-oltp\/\" rel=\"noopener noreferrer\">OLTP vs. OLAP<\/a>). <\/p>\n\n<p>When running large and complex reports, it can be helpful to maintain a reporting database that can be used just for executing reports (this can be generalized to a data warehouse). In the meantime, a transactional database can continue to be used separately by the main application logic. <\/p>\n\n<p>A variation on this idea is to implement <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/patterns\/cqrs\" rel=\"noopener noreferrer\">CQRS<\/a>, a pattern where we use one model for write operations and another one for read operations. Usually there are separate databases for reads and writes. <\/p>\n\n<p>In both cases, the distributed nature of the databases means that changes that occur on the write side as part of a transaction won't be visible immediately on the read side - this is known as eventual consistency (see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Eventual_consistency\" rel=\"noopener noreferrer\">Eventual Consistency<\/a>).<\/p>\n\n<h1>\n  \n  \n  Split Monolith into (Micro)services\n<\/h1>\n\n<p>We can take the previously mentioned idea of partitioning the database further by breaking up an application into multiple applications, each with its own database. In this case each application will communicate with the others via something like <a href=\"https:\/\/blog.postman.com\/rest-api-examples\" rel=\"noopener noreferrer\">REST<\/a>, RPC (e.g. <a href=\"https:\/\/grpc.io\" rel=\"noopener noreferrer\">gRPC<\/a>), or a message queue (e.g. <a href=\"https:\/\/redis.io\" rel=\"noopener noreferrer\">Redis<\/a>, <a href=\"https:\/\/kafka.apache.org\/intro\" rel=\"noopener noreferrer\">Kafka<\/a>, or <a href=\"https:\/\/www.rabbitmq.com\" rel=\"noopener noreferrer\">RabbitMQ<\/a>).<\/p>\n\n<p>This approach offers advantages, such as more flexible development and deployment (you can develop and deploy each microservice separately). It also offers scaling benefits, since services can be orchestrated to run in different geographies, and instances of running services can be added and removed dynamically based on usage (e.g. using orchestration tools like <a href=\"https:\/\/docs.docker.com\/engine\/swarm\/key-concepts\/\" rel=\"noopener noreferrer\">Docker Swarm<\/a> and <a href=\"https:\/\/kubernetes.io\/\" rel=\"noopener noreferrer\">Kubernetes<\/a>). <\/p>\n\n<p>The data for a given service can be managed more efficiently - both in terms of the amount of data and the way it is structured, since it is specific to that service.<\/p>\n\n<p>Of course services also present many challenges. Modifying a service may cause bugs in other services that depend on it. It can also be difficult to understand the overall behaviour of the system when a workflow crosses many service boundaries Even something that sounds as simple as local testing can become more complex, as a given workflow may require deploying a variety of different services. <\/p>\n\n<p>There can be surprising bottlenecks as well. I find this video about Netflix's migration to microservices is still very relevant: <\/p>\n\n<p><iframe width=\"710\" height=\"399\" src=\"https:\/\/www.youtube.com\/embed\/CZ3wIuvmHeM\">\n<\/iframe>\n<\/p>\n\n<p>With separate databases for each service, we can no longer guarantee the same type of consistency that we get with single transactions against a relational database.<\/p>\n\n<p>All in all, my advice is to be aware of the difficulties that services present and to take a realistic and clear eyed view of the various tradeoffs involved. <\/p>\n\n<p>If you'd like to learn more about microservices and service-oriented architecture, I recommend reading <a href=\"https:\/\/www.oreilly.com\/library\/view\/monolith-to-microservices\/9781492047834\/\" rel=\"noopener noreferrer\">Monolith to Microservices<\/a>, by Sam Newman.<\/p>\n\n<h1>\n  \n  \n  References\n<\/h1>\n\n<ul>\n<li><a href=\"https:\/\/www.talend.com\/resources\/what-is-mapreduce\" rel=\"noopener noreferrer\">MapReduce<\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/prescriptive-guidance\/latest\/cloud-design-patterns\/saga.html\" rel=\"noopener noreferrer\">Saga Patterns<\/a><\/li>\n<li><a href=\"https:\/\/aws.amazon.com\/compare\/the-difference-between-olap-and-oltp\" rel=\"noopener noreferrer\">OLTP vs. OLAP<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/patterns\/cqrs\" rel=\"noopener noreferrer\">CQRS<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Eventual_consistency\" rel=\"noopener noreferrer\">Eventual Consistency<\/a><\/li>\n<li><a href=\"https:\/\/blog.postman.com\/rest-api-examples\" rel=\"noopener noreferrer\">REST<\/a><\/li>\n<li><a href=\"https:\/\/grpc.io\" rel=\"noopener noreferrer\">gRPC<\/a><\/li>\n<li><a href=\"https:\/\/redis.io\" rel=\"noopener noreferrer\">Redis<\/a><\/li>\n<li><a href=\"https:\/\/kafka.apache.org\/intro\" rel=\"noopener noreferrer\">Kafka<\/a><\/li>\n<li><a href=\"https:\/\/www.rabbitmq.com\" rel=\"noopener noreferrer\">RabbitMQ<\/a><\/li>\n<li><a href=\"https:\/\/docs.docker.com\/engine\/swarm\/key-concepts\" rel=\"noopener noreferrer\">Docker Swarm<\/a><\/li>\n<li><a href=\"https:\/\/kubernetes.io\" rel=\"noopener noreferrer\">Kubernetes<\/a><\/li>\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=CZ3wIuvmHeM\" rel=\"noopener noreferrer\">Mastering Chaos - A Netflix Guide to Microservices<\/a><\/li>\n<li><a href=\"https:\/\/www.oreilly.com\/library\/view\/monolith-to-microservices\/9781492047834\" rel=\"noopener noreferrer\">Monolith to Microservices<\/a><\/li>\n<\/ul>\n\n","category":["database","sql","performance","scalability"]},{"title":"Software Dev Interviews: What I look for in a Candidate","pubDate":"Tue, 02 Jan 2024 03:59:54 +0000","link":"https:\/\/dev.to\/nestedsoftware\/software-dev-interviews-what-i-look-for-in-a-candidate-25h6","guid":"https:\/\/dev.to\/nestedsoftware\/software-dev-interviews-what-i-look-for-in-a-candidate-25h6","description":"<p>I've been interviewing candidates for software developer positions for the past several years. It has been an interesting experience, and I thought I'd put down some thoughts about what I look for in a candidate, as well as how I perceive the current hiring\/interviewing landscape.<\/p>\n\n<p>I think the present interview processes used by many companies tend to lean too heavily on so-called DSA problems, i.e. data structures and algorithms. While I don't think improving one's skills at such problems is a bad thing, I do believe that emphasizing these problems doesn't actually lead to finding the most promising developers. <\/p>\n\n<p>These types of questions are attractive because the results are easy to measure, but just because assigning a value is easy, it doesn't mean that the result corresponds well to the day-to-day work of a software developer. To me it's a bit like evaluating marathon runners by having them run a 100 metre sprint. <\/p>\n\n<p>When I interview people, I try to look for certain qualities. Below I'll summarize the things I try to identify when I am interviewing a candidate.<\/p>\n\n<h1>\n  \n  \n  Intrinsic Motivation\n<\/h1>\n\n<p>Being self-motivated means that a person's behaviour is driven more by their internal desire to perform well rather than by getting ahead. That's not to say that caring about compensation or career progression is bad, but if that is always the primary motivator, I consider that to be a negative.<\/p>\n\n<p>Someone with intrinsic motivation will have an interest in software that goes beyond their job, and their portfolio will include interesting self-initiated projects. While this may be a somewhat controversial opinion, I don't think it's ideal if someone looks at software development purely as a career path, with no interest in it beyond the workplace. <\/p>\n\n<p>That doesn't mean work-life balance isn't important: It's worthwhile to spend time with family and friends, be physically active, etc., but, if there is no innate interest in coding, I believe that will significantly limit a person's growth in the long term. <\/p>\n\n<p>Unfortunately, I find that it's not uncommon for recent grads to aim for goals in an arbitrary way. In school, they focus on grades for their own sake, and after graduation, that same pattern appears to continue.<\/p>\n\n<p>I think of intrinsic motivation as being a composite with the following constituents:<\/p>\n\n<h2>\n  \n  \n  Curiosity\n<\/h2>\n\n<p>Curiosity is the desire to learn, both about technology and about the domain. This includes doing one's own research to acquire a greater depth of knowledge and understanding, with a focus on fundamentals.<\/p>\n\n<p>This means not waiting to be spoon-fed an answer, and not to blindly trust received information, but to explore on one's own, bringing new understanding for oneself, but also for the benefit of the rest of the team.<\/p>\n\n<h2>\n  \n  \n  Initiative\n<\/h2>\n\n<p>Initiative is the willingness to take things on without having to be told to do so, and to tackle problems more independently.<\/p>\n\n<p>I like to see an overall attitude of being willing to do some leg work and not to expect to always be given a series of steps to follow by rote. <\/p>\n\n<p>One should not simply stop when a challenge presents itself. This doesn't mean one should keep spinning one's wheels for long periods of time: It's okay to put in a reasonable effort, identify some specific areas that are causing trouble, and then to ask for help, but this should be done in a focused way.<\/p>\n\n<p>In the workplace, an individual with initiative will identify pain points in the system and will be willing to make improvements themselves rather than waiting for someone else to do it.  <\/p>\n\n<h2>\n  \n  \n  Enthusiasm\n<\/h2>\n\n<p>I like to see a sense of excitement about the possibilities of building something great. This quality is related to many of the points above. <\/p>\n\n<p>To be a good software developer, just being smart is, in my opinion, not good enough. Wanting to develop software, and to achieve a high quality in the software being built, is extremely important. <\/p>\n\n<h1>\n  \n  \n  Communication\n<\/h1>\n\n<p>In addition to the above qualities, being able to communicate clearly (in code and otherwise) is both significant, and in my experience, rare. <\/p>\n\n<p>I often find that understanding a piece of code can be more laborious not because it's so inherently complex, but because it's written in a way that doesn't do a good job of expressing the intent. <\/p>\n\n<p>In design, there should be enough abstraction to separate logic into a modular structure, but without unnecessary complications and overly convoluted logic. <\/p>\n\n<p>In interviews, I examine the candidate's ability explain their thoughts as they are working on a problem as well as their ability to write code that fulfills its objective as directly as possible.<\/p>\n\n<h1>\n  \n  \n  Problem solving\n<\/h1>\n\n<p>At the end of the day, solving problems is still important, and the DSA-style problems attempt to target this aspect of software development. <\/p>\n\n<p>In this regard, I want to see an ability to solve problems that are appropriate for the domain and for the candidate to demonstrate an ability to research appropriate techniques. <\/p>\n\n<p>I'm not sure it's so critical to find the answer as it is to think through the possibilities systematically - and from my point of view, being able to search for a suitable solution online is also valuable. Having the ability to evaluate different solutions or explanations for a given problem, and understanding how to adapt a given technique or piece of code to the task at hand is often more efficient and more reliable in the real world than re-inventing the wheel. <\/p>\n\n<p>I do think that emphasizing DSA problems in interviews poses some challenges. It may give the candidate the wrong idea about the type of work they'll be doing. If someone is especially motivated by small tricky problems, they may not enjoy the reality of everyday software development - and someone who is not motivated will not do a good job, regardless of how capable they may be in principle. <\/p>\n\n<p>Getting too accustomed to such problems can also lead to a passive posture of accepting constraints and powering through them, whereas in real life one can often simplify a problem by relaxing or modifying some of the constraints. What really matters is whether the software meets its objective from the point of view of the key stakeholders and users.<\/p>\n\n<p>When people spend huge amounts of time working on DSA problems, it also incurs a significant opportunity cost: People could be using that time to come up with, and work on, their own projects. Personally, I think that\u2019s generally more valuable<\/p>\n\n<p>The fact that these types of DSA assessments are so ubiquitous also leads to a kind of arms race where the problems need to get harder over time, since candidates are actively preparing for them. At  a certain point, the results don't measure someone's talent or potential. Rather, they are an indication of how much time a candidate has spent on such problems. <\/p>\n\n<h1>\n  \n  \n  Alternatives to DSA\n<\/h1>\n\n<p>My preference for interviews includes things like the following:<\/p>\n\n<ul>\n<li>General programming: Evaluating the candidate at a a broader task that's relevant to the domain (possibly pair programming), including working out tactical aspects of the problem like efficiency, but also how to structure and organize the code, as well as some higher-level strategic aspects of the design. If a problem can incorporate elements of finding needed information the interviewee is not familiar with, that\u2019s a nice bonus.<\/li>\n<li>Take-home assignments, where there is less time pressure. <\/li>\n<li>Lastly, using the interview to go over some of the candidate's own projects with them appeals to me.<\/li>\n<\/ul>\n\n<p>That said, none of these approaches is a panacea. One great advantage of DSA problems (and similar puzzle-oriented questions) is that an organization can collect a database of thousands of such problems, neatly categorized both by difficulty and topic area. While preparing for such problems helps, if a company plays its cards right, it's less likely that a candidate will know the exact solution ahead of time for a given question, even if they've looked at a lot of problems. <\/p>\n\n<p>It's a lot harder to set up a large bank of more general programming or take-home exercises that can be deemed comparable to one another in terms of difficulty. For take-home assignments, the company is also imposing a significant burden on the candidate to work on something offline, which may not always be fair - and it's also easier to game. As for personal projects, if evaluating them becomes common, it will also encourage candidates to game the system.<\/p>\n\n<p>Unfortunately there is an inherent tension between making something objective and systematic on one hand, but also difficult to game on the other hand. A relatively small company may be able to get away with an interview that's a bit more subjective, and not have to worry so much about candidates finding out what the questions will be ahead of time. However, for higher-profile companies, candidates are constantly trying to find out what they can do do pass the interviews, so scaling the evaluation process will always remain a challenge. <\/p>\n\n<p>I do think that a more multifaceted approach, one that doesn't over-emphasize DSA, with several different types of questions given equal weight, is best. It's also important to find ways to measure how effective newly hired employees are, a year, two years, later, and to feed that information back into the interview process - although performance evaluations pose their own challenges. <\/p>\n\n<h1>\n  \n  \n  Conclusion\n<\/h1>\n\n<p>I find the following video showing a competition between Magnus Mitb\u00f8, a world class climber, and the \"Norwegian Hulk\", a strongman competitor, interesting: It shows that even something that may seem simple, like physical strength, is multi-dimensional, and not amenable to a simple scale going from \"weaker\" to \"stronger\". This is certainly true for software development, or any complex task for that matter.<\/p>\n\n<p><iframe width=\"710\" height=\"399\" src=\"https:\/\/www.youtube.com\/embed\/m60zLmpboqc\">\n<\/iframe>\n<\/p>\n\n<p>It's important to recognize that a kind of monoculture in hiring where everyone has the same strengths and weaknesses will expose vulnerabilities and gaps within an organization, so it's very valuable to appreciate the variety of talents different people have and to hire with that in mind.<\/p>\n\n","category":["interview","hiring","career"]},{"title":"Thoughts on 10x Developers","pubDate":"Mon, 11 Oct 2021 00:34:18 +0000","link":"https:\/\/dev.to\/nestedsoftware\/thoughts-on-10x-developers-1iom","guid":"https:\/\/dev.to\/nestedsoftware\/thoughts-on-10x-developers-1iom","description":"<p>The concept of the 10x developer was popularized by <a href=\"https:\/\/en.wikipedia.org\/wiki\/Steve_McConnell\" rel=\"noopener noreferrer\">Steve McConnell<\/a> in his book, Rapid Development, published in the 1990s. Steve highlighted some research into software engineering that suggested top developers could offer an order-of-magnitude improvement in productivity over average performers. <\/p>\n\n<p>It is not easy to draw firm conclusions from this type of research. It is difficult to generalize the meaning of the results beyond the particular tasks associated with a specific study. Nonetheless, I would say these results are roughly consistent with my own anecdotal experience. In any field, there are outliers who can make things look easy that would be very hard for the average practitioner. This is accomplished both by dint of great effort put in over years, but also because some people have the good fortune of being extremely talented. <\/p>\n\n<p>Software development is no different from any other field. I've encountered great programmers who would be several times as productive as I am for many kinds of tasks. Also, such people can solve problems that may be entirely beyond my capabilities. Reasonable people can disagree about the 10x number, but brilliant people do exist. <\/p>\n\n<p>I think the trouble started when \"10x developer\" increasingly became a buzzword in the tech industry in the 2000s. It became associated with the image of a kind of arrogant male tech-bro. I believe it also came to be used as justification to dismiss the need for diversification in software engineering. I completely reject this kind of idea. Human potential comes in many forms, and we can nurture and cultivate talent from many different backgrounds. <\/p>\n\n<p>However, I do sense something in the zeitgeist that there is also an undercurrent of thought that 10x developers simply don't exist - or that any apparent increase in productivity comes only from cutting corners. That's also not true, and I believe it does a disservice to the reality of individual differences.<\/p>\n\n<p>The truth is that there are two axes to consider. On one axis, we have talent level. On the other, there are positive vs. negative personality traits. The two axes  are somewhat independent. Talented (and untalented) jerks exist. But many talented people are also modest and kind. I believe that somehow mythologizing of 10x developers who act like jerks came from some high profile examples - people like John Carmack and Bill Gates in their youth do fit the profile. However, these stereotypes need not be representative. We tend to notice these cases because people who are both exceptional and jerks tend to stand out, but that doesn't mean it is something to celebrate or emulate. <\/p>\n\n","category":["10xdeveloper","discuss","programming"]},{"title":"Recursion with the Y Combinator","pubDate":"Wed, 04 Aug 2021 15:27:49 +0000","link":"https:\/\/dev.to\/nestedsoftware\/recursion-with-the-y-combinator-ai4","guid":"https:\/\/dev.to\/nestedsoftware\/recursion-with-the-y-combinator-ai4","description":"<p>In this article, we'll introduce a higher-order function called the Y combinator. It's immediately recognizable thanks to the famous <a href=\"https:\/\/www.ycombinator.com\/\" rel=\"noopener noreferrer\">startup incubator<\/a> of the same name, but what is this strange sounding term all about? <\/p>\n\n<p>In most languages, recursion is supported directly for named functions. For example, the following <code>factorial<\/code> function written in JavaScript calls itself recursively:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight javascript\"><code><span class=\"kd\">const<\/span> <span class=\"nx\">factorial<\/span> <span class=\"o\">=<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">&gt;<\/span> <span class=\"mi\">1<\/span> <span class=\"p\">?<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">*<\/span> <span class=\"nf\">factorial<\/span><span class=\"p\">(<\/span><span class=\"nx\">n<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"p\">:<\/span> <span class=\"mi\">1<\/span>\n<span class=\"nf\">factorial<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span> <span class=\"c1\">\/\/ 120<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Lambdas, i.e. anonymous functions, generally don't have built-in support for recursion, but since they should be used when the logic is simple (and extracted to a named function otherwise), it's unlikely one would want to make a recursive call in a lambda. <\/p>\n\n<p>Therefore, making recursive calls as above is the way to go. However, let's pretend we can't use recursion directly. As long as our language has support for functions as first-class citizens (they can be assigned to variables, passed in as arguments, and returned like any other object), we can still implement recursion ourselves. One nice way to do so is with a higher-order function called the Y combinator. The name sounds intimidating, but it's just a higher-order function, a function that wraps around another function.<\/p>\n\n<p>Instead of making a recursive call directly as we did earlier, we will modify our <code>factorial<\/code> function so that it calls a callback function. This callback function will be responsible for calling back into the <code>factorial<\/code> function to complete a recursive call. Our <code>factorial<\/code> function will therefore now have an additional parameter, <code>recurse<\/code>:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight javascript\"><code><span class=\"kd\">const<\/span> <span class=\"nx\">factorial<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">recurse<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">&gt;<\/span> <span class=\"mi\">1<\/span> <span class=\"p\">?<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">*<\/span> <span class=\"nf\">recurse<\/span><span class=\"p\">(<\/span><span class=\"nx\">n<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"p\">:<\/span> <span class=\"mi\">1<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>In the above function, instead of calling <code>factorial<\/code> directly, we call the <code>recurse<\/code> callback.<\/p>\n\n<p>What should this callback look like? We can consider a <code>callRecursively<\/code> function that looks something like the following:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight javascript\"><code><span class=\"kd\">const<\/span> <span class=\"nx\">callRecursively<\/span> <span class=\"o\">=<\/span> <span class=\"nx\">target<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">args<\/span> <span class=\"o\">=&gt;<\/span>\n                            <span class=\"nf\">target<\/span><span class=\"p\">(<\/span><span class=\"nx\">args2<\/span> <span class=\"o\">=&gt;<\/span>\n                                <span class=\"nf\">target<\/span><span class=\"p\">(<\/span><span class=\"nx\">args3<\/span> <span class=\"o\">=&gt;<\/span> \n                                    <span class=\"nf\">target<\/span><span class=\"p\">(...)(<\/span><span class=\"nx\">args3<\/span><span class=\"p\">))(<\/span><span class=\"nx\">args2<\/span><span class=\"p\">))(<\/span><span class=\"nx\">args<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>When we call our target (the <code>factorial<\/code> function in our case), we need to pass a callback to it that accepts the next parameter that the target will be called with. However, we run into a problem of infinite regress. For each call, we have to to keep supplying a new callback. <\/p>\n\n<p>It turns out there is a clever trick that helps us get around this limitation. We can create a function and then call that function with itself as its own argument! In JavaScript, we use an <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Glossary\/IIFE\" rel=\"noopener noreferrer\">IIFE<\/a> to do so. Below is an example of the mechanism we'll use:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight javascript\"><code><span class=\"p\">(<\/span><span class=\"nx\">f<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nf\">f<\/span><span class=\"p\">(<\/span><span class=\"nx\">f<\/span><span class=\"p\">))(<\/span><span class=\"nb\">self<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">console<\/span><span class=\"p\">.<\/span><span class=\"nf\">log<\/span><span class=\"p\">(<\/span><span class=\"nb\">self<\/span><span class=\"p\">));<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>We supply the lambda <code>self =&gt; console.log(self)<\/code> as an argument to the self-executing lambda <code>(f =&gt; f(f))<\/code>. When we run this code (e.g. in the browser console), we see that the variable <code>self<\/code> refers to the very function it is being passed into as a parameter:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight javascript\"><code><span class=\"o\">&gt;<\/span> <span class=\"p\">(<\/span><span class=\"nx\">f<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nf\">f<\/span><span class=\"p\">(<\/span><span class=\"nx\">f<\/span><span class=\"p\">))(<\/span><span class=\"nb\">self<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">console<\/span><span class=\"p\">.<\/span><span class=\"nf\">log<\/span><span class=\"p\">(<\/span><span class=\"nb\">self<\/span><span class=\"p\">));<\/span>\n<span class=\"nb\">self<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">console<\/span><span class=\"p\">.<\/span><span class=\"nf\">log<\/span><span class=\"p\">(<\/span><span class=\"nb\">self<\/span><span class=\"p\">)<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>We will use this idea to solve our problem of infinite regress. We define a function we'll call Y (for Y combinator) that takes a target function (e.g. <code>factorial<\/code>) and the parameters for that target function as arguments. Our Y combinator function will then call the target function, supplying a callback for the target function to invoke when it wants to make a recursive call. The complete code is below:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight javascript\"><code><span class=\"kd\">const<\/span> <span class=\"nx\">Y<\/span> <span class=\"o\">=<\/span> <span class=\"nx\">target<\/span> <span class=\"o\">=&gt;<\/span> \n              <span class=\"nx\">args<\/span> <span class=\"o\">=&gt;<\/span> \n                  <span class=\"p\">(<\/span><span class=\"nx\">f<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nf\">f<\/span><span class=\"p\">(<\/span><span class=\"nx\">f<\/span><span class=\"p\">))(<\/span><span class=\"nb\">self<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nf\">target<\/span><span class=\"p\">(<\/span><span class=\"nx\">a<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nf\">self<\/span><span class=\"p\">(<\/span><span class=\"nb\">self<\/span><span class=\"p\">)(<\/span><span class=\"nx\">a<\/span><span class=\"p\">)))(<\/span><span class=\"nx\">args<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kd\">const<\/span> <span class=\"nx\">factorial<\/span> <span class=\"o\">=<\/span> <span class=\"nx\">recurse<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">&gt;<\/span> <span class=\"mi\">1<\/span> <span class=\"p\">?<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">*<\/span> <span class=\"nf\">recurse<\/span><span class=\"p\">(<\/span><span class=\"nx\">n<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"p\">:<\/span> <span class=\"mi\">1<\/span><span class=\"p\">;<\/span>\n\n<span class=\"nc\">Y<\/span><span class=\"p\">(<\/span><span class=\"nx\">factorial<\/span><span class=\"p\">)(<\/span><span class=\"mi\">5<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/120<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>In the above code, when the target, e.g. <code>factorial<\/code>, and its argument are passed into the Y combinator function, the Y combinator will execute <code>self =&gt; target(a =&gt; self (self)(a))<\/code>. When the target is executed, the callback <code>a =&gt; self(self)(a)<\/code> is passed to the <code>target<\/code> so that it can initiate the next recursive call. Keep in mind that <code>self<\/code> is a reference to the function <code>self =&gt; target(a =&gt; self(self)(a))<\/code>. <\/p>\n\n<p>When our <code>factorial<\/code> function receives the argument <code>5<\/code> (note that our target is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Currying\" rel=\"noopener noreferrer\">curried<\/a> in this example), it will execute the callback, passing in <code>4<\/code> for the parameter <code>a<\/code>. This will trigger a recursive call back into the target, and so on, until the terminating condition for the target function is reached. When our callback code executes, we need to pass a reference to to the handler as the first argument, hence the <code>self(self)<\/code> fragment in the above code. <\/p>\n\n<p>The Y combinator function is not something we expect to see being used in modern programming languages, since they have built-in support for recursion (at least for named functions). However, higher-order functions are an important part of the functional programming paradigm, so working out the details of how such a function behaves can still be a useful exercise. The general idea of composing functions along these lines is commonly applied in functional programming across a wide range of use-cases. <\/p>\n\n<p>We also gain insight into <a href=\"https:\/\/en.wikipedia.org\/wiki\/Lambda_calculus\" rel=\"noopener noreferrer\">lambda calculus<\/a>, a powerful mathematical framework for understanding computation. For example, We can completely inline the code we've written to show there are no free variables. While the code is not exactly readable when inlined this way, this gets us very close to the pure lambda calculus form for this logic:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight javascript\"><code><span class=\"p\">(<\/span><span class=\"nx\">target<\/span> <span class=\"o\">=&gt;<\/span>  <span class=\"nx\">args<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"p\">(<\/span><span class=\"nx\">f<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nf\">f<\/span><span class=\"p\">(<\/span><span class=\"nx\">f<\/span><span class=\"p\">))(<\/span><span class=\"nb\">self<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nf\">target<\/span><span class=\"p\">(<\/span><span class=\"nx\">a<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nf\">self<\/span><span class=\"p\">(<\/span><span class=\"nb\">self<\/span><span class=\"p\">)(<\/span><span class=\"nx\">a<\/span><span class=\"p\">)))(<\/span><span class=\"nx\">args<\/span><span class=\"p\">))(<\/span><span class=\"nx\">recurse<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">&gt;<\/span> <span class=\"mi\">1<\/span> <span class=\"p\">?<\/span> <span class=\"nx\">n<\/span> <span class=\"o\">*<\/span> <span class=\"nf\">recurse<\/span><span class=\"p\">(<\/span><span class=\"nx\">n<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"p\">:<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)(<\/span><span class=\"mi\">5<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/120<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h2>\n  \n  \n  References\n<\/h2>\n\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Fixed-point_combinator#Y_combinator\" rel=\"noopener noreferrer\">Y combinator<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Currying\" rel=\"noopener noreferrer\">Currying<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Lambda_calculus\" rel=\"noopener noreferrer\">Lambda calculus<\/a><\/li>\n<li><a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Glossary\/IIFE\" rel=\"noopener noreferrer\">IIFE<\/a><\/li>\n<\/ul>\n\n","category":["ycombinator","functional","beginners","javascript"]},{"title":"CSRF and Cross-Origin Requests by Example","pubDate":"Sun, 29 Nov 2020 06:38:41 +0000","link":"https:\/\/dev.to\/nestedsoftware\/csrf-and-cross-origin-requests-by-example-25nb","guid":"https:\/\/dev.to\/nestedsoftware\/csrf-and-cross-origin-requests-by-example-25nb","description":"<p>In this article, we will go over how a basic CSRF (cross-site request forgery) attack works and how a <a href=\"https:\/\/owasp.org\/www-community\/attacks\/csrf\" rel=\"noopener noreferrer\">CSRF token<\/a> prevents this type of attack. <\/p>\n\n<p>We will also show how the browser's <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/Security\/Same-origin_policy\" rel=\"noopener noreferrer\">same-origin policy<\/a> can prevent undesired cross-origin access to resources such as the CSRF token. <\/p>\n\n<p>The code for these examples is available on GitHub: <\/p>\n\n\n<div class=\"ltag-github-readme-tag\">\n  <div class=\"readme-overview\">\n    <h2>\n      <img src=\"https:\/\/assets.dev.to\/assets\/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg\" alt=\"GitHub logo\">\n      <a href=\"https:\/\/github.com\/nestedsoftware\" rel=\"noopener noreferrer\">\n        nestedsoftware\n      <\/a> \/ <a href=\"https:\/\/github.com\/nestedsoftware\/csrf\" rel=\"noopener noreferrer\">\n        csrf\n      <\/a>\n    <\/h2>\n    <h3>\n      csrf\/cors examples\n    <\/h3>\n  <\/div>\n  <div class=\"ltag-github-body\">\n    \n<div id=\"readme\" class=\"md\">\n<div class=\"markdown-heading\">\n<h1 class=\"heading-element\">How Cross-Origin Requests and CSRF Tokens Work<\/h1>\n<\/div>\n<p>The examples below show how the browser's <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/Security\/Same-origin_policy\" title=\"same-origin policy\" rel=\"nofollow noopener noreferrer\">same-origin policy<\/a> can prevent undesired cross-origin access to resources. It's important to understand that the browser enforces this policy on browser \"reads\", that is, on the responses sent back from the server to the browser (although the new <a href=\"https:\/\/blog.chromium.org\/2020\/02\/samesite-cookie-changes-in-february.html\" title=\"samesite cookie\" rel=\"nofollow noopener noreferrer\">samesite cookie<\/a> behaviour recently implemented in Chrome, described further down, appears to be a welcome exception that greatly improves security).<\/p>\n<p>These examples also show how an unguessable <a href=\"https:\/\/owasp.org\/www-community\/attacks\/csrf\" title=\"csrf\" rel=\"nofollow noopener noreferrer\">csrf token<\/a> bound to the user's session can prevent cross-origin form submissions from succeeding (note: be sure to refresh the csrf token <a href=\"https:\/\/security.stackexchange.com\/a\/22936\" title=\"issue new csrf token on principal-change inside a session\" rel=\"nofollow noopener noreferrer\">at login<\/a>). In such cases, the form is actually submitted, along with the relevant authorization cookies, but there should be no way for a third-party to access the secret csrf token or to programmatically tamper with the user's form fields (also see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Clickjacking#:~:text=Clickjacking%20(classified%20as%20a%20User,control%20of%20their%20computer%20while\" title=\"clickjacking\" rel=\"nofollow noopener noreferrer\">clickjacking<\/a>).<\/p>\n<p>In addition the what\u2026<\/p>\n<\/div>\n  <\/div>\n  <div class=\"gh-btn-container\"><a class=\"gh-btn\" href=\"https:\/\/github.com\/nestedsoftware\/csrf\" rel=\"noopener noreferrer\">View on GitHub<\/a><\/div>\n<\/div>\n\n\n<h2>\n  \n  \n  Set Up\n<\/h2>\n\n<p>These examples use a simple <a href=\"https:\/\/expressjs.com\/\" rel=\"noopener noreferrer\">Express<\/a> application running in a <a href=\"https:\/\/www.docker.com\/\" rel=\"noopener noreferrer\">docker<\/a> container. To get started, we need to run two web servers. We will consider the \"same-origin\" server to run on port <em>3000<\/em>. The \"cross-origin\" server will run on port <em>8000<\/em>. The idea here is that the cross-origin server serves code to the browser and this code then tries to access resources on the same-origin server - thus making a \"cross-origin\" request.<\/p>\n\n<blockquote>\n<p>A <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/Security\/Same-origin_policy\" rel=\"noopener noreferrer\">\"scheme\/host\/port tuple\"<\/a> is used to determine whether the destination for a request matches its origin. <\/p>\n<\/blockquote>\n\n<p>To get started, let's run our two servers:<\/p>\n\n<ul>\n<li>Run the same-origin container: <code>$ .\/run.sh<\/code>\n<\/li>\n<li>View logs for same-origin server: <code>$ docker logs --follow console-logging-server<\/code>\n<\/li>\n<li>Run the cross-origin container: <code>$ .\/run.sh console-logging-server-xorigin 8000<\/code>\n<\/li>\n<li>View logs for cross-origin server: <code>$ docker logs --follow console-logging-server-xorigin<\/code>\n<\/li>\n<\/ul>\n\n<h2>\n  \n  \n  A Basic CSRF Attack\n<\/h2>\n\n<p>The idea here is that we induce a user to open a malicious web site. This web site will either get the user to submit a form to a site they have already logged in to, or may even trigger the submission automatically. Traditionally, the browser would send along any cookies, including ones used for authentication, as part of that submission. As long as the user was already logged into the site, this would allow the malicious web site to trigger actions on behalf of the user without their awareness. CSRF tokens have been the standard method to prevent so-called CSRF attacks.<\/p>\n\n<p>As of this writing (November, 2020), a basic CSRF attack, even without CSRF token protection, <a href=\"https:\/\/blog.chromium.org\/2020\/02\/samesite-cookie-changes-in-february.html\" rel=\"noopener noreferrer\">will no longer work by default in the Chrome browser<\/a>. The screenshot below shows what happens when we try:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Faqe30obf1dtxfoccaobd.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Faqe30obf1dtxfoccaobd.png\" title=\"CSRF Attack Fails in Chrome\" alt=\"CSRF Attack Fails in Chrome\" width=\"705\" height=\"542\"><\/a><\/p>\n\n<p>For quite some time, the default behaviour has been to submit cookies automatically when a request against a given server is made, even if that request comes from code loaded from a different origin. However, the Chrome browser will no longer submit cookies via a cross-origin request by default. To support cross-origin cookie submission, the cookies must be marked with <code>SameSite=None<\/code> and <code>Secure<\/code> attributes. <\/p>\n\n<p>The basic demonstration of a CSRF attack below does currently work in Firefox (version 82.0.3 used for this example), although Firefox is also apparently looking into implementing such a restriction in the future. <\/p>\n\n<p>We will load a form from our cross-origin server on port <em>8000<\/em> and use JavaScript to submit that form to our server on port <em>3000<\/em>:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight html\"><code><span class=\"cp\">&lt;!DOCTYPE html&gt;<\/span>\n<span class=\"nt\">&lt;html&gt;<\/span>\n  <span class=\"nt\">&lt;head&gt;<\/span>\n    <span class=\"nt\">&lt;title&gt;<\/span>Submit form with JS (no csrf protection)<span class=\"nt\">&lt;\/title&gt;<\/span>\n    <span class=\"nt\">&lt;script&gt;<\/span>\n      <span class=\"nb\">document<\/span><span class=\"p\">.<\/span><span class=\"nf\">addEventListener<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">DOMContentLoaded<\/span><span class=\"dl\">\"<\/span><span class=\"p\">,<\/span> <span class=\"kd\">function<\/span><span class=\"p\">(<\/span><span class=\"nx\">event<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n        <span class=\"nb\">document<\/span><span class=\"p\">.<\/span><span class=\"nf\">getElementById<\/span><span class=\"p\">(<\/span><span class=\"dl\">'<\/span><span class=\"s1\">hackedForm<\/span><span class=\"dl\">'<\/span><span class=\"p\">).<\/span><span class=\"nf\">submit<\/span><span class=\"p\">();<\/span>\n      <span class=\"p\">});<\/span>\n    <span class=\"nt\">&lt;\/script&gt;<\/span>\n  <span class=\"nt\">&lt;\/head&gt;<\/span>\n  <span class=\"nt\">&lt;body&gt;<\/span>\n    <span class=\"nt\">&lt;form<\/span> <span class=\"na\">id=<\/span><span class=\"s\">\"hackedForm\"<\/span> <span class=\"na\">action=<\/span><span class=\"s\">\"http:\/\/localhost:3000\/save_no_csrf_protection\"<\/span> <span class=\"na\">method=<\/span><span class=\"s\">\"post\"<\/span><span class=\"nt\">&gt;<\/span>\n    <span class=\"nt\">&lt;label<\/span> <span class=\"na\">for=<\/span><span class=\"s\">\"name\"<\/span><span class=\"nt\">&gt;<\/span>\n    <span class=\"nt\">&lt;input<\/span> <span class=\"na\">type=<\/span><span class=\"s\">\"text\"<\/span> <span class=\"na\">id=<\/span><span class=\"s\">\"name\"<\/span> <span class=\"na\">name=<\/span><span class=\"s\">\"name\"<\/span> <span class=\"na\">value=<\/span><span class=\"s\">\"Hacked\"<\/span><span class=\"nt\">&gt;<\/span>\n    <span class=\"nt\">&lt;input<\/span> <span class=\"na\">type=<\/span><span class=\"s\">\"submit\"<\/span> <span class=\"na\">value=<\/span><span class=\"s\">\"Save\"<\/span><span class=\"nt\">&gt;<\/span>\n  <span class=\"nt\">&lt;\/body&gt;<\/span>\n<span class=\"nt\">&lt;\/html&gt;<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<ul>\n<li>To show that a normal form submission works (and to create the session cookie the malicious site will attempt to hijack): submit the form at <code>http:\/\/localhost:3000\/form<\/code>\n<\/li>\n<li>Next, to show that an unprotected cross-origin submission works, go to <code>http:\/\/127.0.0.1:8000\/submit_form_xorigin_no_csrf_protection.html<\/code> (note: cookies don't distinguish different ports on the same domain, so this trick prevents clobbering the original cookie produced by the legitimate interaction with localhost)<\/li>\n<li>Now, to show that a CSRF token will prevent the above attack, go to <code>http:\/\/127.0.0.1:8000\/submit_form_xorigin_with_csrf_protection.html<\/code>\n<\/li>\n<\/ul>\n\n<p>Below is a screenshot showing the results from the 3 scenarios above (note that the 2 cross-origin requests that are forced when the user accesses the malicious web site on port 8000 cause the user's session cookie to be automatically submitted):<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fw7fa0skz6a9y8kwlbwko.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fw7fa0skz6a9y8kwlbwko.png\" title=\"CSRF Attack Scenarios in Firefox\" alt=\"CSRF Attack Scenarios in Firefox\" width=\"800\" height=\"1032\"><\/a><\/p>\n\n<p>We can see that in the 3rd case, even though the session cookie gets submitted by the attacker, they don't have access to the CSRF token, so the form submission is rejected.<\/p>\n\n<h2>\n  \n  \n  Cross-Origin Access Protections\n<\/h2>\n\n<p>Next, let's take a look at some of the protections in place to prevent cross-origin access. After all, if we are to rely on a CSRF token to prevent CSRF attacks, we need to make sure the attacker can't just get the token and proceed with the attack anyway.<\/p>\n\n<p>To demonstrate that same-origin access works, enter the following into the browser's address field (check the browser console to make sure there are no errors):<\/p>\n\n<ul>\n<li><code>http:\/\/localhost:3000\/load_and_submit_form_with_fetch.html<\/code><\/li>\n<li><code>http:\/\/localhost:3000\/load_form_into_iframe.html<\/code><\/li>\n<li><code>http:\/\/localhost:3000\/load_form_into_iframe_no_embedding.html<\/code><\/li>\n<li>\n<code>http:\/\/localhost:3000\/jquery_run_and_try_to_load_source.html<\/code>\n<\/li>\n<\/ul>\n\n<h3>\n  \n  \n  Cross-Origin Form Load\/Submission\n<\/h3>\n\n<p>The following URL shows that loading and automatically submitting a form cross-origin doesn't work: <code>http:\/\/localhost:8000\/load_and_submit_form_with_fetch.html<\/code><\/p>\n\n<p>The code uses javascript to load the form from port <em>3000<\/em> into the dom, then updates a form field and submits the form:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight html\"><code><span class=\"cp\">&lt;!DOCTYPE html&gt;<\/span>\n<span class=\"nt\">&lt;html&gt;<\/span>\n  <span class=\"nt\">&lt;head&gt;<\/span>\n    <span class=\"nt\">&lt;title&gt;<\/span>Fetch and submit form with JS (try to get csrf token)<span class=\"nt\">&lt;\/title&gt;<\/span>\n    <span class=\"nt\">&lt;script&gt;<\/span>\n      <span class=\"nf\">fetch<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">http:\/\/localhost:3000\/form<\/span><span class=\"dl\">\"<\/span><span class=\"p\">)<\/span>\n      <span class=\"p\">.<\/span><span class=\"nf\">then<\/span><span class=\"p\">(<\/span><span class=\"nx\">r<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">r<\/span><span class=\"p\">.<\/span><span class=\"nf\">text<\/span><span class=\"p\">())<\/span>\n      <span class=\"p\">.<\/span><span class=\"nf\">then<\/span><span class=\"p\">(<\/span><span class=\"nx\">d<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"p\">{<\/span>\n        <span class=\"kd\">const<\/span> <span class=\"nx\">action<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">DOMParser<\/span><span class=\"p\">()<\/span>\n          <span class=\"p\">.<\/span><span class=\"nf\">parseFromString<\/span><span class=\"p\">(<\/span><span class=\"nx\">d<\/span><span class=\"p\">,<\/span> <span class=\"dl\">'<\/span><span class=\"s1\">text\/html<\/span><span class=\"dl\">'<\/span><span class=\"p\">)<\/span>\n          <span class=\"p\">.<\/span><span class=\"nx\">forms<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span>\n          <span class=\"p\">.<\/span><span class=\"nf\">getAttribute<\/span><span class=\"p\">(<\/span><span class=\"dl\">'<\/span><span class=\"s1\">action<\/span><span class=\"dl\">'<\/span><span class=\"p\">);<\/span>\n        <span class=\"kd\">const<\/span> <span class=\"nx\">csrfToken<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">DOMParser<\/span><span class=\"p\">()<\/span>\n          <span class=\"p\">.<\/span><span class=\"nf\">parseFromString<\/span><span class=\"p\">(<\/span><span class=\"nx\">d<\/span><span class=\"p\">,<\/span> <span class=\"dl\">'<\/span><span class=\"s1\">text\/html<\/span><span class=\"dl\">'<\/span><span class=\"p\">)<\/span>\n          <span class=\"p\">.<\/span><span class=\"nx\">forms<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span>\n          <span class=\"p\">.<\/span><span class=\"nx\">elements<\/span><span class=\"p\">[<\/span><span class=\"dl\">'<\/span><span class=\"s1\">csrfToken<\/span><span class=\"dl\">'<\/span><span class=\"p\">]<\/span>\n          <span class=\"p\">.<\/span><span class=\"nx\">value<\/span><span class=\"p\">;<\/span>\n\n        <span class=\"kd\">const<\/span> <span class=\"nx\">data<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nc\">URLSearchParams<\/span><span class=\"p\">();<\/span>\n        <span class=\"nx\">data<\/span><span class=\"p\">.<\/span><span class=\"nf\">append<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">name<\/span><span class=\"dl\">\"<\/span><span class=\"p\">,<\/span> <span class=\"dl\">\"<\/span><span class=\"s2\">injected name<\/span><span class=\"dl\">\"<\/span><span class=\"p\">);<\/span>\n        <span class=\"nx\">data<\/span><span class=\"p\">.<\/span><span class=\"nf\">append<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">csrfToken<\/span><span class=\"dl\">\"<\/span><span class=\"p\">,<\/span> <span class=\"nx\">csrfToken<\/span><span class=\"p\">);<\/span>\n\n        <span class=\"nf\">fetch<\/span><span class=\"p\">(<\/span><span class=\"dl\">'<\/span><span class=\"s1\">http:\/\/localhost:3000<\/span><span class=\"dl\">'<\/span> <span class=\"o\">+<\/span> <span class=\"nx\">action<\/span><span class=\"p\">,<\/span> <span class=\"p\">{<\/span>\n          <span class=\"na\">method<\/span><span class=\"p\">:<\/span> <span class=\"dl\">'<\/span><span class=\"s1\">POST<\/span><span class=\"dl\">'<\/span><span class=\"p\">,<\/span>\n          <span class=\"na\">body<\/span><span class=\"p\">:<\/span> <span class=\"nx\">data<\/span>\n        <span class=\"p\">})<\/span>\n        <span class=\"p\">.<\/span><span class=\"nf\">then<\/span><span class=\"p\">(<\/span><span class=\"nx\">r<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">console<\/span><span class=\"p\">.<\/span><span class=\"nf\">log<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">status: <\/span><span class=\"dl\">\"<\/span><span class=\"p\">,<\/span> <span class=\"nx\">r<\/span><span class=\"p\">.<\/span><span class=\"nx\">status<\/span><span class=\"p\">));<\/span>\n      <span class=\"p\">})<\/span>\n      <span class=\"p\">.<\/span><span class=\"k\">catch<\/span><span class=\"p\">(<\/span><span class=\"nx\">e<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">console<\/span><span class=\"p\">.<\/span><span class=\"nf\">log<\/span><span class=\"p\">(<\/span><span class=\"nx\">e<\/span><span class=\"p\">));<\/span>\n    <span class=\"nt\">&lt;\/script&gt;<\/span>\n  <span class=\"nt\">&lt;\/head&gt;<\/span>\n  <span class=\"nt\">&lt;body&gt;<\/span>\n  <span class=\"nt\">&lt;\/body&gt;<\/span>\n<span class=\"nt\">&lt;\/html&gt;<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Here is what happens:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fe79rf0grr4d1dk3yzgk9.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fe79rf0grr4d1dk3yzgk9.png\" alt=\"Browser blocks cross-origin request\" width=\"800\" height=\"306\"><\/a><\/p>\n\n<p>As we can see, the browser prevents the javascript from loading the form because it is a cross-origin request (we log an exception in the <code>fetch<\/code> call to the browser's console: <code>load_and_submit_form_with_fetch.html:30 TypeError: Failed to fetch<\/code>).<\/p>\n\n<p>It's important to understand that the browser does issue the <code>fetch<\/code> request to load the form and the server does send the form back to the browser, including any CSRF token (note: the <code>404<\/code> response is just because the \"favicon.ico\" file is missing).<\/p>\n\n<p>The wireshark trace for the <code>fetch<\/code> request is shown below:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fixq0e6efiz0mzx6jnepn.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fixq0e6efiz0mzx6jnepn.png\" alt=\"wireshark trace of fetch request being sent\" width=\"800\" height=\"666\"><\/a><\/p>\n\n<p>The wireshark trace for the response from the server is shown below:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgdcp6uv3hpxgr3o83rtf.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgdcp6uv3hpxgr3o83rtf.png\" alt=\"wireshark trace of response to fetch request\" width=\"800\" height=\"665\"><\/a><\/p>\n\n<p>However, the same-origin policy prevents this information from reaching the code that tries to access it.<\/p>\n\n<h3>\n  \n  \n  Cross-Origin IFrame\n<\/h3>\n\n<p>Let's see if cross-origin loading of a form into an iframe works: <code>http:\/\/localhost:8000\/load_form_into_iframe.html<\/code>.<\/p>\n\n<p>The HTML file loaded from the cross-origin server (<em>port 8000<\/em>) attempts to load the contents of the form at port <em>3000<\/em> into an iframe and to populate the contents of the form:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight html\"><code><span class=\"cp\">&lt;!DOCTYPE html&gt;<\/span>\n<span class=\"nt\">&lt;html&gt;<\/span>\n  <span class=\"nt\">&lt;head&gt;<\/span>\n    <span class=\"nt\">&lt;title&gt;<\/span>IFrame Form Loader<span class=\"nt\">&lt;\/title&gt;<\/span>\n    <span class=\"nt\">&lt;script&gt;<\/span>\n      <span class=\"nb\">document<\/span><span class=\"p\">.<\/span><span class=\"nf\">addEventListener<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">DOMContentLoaded<\/span><span class=\"dl\">\"<\/span><span class=\"p\">,<\/span> <span class=\"kd\">function<\/span><span class=\"p\">(<\/span><span class=\"nx\">event<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> \n        <span class=\"kd\">const<\/span> <span class=\"nx\">iframe<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">document<\/span><span class=\"p\">.<\/span><span class=\"nf\">getElementById<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">iframe<\/span><span class=\"dl\">\"<\/span><span class=\"p\">);<\/span>\n        <span class=\"nx\">iframe<\/span><span class=\"p\">.<\/span><span class=\"nf\">addEventListener<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">load<\/span><span class=\"dl\">\"<\/span><span class=\"p\">,<\/span> <span class=\"kd\">function<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span>\n          <span class=\"k\">try<\/span> <span class=\"p\">{<\/span>\n            <span class=\"kd\">const<\/span> <span class=\"nx\">formField<\/span> <span class=\"o\">=<\/span> <span class=\"nx\">iframe<\/span><span class=\"p\">.<\/span><span class=\"nx\">contentWindow<\/span><span class=\"p\">.<\/span><span class=\"nb\">document<\/span><span class=\"p\">.<\/span><span class=\"nf\">getElementById<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">name<\/span><span class=\"dl\">\"<\/span><span class=\"p\">);<\/span>  \n            <span class=\"k\">if <\/span><span class=\"p\">(<\/span><span class=\"nx\">formField<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n              <span class=\"nx\">formField<\/span><span class=\"p\">.<\/span><span class=\"nx\">value<\/span> <span class=\"o\">=<\/span> <span class=\"dl\">\"<\/span><span class=\"s2\">filled by JS code<\/span><span class=\"dl\">\"<\/span><span class=\"p\">;<\/span>\n            <span class=\"p\">}<\/span>\n          <span class=\"p\">}<\/span> <span class=\"k\">catch <\/span><span class=\"p\">(<\/span><span class=\"nx\">e<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n            <span class=\"nx\">console<\/span><span class=\"p\">.<\/span><span class=\"nf\">error<\/span><span class=\"p\">(<\/span><span class=\"nx\">e<\/span><span class=\"p\">);<\/span>\n          <span class=\"p\">}<\/span>\n          <span class=\"k\">try<\/span> <span class=\"p\">{<\/span>\n            <span class=\"kd\">const<\/span> <span class=\"nx\">csrfToken<\/span> <span class=\"o\">=<\/span> <span class=\"nx\">iframe<\/span><span class=\"p\">.<\/span><span class=\"nx\">contentWindow<\/span><span class=\"p\">.<\/span><span class=\"nb\">document<\/span><span class=\"p\">.<\/span><span class=\"nf\">getElementById<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">csrfToken<\/span><span class=\"dl\">\"<\/span><span class=\"p\">);<\/span>\n            <span class=\"k\">if <\/span><span class=\"p\">(<\/span><span class=\"nx\">csrfToken<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n              <span class=\"nx\">console<\/span><span class=\"p\">.<\/span><span class=\"nf\">log<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">csrfToken<\/span><span class=\"dl\">\"<\/span><span class=\"p\">,<\/span> <span class=\"nx\">csrfToken<\/span><span class=\"p\">.<\/span><span class=\"nx\">value<\/span><span class=\"p\">);<\/span>\n            <span class=\"p\">}<\/span>\n          <span class=\"p\">}<\/span> <span class=\"k\">catch <\/span><span class=\"p\">(<\/span><span class=\"nx\">e<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n            <span class=\"nx\">console<\/span><span class=\"p\">.<\/span><span class=\"nf\">error<\/span><span class=\"p\">(<\/span><span class=\"nx\">e<\/span><span class=\"p\">)<\/span>\n          <span class=\"p\">}<\/span>\n        <span class=\"p\">});<\/span>\n      <span class=\"p\">});<\/span>\n    <span class=\"nt\">&lt;\/script&gt;<\/span>\n  <span class=\"nt\">&lt;\/head&gt;<\/span>\n  <span class=\"nt\">&lt;body&gt;<\/span>\n    <span class=\"nt\">&lt;iframe<\/span> <span class=\"na\">id=<\/span><span class=\"s\">\"iframe\"<\/span> <span class=\"na\">src=<\/span><span class=\"s\">\"http:\/\/localhost:3000\/form\"<\/span> <span class=\"na\">title=<\/span><span class=\"s\">\"iframe tries to load form - hardcoded to port 3000\"<\/span><span class=\"nt\">&gt;<\/span>\n  <span class=\"nt\">&lt;\/body&gt;<\/span>\n<span class=\"nt\">&lt;\/html&gt;<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>The following wireshark trace shows that the request for the form is sent successfully:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvtdo05yggaqzos42lins.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvtdo05yggaqzos42lins.png\" alt=\"load form into iframe cross-origin request is sent\" width=\"800\" height=\"665\"><\/a><\/p>\n\n<p>The browser also receives the form successfully from the server:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjq50pdz316pspo37p807.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjq50pdz316pspo37p807.png\" alt=\"Load form into iframe cross-origin browser received response\" width=\"800\" height=\"664\"><\/a><\/p>\n\n<p>It's interesting to note that the cross-origin script is able to successfully load the form into an iframe. However, the same-origin policy prevents the script from reading the CSRF token or populating the form with data:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fd9lm72oqdbepgdveb7md.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fd9lm72oqdbepgdveb7md.png\" alt=\"Load form into iframe reading\/writing not allowed\" width=\"800\" height=\"325\"><\/a><\/p>\n\n<p>If the user fills out this form and submits it manually, it will work though, even when loaded cross-origin. <\/p>\n\n<p>This feels dangerous to me. We can add some headers to prevent the browser from allowing the form to be embedded by a cross-origin request in the first place:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight javascript\"><code><span class=\"nx\">app<\/span><span class=\"p\">.<\/span><span class=\"nf\">get<\/span><span class=\"p\">(<\/span><span class=\"dl\">'<\/span><span class=\"s1\">\/form_no_embedding<\/span><span class=\"dl\">'<\/span><span class=\"p\">,<\/span> <span class=\"p\">(<\/span><span class=\"nx\">req<\/span><span class=\"p\">,<\/span> <span class=\"nx\">res<\/span><span class=\"p\">)<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"p\">{<\/span>\n  <span class=\"nx\">console<\/span><span class=\"p\">.<\/span><span class=\"nf\">log<\/span><span class=\"p\">({<\/span> <span class=\"na\">url<\/span><span class=\"p\">:<\/span> <span class=\"nx\">req<\/span><span class=\"p\">.<\/span><span class=\"nx\">url<\/span><span class=\"p\">,<\/span> <span class=\"na\">method<\/span><span class=\"p\">:<\/span> <span class=\"nx\">req<\/span><span class=\"p\">.<\/span><span class=\"nx\">method<\/span><span class=\"p\">,<\/span> <span class=\"na\">headers<\/span><span class=\"p\">:<\/span> <span class=\"nx\">req<\/span><span class=\"p\">.<\/span><span class=\"nx\">headers<\/span> <span class=\"p\">});<\/span>\n  <span class=\"nx\">res<\/span><span class=\"p\">.<\/span><span class=\"nf\">header<\/span><span class=\"p\">(<\/span><span class=\"dl\">'<\/span><span class=\"s1\">X-Frame-Options<\/span><span class=\"dl\">'<\/span><span class=\"p\">,<\/span> <span class=\"dl\">'<\/span><span class=\"s1\">SAMEORIGIN<\/span><span class=\"dl\">'<\/span><span class=\"p\">);<\/span>\n  <span class=\"nx\">res<\/span><span class=\"p\">.<\/span><span class=\"nf\">header<\/span><span class=\"p\">(<\/span><span class=\"dl\">'<\/span><span class=\"s1\">Content-Security-Policy<\/span><span class=\"dl\">'<\/span><span class=\"p\">,<\/span> <span class=\"dl\">\"<\/span><span class=\"s2\">frame-ancestors 'self'<\/span><span class=\"dl\">\"<\/span><span class=\"p\">);<\/span>\n  <span class=\"nx\">res<\/span><span class=\"p\">.<\/span><span class=\"nf\">render<\/span><span class=\"p\">(<\/span><span class=\"dl\">'<\/span><span class=\"s1\">simple_form<\/span><span class=\"dl\">'<\/span><span class=\"p\">,<\/span> <span class=\"p\">{<\/span><span class=\"na\">csrfToken<\/span><span class=\"p\">:<\/span> <span class=\"nx\">req<\/span><span class=\"p\">.<\/span><span class=\"nx\">session<\/span><span class=\"p\">.<\/span><span class=\"nx\">csrfToken<\/span><span class=\"p\">});<\/span>\n<span class=\"p\">});<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>If we try the same technique on a form that has been protected by such headers, we see that the browser will not load the form into the iframe anymore. <code>http:\/\/localhost:8000\/load_form_into_iframe_no_embedding.html<\/code>: <\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Faxxey86kr3ei7cwv92rr.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Faxxey86kr3ei7cwv92rr.png\" alt=\"headers prevent cross-origin loading into iframe\" width=\"800\" height=\"312\"><\/a><\/p>\n\n<h3>\n  \n  \n  Script Tags\n<\/h3>\n\n<p>Script tags are interesting, in that the browser won't place restrictions on script execution. A script can include JavaScript code from another site, and that code will successfully execute. However, the page won't be able to access the source code of that script. The following code successfully executes a bit of <a href=\"https:\/\/jquery.com\/\" rel=\"noopener noreferrer\">jQuery<\/a> code loaded from the same-origin site:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight html\"><code><span class=\"cp\">&lt;!DOCTYPE html&gt;<\/span>\n<span class=\"nt\">&lt;html&gt;<\/span>\n  <span class=\"nt\">&lt;head&gt;<\/span>\n    <span class=\"nt\">&lt;title&gt;<\/span>jQuery: running always works x-origin, but not accessing source<span class=\"nt\">&lt;\/title&gt;<\/span>\n    <span class=\"nt\">&lt;script <\/span><span class=\"na\">id=<\/span><span class=\"s\">\"jq\"<\/span> <span class=\"na\">type=<\/span><span class=\"s\">\"text\/javascript\"<\/span> <span class=\"na\">src=<\/span><span class=\"s\">\"http:\/\/localhost:3000\/js\/jquery-3.5.1.js\"<\/span><span class=\"nt\">&gt;&lt;\/script&gt;<\/span>\n  <span class=\"nt\">&lt;\/head&gt;<\/span>\n  <span class=\"nt\">&lt;body&gt;<\/span>\n    <span class=\"nt\">&lt;div<\/span> <span class=\"na\">id=<\/span><span class=\"s\">\"execute_jquery\"<\/span><span class=\"nt\">&gt;&lt;\/div&gt;<\/span>\n    <span class=\"nt\">&lt;div<\/span> <span class=\"na\">id=<\/span><span class=\"s\">\"jquery_source_code\"<\/span><span class=\"nt\">&gt;&lt;\/div&gt;<\/span>\n    <span class=\"nt\">&lt;script&gt;<\/span>\n      <span class=\"nf\">$<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">#execute_jquery<\/span><span class=\"dl\">\"<\/span><span class=\"p\">).<\/span><span class=\"nx\">html<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">&lt;b&gt;I work with same origin and cross origin!&lt;\/b&gt;<\/span><span class=\"dl\">\"<\/span><span class=\"p\">);<\/span>\n    <span class=\"nt\">&lt;\/script&gt;<\/span>\n    <span class=\"nt\">&lt;script&gt;<\/span>\n      <span class=\"kd\">const<\/span> <span class=\"nx\">script<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">document<\/span><span class=\"p\">.<\/span><span class=\"nf\">getElementById<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">jq<\/span><span class=\"dl\">\"<\/span><span class=\"p\">);<\/span>\n      <span class=\"kd\">const<\/span> <span class=\"nx\">url<\/span> <span class=\"o\">=<\/span> <span class=\"nx\">script<\/span><span class=\"p\">.<\/span><span class=\"nx\">src<\/span><span class=\"p\">;<\/span>\n      <span class=\"nf\">fetch<\/span><span class=\"p\">(<\/span><span class=\"nx\">url<\/span><span class=\"p\">)<\/span>\n      <span class=\"p\">.<\/span><span class=\"nf\">then<\/span><span class=\"p\">(<\/span><span class=\"nx\">r<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">r<\/span><span class=\"p\">.<\/span><span class=\"nf\">text<\/span><span class=\"p\">())<\/span>\n      <span class=\"p\">.<\/span><span class=\"nf\">then<\/span><span class=\"p\">(<\/span><span class=\"nx\">d<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nb\">document<\/span><span class=\"p\">.<\/span><span class=\"nf\">getElementById<\/span><span class=\"p\">(<\/span><span class=\"dl\">\"<\/span><span class=\"s2\">jquery_source_code<\/span><span class=\"dl\">\"<\/span><span class=\"p\">).<\/span><span class=\"nx\">innerHTML<\/span> <span class=\"o\">=<\/span> <span class=\"nx\">d<\/span><span class=\"p\">)<\/span>\n      <span class=\"p\">.<\/span><span class=\"k\">catch<\/span><span class=\"p\">(<\/span><span class=\"nx\">error<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"nx\">console<\/span><span class=\"p\">.<\/span><span class=\"nf\">log<\/span><span class=\"p\">(<\/span><span class=\"nx\">error<\/span><span class=\"p\">));<\/span>\n    <span class=\"nt\">&lt;\/script&gt;<\/span>\n\n  <span class=\"nt\">&lt;\/body&gt;<\/span>\n<span class=\"nt\">&lt;\/html&gt;<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>However, the cross-origin request, <code>http:\/\/localhost:8000\/jquery_run_and_try_to_load_source.html<\/code>, cannot access the jQuery source code:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffabqhzot3csm6h501r9n.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffabqhzot3csm6h501r9n.png\" alt=\"source code of script tag cannot be accessed cross-origin\" width=\"800\" height=\"287\"><\/a><\/p>\n\n<p>When this same page is loaded from the same-origin server on port <em>3000<\/em>, the entire source code of jQuery is displayed on the page:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffpb9ac2yomrzl65rua5m.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffpb9ac2yomrzl65rua5m.png\" alt=\"source code of script tag cann be accessed same-origin\" width=\"800\" height=\"436\"><\/a><\/p>\n\n<p>When it is a cross-origin request though, the browser does not allow it.<\/p>\n\n<h2>\n  \n  \n  Conclusion\n<\/h2>\n\n<p>Hopefully this article has been helpful in clarifying how the browser's same-origin policy works together with CSRF tokens to prevent CSRF attacks. It's important to understand that the browser enforces this policy on browser \"reads\", that is, on the responses sent back from the server to the browser.<\/p>\n\n<p>Frankly, this approach of leaving it until the last moment to prevent malicious code from working strikes me as rather brittle. I welcome Chrome's new <a href=\"https:\/\/blog.chromium.org\/2020\/02\/samesite-cookie-changes-in-february.html\" rel=\"noopener noreferrer\">samesite cookie<\/a> behaviour mentioned earlier in the article. It seems much more secure. If all browsers implement this, perhaps in the future we can start getting away from needing such elaborate and error-prone protection measures. <\/p>\n\n<p>As an example of the kind of complexity we have to deal with when working with CSRF tokens, should we <a href=\"https:\/\/cheatsheetseries.owasp.org\/cheatsheets\/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html#synchronizer-token-pattern\" rel=\"noopener noreferrer\">refresh our CSRF tokens for each request<\/a>, as recommended by OWASP, despite various problems this creates with the browser's \"back\" button or with using multiple tabs? Or is it sufficient to set up the CSRF token at the session level? For the latter, be sure to refresh the csrf token <a href=\"https:\/\/security.stackexchange.com\/a\/22936\" rel=\"noopener noreferrer\">at login<\/a>. <\/p>\n\n<p>Separately from the discussion of CSRF in this article, when possible, it is a good idea to make cookies <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/HTTP\/Cookies#Creating_cookies\" rel=\"noopener noreferrer\">secure and httponly<\/a> as well as <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/HTTP\/Headers\/Set-Cookie\/SameSite\" rel=\"noopener noreferrer\">SameSite=strict<\/a>. While it is unrelated to this article, also please always remember to <a href=\"https:\/\/kevinsmith.io\/sanitize-your-inputs\" rel=\"noopener noreferrer\">sanitize web inputs<\/a> to ward off <a href=\"https:\/\/owasp.org\/www-community\/attacks\/xss\/\" rel=\"noopener noreferrer\">XSS attacks<\/a>.<\/p>\n\n<blockquote>\n<p>The examples in this article are meant to illustrate the basic concept of how CSRF tokens work . Please don't use the code in production. Instead, leverage a well-established library appropriate to the particular Web technology you are using.<\/p>\n<\/blockquote>\n\n","category":["security","csrf","html","javascript"]},{"title":"Tic-Tac-Toe with a Neural Network","pubDate":"Fri, 27 Dec 2019 05:15:18 +0000","link":"https:\/\/dev.to\/nestedsoftware\/tic-tac-toe-with-a-neural-network-1fjn","guid":"https:\/\/dev.to\/nestedsoftware\/tic-tac-toe-with-a-neural-network-1fjn","description":"<p>In <a href=\"https:\/\/dev.to\/nestedsoftware\/tic-tac-toe-with-tabular-q-learning-1kdn\">Tic-Tac-Toe with Tabular Q-learning<\/a>, we developed a tic-tac-toe agent using reinforcement learning. We used a table to assign a Q-value to each move from a given position. Training games were used to gradually nudge these Q-values in a direction that produced better results: Good results pulled the Q-values for the actions that led to those results higher, while poor results pushed them lower. In this article, instead of using tables, we'll apply the same idea of reinforcement learning to neural networks.<\/p>\n\n<h2>\n  \n  \n  Neural Network as a Function\n<\/h2>\n\n<p>We can think of the Q-table as a multivariable function: The input is a given tic-tac-toe position, and the output is a list of Q-values corresponding to each move from that position. We will endeavour to teach a neural network to approximate this function.<\/p>\n\n<p>For the input into our network, we'll flatten out the board position into an array of <em>9<\/em> values: <em>1<\/em> represents an <em>X<\/em>, <em>-1<\/em> represents an <em>O<\/em>, and <em>0<\/em> is an empty cell. The output layer will be an array of <em>9<\/em> values representing the Q-value for each possible move: A low value closer to <em>0<\/em> is bad, and a higher value closer to <em>1<\/em> is good. After training, the network will choose the move corresponding to the highest output value from this model. <\/p>\n\n<p>The diagram below shows the input and output for the given position after training (initially all of the values hover around <em>0.5<\/em>): <\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fs0zbi7n98kuthuole8h1.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fs0zbi7n98kuthuole8h1.png\" alt=\"neural network emulates the q-value function\"><\/a><\/p>\n\n<p>As we can see, the winning move for <em>X<\/em>, <em>A2<\/em>, has the highest Q-value, <em>0.998<\/em>, and the illegal moves have very low Q-values. The Q-values for the other legal moves are greater than the illegal ones, but less than the winning move. That's what we want.<\/p>\n\n<h2>\n  \n  \n  Model\n<\/h2>\n\n<p>The network (using PyTorch) has the following structure:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"k\">class<\/span> <span class=\"nc\">TicTacNet<\/span><span class=\"p\">(<\/span><span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"n\">Module<\/span><span class=\"p\">):<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">__init__<\/span><span class=\"p\">(<\/span><span class=\"n\">self<\/span><span class=\"p\">):<\/span>\n        <span class=\"nf\">super<\/span><span class=\"p\">().<\/span><span class=\"nf\">__init__<\/span><span class=\"p\">()<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">dl1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">9<\/span><span class=\"p\">,<\/span> <span class=\"mi\">36<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">dl2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">36<\/span><span class=\"p\">,<\/span> <span class=\"mi\">36<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">output_layer<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">36<\/span><span class=\"p\">,<\/span> <span class=\"mi\">9<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">forward<\/span><span class=\"p\">(<\/span><span class=\"n\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">dl1<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">relu<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">dl2<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">relu<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">output_layer<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">sigmoid<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">x<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>The <em>9<\/em> input values that represent the current board position are passed through two dense hidden layers of <em>36<\/em> neurons each, then to the output layer, which consists of <em>9<\/em> values, each corresponding to the Q-value for a given move <\/p>\n<h2>\n  \n  \n  Training\n<\/h2>\n\n<p>Most of the training logic for this agent is the same as for the Q-table implementation discussed earlier in this series. However, in that implementation, we prevented illegal moves. For the neural network, I decided to <em>teach<\/em> it not to make illegal moves, so as to have a more realistic set of output values for any given position. <\/p>\n\n<p>The code below, from <a href=\"https:\/\/github.com\/nestedsoftware\/tictac\/blob\/master\/tictac\/qneural.py\" rel=\"noopener noreferrer\">qneural.py<\/a>, shows how the parameters of the network are updated for a single training game:<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"k\">def<\/span> <span class=\"nf\">update_training_gameover<\/span><span class=\"p\">(<\/span><span class=\"n\">net_context<\/span><span class=\"p\">,<\/span> <span class=\"n\">move_history<\/span><span class=\"p\">,<\/span> <span class=\"n\">q_learning_player<\/span><span class=\"p\">,<\/span>\n                             <span class=\"n\">final_board<\/span><span class=\"p\">,<\/span> <span class=\"n\">discount_factor<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">game_result_reward<\/span> <span class=\"o\">=<\/span> <span class=\"nf\">get_game_result_value<\/span><span class=\"p\">(<\/span><span class=\"n\">q_learning_player<\/span><span class=\"p\">,<\/span> <span class=\"n\">final_board<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"c1\"># move history is in reverse-chronological order - last to first\n<\/span>    <span class=\"n\">next_position<\/span><span class=\"p\">,<\/span> <span class=\"n\">move_index<\/span> <span class=\"o\">=<\/span> <span class=\"n\">move_history<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span>\n\n    <span class=\"nf\">backpropagate<\/span><span class=\"p\">(<\/span><span class=\"n\">net_context<\/span><span class=\"p\">,<\/span> <span class=\"n\">next_position<\/span><span class=\"p\">,<\/span> <span class=\"n\">move_index<\/span><span class=\"p\">,<\/span> <span class=\"n\">game_result_reward<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"nf\">for <\/span><span class=\"p\">(<\/span><span class=\"n\">position<\/span><span class=\"p\">,<\/span> <span class=\"n\">move_index<\/span><span class=\"p\">)<\/span> <span class=\"ow\">in<\/span> <span class=\"nf\">list<\/span><span class=\"p\">(<\/span><span class=\"n\">move_history<\/span><span class=\"p\">)[<\/span><span class=\"mi\">1<\/span><span class=\"p\">:]:<\/span>\n        <span class=\"n\">next_q_values<\/span> <span class=\"o\">=<\/span> <span class=\"nf\">get_q_values<\/span><span class=\"p\">(<\/span><span class=\"n\">next_position<\/span><span class=\"p\">,<\/span> <span class=\"n\">net_context<\/span><span class=\"p\">.<\/span><span class=\"n\">target_net<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">qv<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">max<\/span><span class=\"p\">(<\/span><span class=\"n\">next_q_values<\/span><span class=\"p\">).<\/span><span class=\"nf\">item<\/span><span class=\"p\">()<\/span>\n\n        <span class=\"nf\">backpropagate<\/span><span class=\"p\">(<\/span><span class=\"n\">net_context<\/span><span class=\"p\">,<\/span> <span class=\"n\">position<\/span><span class=\"p\">,<\/span> <span class=\"n\">move_index<\/span><span class=\"p\">,<\/span> <span class=\"n\">discount_factor<\/span> <span class=\"o\">*<\/span> <span class=\"n\">qv<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">next_position<\/span> <span class=\"o\">=<\/span> <span class=\"n\">position<\/span>\n\n    <span class=\"n\">net_context<\/span><span class=\"p\">.<\/span><span class=\"n\">target_net<\/span><span class=\"p\">.<\/span><span class=\"nf\">load_state_dict<\/span><span class=\"p\">(<\/span><span class=\"n\">net_context<\/span><span class=\"p\">.<\/span><span class=\"n\">policy_net<\/span><span class=\"p\">.<\/span><span class=\"nf\">state_dict<\/span><span class=\"p\">())<\/span>\n\n\n<span class=\"k\">def<\/span> <span class=\"nf\">backpropagate<\/span><span class=\"p\">(<\/span><span class=\"n\">net_context<\/span><span class=\"p\">,<\/span> <span class=\"n\">position<\/span><span class=\"p\">,<\/span> <span class=\"n\">move_index<\/span><span class=\"p\">,<\/span> <span class=\"n\">target_value<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">net_context<\/span><span class=\"p\">.<\/span><span class=\"n\">optimizer<\/span><span class=\"p\">.<\/span><span class=\"nf\">zero_grad<\/span><span class=\"p\">()<\/span>\n    <span class=\"n\">output<\/span> <span class=\"o\">=<\/span> <span class=\"n\">net_context<\/span><span class=\"p\">.<\/span><span class=\"nf\">policy_net<\/span><span class=\"p\">(<\/span><span class=\"nf\">convert_to_tensor<\/span><span class=\"p\">(<\/span><span class=\"n\">position<\/span><span class=\"p\">))<\/span>\n\n    <span class=\"n\">target<\/span> <span class=\"o\">=<\/span> <span class=\"n\">output<\/span><span class=\"p\">.<\/span><span class=\"nf\">clone<\/span><span class=\"p\">().<\/span><span class=\"nf\">detach<\/span><span class=\"p\">()<\/span>\n    <span class=\"n\">target<\/span><span class=\"p\">[<\/span><span class=\"n\">move_index<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">target_value<\/span>\n    <span class=\"n\">illegal_move_indexes<\/span> <span class=\"o\">=<\/span> <span class=\"n\">position<\/span><span class=\"p\">.<\/span><span class=\"nf\">get_illegal_move_indexes<\/span><span class=\"p\">()<\/span>\n    <span class=\"k\">for<\/span> <span class=\"n\">mi<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">illegal_move_indexes<\/span><span class=\"p\">:<\/span>\n        <span class=\"n\">target<\/span><span class=\"p\">[<\/span><span class=\"n\">mi<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">LOSS_VALUE<\/span>\n\n    <span class=\"n\">loss<\/span> <span class=\"o\">=<\/span> <span class=\"n\">net_context<\/span><span class=\"p\">.<\/span><span class=\"nf\">loss_function<\/span><span class=\"p\">(<\/span><span class=\"n\">output<\/span><span class=\"p\">,<\/span> <span class=\"n\">target<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">loss<\/span><span class=\"p\">.<\/span><span class=\"nf\">backward<\/span><span class=\"p\">()<\/span>\n    <span class=\"n\">net_context<\/span><span class=\"p\">.<\/span><span class=\"n\">optimizer<\/span><span class=\"p\">.<\/span><span class=\"nf\">step<\/span><span class=\"p\">()<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>We maintain two networks, the policy network (<code>policy_net<\/code>) and the target network (<code>target_net<\/code>). We perform backpropagation on the policy network, but we obtain the maximum Q-value for the next state from the target network. That way, the Q-values obtained from the target network aren't changing during the course of training for a single game. Once we complete training for a game, we update the target network with the parameters of the policy network (<code>load_state_dict<\/code>).<\/p>\n\n<p><code>move_history<\/code> contains the Q-learning agent's moves for a single training game at a time. For the last move played by the Q-learning agent, we update its chosen move with the reward value for that game - <em>0<\/em> for a loss, and <em>1<\/em> for a win or a draw. Then we go through the remaining moves in the game history in reverse-chronological order. We tug the Q-value for the move that was played in the direction of the maximum Q-value from the next state (the next state is the state that results from the action taken in the current state).<\/p>\n\n<p>This is analogous to the exponential moving average used in the tabular Q-learning approach: In both cases, we are pulling the current value in the direction of the maximum Q-value available from the next state. For any illegal move from a given game position, we also provide negative feedback for that move as part of the backpropagation. That way, our network will hopefully learn not to make illegal moves.<\/p>\n<h2>\n  \n  \n  Results\n<\/h2>\n\n<p>The results are comparable to the tabular Q-learning agent. The following table (based on <em>1,000<\/em> games in each case) is representative of the results obtained after a typical training run:  <\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F0hy3jltlywsigep7zazh.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F0hy3jltlywsigep7zazh.png\" alt=\"qneural results\"><\/a><\/p>\n\n<p>These results were obtained from a model that learned from <em>2 million<\/em> training games for each of <em>X<\/em> and <em>O<\/em> (against an agent  making random moves). It takes over an hour to train this model on my PC. That's a huge increase over the number of games needed to train the tabular agent. <\/p>\n\n<p>I think this shows how essential large amounts of high-quality data are for deep learning, especially when we go from a toy example like this one to real-world problems. Of course the advantage of the neural network is that it can generalize - that is, it can handle inputs it has not seen during training (at least to some extent).<\/p>\n\n<p>With the tabular approach, there is no interpolation: The best we can do if we encounter a position we haven't seen before is to apply a heuristic. In games like go and chess, the number of positions is so huge that we can't even begin to store them all. We need an approach which can generalize, and that's where neural networks can really shine compared to prior techniques.<\/p>\n\n<p>Our network offers the same reward for a win as for a draw. I tried giving a smaller reward for a draw than a win, but even lowering the value for a draw to something like <em>0.95<\/em> seems to reduce the stability of the network. In particular, playing as <em>X<\/em>, the network can end up losing a significant number of games against the randomized minimax agent. Making the reward for a win and a draw the same seems to resolve this problem. <\/p>\n\n<p>Even though we give the same reward for a win and a draw, the agent seems to do a good job of winning games. I believe this is because winning a game usually ends it early, before all <em>9<\/em> cells on the board have been filled. This means there is less dilution of the reward going back through each move of the game history (the same idea applies for losses and illegal moves). On the other hand, a draw requires (by definition) all <em>9<\/em> moves to be played, which means that the rewards for the moves in a given game leading to a draw are more diluted as we go from one move to the previous one played by the Q-learning agent. Therefore, if a given move consistently leads to a win sooner, it will still have an advantage over a move that eventually leads to a draw. <\/p>\n<h2>\n  \n  \n  Network Topology and Hyperparameters\n<\/h2>\n\n<p>As mentioned earlier, this model has two hidden dense layers of <em>36<\/em> neurons each. <code>MSELoss<\/code> is used as the loss function and the learning rate is <em>0.1<\/em>. <code>relu<\/code> is used as the activation function for the hidden layers. <code>sigmoid<\/code> is used as the activation for the output layer, to squeeze the results into a range between <em>0<\/em> and <em>1<\/em>. <\/p>\n\n<p>Given the simplicity of the network, this design may seem self-evident. However, even for this simple case study, tuning this network was rather time consuming. At first, I tried using <code>tanh<\/code> (hyperbolic tangent) for the output layer - it made sense to me to set <em>-1<\/em> as the value for a loss and <em>1<\/em> as the value for a win. However, I was unable to get stable results with this activation function. Eventually, after trying several other ideas, I replaced it with <code>sigmoid<\/code>, which produced much better results. Similarly, replacing <code>relu<\/code> with something else in the hidden layers made the results worse. <\/p>\n\n<p>I also tried several different network topologies, with combinations of one, two, or three hidden layers, and using combinations of <em>9<\/em>, <em>18<\/em>, <em>27<\/em>, and <em>36<\/em> neurons per hidden layer. Lastly, I experimented with the number of training games, starting at <em>100,000<\/em> and gradually increasing that number to <em>2,000,000<\/em>, which seems to produce the most stable results.<\/p>\n<h2>\n  \n  \n  DQN\n<\/h2>\n\n<p>This implementation is inspired by DeepMind's DQN architecture (see <a href=\"https:\/\/storage.googleapis.com\/deepmind-media\/dqn\/DQNNaturePaper.pdf\" rel=\"noopener noreferrer\">Human-level control through deep reinforcement learning<\/a>), but it's not exactly the same. DeepMind used a convolutional network that took direct screen images as input. Here, I felt that the goal was to teach the network the core logic of tic-tac-toe, so I decided that simplifying the representation made sense. Removing the need to process the input as an image also meant fewer layers were needed (no layers to identify the visual features of the board), which sped up training.<\/p>\n\n<p>DeepMind's implementation also used <em>experience replay<\/em>, which applies random fragments of experiences as input to the network during training. My feeling was that generating fresh random games was simpler in this case.<\/p>\n\n<p>Can we call this tic-tac-toe implementation \"deep\" learning? I think this term is usually reserved for networks with at least three hidden layers, so probably not. I believe that increasing the number of layers tends to be more valuable with convolutional networks, where we can more clearly understand this as a process where each layer further abstracts the features identified in the previous layer, and where the number of parameters is reduced compared to dense layers. In any case, adding layers is something we should only do if it produces better results.<\/p>\n<h2>\n  \n  \n  Code\n<\/h2>\n\n<p>The full code is available on github (<a href=\"https:\/\/github.com\/nestedsoftware\/tictac\/blob\/master\/tictac\/qneural.py\" rel=\"noopener noreferrer\">qneural.py<\/a> and <a href=\"https:\/\/github.com\/nestedsoftware\/tictac\/blob\/master\/tictac\/main_qneural.py\" rel=\"noopener noreferrer\">main_qneural.py<\/a>):<\/p>\n\n\n<div class=\"ltag-github-readme-tag\">\n  <div class=\"readme-overview\">\n    <h2>\n      <img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg\" alt=\"GitHub logo\">\n      <a href=\"https:\/\/github.com\/nestedsoftware\" rel=\"noopener noreferrer\">\n        nestedsoftware\n      <\/a> \/ <a href=\"https:\/\/github.com\/nestedsoftware\/tictac\" rel=\"noopener noreferrer\">\n        tictac\n      <\/a>\n    <\/h2>\n    <h3>\n      Experimenting with different techniques for playing tic-tac-toe\n    <\/h3>\n  <\/div>\n  <div class=\"ltag-github-body\">\n    \n<div id=\"readme\" class=\"md\">\n<p>Demo project for different approaches for playing tic-tac-toe.<\/p>\n<p>Code requires python 3, numpy, and pytest. For the neural network\/dqn implementation (qneural.py), pytorch is required.<\/p>\n<p>Create virtual environment using pipenv:<\/p>\n<ul>\n<li><code>pipenv --site-packages<\/code><\/li>\n<\/ul>\n<p>Install using pipenv:<\/p>\n<ul>\n<li><code>pipenv shell<\/code><\/li>\n<li><code>pipenv install --dev<\/code><\/li>\n<\/ul>\n<p>Set <code>PYTHONPATH<\/code> to main project directory:<\/p>\n<ul>\n<li>In windows, run <code>path.bat<\/code>\n<\/li>\n<li>In bash run <code>source path.sh<\/code>\n<\/li>\n<\/ul>\n<p>Run tests and demo:<\/p>\n<ul>\n<li>Run tests: <code>pytest<\/code>\n<\/li>\n<li>Run demo: <code>python -m tictac.main<\/code>\n<\/li>\n<li>Run neural net demo: <code>python -m tictac.main_qneural<\/code>\n<\/li>\n<\/ul>\n<p>Latest results:<\/p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\">\n<pre class=\"notranslate\"><code>C:\\Dev\\python\\tictac&gt;python -m tictac.main\nPlaying random vs random\n-------------------------\nx wins: 60.10%\no wins: 28.90%\ndraw  : 11.00%\n\nPlaying minimax not random vs minimax random:\n---------------------------------------------\nx wins: 0.00%\no wins: 0.00%\ndraw  : 100.00%\n\nPlaying minimax random vs minimax not random:\n---------------------------------------------\nx wins: 0.00%\no wins: 0.00%\ndraw  : 100.00%\n\nPlaying minimax not random vs minimax not random:\n-------------------------------------------------\nx wins: 0.00%\no wins: 0.00%\ndraw  : 100.00%\n\nPlaying minimax random vs minimax random:<\/code><\/pre>\u2026<\/div>\n<\/div>\n  <\/div>\n  <div class=\"gh-btn-container\"><a class=\"gh-btn\" href=\"https:\/\/github.com\/nestedsoftware\/tictac\" rel=\"noopener noreferrer\">View on GitHub<\/a><\/div>\n<\/div>\n\n\n\n<h2>\n  \n  \n  Related\n<\/h2>\n\n<ul>\n<li><a href=\"https:\/\/dev.to\/nestedsoftware\/tic-tac-toe-with-tabular-q-learning-1kdn\">Tic-Tac-Toe with Tabular Q-Learning<\/a><\/li>\n<li><a href=\"https:\/\/dev.to\/nestedsoftware\/neural-networks-primer-374i\">Neural Networks Primer<\/a><\/li>\n<li><a href=\"https:\/\/dev.to\/nestedsoftware\/pytorch-image-recognition-dense-network-3nbd\">PyTorch Image Recognition with Dense Network<\/a><\/li>\n<\/ul>\n\n<h2>\n  \n  \n  References\n<\/h2>\n\n<ul>\n<li><a href=\"https:\/\/storage.googleapis.com\/deepmind-media\/dqn\/DQNNaturePaper.pdf\" rel=\"noopener noreferrer\">Human-level control through deep reinforcement learning<\/a><\/li>\n<\/ul>\n\n","category":["tictactoe","neuralnetworks","pytorch","python"]},{"title":"Card with expand-on-hover effect","pubDate":"Tue, 05 Nov 2019 18:48:36 +0000","link":"https:\/\/dev.to\/nestedsoftware\/card-with-expand-on-hover-effect-2ccm","guid":"https:\/\/dev.to\/nestedsoftware\/card-with-expand-on-hover-effect-2ccm","description":"<p>I recently came across a cool effect on a blog (I believe the original design came from the <a href=\"https:\/\/ghost.org\/blog\/\" rel=\"noopener noreferrer\">Ghost<\/a> platform). When you hover over a card that links to an article, there's a transition that expands the card slightly - it goes back to its original size when you move the mouse away from it. <\/p>\n\n<p>I tend to appreciate simple, minimalist designs that don't overwhelm the user. I avoid in-your-face effects, transitions, and animations. Here however, the effect is subtle, yet I find that it adds a nice touch of sophistication to the design. <\/p>\n\n<p>In addition to the hover effect, I liked this card design, so I reverse-engineered it from scratch, using flexbox for layout. <\/p>\n\n<p>Below is the result of my efforts in codepen:<\/p>\n\n<p><iframe height=\"600\" src=\"https:\/\/codepen.io\/nestedsoftware\/embed\/eYYVbNB?height=600&amp;default-tab=result&amp;embed-version=2\">\n<\/iframe>\n<\/p>\n\n<h2>\n  \n  \n  Hover Effect\n<\/h2>\n\n<p>The hover effect is achieved with the following CSS:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight css\"><code><span class=\"nc\">.fancy_card<\/span><span class=\"nd\">:hover<\/span> <span class=\"p\">{<\/span>\n  <span class=\"nl\">transform<\/span><span class=\"p\">:<\/span> <span class=\"n\">translate3D<\/span><span class=\"p\">(<\/span><span class=\"m\">0<\/span><span class=\"p\">,<\/span><span class=\"m\">-1px<\/span><span class=\"p\">,<\/span><span class=\"m\">0<\/span><span class=\"p\">)<\/span> <span class=\"n\">scale<\/span><span class=\"p\">(<\/span><span class=\"m\">1.03<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>I got this CSS from the original site. I think it's quite clever: Not only do we expand the card slightly, but we also slide it upward a little bit at the same time. <\/p>\n\n<blockquote>\n<p>This effect works smoothly in current versions of Chrome and Firefox, but it looks choppy in Edge.<\/p>\n<\/blockquote>\n\n<h2>\n  \n  \n  Box Shadow\n<\/h2>\n\n<p>I also got the following parameters from the original site:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight css\"><code><span class=\"nc\">.fancy_card<\/span> <span class=\"p\">{<\/span>\n  <span class=\"nl\">box-shadow<\/span><span class=\"p\">:<\/span> <span class=\"m\">8px<\/span> <span class=\"m\">14px<\/span> <span class=\"m\">38px<\/span> <span class=\"n\">rgba<\/span><span class=\"p\">(<\/span><span class=\"m\">39<\/span><span class=\"p\">,<\/span><span class=\"m\">44<\/span><span class=\"p\">,<\/span><span class=\"m\">49<\/span><span class=\"p\">,<\/span><span class=\"m\">.06<\/span><span class=\"p\">),<\/span> <span class=\"m\">1px<\/span> <span class=\"m\">3px<\/span> <span class=\"m\">8px<\/span> <span class=\"n\">rgba<\/span><span class=\"p\">(<\/span><span class=\"m\">39<\/span><span class=\"p\">,<\/span><span class=\"m\">44<\/span><span class=\"p\">,<\/span><span class=\"m\">49<\/span><span class=\"p\">,<\/span><span class=\"m\">.03<\/span><span class=\"p\">);<\/span>\n  <span class=\"nl\">transition<\/span><span class=\"p\">:<\/span> <span class=\"n\">all<\/span> <span class=\"m\">.5s<\/span> <span class=\"n\">ease<\/span><span class=\"p\">;<\/span> <span class=\"c\">\/* back to normal *\/<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"nc\">.fancy_card<\/span><span class=\"nd\">:hover<\/span> <span class=\"p\">{<\/span>\n  <span class=\"nl\">box-shadow<\/span><span class=\"p\">:<\/span> <span class=\"m\">8px<\/span> <span class=\"m\">28px<\/span> <span class=\"m\">50px<\/span> <span class=\"n\">rgba<\/span><span class=\"p\">(<\/span><span class=\"m\">39<\/span><span class=\"p\">,<\/span><span class=\"m\">44<\/span><span class=\"p\">,<\/span><span class=\"m\">49<\/span><span class=\"p\">,<\/span><span class=\"m\">.07<\/span><span class=\"p\">),<\/span> <span class=\"m\">1px<\/span> <span class=\"m\">6px<\/span> <span class=\"m\">12px<\/span> <span class=\"n\">rgba<\/span><span class=\"p\">(<\/span><span class=\"m\">39<\/span><span class=\"p\">,<\/span><span class=\"m\">44<\/span><span class=\"p\">,<\/span><span class=\"m\">49<\/span><span class=\"p\">,<\/span><span class=\"m\">.04<\/span><span class=\"p\">);<\/span>\n  <span class=\"nl\">transition<\/span><span class=\"p\">:<\/span> <span class=\"n\">all<\/span> <span class=\"m\">.4s<\/span> <span class=\"n\">ease<\/span><span class=\"p\">;<\/span> <span class=\"c\">\/* zoom in *\/<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>I like the application of two box shadows (separated by commas), and how the box shadow expands when hovering over a card. Note also the slightly different timing for the forward and back transitions. I think these kinds of subtle cues aren't noticeable at a conscious level, but they contribute to an overall sense of quality when using a well-designed site.<\/p>\n\n<h2>\n  \n  \n  Centering\n<\/h2>\n\n<p>Below are a few more notes on the CSS design. I like how flexbox makes centering simple, both horizontally and vertically. The CSS below centers the card in the window:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight css\"><code><span class=\"nc\">.container<\/span> <span class=\"p\">{<\/span>\n  <span class=\"nl\">display<\/span><span class=\"p\">:<\/span> <span class=\"n\">flex<\/span><span class=\"p\">;<\/span>\n  <span class=\"nl\">min-height<\/span><span class=\"p\">:<\/span> <span class=\"m\">100vh<\/span><span class=\"p\">;<\/span> <span class=\"c\">\/* expand height to center contents *\/<\/span>\n  <span class=\"nl\">height<\/span><span class=\"p\">:<\/span> <span class=\"m\">100vh<\/span><span class=\"p\">;<\/span>\n  <span class=\"nl\">justify-content<\/span><span class=\"p\">:<\/span> <span class=\"nb\">center<\/span><span class=\"p\">;<\/span> <span class=\"c\">\/* center horizontally *\/<\/span>\n  <span class=\"nl\">align-items<\/span><span class=\"p\">:<\/span> <span class=\"nb\">center<\/span><span class=\"p\">;<\/span> <span class=\"c\">\/* center vertically*\/<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>The following CSS vertically aligns the user's profile image and the reading duration text in the footer of the card:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight css\"><code><span class=\"nc\">.card_footer<\/span> <span class=\"p\">{<\/span>\n  <span class=\"nl\">display<\/span><span class=\"p\">:<\/span> <span class=\"n\">flex<\/span><span class=\"p\">;<\/span>\n  <span class=\"nl\">flex-direction<\/span><span class=\"p\">:<\/span> <span class=\"n\">row<\/span><span class=\"p\">;<\/span>\n  <span class=\"nl\">flex-wrap<\/span><span class=\"p\">:<\/span> <span class=\"n\">wrap<\/span><span class=\"p\">;<\/span>\n  <span class=\"nl\">align-items<\/span><span class=\"p\">:<\/span> <span class=\"nb\">center<\/span><span class=\"p\">;<\/span> <span class=\"c\">\/* vertically align content *\/<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<h2>\n  \n  \n  Header Image\n<\/h2>\n\n<p>I found that my header image was expanding beyond the boundaries of its container and hiding the rounded corners. This can be fixed by applying <code>overflow: hidden<\/code> to its parent:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight css\"><code><span class=\"nc\">.fancy_card<\/span> <span class=\"p\">{<\/span>\n  <span class=\"nl\">overflow<\/span><span class=\"p\">:<\/span> <span class=\"nb\">hidden<\/span><span class=\"p\">;<\/span> <span class=\"c\">\/* otherwise header image won't respect rounded corners *\/<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>I also discovered that the header image got stretched out vertically and did not respect its aspect ratio. With a bit of searching, I found a solution that seems to work:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight css\"><code><span class=\"nc\">.card_image<\/span> <span class=\"p\">{<\/span>\n  <span class=\"nl\">width<\/span><span class=\"p\">:<\/span> <span class=\"m\">100%<\/span><span class=\"p\">;<\/span> <span class=\"c\">\/* forces image to maintain its aspect ratio; otherwise image stretches vertically *\/<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n\n<p>Surprisingly, this change alone seems to solve the problem (at least for modern browsers). <\/p>\n\n<p>The complete HTML\/CSS is available on <a href=\"https:\/\/codepen.io\/nestedsoftware\/pen\/eYYVbNB\" rel=\"noopener noreferrer\">CodePen<\/a>, so feel free to take a look if you're interested.<\/p>\n\n","category":["css","html","showdev"]},{"title":"I can has web site?","pubDate":"Mon, 04 Nov 2019 01:31:02 +0000","link":"https:\/\/dev.to\/nestedsoftware\/i-can-has-web-site-4loc","guid":"https:\/\/dev.to\/nestedsoftware\/i-can-has-web-site-4loc","description":"<p>After thinking about doing it for a while, I finally went ahead and created a <a href=\"https:\/\/nestedsoftware.com\" rel=\"noopener noreferrer\">landing page<\/a> for myself, along with a self-hosted copy of my DEV.to <a href=\"https:\/\/dev.to\/nestedsoftware\">blog<\/a>. <\/p>\n\n<p>There are a number of options for doing this. Recently <a href=\"https:\/\/www.stackbit.com\/\" rel=\"noopener noreferrer\">Stackbit<\/a> has created a service that allows DEV.to users to automatically generate a copy of their blog: <\/p>\n\n\n<div class=\"ltag__link\">\n  <a href=\"\/devteam\" class=\"ltag__link__link\">\n    <div class=\"ltag__link__org__pic\">\n      <img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F1%2Fd908a186-5651-4a5a-9f76-15200bc6801f.jpg\" alt=\"The DEV Team\" width=\"800\" height=\"800\">\n      <div class=\"ltag__link__user__pic\">\n        <img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1%2Fbabb96d0-9cd2-49bc-a412-2dc4caf94c2a.png\" alt=\"\" width=\"800\" height=\"800\">\n      <\/div>\n    <\/div>\n  <\/a>\n  <a href=\"https:\/\/dev.to\/devteam\/you-can-now-generate-self-hostable-static-blogs-right-from-your-dev-content-via-stackbit-7a5\" class=\"ltag__link__link\">\n    <div class=\"ltag__link__content\">\n      <h2>You can now generate self-hostable static blogs right from your DEV content via Stackbit<\/h2>\n      <h3>Ben Halpern for The DEV Team \u30fb Sep 26 '19<\/h3>\n      <div class=\"ltag__link__taglist\">\n        <span class=\"ltag__link__tag\">#meta<\/span>\n        <span class=\"ltag__link__tag\">#projectbenatar<\/span>\n        <span class=\"ltag__link__tag\">#webdev<\/span>\n        <span class=\"ltag__link__tag\">#changelog<\/span>\n      <\/div>\n    <\/div>\n  <\/a>\n<\/div>\n\n\n<p>I tried this service out, and it's pretty cool! You can generate a blog hosted on <a href=\"https:\/\/www.netlify.com\/\" rel=\"noopener noreferrer\">Netlify<\/a> with just a few clicks. <\/p>\n\n<p>While I did model my list of articles on one of Stackbit's themes, ultimately I decided to set up my own site. With generated sites, the CSS they use for layout can get a bit messy, which makes changing it more difficult. I ended up writing the CSS for my site from scratch. Also, the idea of creating something which didn't have dependencies on another service appealed to me. <\/p>\n\n<h2>\n  \n  \n  Python Scripts\n<\/h2>\n\n<p>The first step was to download the contents of my DEV.to blog. To this end, I wrote a few Python scripts: <code>download_articles.py<\/code> uses DEV.to's <a href=\"https:\/\/docs.dev.to\/api\/\">REST api<\/a> to download the markdown for my published articles; <code>download_images.py<\/code> then downloads all of the images used in these articles; <code>copy_and_transform.py<\/code> creates a copy of the original content, using regular expressions to apply some transformations to the markdown. There's a master script, <code>main.py<\/code> which runs all of the above scripts. If you're interested in taking a look, you can find a copy of this code on GitHub:<\/p>\n\n\n<div class=\"ltag-github-readme-tag\">\n  <div class=\"readme-overview\">\n    <h2>\n      <img src=\"https:\/\/assets.dev.to\/assets\/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg\" alt=\"GitHub logo\">\n      <a href=\"https:\/\/github.com\/nestedsoftware\" rel=\"noopener noreferrer\">\n        nestedsoftware\n      <\/a> \/ <a href=\"https:\/\/github.com\/nestedsoftware\/markdown_manager\" rel=\"noopener noreferrer\">\n        markdown_manager\n      <\/a>\n    <\/h2>\n    <h3>\n      scripts to manage dev.to blog articles\n    <\/h3>\n  <\/div>\n  <div class=\"ltag-github-body\">\n    \n<div id=\"readme\" class=\"md\">\n<p>Scripts that download and transform dev.to blog articles.<\/p>\n<p>The <code>main.py<\/code> script runs three subordinate scripts, <code>download_articles.py<\/code>, <code>download_images.py<\/code>, and <code>copy_and_transform.py<\/code>. The following will run the main script, which downloads articles, images, and then makes copies which transform the original markdown content:<\/p>\n<ul>\n<li><code>python main.py &lt;username&gt; --root &lt;root dir&gt; --download_dir &lt;download dir&gt; --transformed_dir &lt;transform dir&gt; --article &lt;article name&gt;<\/code><\/li>\n<\/ul>\n<p>Options:<\/p>\n<ul>\n<li>\n<code>username<\/code> refers to the dev.to user whose articles will be downloaded<\/li>\n<li>\n<code>root dir<\/code> determines which base path the files are downloaded to - defaults to the current directory.<\/li>\n<li>\n<code>download dir<\/code> is the directory to which the markdown files and image files will be downloaded - defaults to <code>downloaded_files<\/code>\n<\/li>\n<li>\n<code>transform dir<\/code> will contain a copy of the contents of <code>download dir<\/code> with changes applied to the markdown (localizes links and fixes some markdown to work with jekyll) - defaults to <code>transformed_files<\/code>\n<\/li>\n<li>Once this script has been run for all articles, the <code>--article<\/code>\u2026<\/li>\n<\/ul>\n<\/div>\n  <\/div>\n  <div class=\"gh-btn-container\"><a class=\"gh-btn\" href=\"https:\/\/github.com\/nestedsoftware\/markdown_manager\" rel=\"noopener noreferrer\">View on GitHub<\/a><\/div>\n<\/div>\n\n\n<p>I wrote this code for my own purposes, so I can't guarantee that it will work for everyone else. I did run the scripts against <a class=\"mentioned-user\" href=\"https:\/\/dev.to\/ben\">@ben<\/a>'s posts and confirmed that they don't crash.<\/p>\n\n<h2>\n  \n  \n  Jekyll Static Site Generator\n<\/h2>\n\n<p>Next, I set up <a href=\"https:\/\/jekyllrb.com\/\" rel=\"noopener noreferrer\">Jekyll<\/a> to generate the HTML from these markdown files. I considered other generators like <a href=\"https:\/\/gohugo.io\/\" rel=\"noopener noreferrer\">Hugo<\/a> or <a href=\"https:\/\/www.gatsbyjs.org\/\" rel=\"noopener noreferrer\">Gatsby<\/a>, but since DEV.to and Jekyll both use liquid tags and have similar formatting for front matter, Jekyll seemed like the most natural choice. <\/p>\n\n<p>I set up the Python scripts to produce a Jekyll-compatible directory structure, with articles going into the <code>_posts<\/code> folder, and images going into <code>assets\/images<\/code>. The output from the scripts is then copied to the corresponding folders in the Jekyll project:<\/p>\n\n\n<div class=\"ltag-github-readme-tag\">\n  <div class=\"readme-overview\">\n    <h2>\n      <img src=\"https:\/\/assets.dev.to\/assets\/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg\" alt=\"GitHub logo\">\n      <a href=\"https:\/\/github.com\/nestedsoftware\" rel=\"noopener noreferrer\">\n        nestedsoftware\n      <\/a> \/ <a href=\"https:\/\/github.com\/nestedsoftware\/nestedsoftware_jekyll_blog\" rel=\"noopener noreferrer\">\n        nestedsoftware_jekyll_blog\n      <\/a>\n    <\/h2>\n    <h3>\n      \n    <\/h3>\n  <\/div>\n  <div class=\"ltag-github-body\">\n    \n<div id=\"readme\" class=\"md\">\n<p>Jekyll project for my professional landing page and blog.<\/p>\n<p>To install (Ruby must already be installed):<\/p>\n<ul>\n<li>bundle install<\/li>\n<\/ul>\n<p>To build:<\/p>\n<ul>\n<li>bundle exec jekyll build<\/li>\n<\/ul>\n<p>The generated site will be in the <code>_site<\/code> folder.<\/p>\n<p>To serve a local copy:<\/p>\n<ul>\n<li>bundle exec jekyll serve<\/li>\n<\/ul>\n<\/div>\n\n\n\n<\/div>\n<br>\n  <div class=\"gh-btn-container\"><a class=\"gh-btn\" href=\"https:\/\/github.com\/nestedsoftware\/nestedsoftware_jekyll_blog\" rel=\"noopener noreferrer\">View on GitHub<\/a><\/div>\n<br>\n<\/div>\n<br>\n\n\n<p>Jekyll has been quite helpful for several things: I'm using DEV.to's support for a series of articles in a few places, and I found a bit of Jekyll <a href=\"https:\/\/github.com\/realjenius\/site-samples\/blob\/master\/2012-11-03-jekyll-series-list\/series.html\" rel=\"noopener noreferrer\">template code<\/a> to handle this. <\/p>\n\n<p>I am also using the following plugins:<\/p>\n\n<ul>\n<li>\n<a href=\"https:\/\/github.com\/jekyll\/jekyll-gist\" rel=\"noopener noreferrer\">jekyll-gist<\/a> <\/li>\n<li><a href=\"https:\/\/github.com\/rmcfadzean\/jekyll-codepen\" rel=\"noopener noreferrer\">jekyll-codepen<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/jekyll\/jekyll-sitemap\" rel=\"noopener noreferrer\">jekyll-sitemap<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/jekyll\/jekyll-seo-tag\" rel=\"noopener noreferrer\">jekyll-seo-tag<\/a><\/li>\n<\/ul>\n\n<p>I've written a custom plugin as well, <a href=\"https:\/\/github.com\/nestedsoftware\/nestedsoftware_jekyll_blog\/blob\/master\/_plugins\/github_readme_tag.rb\" rel=\"noopener noreferrer\">github_readme_tag.rb<\/a> to support embedding a preview of GitHub projects.<\/p>\n\n<p>Syntax highlighting has been a very useful aspect of using Jekyll. I simply downloaded the appropriate <a href=\"https:\/\/github.com\/jwarby\/jekyll-pygments-themes\" rel=\"noopener noreferrer\">CSS theme<\/a> for monokai, which tends to be my go-to theme, and voil\u00e0, I had a nice-looking display for code examples. <\/p>\n\n<h2>\n  \n  \n  Design\n<\/h2>\n\n<p>One thing I found to be important was to design the appearance and layout separately from processing the content. I saved a couple of articles to a separate folder, and used that to create the CSS and HTML for the main components of my site: The landing page, the list of articles, and each individual article. This allowed me to focus on getting things to look the way I wanted them to. Once I felt this part was ready, I incorporated it into the Jekyll project.<\/p>\n\n<h2>\n  \n  \n  Comments\n<\/h2>\n\n<p>Supporting comments seems to be a rather hairy area. I found a neat little project called <a href=\"https:\/\/utteranc.es\/\" rel=\"noopener noreferrer\">utterances<\/a> which I've incorporated into my article template. Reader comments are posted as issues to a dedicated GitHub project - <a href=\"https:\/\/github.com\/nestedsoftware\/blog_comments\" rel=\"noopener noreferrer\">blog_comments<\/a> in my case. It does require commenters to have a <a href=\"https:\/\/github.com\/\" rel=\"noopener noreferrer\">GitHub<\/a> account, but I like the simplicity and transparency of this solution. <\/p>\n\n<h2>\n  \n  \n  Results\n<\/h2>\n\n<p>For the time being, I've deployed my small site to GitHub pages:<\/p>\n\n<ul>\n<li>\n<a href=\"https:\/\/nestedsoftware.com\" rel=\"noopener noreferrer\">https:\/\/nestedsoftware.com<\/a> <\/li>\n<\/ul>\n\n<p>I used <a href=\"https:\/\/domains.google\" rel=\"noopener noreferrer\">Google Domains<\/a> for the custom domain registration. Even though I'm not a designer, I'm pretty happy with the results, and it feels good to have a self-hosted version of my blog, as well as a central spot for my online presence. <\/p>\n\n<p>Writing some of these articles has been a lot of work, and it's been gnawing at the back of my mind that if something catastrophic were to happen to DEV.to, I would lose a huge amount of work! I know, realistically, it's not going to happen, but it still gives me some peace of mind knowing that I've got a back up. <\/p>\n\n<p>As a matter of principle, it's probably also worthwhile to host one's own blog. If you decide to do this, don't forget to update the <code>canonical_url<\/code> in the DEV.to front matter to point to your version of the same article. This applies for any other places where you may host copies of the same articles (see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Canonical_link_element\" rel=\"noopener noreferrer\">canonical link element<\/a> for reference).<\/p>\n\n","category":["python","jekyll","showdev"]},{"title":"Dev.to API questions","pubDate":"Tue, 01 Oct 2019 15:50:37 +0000","link":"https:\/\/dev.to\/nestedsoftware\/dev-to-api-questions-1jf5","guid":"https:\/\/dev.to\/nestedsoftware\/dev-to-api-questions-1jf5","description":"<p>I'm writing some scripts to help me manage my dev.to articles. I was wondering how to do the following things using an API (if possible):<\/p>\n\n<ul>\n<li>Post a new article (by sending a markdown file)<\/li>\n<li>Update an existing article<\/li>\n<li>Submit an image and get back a URL to that image<\/li>\n<\/ul>\n\n","category":"help"},{"title":"Incremental Average and Standard Deviation with Sliding Window","pubDate":"Thu, 26 Sep 2019 15:58:04 +0000","link":"https:\/\/dev.to\/nestedsoftware\/incremental-average-and-standard-deviation-with-sliding-window-470k","guid":"https:\/\/dev.to\/nestedsoftware\/incremental-average-and-standard-deviation-with-sliding-window-470k","description":"<p>I was pleasantly surprised recently to get a question from a reader about a couple of my articles, <a href=\"https:\/\/dev.to\/nestedsoftware\/calculating-a-moving-average-on-streaming-data-5a7k\">Calculating a Moving Average on Streaming Data<\/a> and <a href=\"https:\/\/dev.to\/nestedsoftware\/calculating-standard-deviation-on-streaming-data-253l\">Calculating Standard Deviation on Streaming Data<\/a>. The question was, <em>instead of updating the statistics cumulatively, would it be possible to consider only a window of fixed size instead?<\/em><\/p>\n\n<p>In other words, say we set the window size to <em>20<\/em> items. Once the window is full, each time a new value comes along, we include it as part of the updated average and standard deviation, but the oldest value is also removed from consideration. Only the most recent <em>20<\/em> items are used (or whatever the window size happens to be).<\/p>\n\n<p>I thought this was an interesting question, so I decided to try to figure it out. It turns out that we only have to make some small changes to the logic from the earlier articles to make this work. I'll briefly summarize the derivation and show example code in JavaScript as well.<\/p>\n\n<p>The diagram below shows the basic idea. We initially have values from <em>x0<\/em> to <em>x5<\/em> in our window, which has room for 6 items in this case. When we receive a new value, <em>x6<\/em>, it means we have to remove <em>x0<\/em> from the window, since it's currently the oldest value. As new values come in, we keep sliding the window forward:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fx990ut3oilhljzi8eo22.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fx990ut3oilhljzi8eo22.png\" alt=\"sliding window of values\"><\/a><\/p>\n\n<h2>\n  \n  \n  Sliding Average\n<\/h2>\n\n<p>Let\u2019s start by deriving the moving average within our window, where <em>N<\/em> corresponds to the window size. The average for values from <em>x1<\/em> to <em>xn<\/em> is as follows:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F5vcf5n26a7pghghwzrzy.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F5vcf5n26a7pghghwzrzy.png\" alt=\"average from x_1 to x_n\"><\/a><\/p>\n\n<p>It's basically unchanged from the first article in this series, <a href=\"https:\/\/dev.to\/nestedsoftware\/calculating-a-moving-average-on-streaming-data-5a7k\">Calculating a Moving Average on Streaming Data<\/a>. However, since the size of our window is now fixed, the average up to the previous value, <em>xn-1<\/em> is:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fxgveudg9s7d84ihs2hi9.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fxgveudg9s7d84ihs2hi9.png\" alt=\"average from x_0 to x_n-1\"><\/a><\/p>\n\n<p>Subtracting these two averages, we get the following expression:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fqrrcnic8nyd33pphbask.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fqrrcnic8nyd33pphbask.png\" alt=\"x\u0304_n - x\u0304_n-1\"><\/a><\/p>\n\n<p>The first average consists of a sum of values from <em>x1<\/em> to <em>xn<\/em>. From this, we subtract a sum of values from <em>x0<\/em> to <em>xn-1<\/em>. The only values that don't cancel each other out are <em>xn<\/em> and <em>x0<\/em>. Our final recurrence relation for the incremental average with a sliding window of size <em>N<\/em> is therefore:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ftqd4wec1gu2g15nw3o3u.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ftqd4wec1gu2g15nw3o3u.png\" alt=\"incremental average recurrence relation\"><\/a><\/p>\n\n<p>That's all we need to compute the average incrementally with a fixed window size.  The corresponding code snippet is below:<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight javascript\"><code><span class=\"kd\">const<\/span> <span class=\"nx\">meanIncrement<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"nx\">newValue<\/span> <span class=\"o\">-<\/span> <span class=\"nx\">poppedValue<\/span><span class=\"p\">)<\/span> <span class=\"o\">\/<\/span> <span class=\"k\">this<\/span><span class=\"p\">.<\/span><span class=\"nx\">count<\/span>\n<span class=\"kd\">const<\/span> <span class=\"nx\">newMean<\/span> <span class=\"o\">=<\/span> <span class=\"k\">this<\/span><span class=\"p\">.<\/span><span class=\"nx\">_mean<\/span> <span class=\"o\">+<\/span> <span class=\"nx\">meanIncrement<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n<h2>\n  \n  \n  Sliding Variance and Standard Deviation\n<\/h2>\n\n<p>Next, let's derive the relation for <em>d<sup>2<\/sup>n<\/em>.<\/p>\n\n<blockquote>\n<p>What is <em>d<sup>2<\/sup><\/em>? It's a term I made up for the variance * (n-1), or variance * n, depending on whether we're talking about <a href=\"https:\/\/en.wikipedia.org\/wiki\/Variance#Population_variance_and_sample_variance\" rel=\"noopener noreferrer\">sample variance or population variance<\/a>. For more background on the naming, see my article <a href=\"https:\/\/dev.to\/nestedsoftware\/the-geometry-of-standard-deviation--3m3o\">The Geometry of Standard Deviation<\/a>.<\/p>\n<\/blockquote>\n\n<p>From <a href=\"https:\/\/dev.to\/nestedsoftware\/calculating-standard-deviation-on-streaming-data-253l\">Calculating Standard Deviation on Streaming Data<\/a>, we've already derived the following:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fz0qgsmyqtk51mhu7sgkv.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fz0qgsmyqtk51mhu7sgkv.png\" alt=\"d^2 for n\"><\/a><\/p>\n\n<p>Again, since our window size remains constant, the equation for <em>d<sup>2<\/sup>n-1<\/em> has the same form, with the only difference being that it applies to the range of values from <em>x0<\/em> to <em>xn-1<\/em>:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Flmsgywfd2qotovar1k2h.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Flmsgywfd2qotovar1k2h.png\" alt=\"d^2 for n-1\"><\/a><\/p>\n\n<p>When we subtract these two equations, we get:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F81wo3c3ck7ki0jhmag3v.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F81wo3c3ck7ki0jhmag3v.png\" alt=\"d^2_n - d^2_n-1\"><\/a><\/p>\n\n<p>Since the two summations overlap everywhere except at <em>xn<\/em> and <em>x0<\/em>, we can simplify this as follows:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F277ncxfdha50l4gw5x2o.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F277ncxfdha50l4gw5x2o.png\" alt=\"d^2_n - d^2_n-1 simplify summations\"><\/a><\/p>\n\n<p>We can now factor this expression into the following form:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F1dtryozc0oj5lzuv6nxs.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F1dtryozc0oj5lzuv6nxs.png\" alt=\"d^2_n - d^2_n-1 factor\"><\/a><\/p>\n\n<p>We can also factor the difference of squares on the right:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fqkkwy8hzsxf2u10pxk3b.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fqkkwy8hzsxf2u10pxk3b.png\" alt=\"d^2_n - d^2_n-1 \"><\/a><\/p>\n\n<p>Next, we notice that the difference between the current average and the previous average, <em>x\u0304n - x\u0304n-1<\/em>, is (<em>xn - x0)\/N<\/em>, as derived earlier:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F5gsr9p3rdockbtf1i9ne.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F5gsr9p3rdockbtf1i9ne.png\" alt=\"d^2_n - d^2_n-1 simplify difference between current and previous average\"><\/a><\/p>\n\n<p>We can cancel the <em>N<\/em>'s to get the following nicely simplified form:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F3yhd7bsxt2g7lbnhbzya.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F3yhd7bsxt2g7lbnhbzya.png\" alt=\"d^2_n - d^2_n-1 cancel out the n's\"><\/a><\/p>\n\n<p>To reduce the number of multiplications, we can factor out <em>xn - x0<\/em>:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fzytuml4su0ynio6qnxnl.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fzytuml4su0ynio6qnxnl.png\" alt=\"d^2_n - d^2_n-1 factor out x_n-x_0\"><\/a><\/p>\n\n<p>Lastly, to get our final recurrence relation, we add <em>d<sup>2<\/sup>n-1<\/em> to both sides. This gives us the new value of <em>d<sup>2<\/sup><\/em> in terms of the previous value and an increment:<\/p>\n\n<p><a href=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F0rk8thrj30lf1r8llx9x.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F0rk8thrj30lf1r8llx9x.png\" alt=\"d^2_n final recurrence relation\"><\/a><\/p>\n\n<p>The corresponding code is:<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight javascript\"><code><span class=\"kd\">const<\/span> <span class=\"nx\">dSquaredIncrement<\/span> <span class=\"o\">=<\/span> <span class=\"p\">((<\/span><span class=\"nx\">newValue<\/span> <span class=\"o\">-<\/span> <span class=\"nx\">poppedValue<\/span><span class=\"p\">)<\/span>\n                <span class=\"o\">*<\/span> <span class=\"p\">(<\/span><span class=\"nx\">newValue<\/span> <span class=\"o\">-<\/span> <span class=\"nx\">newMean<\/span> <span class=\"o\">+<\/span> <span class=\"nx\">poppedValue<\/span> <span class=\"o\">-<\/span> <span class=\"k\">this<\/span><span class=\"p\">.<\/span><span class=\"nx\">_mean<\/span><span class=\"p\">))<\/span>\n<span class=\"kd\">const<\/span> <span class=\"nx\">newDSquared<\/span> <span class=\"o\">=<\/span> <span class=\"k\">this<\/span><span class=\"p\">.<\/span><span class=\"nx\">_dSquared<\/span> <span class=\"o\">+<\/span> <span class=\"nx\">dSquaredIncrement<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n<h2>\n  \n  \n  Discussion\n<\/h2>\n\n<p>We now have a nice way to incrementally calculate the mean, variance, and standard deviation on a sliding window of values. With a cumulative average, which was described in the first article in this series, we have to express the mean in terms of the total number of values received so far - from the very beginning.<\/p>\n\n<p>That means we will get smaller and smaller fractions as time goes on, which will eventually lead to floating point precision problems. Even more importantly, after a large number of values has come along, a new value will just no longer represent a significant change, regardless of the precision. Here that issue doesn't come up: Our window size is always the same, and we only need to make adjustments based on the oldest value that is leaving the window, and the new value coming in.<\/p>\n\n<p>This approach also requires less computation than re-calculating everything in the current window from scratch each time. However, for many real-world applications, I suspect this may not make a huge difference. It should become more useful if the window size is large and the data is streaming in rapidly.<\/p>\n<h2>\n  \n  \n  Code\n<\/h2>\n\n<p>A demo with full source code for calculating the mean, variance, and standard deviation using a sliding window is available on github:<\/p>\n\n\n<div class=\"ltag-github-readme-tag\">\n  <div class=\"readme-overview\">\n    <h2>\n      <img src=\"https:\/\/media.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg\" alt=\"GitHub logo\">\n      <a href=\"https:\/\/github.com\/nestedsoftware\" rel=\"noopener noreferrer\">\n        nestedsoftware\n      <\/a> \/ <a href=\"https:\/\/github.com\/nestedsoftware\/iterative_stats\" rel=\"noopener noreferrer\">\n        iterative_stats\n      <\/a>\n    <\/h2>\n    <h3>\n      Demo of adjustment to Welford method for calculating mean\/variance\/stdev that incorporates a sliding window of fixed size over the incoming data\n    <\/h3>\n  <\/div>\n  <div class=\"ltag-github-body\">\n    \n<div id=\"readme\" class=\"md\">\n<p>Simple demo that compares two ways to calculate mean\/variance\/standard deviation over incoming data within a fixed window size. In the first case, we re-calculate the statistics from scratch using all of the values currently within the window. In the second case, we use an adjusted version of Welford's method such that we only need to consider the value entering the window and the oldest value it is replacing.<\/p>\n<p>To run: <code>node IterativeStatsWithWindow.js<\/code><\/p>\n<\/div>\n\n  <\/div>\n  <div class=\"gh-btn-container\"><a class=\"gh-btn\" href=\"https:\/\/github.com\/nestedsoftware\/iterative_stats\" rel=\"noopener noreferrer\">View on GitHub<\/a><\/div>\n<\/div>\n\n\n\n<h2>\n  \n  \n  Related\n<\/h2>\n\n<ul>\n<li><a href=\"https:\/\/dev.to\/nestedsoftware\/the-geometry-of-standard-deviation--3m3o\">The Geometry of Standard Deviation<\/a><\/li>\n<\/ul>\n\n","category":["javascript","math","statistics","slidingaverage"]},{"title":"PyTorch Image Recognition with Convolutional Networks","pubDate":"Mon, 09 Sep 2019 03:19:26 +0000","link":"https:\/\/dev.to\/nestedsoftware\/pytorch-image-recognition-with-convolutional-networks-4k17","guid":"https:\/\/dev.to\/nestedsoftware\/pytorch-image-recognition-with-convolutional-networks-4k17","description":"<p>In the last article, we implemented a simple dense network to recognize MNIST images with PyTorch. In this article, we'll stay with the MNIST recognition task, but this time we'll use convolutional networks, as described in <a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap6.html\" rel=\"noopener noreferrer\">chapter 6<\/a> of Michael Nielsen's book, <a href=\"http:\/\/neuralnetworksanddeeplearning.com\" rel=\"noopener noreferrer\">Neural Networks and Deep Learning<\/a>. For some additional background about convolutional networks, you can also check out my article <a href=\"https:\/\/dev.to\/nestedsoftware\/convolutional-neural-networks-an-intuitive-primer-k1k\">Convolutional Neural Networks: An Intuitive Primer<\/a>.<\/p>\n\n<p>We'll compare our PyTorch implementations to Michael's results using <a href=\"https:\/\/github.com\/mnielsen\/neural-networks-and-deep-learning\/blob\/master\/src\/conv.py\" rel=\"noopener noreferrer\">code<\/a> written with the (now defunct) <a href=\"https:\/\/github.com\/Theano\/Theano\" rel=\"noopener noreferrer\">Theano<\/a> library. You can also take a look at the underlying <a href=\"https:\/\/github.com\/mnielsen\/neural-networks-and-deep-learning\/blob\/master\/src\/network3.py\" rel=\"noopener noreferrer\">framework code<\/a> he developed on top of Theano. PyTorch seems to be more of a \"batteries included\" solution compared to Theano, so it makes implementing these networks much simpler. The dense network from the previous article had an accuracy close to <em>98%<\/em>. Our ultimate goal for our convolutional network will be to match the <em>99.6%<\/em> accuracy that Michael achieves.<\/p>\n\n<p>The <a href=\"https:\/\/github.com\/nestedsoftware\/pytorch\" rel=\"noopener noreferrer\">code<\/a> for this project is available on github, mainly in <a href=\"https:\/\/github.com\/nestedsoftware\/pytorch\/blob\/master\/pytorch_mnist_convnet.py\" rel=\"noopener noreferrer\">pytorch_mnist_convnet.py<\/a>.<\/p>\n\n<h2>\n  \n  \n  Simple Convolutional Network\n<\/h2>\n\n<p>The first convolutional network design that Michael presents is <a href=\"https:\/\/github.com\/mnielsen\/neural-networks-and-deep-learning\/blob\/master\/src\/conv.py\" rel=\"noopener noreferrer\"><code>basic_conv<\/code><\/a>. Our PyTorch implementation is shown below (<a href=\"https:\/\/github.com\/nestedsoftware\/pytorch\/blob\/master\/pytorch_mnist_convnet.py\" rel=\"noopener noreferrer\">pytorch_mnist_convnet.py<\/a>):<br>\n<\/p>\n\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"k\">class<\/span> <span class=\"nc\">ConvNetSimple<\/span><span class=\"p\">(<\/span><span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"n\">Module<\/span><span class=\"p\">):<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">__init__<\/span><span class=\"p\">(<\/span><span class=\"n\">self<\/span><span class=\"p\">):<\/span>\n        <span class=\"nf\">super<\/span><span class=\"p\">().<\/span><span class=\"nf\">__init__<\/span><span class=\"p\">()<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">conv1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Conv2d<\/span><span class=\"p\">(<\/span><span class=\"n\">in_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">out_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">20<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">fc1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">12<\/span><span class=\"o\">*<\/span><span class=\"mi\">12<\/span><span class=\"o\">*<\/span><span class=\"mi\">20<\/span><span class=\"p\">,<\/span> <span class=\"mi\">100<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">100<\/span><span class=\"p\">,<\/span> <span class=\"n\">OUTPUT_SIZE<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">forward<\/span><span class=\"p\">(<\/span><span class=\"n\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">conv1<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">sigmoid<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">max_pool2d<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">stride<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">x<\/span><span class=\"p\">.<\/span><span class=\"nf\">view<\/span><span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">12<\/span><span class=\"o\">*<\/span><span class=\"mi\">12<\/span><span class=\"o\">*<\/span><span class=\"mi\">20<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">fc1<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">sigmoid<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">out<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">x<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>In this network, we have 3 layers (not counting the input layer). The image data is sent to a convolutional layer with a <em>5 \u00d7 5<\/em> kernel, <em>1<\/em> input channel, and <em>20<\/em> output channels. The output from this convolutional layer is fed into a dense (aka fully connected) layer of <em>100<\/em> neurons. This dense layer, in turn, feeds into the output layer, which is another dense layer consisting of <em>10<\/em> neurons, each corresponding to one of our possible digits from <em>0<\/em> to <em>9<\/em>.<\/p>\n\n<p>The <code>forward<\/code> method is called when we run input through the network. We use sigmoid activation functions for each of our layers, except for the output layer (we'll look at this in more detail in the next few sections). We also compress the output from our convolutional layer in half by applying <em>2 \u00d7 2<\/em> max pooling to it, with a stride length of <em>2<\/em>. The diagram below shows the structure of this network:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpc1009mnil380qmx9j1a.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpc1009mnil380qmx9j1a.png\" alt=\"convolutional network\" width=\"800\" height=\"243\"><\/a><\/p>\n\n<p>In the previous article, we saw that the data returned by the loader has dimensions <code>torch.Size([10, 1, 28, 28])<\/code>. This means there are <em>10<\/em> images per batch, and each image is represented as a <em>1 \u00d7 28 \u00d7 28<\/em> grid. The <em>1<\/em> means there is a single input channel (the data is in greyscale). The diagram below shows in more detail how the input is processed through the convolutional layer:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7nl3jva6l69xrvjqy4f2.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7nl3jva6l69xrvjqy4f2.png\" alt=\"convolutional layer with max pooling\" width=\"800\" height=\"341\"><\/a><\/p>\n\n<p>In SciPy, <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.signal.convolve2d.html\" rel=\"noopener noreferrer\"><code>convolve2d<\/code><\/a> does just what is says: It convolves two 2-d matrices together. The behaviour of <code>torch.nn.Conv2d<\/code> is more complicated. The line of code that creates the convolutional layer, <code>self.conv1 = nn.Conv2d(in_channels=1, out_channels=20, kernel_size=5)<\/code>, has a number of parts to it:<\/p>\n\n<ul>\n<li>\n<code>kernel_size<\/code> tells us the 2-d structure of the filter to apply to the input. We can supply this as tuple if we want it to be a rectangle, but if we specify it as a scalar, as we do here, then that value is used for both the height and width, a <em>5 \u00d7 5<\/em> square in this case.<\/li>\n<li>\n<code>in_channels<\/code> extends the kernel into the 3rd dimension, depth-wise. These three parameters, the height and width of the kernel, and the depth as specified by the number of input channels, define a 3-d matrix. We can convolve the 3-d input with this 3-d filter. The result is a <em>24 \u00d7 24<\/em> 2-d matrix. This 2-d matrix is a feature map. Each neuron in this feature map identifies the same <em>5 \u00d7 5<\/em> feature somewhere in the receptive field of the input.<\/li>\n<li>\n<code>out_channels<\/code> tells us how many filters to use - in other words, how many feature maps we want for the convolutional layer. The 2-d outputs from the convolution of the input with each filter are stacked on top of one another.<\/li>\n<\/ul>\n\n<p>Even though I think of this as a 3-d operation (especially when there is more than one input channel), I guess it's called <code>Conv2d<\/code> in PyTorch to indicate that each channel has a 2-dimensional shape (<code>Conv3d<\/code> is used when each channel has <em>3<\/em> dimensions). I go into more detail about forward and back propagation through convolutional layers in <a href=\"https:\/\/dev.to\/nestedsoftware\/convolutional-neural-networks-an-intuitive-primer-k1k\">Convolutional Neural Networks: An Intuitive Primer<\/a>.<\/p>\n\n<p>Conceptually, each filter produces a feature map, which represents a feature that we're looking for in the receptive field of the input data. In this case, that means the network learns <em>20<\/em> distinct <em>5 \u00d7 5<\/em> features. During forward propagation, <code>max_pool2d<\/code> compresses each feature. It's applied to each channel, turning each <em>24 \u00d7 24<\/em> feature map into a <em>12 \u00d7 12<\/em> matrix for each channel. The result is a 3-d matrix with the same depth (<em>20<\/em> channels in this case).<\/p>\n\n<p>Note, as shown below, that <code>Conv2d<\/code> technically performs a cross-correlation rather than a true convolution operation (<code>Conv2d<\/code> calls <code>conv2d<\/code> internally):<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">torch<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">torch<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">nn<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">scipy<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">signal<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">values<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">tensor<\/span><span class=\"p\">([[[[<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span><span class=\"mi\">2<\/span><span class=\"p\">,<\/span><span class=\"mi\">3<\/span><span class=\"p\">],[<\/span><span class=\"mi\">4<\/span><span class=\"p\">,<\/span><span class=\"mi\">5<\/span><span class=\"p\">,<\/span><span class=\"mi\">6<\/span><span class=\"p\">],[<\/span><span class=\"mi\">7<\/span><span class=\"p\">,<\/span><span class=\"mi\">8<\/span><span class=\"p\">,<\/span><span class=\"mi\">9<\/span><span class=\"p\">]]]])<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">tensor<\/span><span class=\"p\">([[[[<\/span><span class=\"mi\">10<\/span><span class=\"p\">,<\/span><span class=\"mi\">20<\/span><span class=\"p\">],[<\/span><span class=\"mi\">30<\/span><span class=\"p\">,<\/span><span class=\"mi\">40<\/span><span class=\"p\">]]]])<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"n\">functional<\/span><span class=\"p\">.<\/span><span class=\"nf\">conv2d<\/span><span class=\"p\">(<\/span><span class=\"n\">values<\/span><span class=\"p\">,<\/span> <span class=\"n\">f<\/span><span class=\"p\">)<\/span>\n<span class=\"nf\">tensor<\/span><span class=\"p\">([[[[<\/span><span class=\"mi\">370<\/span><span class=\"p\">,<\/span> <span class=\"mi\">470<\/span><span class=\"p\">],<\/span>\n          <span class=\"p\">[<\/span><span class=\"mi\">670<\/span><span class=\"p\">,<\/span> <span class=\"mi\">770<\/span><span class=\"p\">]]]])<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">signal<\/span><span class=\"p\">.<\/span><span class=\"nf\">correlate2d<\/span><span class=\"p\">(<\/span><span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">0<\/span><span class=\"p\">],<\/span> <span class=\"n\">f<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">0<\/span><span class=\"p\">],<\/span> <span class=\"n\">mode<\/span><span class=\"o\">=<\/span><span class=\"sh\">\"<\/span><span class=\"s\">valid<\/span><span class=\"sh\">\"<\/span><span class=\"p\">)<\/span>\n<span class=\"nf\">array<\/span><span class=\"p\">([[<\/span><span class=\"mi\">370<\/span><span class=\"p\">,<\/span> <span class=\"mi\">470<\/span><span class=\"p\">],<\/span>\n       <span class=\"p\">[<\/span><span class=\"mi\">670<\/span><span class=\"p\">,<\/span> <span class=\"mi\">770<\/span><span class=\"p\">]],<\/span> <span class=\"n\">dtype<\/span><span class=\"o\">=<\/span><span class=\"n\">int64<\/span><span class=\"p\">)<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n<h2>\n  \n  \n  Softmax\n<\/h2>\n\n<p>We want the output to indicate which digit the image corresponds to. In other words, we want the output for the correct prediction to be as close to <em>1<\/em> as possible, and for the rest of the outputs to be as close to <em>0<\/em> as possible.<\/p>\n\n<p>First, we will normalize our outputs so that they add up to <em>1<\/em>, thus turning our ouput into a probability distribution. The simple way to normalize our outputs would be just to divide each output by the sum of all of the outputs (<em>N<\/em> is the number of outputs):<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zof13kfjrm4g8qlubwc.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zof13kfjrm4g8qlubwc.png\" alt=\"normalize output\" width=\"301\" height=\"162\"><\/a><\/p>\n\n<p>We will use a function called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Softmax_function\" rel=\"noopener noreferrer\"><em>softmax<\/em><\/a> instead. With softmax, we adjust the above formula by applying the exponential function to each output:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5yzyevqjdl735c2tyzd.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5yzyevqjdl735c2tyzd.png\" alt=\"softmax output\" width=\"288\" height=\"163\"><\/a><\/p>\n\n<p>Why should we do this? I don't think Michael compares softmax with the simple linear normalization shown earlier. One benefit is that, with softmax, the highest output value will get an exponentially greater proportion of the total. This encourages our network to more sharply favour the highest output over all of the others. This approach also has the advantage that any negative outputs will be automatically converted to positive values - since the exponential function returns a positive value for any input (it approaches <em>0<\/em> as <em>x<\/em> goes to negative infinity).<\/p>\n\n<p>You may also want to see what Michael has to say about <a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap3.html#softmax\" rel=\"noopener noreferrer\">softmax<\/a> in <a href=\"http:\/\/neuralnetworksanddeeplearning.com\" rel=\"noopener noreferrer\">Neural Networks and Deep Learning<\/a>, as he goes into some interesting additional discussion of its properties.<\/p>\n<h2>\n  \n  \n  Negative Log Likelihood Loss\n<\/h2>\n\n<p>Once we have the output transformed with softmax, we need to compute the loss. For this, we'll use the <em>negative log likelihood<\/em> loss function. For the target value, where we want the probability to be close to <em>1<\/em>, the loss is <em>f(x) = -ln(x)<\/em>, where <em>x<\/em> is the network's output for the desired prediction. Why should we use the negative log instead of our old friend, the quadratic cost function? I found it helpful to compare negative log against the quadratic cost, <em>f(x) = (x - 1)<sup>2<\/sup><\/em>:<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7i19ugvw50ez7nndzyc.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7i19ugvw50ez7nndzyc.png\" alt=\"loss function comparison\" width=\"457\" height=\"612\"><\/a><\/p>\n\n<p>We can see that the cost goes down to <em>0<\/em> for both functions as the output approaches <em>1<\/em>, which is what we want. The advantage of the log likelihood over quadratic cost is that the cost for log likelihood rises much faster as the output moves away from <em>1<\/em> and toward <em>0<\/em>. That means the gradients we compute get much higher the farther away they are from the target. This should increase the speed with which our network learns.<\/p>\n<h2>\n  \n  \n  Cross Entropy Loss\n<\/h2>\n\n<p>There are several ways that we could compute the negative log likelihood loss. We could run our output through softmax ourselves, then compute the loss with a custom loss function that applies the negative log to the output. This is what Michael Nielsen's Theano code does. However, the simplest way to do it in PyTorch is just to use <a href=\"https:\/\/pytorch.org\/docs\/stable\/nn.html#crossentropyloss\" rel=\"noopener noreferrer\"><code>CrossEntropyLoss<\/code><\/a>.  <code>CrossEntropyLoss<\/code> does everything for us, which includes applying softmax to the output - that's why we don't do it ourselves, as mentioned earlier.<\/p>\n\n<p><code>CrossEntropyLoss()<\/code> produces a loss function that takes two parameters, the outputs from the network, and the corresponding index of the correct prediction for each image in the batch. In our case, we can use the target digit as this index: If the image corresponds to the number <em>3<\/em>, then the output from the network that we want to increase is <code>output[3]<\/code>.<\/p>\n\n<blockquote>\n<p>During backpropagation, using <code>CrossEntropyLoss<\/code> only adjusts the weights and biases corresponding to the correct prediction. The gradients for the wrong predictions are just set to zero. Because softmax is applied to the output, any increase to the correct output after backpropagation means that the other outputs will be adjusted downward to compensate (to insure that the total still adds up to <em>1<\/em>).<\/p>\n<\/blockquote>\n\n<p>To demonstrate why we use <code>CrossEntropyLoss<\/code>, let's say we've got an output of <code>[0.2, 0.4, 0.9]<\/code> for some network. We want the 3rd output, currently <em>0.9<\/em>, to be the correct one, i.e. we want to increase that output toward <em>1<\/em>. The REPL session below shows several loss calculations that produce the same result: We apply softmax followed by negative log; we take the negative value of <code>log_softmax<\/code>; we compute <code>NLLLoss<\/code> after <code>log_softmax<\/code>; we use <code>CrossEntropyLoss<\/code> with the raw output:<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">torch<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">torch<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">nn<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">output<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">tensor<\/span><span class=\"p\">([[<\/span><span class=\"mf\">0.2<\/span><span class=\"p\">,<\/span> <span class=\"mf\">0.4<\/span><span class=\"p\">,<\/span> <span class=\"mf\">0.9<\/span><span class=\"p\">]])<\/span> <span class=\"c1\"># raw output doesn't add up to 1\n<\/span><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">output_after_softmax<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">softmax<\/span><span class=\"p\">(<\/span><span class=\"n\">output<\/span><span class=\"p\">,<\/span> <span class=\"n\">dim<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">output_after_softmax<\/span>\n<span class=\"nf\">tensor<\/span><span class=\"p\">([[<\/span><span class=\"mf\">0.2361<\/span><span class=\"p\">,<\/span> <span class=\"mf\">0.2884<\/span><span class=\"p\">,<\/span> <span class=\"mf\">0.4755<\/span><span class=\"p\">]])<\/span> <span class=\"c1\"># output adds up to 1 after softmax\n<\/span><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">negative_log_likelihood<\/span> <span class=\"o\">=<\/span> <span class=\"o\">-<\/span><span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">log<\/span><span class=\"p\">(<\/span><span class=\"n\">output_after_softmax<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">2<\/span><span class=\"p\">])<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">negative_log_likelihood<\/span>\n<span class=\"nf\">tensor<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.7434<\/span><span class=\"p\">)<\/span> <span class=\"c1\"># loss for target\n<\/span><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">output_after_log_softmax<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">log_softmax<\/span><span class=\"p\">(<\/span><span class=\"n\">output<\/span><span class=\"p\">,<\/span> <span class=\"n\">dim<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">output_after_log_softmax_3rd_item<\/span> <span class=\"o\">=<\/span> <span class=\"n\">output_after_log_softmax<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">2<\/span><span class=\"p\">]<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">output_after_log_softmax_3rd_item<\/span> <span class=\"o\">*<\/span> <span class=\"o\">-<\/span><span class=\"mi\">1<\/span>\n<span class=\"nf\">tensor<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.7434<\/span><span class=\"p\">)<\/span> <span class=\"c1\"># loss for target is same as above\n<\/span><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">negative_log_likelihood_loss<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">NLLLoss<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"nf\">negative_log_likelihood_loss<\/span><span class=\"p\">(<\/span><span class=\"n\">output_after_log_softmax<\/span><span class=\"p\">,<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">tensor<\/span><span class=\"p\">([<\/span><span class=\"mi\">2<\/span><span class=\"p\">]))<\/span>\n<span class=\"nf\">tensor<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.7434<\/span><span class=\"p\">)<\/span> <span class=\"c1\"># loss for target is same as above\n<\/span><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">cross_entropy_loss<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">CrossEntropyLoss<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"nf\">cross_entropy_loss<\/span><span class=\"p\">(<\/span><span class=\"n\">output<\/span><span class=\"p\">,<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">tensor<\/span><span class=\"p\">([<\/span><span class=\"mi\">2<\/span><span class=\"p\">]))<\/span>\n<span class=\"nf\">tensor<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.7434<\/span><span class=\"p\">)<\/span> <span class=\"c1\"># loss for target is same as above\n<\/span><\/code><\/pre>\n\n<\/div>\n\n\n<p>We can see that all of the above calculations produce the same loss value for our desired output. <code>CrossEntropyLoss<\/code> uses <code>torch.log_softmax<\/code> behind the scenes. The advantage of using <code>log_softmax<\/code> is that it is more numerically stable (i.e. deals with floating point precision better) than calculating <code>softmax<\/code> first, then applying <code>log<\/code> to the result as a separate step.<\/p>\n<h2>\n  \n  \n  Results for Simple Convolutional Network\n<\/h2>\n\n<p>The code below performs a training run for our network (<a href=\"https:\/\/github.com\/nestedsoftware\/pytorch\/blob\/master\/pytorch_mnist_convnet.py\" rel=\"noopener noreferrer\">pytorch_mnist_convnet.py<\/a>):<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"k\">def<\/span> <span class=\"nf\">train_and_test_network<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">,<\/span> <span class=\"n\">num_epochs<\/span><span class=\"o\">=<\/span><span class=\"mi\">60<\/span><span class=\"p\">,<\/span> <span class=\"n\">lr<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.1<\/span><span class=\"p\">,<\/span> <span class=\"n\">wd<\/span><span class=\"o\">=<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span>\n                           <span class=\"n\">loss_function<\/span><span class=\"o\">=<\/span><span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">CrossEntropyLoss<\/span><span class=\"p\">(),<\/span>\n                           <span class=\"n\">train_loader<\/span><span class=\"o\">=<\/span><span class=\"nf\">get_train_loader<\/span><span class=\"p\">(),<\/span>\n                           <span class=\"n\">test_loader<\/span><span class=\"o\">=<\/span><span class=\"nf\">get_test_loader<\/span><span class=\"p\">()):<\/span>\n    <span class=\"n\">sgd<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"n\">optim<\/span><span class=\"p\">.<\/span><span class=\"nc\">SGD<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">.<\/span><span class=\"nf\">parameters<\/span><span class=\"p\">(),<\/span> <span class=\"n\">lr<\/span><span class=\"o\">=<\/span><span class=\"n\">lr<\/span><span class=\"p\">,<\/span> <span class=\"n\">weight_decay<\/span><span class=\"o\">=<\/span><span class=\"n\">wd<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"nf\">train_network<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">,<\/span> <span class=\"n\">train_loader<\/span><span class=\"p\">,<\/span> <span class=\"n\">num_epochs<\/span><span class=\"p\">,<\/span> <span class=\"n\">loss_function<\/span><span class=\"p\">,<\/span> <span class=\"n\">sgd<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"nf\">print<\/span><span class=\"p\">(<\/span><span class=\"sh\">\"\"<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"nf\">test_network<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">,<\/span> <span class=\"n\">test_loader<\/span><span class=\"p\">)<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>We can see that we that we use <code>CrossEntropyLoss<\/code> by default to compute the loss. Let's train our simple network on the MNIST dataset:<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">pytorch_mnist_convnet<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">train_and_test_network<\/span><span class=\"p\">,<\/span> <span class=\"n\">ConvNetSimple<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">net<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">ConvNetSimple<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"nf\">train_and_test_network<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">Test<\/span> <span class=\"n\">data<\/span> <span class=\"n\">results<\/span><span class=\"p\">:<\/span> <span class=\"mf\">0.9897<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>After 60 epochs, with a learning rate of <em>0.1<\/em>, we get an accuracy of <em>98.97%<\/em>. Michael Nielsen reports <em>98.78%<\/em>, so our network seems to be in the right ballpark.<\/p>\n<h2>\n  \n  \n  Add a Second Convolutional layer\n<\/h2>\n\n<p>The next convolutional network Michael presents, <a href=\"https:\/\/github.com\/mnielsen\/neural-networks-and-deep-learning\/blob\/master\/src\/conv.py\" rel=\"noopener noreferrer\"><code>dbl_conv<\/code><\/a>, adds a second convolutional layer. The code below shows the structure of this network in PyTorch (<a href=\"https:\/\/github.com\/nestedsoftware\/pytorch\/blob\/master\/pytorch_mnist_convnet.py\" rel=\"noopener noreferrer\">pytorch_mnist_convnet.py<\/a>):<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"k\">class<\/span> <span class=\"nc\">ConvNetTwoConvLayers<\/span><span class=\"p\">(<\/span><span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"n\">Module<\/span><span class=\"p\">):<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">__init__<\/span><span class=\"p\">(<\/span><span class=\"n\">self<\/span><span class=\"p\">):<\/span>\n        <span class=\"nf\">super<\/span><span class=\"p\">().<\/span><span class=\"nf\">__init__<\/span><span class=\"p\">()<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">conv1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Conv2d<\/span><span class=\"p\">(<\/span><span class=\"n\">in_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">out_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">20<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">conv2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Conv2d<\/span><span class=\"p\">(<\/span><span class=\"n\">in_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">20<\/span><span class=\"p\">,<\/span> <span class=\"n\">out_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">40<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">fc1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">40<\/span><span class=\"p\">,<\/span> <span class=\"mi\">100<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">100<\/span><span class=\"p\">,<\/span> <span class=\"n\">OUTPUT_SIZE<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">forward<\/span><span class=\"p\">(<\/span><span class=\"n\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">conv1<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">sigmoid<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">max_pool2d<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">stride<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">conv2<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">sigmoid<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">max_pool2d<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">stride<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">x<\/span><span class=\"p\">.<\/span><span class=\"nf\">view<\/span><span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">40<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">fc1<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">sigmoid<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">out<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">x<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>The diagram below shows how the output from the first convolutional layer is fed into the second one.<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1dgdmf1108s2z8jbkqd.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1dgdmf1108s2z8jbkqd.png\" alt=\"second convolutional layer\" width=\"800\" height=\"341\"><\/a><\/p>\n\n<p>The previous convolutional layer learns <em>20<\/em> distinct features from the input image. We now take these <em>20<\/em> feature maps and send them as input to the second convolutional layer. For each filter in the second convolutional layer, this does two things:<\/p>\n\n<ul>\n<li>For each incoming channel, we compress together adjacent features across the receptive field.<\/li>\n<li>We then combine together these compressed features across channels. The result is a 2-dimensional feature map.<\/li>\n<\/ul>\n\n<p>Each feature map corresponds to a different combination of features from the previous layer, based on the weights for its specific filter. After max pooling, we end up with a <em>4 \u00d7 4<\/em> grid of feature neurons. Each neuron here represents a complicated aggregation of <em>16 \u00d7 16<\/em> pixels from the original image (each one is offset by <em>4<\/em> pixels). Since we've got <em>40<\/em> filters (the number of outgoing channels), we end up with <em>40<\/em> such feature maps as the output from the second convolutional layer.<\/p>\n<h2>\n  \n  \n  Results for Two Convolutional Layers\n<\/h2>\n\n<p>The only difference between this network and the previous one is the additional convolutional layer. Let's train this network on the MNIST dataset:<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">pytorch_mnist_convnet<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">ConvNetTwoConvLayers<\/span><span class=\"p\">,<\/span> <span class=\"n\">train_and_test_network<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">net<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">ConvNetTwoConvLayers<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"nf\">train_and_test_network<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">Test<\/span> <span class=\"n\">data<\/span> <span class=\"n\">results<\/span><span class=\"p\">:<\/span> <span class=\"mf\">0.9905<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>After 60 epochs, with a learning rate of <em>0.1<\/em>, we get an accuracy of <em>99.05%<\/em>. Michael Nielsen reports <em>99.06%<\/em>, so this time the results are really close.<\/p>\n<h2>\n  \n  \n  Replace Sigmoid with ReLU\n<\/h2>\n\n<p>The next network, <a href=\"https:\/\/github.com\/mnielsen\/neural-networks-and-deep-learning\/blob\/master\/src\/conv.py\" rel=\"noopener noreferrer\"><code>dbl_conv_relu<\/code><\/a>,  replaces the sigmoid activations with rectified linear units, or <a href=\"https:\/\/en.wikipedia.org\/wiki\/Rectifier_(neural_networks)\" rel=\"noopener noreferrer\"><em>ReLU<\/em><\/a>. Our PyTorch version is shown below (<a href=\"https:\/\/github.com\/nestedsoftware\/pytorch\/blob\/master\/pytorch_mnist_convnet.py\" rel=\"noopener noreferrer\">pytorch_mnist_convnet.py<\/a>):<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"k\">class<\/span> <span class=\"nc\">ConvNetTwoConvLayersReLU<\/span><span class=\"p\">(<\/span><span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"n\">Module<\/span><span class=\"p\">):<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">__init__<\/span><span class=\"p\">(<\/span><span class=\"n\">self<\/span><span class=\"p\">):<\/span>\n        <span class=\"nf\">super<\/span><span class=\"p\">().<\/span><span class=\"nf\">__init__<\/span><span class=\"p\">()<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">conv1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Conv2d<\/span><span class=\"p\">(<\/span><span class=\"n\">in_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">out_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">20<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">conv2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Conv2d<\/span><span class=\"p\">(<\/span><span class=\"n\">in_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">20<\/span><span class=\"p\">,<\/span> <span class=\"n\">out_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">40<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">fc1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">40<\/span><span class=\"p\">,<\/span> <span class=\"mi\">100<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">100<\/span><span class=\"p\">,<\/span> <span class=\"n\">OUTPUT_SIZE<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">forward<\/span><span class=\"p\">(<\/span><span class=\"n\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">conv1<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">relu<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">max_pool2d<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">stride<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">conv2<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">relu<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">max_pool2d<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">stride<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">x<\/span><span class=\"p\">.<\/span><span class=\"nf\">view<\/span><span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">40<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">fc1<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">relu<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">out<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">x<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>ReLU is discussed near the end of <a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap3.html\" rel=\"noopener noreferrer\">chapter 3<\/a> of <a href=\"http:\/\/neuralnetworksanddeeplearning.com\" rel=\"noopener noreferrer\">Neural Networks and Deep Learning<\/a>. The main advantage of ReLU seems to be that, unlike sigmoid, it doesn't cut off the activation and therefore squash the gradient to a value that's near <em>0<\/em>. This can help us to increase the depth, i.e the number of layers, in our networks. Otherwise, multiplying many small gradients together during backpropagation via the chain rule can lead to the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Vanishing_gradient_problem\" rel=\"noopener noreferrer\">vanishing gradient problem<\/a>.<\/p>\n<h2>\n  \n  \n  Results for ReLU with L2 Regularization\n<\/h2>\n\n<p>Michael reports a classification accuracy of <em>99.23%<\/em>, using a learning rate of <em>0.03<\/em>, with the addition of an <a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap3.html#regularization\" rel=\"noopener noreferrer\"><em>L2<\/em> regularization<\/a> term of <em>0.1<\/em>. I tried to replicate these results. However, with <em>0.1<\/em> as the weight decay value, my results were significantly worse, hovering at around <em>85%<\/em>:<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">pytorch_mnist_convnet<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">train_and_test_network<\/span><span class=\"p\">,<\/span> <span class=\"n\">ConvNetTwoConvLayersReLU<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">net<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">ConvNetTwoConvLayersReLU<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"nf\">train_and_test_network<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">,<\/span> <span class=\"n\">lr<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.03<\/span><span class=\"p\">,<\/span> <span class=\"n\">wd<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.1<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">Test<\/span> <span class=\"n\">data<\/span> <span class=\"n\">results<\/span><span class=\"p\">:<\/span> <span class=\"mf\">0.8531<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>After playing around a bit, I got much better results with weight decay set to <em>0.00005<\/em>:<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">pytorch_mnist_convnet<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">train_and_test_network<\/span><span class=\"p\">,<\/span> <span class=\"n\">ConvNetTwoConvLayersReLU<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">net<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">ConvNetTwoConvLayersReLU<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"nf\">train_and_test_network<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">,<\/span> <span class=\"n\">lr<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.03<\/span><span class=\"p\">,<\/span> <span class=\"n\">wd<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.00005<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">Test<\/span> <span class=\"n\">data<\/span> <span class=\"n\">results<\/span><span class=\"p\">:<\/span> <span class=\"mf\">0.9943<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>Here we get <em>99.43%<\/em>, comparable to, and actually a bit better than Michael's reported value of <em>99.23%<\/em>.<\/p>\n<h2>\n  \n  \n  Expand the Training Data\n<\/h2>\n\n<p>Michael next brings up another technique that can be used to improve training - expanding the training data. He applies a very simple technique of just shifting each image in the training set over by a single pixel. This way, each image generates 4 additional images, shifted over to the right, left, up, and down respectively. The code below generates the expanded dataset (<a href=\"https:\/\/github.com\/nestedsoftware\/pytorch\/blob\/master\/common.py\" rel=\"noopener noreferrer\">common.py<\/a>):<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"k\">def<\/span> <span class=\"nf\">identity<\/span><span class=\"p\">(<\/span><span class=\"n\">tensor<\/span><span class=\"p\">):<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">tensor<\/span>\n\n\n<span class=\"k\">def<\/span> <span class=\"nf\">shift_right<\/span><span class=\"p\">(<\/span><span class=\"n\">tensor<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">shifted<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">roll<\/span><span class=\"p\">(<\/span><span class=\"n\">tensor<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">shifted<\/span><span class=\"p\">[:,<\/span> <span class=\"mi\">0<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.0<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">shifted<\/span>\n\n\n<span class=\"k\">def<\/span> <span class=\"nf\">shift_left<\/span><span class=\"p\">(<\/span><span class=\"n\">tensor<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">shifted<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">roll<\/span><span class=\"p\">(<\/span><span class=\"n\">tensor<\/span><span class=\"p\">,<\/span> <span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">shifted<\/span><span class=\"p\">[:,<\/span> <span class=\"n\">IMAGE_WIDTH<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.0<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">shifted<\/span>\n\n\n<span class=\"k\">def<\/span> <span class=\"nf\">shift_up<\/span><span class=\"p\">(<\/span><span class=\"n\">tensor<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">shifted<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">roll<\/span><span class=\"p\">(<\/span><span class=\"n\">tensor<\/span><span class=\"p\">,<\/span> <span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">0<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">shifted<\/span><span class=\"p\">[<\/span><span class=\"n\">IMAGE_WIDTH<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"p\">:]<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.0<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">shifted<\/span>\n\n\n<span class=\"k\">def<\/span> <span class=\"nf\">shift_down<\/span><span class=\"p\">(<\/span><span class=\"n\">tensor<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">shifted<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">roll<\/span><span class=\"p\">(<\/span><span class=\"n\">tensor<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">0<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">shifted<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span> <span class=\"p\">:]<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.0<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">shifted<\/span>\n\n\n<span class=\"k\">def<\/span> <span class=\"nf\">get_extended_dataset<\/span><span class=\"p\">(<\/span><span class=\"n\">root<\/span><span class=\"o\">=<\/span><span class=\"sh\">\"<\/span><span class=\"s\">.\/data<\/span><span class=\"sh\">\"<\/span><span class=\"p\">,<\/span> <span class=\"n\">train<\/span><span class=\"o\">=<\/span><span class=\"bp\">True<\/span><span class=\"p\">,<\/span> <span class=\"n\">transform<\/span><span class=\"o\">=<\/span><span class=\"n\">transformations<\/span><span class=\"p\">,<\/span>\n                         <span class=\"n\">download<\/span><span class=\"o\">=<\/span><span class=\"bp\">True<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">training_dataset<\/span> <span class=\"o\">=<\/span> <span class=\"n\">datasets<\/span><span class=\"p\">.<\/span><span class=\"nc\">MNIST<\/span><span class=\"p\">(<\/span><span class=\"n\">root<\/span><span class=\"o\">=<\/span><span class=\"n\">root<\/span><span class=\"p\">,<\/span> <span class=\"n\">train<\/span><span class=\"o\">=<\/span><span class=\"n\">train<\/span><span class=\"p\">,<\/span>\n                                      <span class=\"n\">transform<\/span><span class=\"o\">=<\/span><span class=\"n\">transform<\/span><span class=\"p\">,<\/span> <span class=\"n\">download<\/span><span class=\"o\">=<\/span><span class=\"n\">download<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">shift_operations<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">identity<\/span><span class=\"p\">,<\/span> <span class=\"n\">shift_right<\/span><span class=\"p\">,<\/span> <span class=\"n\">shift_left<\/span><span class=\"p\">,<\/span> <span class=\"n\">shift_up<\/span><span class=\"p\">,<\/span> <span class=\"n\">shift_down<\/span><span class=\"p\">]<\/span>\n    <span class=\"n\">extended_dataset<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[]<\/span>\n    <span class=\"k\">for<\/span> <span class=\"n\">image<\/span><span class=\"p\">,<\/span> <span class=\"n\">expected_value<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">training_dataset<\/span><span class=\"p\">:<\/span>\n        <span class=\"k\">for<\/span> <span class=\"n\">shift<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">shift_operations<\/span><span class=\"p\">:<\/span>\n            <span class=\"n\">shifted_image<\/span> <span class=\"o\">=<\/span> <span class=\"nf\">shift<\/span><span class=\"p\">(<\/span><span class=\"n\">image<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]).<\/span><span class=\"nf\">unsqueeze<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">)<\/span>\n            <span class=\"n\">extended_dataset<\/span><span class=\"p\">.<\/span><span class=\"nf\">append<\/span><span class=\"p\">((<\/span><span class=\"n\">shifted_image<\/span><span class=\"p\">,<\/span> <span class=\"n\">expected_value<\/span><span class=\"p\">))<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">extended_dataset<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n<h2>\n  \n  \n  Results with Expanded Training Data\n<\/h2>\n\n<p>Continuing with the same network, Michael reports <em>99.37%<\/em> accuracy using the extended data. Let's try it:<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">pytorch_mnist_convnet<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">train_and_test_network<\/span><span class=\"p\">,<\/span> <span class=\"n\">ConvNetTwoConvLayersReLU<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">common<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">get_extended_train_loader<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">train_loader<\/span> <span class=\"o\">=<\/span> <span class=\"nf\">get_extended_train_loader<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">net<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">ConvNetTwoConvLayersReLU<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"nf\">train_and_test_network<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">,<\/span> <span class=\"n\">lr<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.03<\/span><span class=\"p\">,<\/span> <span class=\"n\">wd<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.00005<\/span><span class=\"p\">,<\/span> <span class=\"n\">train_loader<\/span><span class=\"o\">=<\/span><span class=\"n\">train_loader<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">Test<\/span> <span class=\"n\">data<\/span> <span class=\"n\">results<\/span><span class=\"p\">:<\/span> <span class=\"mf\">0.9951<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>We get <em>99.51%<\/em>, a modest improvement on the <em>99.43%<\/em> accuracy we obtained without extending the data.<\/p>\n\n<blockquote>\n<p>Per Michael's book, a more sophisticated approach for algorithmically extending the training data is described in <a href=\"https:\/\/ieeexplore.ieee.org\/document\/1227801\" rel=\"noopener noreferrer\">Best practices for convolutional neural networks applied to visual document analysis<\/a>.<\/p>\n<\/blockquote>\n<h2>\n  \n  \n  Add Fully Connected Layer and Dropout\n<\/h2>\n\n<p>The last network we'll look at is <a href=\"https:\/\/github.com\/mnielsen\/neural-networks-and-deep-learning\/blob\/master\/src\/conv.py\" rel=\"noopener noreferrer\"><code>double_fc_dropout<\/code><\/a>. We replace the single dense layer of <em>100<\/em> neurons with two dense layers of <em>1,000<\/em> neurons each. To reduce overfitting, we also add <a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap3.html#other_techniques_for_regularization\" rel=\"noopener noreferrer\">dropout<\/a>. During training, dropout excludes some neurons in a given layer from participating both in forward and back propagation. In our case, we set a probability of <em>50%<\/em> for a neuron in a given layer to be excluded.<\/p>\n\n<p>Our PyTorch version is shown below (<a href=\"https:\/\/github.com\/nestedsoftware\/pytorch\/blob\/master\/pytorch_mnist_convnet.py\" rel=\"noopener noreferrer\">pytorch_mnist_convnet.py<\/a>):<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"k\">class<\/span> <span class=\"nc\">ConvNetTwoConvTwoDenseLayersWithDropout<\/span><span class=\"p\">(<\/span><span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"n\">Module<\/span><span class=\"p\">):<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">__init__<\/span><span class=\"p\">(<\/span><span class=\"n\">self<\/span><span class=\"p\">):<\/span>\n        <span class=\"nf\">super<\/span><span class=\"p\">().<\/span><span class=\"nf\">__init__<\/span><span class=\"p\">()<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">conv1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Conv2d<\/span><span class=\"p\">(<\/span><span class=\"n\">in_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">out_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">20<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">conv2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Conv2d<\/span><span class=\"p\">(<\/span><span class=\"n\">in_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">20<\/span><span class=\"p\">,<\/span> <span class=\"n\">out_channels<\/span><span class=\"o\">=<\/span><span class=\"mi\">40<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">dropout1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Dropout<\/span><span class=\"p\">(<\/span><span class=\"n\">p<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.5<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">fc1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">40<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1000<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">dropout2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Dropout<\/span><span class=\"p\">(<\/span><span class=\"n\">p<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.5<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">fc2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">1000<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1000<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">dropout3<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Dropout<\/span><span class=\"p\">(<\/span><span class=\"n\">p<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.5<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"nc\">Linear<\/span><span class=\"p\">(<\/span><span class=\"mi\">1000<\/span><span class=\"p\">,<\/span> <span class=\"n\">OUTPUT_SIZE<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">forward<\/span><span class=\"p\">(<\/span><span class=\"n\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">conv1<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">relu<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">max_pool2d<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">stride<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">conv2<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">relu<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">max_pool2d<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">kernel_size<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">stride<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">x<\/span><span class=\"p\">.<\/span><span class=\"nf\">view<\/span><span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">40<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">dropout1<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">fc1<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">relu<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">dropout2<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">fc2<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">relu<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">dropout3<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">self<\/span><span class=\"p\">.<\/span><span class=\"nf\">out<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">x<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n<h2>\n  \n  \n  Final Results\n<\/h2>\n\n<p>Michael reports an improvement to <em>99.6%<\/em> after <em>40<\/em> epochs. Let's try it ourselves:<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">pytorch_mnist_convnet<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">train_and_test_network<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">pytorch_mnist_convnet<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">ConvNetTwoConvTwoDenseLayersWithDropout<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">common<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">get_extended_train_loader<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">train_loader<\/span> <span class=\"o\">=<\/span> <span class=\"nf\">get_extended_train_loader<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">net<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">ConvNetTwoConvTwoDenseLayersWithDropout<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"nf\">train_and_test_network<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">,<\/span> <span class=\"n\">num_epochs<\/span><span class=\"o\">=<\/span><span class=\"mi\">40<\/span><span class=\"p\">,<\/span> <span class=\"n\">lr<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.03<\/span><span class=\"p\">,<\/span> <span class=\"n\">train_loader<\/span><span class=\"o\">=<\/span><span class=\"n\">train_loader<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">Test<\/span> <span class=\"n\">data<\/span> <span class=\"n\">results<\/span><span class=\"p\">:<\/span> <span class=\"mf\">0.9964<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>On a first try, I also obtained an improved result of <em>99.64%<\/em> (compared to <em>99.51%<\/em> previously). This result looks pretty good. However, I noticed that it wasn't very stable. After a few initial epochs of training, in subsequent epochs the accuracy on test data would fluctuate chaotically. I ran the training several times, and while the best result I got was <em>99.64%<\/em>, most of the time the final result was around <em>99.5%<\/em>.<\/p>\n\n<p>The back-and-forth fluctuations in the results made me wonder if the learning rate was a bit too high. A learning rate of <em>0.005<\/em> does seem to produce more stable and reliable results:<br>\n<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code><span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">pytorch_mnist_convnet<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">train_and_test_network<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">pytorch_mnist_convnet<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">ConvNetTwoConvTwoDenseLayersWithDropout<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"kn\">from<\/span> <span class=\"n\">common<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">get_extended_train_loader<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">train_loader<\/span> <span class=\"o\">=<\/span> <span class=\"nf\">get_extended_train_loader<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"n\">net<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">ConvNetTwoConvTwoDenseLayersWithDropout<\/span><span class=\"p\">()<\/span>\n<span class=\"o\">&gt;&gt;&gt;<\/span> <span class=\"nf\">train_and_test_network<\/span><span class=\"p\">(<\/span><span class=\"n\">net<\/span><span class=\"p\">,<\/span> <span class=\"n\">num_epochs<\/span><span class=\"o\">=<\/span><span class=\"mi\">40<\/span><span class=\"p\">,<\/span> <span class=\"n\">lr<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.005<\/span><span class=\"p\">,<\/span> <span class=\"n\">train_loader<\/span><span class=\"o\">=<\/span><span class=\"n\">train_loader<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">Test<\/span> <span class=\"n\">data<\/span> <span class=\"n\">results<\/span><span class=\"p\">:<\/span> <span class=\"mf\">0.9963<\/span>\n<\/code><\/pre>\n\n<\/div>\n\n\n<p>In the graph below, we can see in detail the improvement of this network for the training run shown above (after each training epoch, we switch the model to <code>eval<\/code> mode and try it against the test data):<\/p>\n\n<p><a href=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbriujyr4dhjg4u7imgv.png\" class=\"article-body-image-wrapper\"><img src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbriujyr4dhjg4u7imgv.png\" alt=\"final network training progress\" width=\"800\" height=\"287\"><\/a><\/p>\n<h2>\n  \n  \n  Code\n<\/h2>\n\n<p>The code for this article is available in full on github:<\/p>\n\n\n<div class=\"ltag-github-readme-tag\">\n  <div class=\"readme-overview\">\n    <h2>\n      <img src=\"https:\/\/assets.dev.to\/assets\/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg\" alt=\"GitHub logo\">\n      <a href=\"https:\/\/github.com\/nestedsoftware\" rel=\"noopener noreferrer\">\n        nestedsoftware\n      <\/a> \/ <a href=\"https:\/\/github.com\/nestedsoftware\/pytorch\" rel=\"noopener noreferrer\">\n        pytorch\n      <\/a>\n    <\/h2>\n    <h3>\n      Demonstrations of basic PyTorch usage. Includes MNIST recognition using dense as well as convolutional networks.\n    <\/h3>\n  <\/div>\n  <div class=\"ltag-github-body\">\n    \n<div id=\"readme\" class=\"md\">\n<p>This project contains scripts to demonstrate basic PyTorch usage.  The code requires python 3, numpy, and pytorch.<\/p>\n<div class=\"markdown-heading\">\n<h2 class=\"heading-element\">Manual vs. PyTorch Backprop Calculation<\/h2>\n<\/div>\n\n<p>To compare a manual backprop calculation with the equivalent PyTorch version, run:<\/p>\n\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\"><pre class=\"notranslate\"><code>python backprop_manual_calculation.py\nw_l1 = 1.58\nb_l1 = -0.14\nw_l2 = 2.45\nb_l2 = -0.11\na_l2 = 0.8506\nupdated_w_l1 = 1.5814\nupdated_b_l1 = -0.1383\nupdated_w_l2 = 2.4529\nupdated_b_l2 = -0.1062\nupdated_a_l2 = 0.8515\n<\/code><\/pre><\/div>\n<p>and<\/p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\"><pre class=\"notranslate\"><code>python backprop_pytorch.py\nnetwork topology: Net(\n  (hidden_layer): Linear(in_features=1, out_features=1, bias=True)\n  (output_layer): Linear(in_features=1, out_features=1, bias=True)\n)\nw_l1 = 1.58\nb_l1 = -0.14\nw_l2 = 2.45\nb_l2 = -0.11\na_l2 = 0.8506\nupdated_w_l1 = 1.5814\nupdated_b_l1 = -0.1383\nupdated_w_l2 = 2.4529\nupdated_b_l2 = -0.1062\nupdated_a_l2 = 0.8515\n<\/code><\/pre><\/div>\n<p>Blog post: <a href=\"https:\/\/dev.to\/nestedsoftware\/pytorch-hello-world-37mo\" rel=\"nofollow\">PyTorch Hello World<\/a><\/p>\n<div class=\"markdown-heading\">\n<h2 class=\"heading-element\">MNIST Recognition<\/h2>\n<\/div>\n\n<p>The next examples recognize MNIST digits using a dense network at first, and then several convolutional network designs (examples are adapted from Michael Nielsen's book, Neural Networks and Deep Learning).<\/p>\n\n<p>I've added\u2026<\/p>\n<\/div>\n\n\n<\/div>\n<br>\n  <div class=\"gh-btn-container\"><a class=\"gh-btn\" href=\"https:\/\/github.com\/nestedsoftware\/pytorch\" rel=\"noopener noreferrer\">View on GitHub<\/a><\/div>\n<br>\n<\/div>\n<br>\n\n\n<h2>\n  \n  \n  References:\n<\/h2>\n\n<ul>\n<li>\n<a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap3.html\" rel=\"noopener noreferrer\">Chapter 3<\/a> of Neural Networks and Deep Learning, by Michael Nielsen<\/li>\n<li>\n<a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap6.html\" rel=\"noopener noreferrer\">Chapter 6<\/a> of Neural Networks and Deep Learning, by Michael Nielsen<\/li>\n<li>Michael Nielsen's Theano <a href=\"https:\/\/github.com\/mnielsen\/neural-networks-and-deep-learning\/blob\/master\/src\/conv.py\" rel=\"noopener noreferrer\">network topologies<\/a> and <a href=\"https:\/\/github.com\/mnielsen\/neural-networks-and-deep-learning\/blob\/master\/src\/network3.py\" rel=\"noopener noreferrer\">framework code<\/a>\n<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Softmax_function\" rel=\"noopener noreferrer\">Softmax<\/a><\/li>\n<li>\n<a href=\"https:\/\/pytorch.org\/docs\/stable\/nn.html#crossentropyloss\" rel=\"noopener noreferrer\">CrossEntropyLoss<\/a> in PyTorch<\/li>\n<li><a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap3.html#regularization\" rel=\"noopener noreferrer\">L2 Regularization<\/a><\/li>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Rectifier_(neural_networks)\" rel=\"noopener noreferrer\">ReLU<\/a> activation<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Vanishing_gradient_problem\" rel=\"noopener noreferrer\">Vanishing gradient problem<\/a><\/li>\n<li><a href=\"https:\/\/ieeexplore.ieee.org\/document\/1227801\" rel=\"noopener noreferrer\">Best practices for convolutional neural networks applied to visual document analysis<\/a><\/li>\n<\/ul>\n\n<h2>\n  \n  \n  Related\n<\/h2>\n\n<ul>\n<li><a href=\"https:\/\/dev.to\/nestedsoftware\/convolutional-neural-networks-an-intuitive-primer-k1k\">Convolutional Neural Networks: An Intuitive Primer<\/a><\/li>\n<\/ul>\n\n","category":["python","pytorch","cnn","mnist"]}]}}