{"@attributes":{"version":"2.0"},"channel":{"title":"Kevin Scott","link":"https:\/\/thekevinscott.com\/","description":"Articles by on Kevin Scott","generator":"Hugo -- gohugo.io","language":"en-us","lastBuildDate":"Thu, 22 Jan 2026 00:00:00 +0000","item":[{"title":"Two Months with Claude Code","link":"https:\/\/thekevinscott.com\/two-months-with-claude-code\/","pubDate":"Tue, 20 Jan 2026 00:00:00 +0000","guid":"https:\/\/thekevinscott.com\/two-months-with-claude-code\/","description":"<p>In November a friend demoed something he&rsquo;d built with Claude Code, 100% AI-generated. Despite using LLMs daily, I was caught off guard by the quality. It looked great and implemented something that would have been challenging to build myself, in a fraction of the time. I realized I&rsquo;d been sleeping on this wave of agentic tools. I&rsquo;ve been using Claude Code since, and wanted to catalogue what I&rsquo;ve learned.<\/p>\n<p>This is a snapshot in time, not gospel.<\/p>\n<p>I&rsquo;ve tried to approach the way a new developer might. A career&rsquo;s worth of habits can be a liability when tools change this fast.<\/p>\n<h2 id=\"caging-the-machine\">Caging the Machine<\/h2>\n<blockquote>\n<p>&ldquo;Claude Code running in this mode genuinely feels like a completely different product from regular, default Claude Code.&rdquo; \u2014 <a href=\"https:\/\/simonwillison.net\/2025\/Oct\/22\/living-dangerously-with-claude\/\">Simon Willison on YOLO mode<\/a><\/p>\n<\/blockquote>\n<p>He&rsquo;s right. But <a href=\"https:\/\/mashable.com\/article\/google-gemini-deletes-users-code\">these tools can also destroy things<\/a>. So the first thing I did was look into containerizing it.<\/p>\n<p><a href=\"https:\/\/github.com\/anthropics\/claude-code\/blob\/main\/.devcontainer\/Dockerfile\">Anthropic includes a <code>Dockerfile<\/code><\/a> for Claude Code. I <a href=\"https:\/\/github.com\/thekevinbot\/robotcage\">forked it into a script called <code>robotcage<\/code><\/a> that spins up Claude Code in YOLO with only the current folder mounted and network access locked down. It can only destroy what&rsquo;s in that folder, and it can&rsquo;t exfiltrate anything.<\/p>\n<p>In practice, this approach has too much friction:<\/p>\n<ul>\n<li>Claude often needs network access for legitimate purposes. I added a dynamic whitelist, but since Claude can run it, it defeats the purpose.<\/li>\n<li>The mounted filesystem differs from my native one. Claude would misinterpret pasted commands, or node modules and uv environments would drift out of sync.<\/li>\n<li>I&rsquo;d often need access to folders outside the Docker root, which means running from a shared ancestor, which, again, defeats the purpose.<\/li>\n<\/ul>\n<p>These frictions added up. I&rsquo;ve moved away from containerization and now run Claude Code natively on a remote machine with everything version controlled; not bulletproof, but the best mitigation I&rsquo;ve found.<\/p>\n<p>One thing I want to keep from this experiment: a dedicated Claude profile with its own SSH keys and Github account. Git blame tells me instantly whether I wrote something or Claude did. When I&rsquo;m reviewing old code and wondering why something is the way it is, I instantly know: if it&rsquo;s Claude, I scrutinize differently. PRs show authorship the same way. It also separates credentials. Claude&rsquo;s SSH keys can be revoked without touching mine. If something goes wrong, the blast radius is contained.<\/p>\n<h2 id=\"embracing-undo\">Embracing Undo<\/h2>\n<p>The common thread across the next few sections: I stopped trying to prevent damage and started embracing recovery. Assume some damage will happen and make it cheap to undo.<\/p>\n<h3 id=\"notes-as-artifacts\">Notes as Artifacts<\/h3>\n<p>I use Claude Code to capture notes, plans, and decisions as markdown files in a <code>notes\/<\/code> folder.<\/p>\n<p>I don&rsquo;t want to commit these files because a lot of the text is transient, throwaway, or slop. But they&rsquo;re still important to track. I had my own version of catastrophic data loss in December when I asked Claude to &ldquo;clean up&rdquo; a folder and it deleted a month&rsquo;s worth of notes. I hadn&rsquo;t realized how valuable they were until they were gone.<\/p>\n<p>I wanted something like infinite undo, or a git repo that auto-committed on every change. Claude helped me set up a <code>btrfs<\/code> file system with once-a-minute snapshots. Lightweight, cheap to store, easy to restore.<\/p>\n<p>Notes stay colocated with projects, backed up but not committed. What&rsquo;s missing: semantic understanding of what a change is.<\/p>\n<h2 id=\"chief-of-staff\">Chief of Staff<\/h2>\n<p>I have a &ldquo;Chief of Staff&rdquo; agent: a folder with <code>claude.md<\/code> and some skills to help organize my projects.<\/p>\n<p>Most Claude Code users are probably familiar with plan vs execute mode, but this serves as a sort of &ldquo;meta&rdquo; planning mode <em>across<\/em> projects.<\/p>\n<p>The value here is cross-project synthesis. It surfaces patterns across projects that I&rsquo;ve missed; it also provides a single interface to pick off tasks and plan work across projects.<\/p>\n<h2 id=\"coding-style\">Coding Style<\/h2>\n<p>I started out as an incredibly pedantic reviewer with an incredibly pedantic workflow: an issue for each feature, a PR for each issue, detailed comments on every minor thing. I would <em>never<\/em> be this pedantic with a human colleague. I was also strict about which technologies Claude could use.<\/p>\n<p>Over time I&rsquo;ve loosened the reins in two ways.<\/p>\n<h3 id=\"technology-choice\">Technology Choice<\/h3>\n<p>I&rsquo;m letting Claude gravitate towards technologies it knows best. React instead of web components, for example, even though I&rsquo;d prefer the latter.<\/p>\n<p>My hypothesis: LLMs perform best where training data is richest. I don&rsquo;t have evidence. It feels true. When Claude struggles with a technology I prefer, I ask whether the battle is worth fighting. Usually no.<\/p>\n<h3 id=\"review-cycle\">Review Cycle<\/h3>\n<p>The pedantic cycle costs a lot of time and tokens. Reviewing a PR might take hours of back and forth. Claude isn&rsquo;t good at extrapolating: if I ask it to separate 3 functions into files, it won&rsquo;t apply that to similar code elsewhere in the same PR.<\/p>\n<p>Is it necessary? I want my code a certain way, but with a robust test suite, who cares what&rsquo;s on the inside?<\/p>\n<p>This points to preference learning, or evals. More on that later.<\/p>\n<h2 id=\"not-just-code\">Not Just Code<\/h2>\n<p>I run <a href=\"https:\/\/www.librechat.ai\">Librechat<\/a> locally: Anthropic, Google, and OpenAI in a single chat interface. I&rsquo;m pretty comfortable throwing a lot of my personal life at an LLM. But I&rsquo;m surprised by how much <em>more<\/em> I&rsquo;m throwing at it. I was wary of uploading personal information, but the benefits are so immense I couldn&rsquo;t help myself.<\/p>\n<p>The trend here is clear: LLMs are surprisingly capable outside of code. I&rsquo;ve used it for tax strategy, legal fine print, financial planning. Research that would take hours and cost hundreds in professional fees now takes minutes.<\/p>\n<p>And Claude Code specifically is a step up from how I was using LLMs before. More powerful models combined with tool use and file system access fit into my workflow more seamlessly than a web UI ever could.<\/p>\n<h2 id=\"a-loosening-of-the-cages\">A Loosening of the Cages<\/h2>\n<p>This is meant to be a snapshot in time. I&rsquo;m far from the promised land of knowing how to work effectively with agents. But I can draw some conclusions:<\/p>\n<p>One, don&rsquo;t swim upstream. I started out exerting a high degree of control that loosened over time. Instead of running Claude in a container, I embraced undo. Instead of being a pedantic reviewer, I let it do its thing. Give it rope, but protect yourself with good rollback.<\/p>\n<p>Two, my personal ambition has increased significantly. The range of projects I am considering has ballooned beyond what would have been reasonable last year, and it&rsquo;s because projects that would&rsquo;ve otherwise taken 2 weeks now take a day.<\/p>\n<p>Three, interact to learn. By using Claude Code heavily I&rsquo;m learning how to work with agents, but also finding that a tighter interactive loop helps when designing software. More akin to play. More akin to a REPL.<\/p>\n<p>Four, the capability threshold has shifted. If you still believe LLMs are incapable of doing X, where X is some large component of your job, take the latest generation out for a spin. My priors from six months ago are already stale.<\/p>\n<h2 id=\"further-reading\">Further Reading<\/h2>\n<p>Others are reaching similar conclusions:<\/p>\n<p><strong><a href=\"https:\/\/www.spakhm.com\/claude-code\">Thoughts on Claude Code<\/a><\/strong> (Slava Akhmechet) \u2014 Built a programming language in two weeks; the most detailed practitioner account I&rsquo;ve found.<\/p>\n<p><strong><a href=\"https:\/\/newsletter.pragmaticengineer.com\/p\/when-ai-writes-almost-all-code-what\">When AI Writes Almost All Code<\/a><\/strong> (Gergely Orosz) \u2014 Frames recent models as crossing a capability threshold.<\/p>\n<p><strong><a href=\"https:\/\/news.ycombinator.com\/item?id=46515696\">Opus 4.5 is not the normal AI agent experience<\/a><\/strong> (Hacker News) \u2014 Practitioner thread on mature Claude setups: custom skills, code review agents, automated workflows.<\/p>\n<p>I don&rsquo;t know what being a developer means anymore. I&rsquo;m figuring it out in real time.<\/p>"},{"title":"LLMs running in the browser","link":"https:\/\/thekevinscott.com\/llms-in-the-browser\/","pubDate":"Fri, 01 Mar 2024 10:00:00 +0000","guid":"https:\/\/thekevinscott.com\/llms-in-the-browser\/","description":"<p>This post explores various ways to run Language Models (LLMs) in the browser. I will:<\/p>\n<ol>\n<li>Review the available frameworks today, and choose the most compelling options<\/li>\n<li>Implement a working example for each framework<\/li>\n<li>Evaluate them both quantitatively (specifically, speed) and qualitatively (documentation, ease-of-use, etc.)<\/li>\n<\/ol>\n<p>This is written spring of 2024 and things move quickly; consider this post a snapshot in time.<\/p>\n<h2 id=\"why\">Why<\/h2>\n<p>Before we jump in, you might wonder <em>why<\/em> someone would want to run an LLM in the browser.<\/p>\n<p>For me, there&rsquo;s three reasons:<\/p>\n<ul>\n<li><strong>Data privacy<\/strong> - keep your data local<\/li>\n<li><strong>Latency<\/strong> - avoid round trips to the server (also can support offline mode)<\/li>\n<li><strong>Cost<\/strong> - run models locally, save money. This applies both to the user (avoid paying for an API) and operators (run models on the users&rsquo; devices and avoid server inference costs)<\/li>\n<\/ul>\n<p>One very large drawback is that LLMs tend to be huge, measuring from hundreds of MBs to GBs. If your users are on, say, a mobile data plan, a downloading an LLM won&rsquo;t be a feasible solution. However, for desktop users with robust data connections, including electron apps and Node apps, browser-based LLMs remain a viable approach.<\/p>\n<h2 id=\"1-available-options\">1. Available options<\/h2>\n<p>Available options for running LLMs in the browser I&rsquo;ve found include:<\/p>\n<ul>\n<li><a href=\"https:\/\/huggingface.co\/docs\/transformers.js\">Transformers.js<\/a> - a Hugging Face library built on top of ONNX runtime<\/li>\n<li><a href=\"https:\/\/github.com\/mlc-ai\/web-llm\">web-llm<\/a> - put out by the MLC team, and running on top of <a href=\"https:\/\/tvm.apache.org\/2021\/12\/15\/tvm-unity\">Apache TVM Unity<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/huggingface\/candle\">Candle<\/a> - another Hugging Face library, this one in Rust and converting to WASM<\/li>\n<li><a href=\"https:\/\/github.com\/tracel-ai\/burn\">Burn<\/a> - another Rust library compiling to WGPU (and Candle)<\/li>\n<li><a href=\"https:\/\/github.com\/tensorflow\/tfjs\">Tensorflow.js<\/a> - a Google library for doing machine learning in Javascript, with CPU, WebGL, WASM, and WebGPU backends<\/li>\n<li><a href=\"https:\/\/github.com\/microsoft\/onnxruntime\">ONNX<\/a> - Microsoft library, supports WebGL, WebGPU, and WASM<\/li>\n<\/ul>\n<p>I chose to evaluate Transformers.js, web-llm, and Candle, and discarded the rest for the following reasons:<\/p>\n<ul>\n<li>Burn leverages the Candle backend so an evaluation of Candle should cover both (and Candle offers specific LLM examples).<\/li>\n<li>ONNX is leveraged by Transformers.js, so an evaluation of Transformers.js should have implications for ONNX (and Transformers.js offers specific LLM examples)<\/li>\n<li>Tensorflow.js doesn&rsquo;t seem to offer much in the way of cutting edge LLM support, though it&rsquo;s been around for a long time and boasts great support. I&rsquo;m keeping my eyes on TFJS in the hopes that cutting-edge LLMs get support soon.<\/li>\n<\/ul>\n<h2 id=\"2-implementations\">2. Implementations<\/h2>\n<h3 id=\"transformersjs\">Transformers.js<\/h3>\n<p><a href=\"https:\/\/huggingface.co\/docs\/transformers.js\">Transformers.js<\/a> is a Javascript library offered by Hugging Face, and is &ldquo;designed to be functionally equivalent to Hugging Face&rsquo;s <a href=\"https:\/\/github.com\/huggingface\/transformers\">transformers<\/a> python library&rdquo;. <code>transformers<\/code> is a key part of the Python ecosystem and often used for running LLMs, so feature parity is a huge plus. In addition to text, <code>Transformers.js<\/code> support vision, audio, and multimodal tasks, so it&rsquo;s a great choice for supporting a wide variety of modalities.<\/p>\n<p><a href=\"https:\/\/huggingface.co\/docs\/transformers.js\/installation\">Transformers.js<\/a> offers an NPM package at <code>@xenova\/transformers<\/code>. Models can be loaded and used <a href=\"https:\/\/huggingface.co\/docs\/transformers.js\/api\/pipelines#pipelinestextgenerationpipeline\">with something like<\/a>:<\/p>\n<pre><code class=\"language-javascript\">const generator = await pipeline('text-generation', 'Xenova\/distilgpt2');\nconst text = 'I enjoy walking with my cute dog,';\nconst output = await generator(text);\n<\/code><\/pre>\n<p><a href=\"https:\/\/thekevinscott.github.io\/llm-browser-demos\/transformers.js\">I set up a demo here<\/a>.<\/p>\n<h3 id=\"web-llm\">Web-LLM<\/h3>\n<p><a href=\"https:\/\/github.com\/mlc-ai\/web-llm\">web-llm<\/a> runs on top of TVM. An NPM package is available at <code>@mlc-ai\/web-llm<\/code>. Models can be loaded with:<\/p>\n<pre><code class=\"language-typescript\">const chat = new webllm.ChatModule();\nawait chat.reload('Phi1.5-q0f16', undefined, {\n&quot;model_list&quot;: [\n\/\/ Phi-1.5\n{\n&quot;model_url&quot;: &quot;https:\/\/huggingface.co\/mlc-ai\/phi-1_5-q0f16-MLC\/resolve\/main\/&quot;,\n&quot;local_id&quot;: &quot;Phi1.5-q0f16&quot;,\n&quot;model_lib_url&quot;: &quot;https:\/\/raw.githubusercontent.com\/mlc-ai\/binary-mlc-llm-libs\/main\/phi-1_5\/phi-1_5-q0f16-ctx2k-webgpu.wasm&quot;,\n&quot;vram_required_MB&quot;: 5818.09,\n&quot;low_resource_required&quot;: false,\n&quot;required_features&quot;: [&quot;shader-f16&quot;],\n},\n],\n});\nconst reply = await chat.generate(prompt);\n<\/code><\/pre>\n<p><a href=\"https:\/\/thekevinscott.github.io\/llm-browser-demos\/web-llm\">I set up a demo here<\/a>.<\/p>\n<h3 id=\"candle\">Candle<\/h3>\n<p><a href=\"https:\/\/github.com\/huggingface\/candle\">Candle<\/a> is a Rust-based framework that compiles models into WASM. Like Transformers.js, this is a Hugging Face library, and boasts a larger mandate than just LLMs (namely, supporting training, any model, and a number of compilation targets).<\/p>\n<p><a href=\"https:\/\/huggingface.github.io\/candle\/guide\/installation.html\">Candle requires having Rust installed<\/a> and does not offer an NPM package. Once you&rsquo;ve got the repository cloned and installed, you can run:<\/p>\n<pre><code class=\"language-bash\">sh build-lib.sh\n<\/code><\/pre>\n<p>to generate the WASM build outputs. Then a model can be loaded and run with:<\/p>\n<pre><code class=\"language-javascript\">import init, { Model } from &quot;.\/build\/m.js&quot;;\nawait init();\nconst model = {\nbase_url:\n&quot;https:\/\/huggingface.co\/lmz\/candle-quantized-phi\/resolve\/main\/&quot;,\nmodel: &quot;model-q4k.gguf&quot;,\ntokenizer: &quot;tokenizer.json&quot;,\nconfig: &quot;phi-1_5.json&quot;,\nquantized: true,\nseq_len: 2048,\nsize: &quot;800 MB&quot;,\n}\nconst weightsURL =\nmodel.model instanceof Array\n? model.model.map((m) =&gt; model.base_url + m)\n: model.base_url + model.model;\nconst tokenizerURL = model.base_url + model.tokenizer;\nconst configURL = model.base_url + model.config;\nconst [weightsArrayU8, tokenizerArrayU8, configArrayU8] =\nawait Promise.all([\n\/\/ see demo for these implementations\nconcatenateArrayBuffers(([] as string[]).concat(weightsURL)),\nfetchArrayBuffer(tokenizerURL),\nfetchArrayBuffer(configURL),\n]);\nconst model = new Model(\nweightsArrayU8,\ntokenizerArrayU8,\nconfigArrayU8,\nquantized\n);\nconst firstToken = model.init_with_prompt(\nprompt,\ntemp,\ntop_p,\nrepeatPenalty,\n64,\nBigInt(seed)\n);\nconst seq_len = 2048;\nlet sentence = firstToken;\nlet maxTokens = maxSeqLen ? maxSeqLen : seq_len - prompt.length - 1;\nlet startTime = performance.now();\nlet tokensCount = 0;\nwhile (tokensCount &lt; maxTokens) {\nawait new Promise(async (resolve) =&gt; {\nconst token = await model.next_token();\nif (token === &quot;&lt;|endoftext|&gt;&quot;) {\nreturn;\n}\nsentence += token;\n});\ntokensCount++;\n}\n<\/code><\/pre>\n<p><a href=\"https:\/\/thekevinscott.github.io\/llm-browser-demos\/candle\">I set up a demo here<\/a>.<\/p>\n<h2 id=\"3-evaluations\">3. Evaluations<\/h2>\n<h3 id=\"qualitative-evaluations\">Qualitative Evaluations<\/h3>\n<h4 id=\"transformersjs-1\">Transformers.js<\/h4>\n<p>Transformers.js is really fantastic. It&rsquo;s simple to get up and running and the code is concise and readable. I&rsquo;m a big fan of how easily models can be loaded directly from Hugging Face (though models do need to be converted before-hand for the library; Xenova maintains a number of LLMs).<\/p>\n<p>On the cons side, I found some of the Typescript definitions to be out of date or missing, and the documentation (particularly around callbacks) to be lacking. The other big con is that a large number of models are broken in the browser (including Phi-2) due to an upstream issue with ONNX. <a href=\"https:\/\/github.com\/xenova\/transformers.js\/issues\/499\">The author has a PR fixing it here<\/a> so hopefully this will be fixed soon.<\/p>\n<p><a href=\"https:\/\/onnxruntime.ai\/docs\/tutorials\/web\/#onnx-runtime-web-application-development-flow\">ONNX, the backend on top of which Tranformers.js runs, supports both CPU and GPU processing<\/a>:<\/p>\n<blockquote>\n<p>With onnxruntime-web, you have the option to use webgl or webgpu for GPU processing, and WebAssembly (wasm, alias to cpu) for CPU processing. All ONNX operators are supported by WASM but only a subset are currently supported by WebGL and WebGPU.<\/p>\n<\/blockquote>\n<h4 id=\"web-llm-1\">Web-LLM<\/h4>\n<p>The output of web-llm are rock solid and its Typescript definitions are great.<\/p>\n<p>However, I found the code to be overly prescriptive, almost as if specifically designed to support the demo. For example, the parameters for specifying a model configuration are strangely constructed. It was difficult to strip away the demo code to a minimal running example.<\/p>\n<p>I <em>believe<\/em> web-llm only supports webgpu; <a href=\"https:\/\/github.com\/mlc-ai\/mlc-llm\/issues\/1106#issuecomment-1777194250\">it&rsquo;s not clear to me whether this project supports CPU-only<\/a>.<\/p>\n<h4 id=\"candle-1\">Candle<\/h4>\n<p>Candle is written in Rust. This may or may not be a dealbreaker. You&rsquo;ll need a Rust-compatible build step, a way to check in WASM artifacts, and more.<\/p>\n<p>What I find compelling about this compilation-based approach is the resulting size of the models; <a href=\"https:\/\/github.com\/tracel-ai\/burn\/tree\/42db540db3bde8e53cc9b8f1f1cf9f38b8909221\/examples\/mnist-inference-web#comparison\">Burn (another Rust-written framework) writes<\/a>:<\/p>\n<blockquote>\n<p>The main differentiating factor of this example&rsquo;s approach (compiling rust model into wasm) and other popular tools, such as TensorFlow.js, ONNX Runtime JS and TVM Web is the absence of runtime code. The rust compiler optimizes and includes only used burn routines. 1,509,747 bytes out of Wasm&rsquo;s 1,866,491 byte file is the model&rsquo;s parameters. The rest of 356,744 bytes contain all the code (including burn&rsquo;s nn components, the data deserialization library, and math operations).<\/p>\n<\/blockquote>\n<p>The code for loading a WASM model is considerably more verbose than the other two solutions on offer, and you must host and load the WASM files yourself.<\/p>\n<p><a href=\"https:\/\/github.com\/huggingface\/candle\/issues\/344\">Candle does not yet support WebGPU<\/a> but it seems to be in active development. It <em>also<\/em> sounds like the <a href=\"https:\/\/github.com\/huggingface\/candle\/issues\/344#issuecomment-1669927122\">Transformers.js maintainer<\/a> will be adding support for Candle as a backend as soon as that lands, so it seems like these two projects will share compatibility, which makes sense since they&rsquo;re both Hugging Face projects.<\/p>\n<h3 id=\"quantitative-evaluations\">Quantitative Evaluations<\/h3>\n<p>For each set of measurements, I tried to target the same Phi 1.5 model, for a max of 128 tokens. I ran each evaluation three times and averaged the results. I ran on Chrome on a Macbook M3. I&rsquo;m measuring model loading from cache (<em>not<\/em> over the network).<\/p>\n<h4 id=\"transformersjs-2\">Transformers.js<\/h4>\n<p>I would love to evaluate the model <code>Xenova\/phi-1_5_dev<\/code>; however, <a href=\"https:\/\/github.com\/xenova\/transformers.js\/issues\/499#issuecomment-1875942192\">due to a bug<\/a> that&rsquo;s broken in the browser, so I won&rsquo;t include generating time measurements here.<\/p>\n<table>\n<tr><td>Average loading time for a model from cache:<\/td><td><code>10.8s<\/code><\/td><\/tr>\n<\/table>\n<p>Transformers.js boasts ~20k weekly downloads and 5.7k stars on Github.<\/p>\n<h4 id=\"web-llm-2\">Web LLM<\/h4>\n<p>I&rsquo;m evaluating the model <code>Phi1.5-q0f16<\/code>.<\/p>\n<table>\n<tr><td>Average loading time for a model from cache:<\/td><td><code>4.5s<\/code><\/td><\/tr>\n<tr><td>Average generation time for a 128 token completion:<\/td><td><code>11.7s<\/code><\/td><\/tr>\n<\/table>\n<p>Web-LLM has 207 weekly downloads and 8.5k stars on Github.<\/p>\n<h4 id=\"candle-2\">Candle<\/h4>\n<p>I&rsquo;m evaluating the model <code>Phi 1.5 q4k<\/code>.<\/p>\n<table>\n<tr><td>Average loading time for a model from cache:<\/td><td><code>1.5s<\/code><\/td><\/tr>\n<tr><td>Average generation time for a 128 token completion:<\/td><td><code>17.2s<\/code><\/td><\/tr>\n<\/table>\n<p>Candle has 11.9k stars on Github (no NPM package is available, as it&rsquo;s Rust-based).<\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>There&rsquo;s no clear winner, which makes sense given how fast this space is moving.<\/p>\n<p>Candle&rsquo;s smaller model size is compelling, as reflected in how quickly it loads, but the lack of GPU support (for now) is a drawback. web-llm seems to offer a nice balance of speed and performance, but the lack of explicit CPU support is a drag. Transformers.js seems to have the most robust ecosystem of models and documentation, but has a number of open bugs that make it hard to fully evaluate; though the author&rsquo;s intent to incorporate Candle may bring it the best of both worlds.<\/p>"},{"title":"Image Upscaling with Javascript","link":"https:\/\/thekevinscott.com\/super-resolution-with-js\/","pubDate":"Mon, 28 Sep 2020 10:00:00 +0000","guid":"https:\/\/thekevinscott.com\/super-resolution-with-js\/","description":"<p><em>I recently released a tool, <a href=\"https:\/\/github.com\/thekevinscott\/UpscalerJS\">UpscalerJS<\/a> for doing image upscaling in your browser with Javascript and reducing your image sizes by up to 1\/16th. It&rsquo;s designed to be model-agnostic - you can plug and play any trained model that can be converted to Tensorflow.js.<\/em><\/p>\n<p><em>In this article I want to lay out what I consider a killer use case for neural networks in the browser, along with how I went about discovering the research, converting it to Javascript, and ways to improve it in the future.<\/em><\/p>\n<p><img src=\".\/images\/demo.gif\" alt=\"UpscalerJS\"><\/p>\n<hr>\n<p>Let&rsquo;s say you&rsquo;re working on an e-commerce platform. Your users upload photos of products to sell.<\/p>\n<p>You&rsquo;ve designed a great looking site, built to highlight the beautiful and wondrous products your users have crafted by hand. There&rsquo;s only one problem - once you launch, you find your users are uploading small, pixelated images, and all of a sudden your beautiful site doesn&rsquo;t look quite so beautiful.<\/p>\n<p>(I&rsquo;m drawing from experience here - this has happened to me more than once.)<\/p>\n<p>You can go back and nag your users for better images - and sometimes this can work. But often, the images they&rsquo;ve provided are all they&rsquo;ve got. Maybe the images are screenshotted from PDFs, or maybe the images are older and users don&rsquo;t have better ones. And even if they <em>do<\/em> have better images, it&rsquo;s a labor intensive ask of your users to go back and ask them to fix their images for you, even if it is for their benefit.<\/p>\n<p>Is there a technical solution we can explore? There certainly is and it&rsquo;s called <strong>Super Resolution<\/strong>.<\/p>\n<h2 id=\"what-is-this-super-resolution-you-speak-of\">What is this &ldquo;Super Resolution&rdquo; you speak of?<\/h2>\n<p>Let\u2019s say somebody&rsquo;s uploaded a 150px photo to our e-commerce site:<\/p>\n<p><a target=\"_blank\" href=\".\/images\/dog-150.jpg\"><img src=\".\/images\/dog-150.jpg\" alt=\"A cute photo of a dog\" \/><\/a>\n<capt><a href=\"https:\/\/www.flickr.com\/photos\/30048871@N00\/2437627725\" target=\"_blank\" rel=\"noopener noreferrer\">A happy dog by merec0<\/a><\/capt><\/p>\n<p>We want to feature that image on our home page because it&rsquo;s a beautiful dog, but our design demands images at 300px. What can we do? If we double each pixel we get a larger image that looks pixelated:<\/p>\n<div style=\"text-align: center;\">\n<a target=\"_blank\" href=\".\/images\/dog-300-pixelated.png\" style=\"border-bottom: none\"><img src=\".\/images\/dog-300-pixelated.png\" alt=\"The cute dog, upscaled to 300px\" width=\"300\"><\/a>\n<\/div>\n<capt>Image upscaled to 300px.<\/capt>\n<p>You have to <a href=\"https:\/\/css-tricks.com\/keep-pixelated-images-pixelated-as-they-scale\/\">go out of your way to achieve the pixelated look<\/a> in the browser; by default most browsers will apply some sort of scaling algorithm to the image, usually <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bicubic_interpolation\">bicubic interpolation<\/a>, which looks like:<\/p>\n<div style=\"text-align: center;\">\n<a target=\"_blank\" href=\".\/images\/dog-300-bicubic.png\" style=\"border-bottom: none\"><img src=\".\/images\/dog-300-bicubic.png\" alt=\"The cute dog, upscaled to 300px using Bicubic Interpolation\" width=\"300\"><\/a>\n<\/div>\n<capt>Image upscaled to 300px using Bicubic Interpolation.<\/capt>\n<p>An image upscaled using bicubic interpolation certainly looks less pixelated than the first, and I&rsquo;d wager most folks find it more aesthetically appealing, but it\u2019s blurry, and no one will mistake it for a high resolution image.<\/p>\n<p><strong>Super Resolution is a Machine Learning technique for reconstructing a higher resolution image from a lower one.<\/strong> You can think of the process as painting new pixels into the image, achieving a higher fidelity than is possible with an algorithm like bicubic interpolation.<\/p>\n<div style=\"text-align: center;\">\n<a target=\"_blank\" href=\".\/images\/dog-300-gan.png\" style=\"border-bottom: none\"><img src=\".\/images\/dog-300-gan.png\" alt=\"The cute dog, upscaled to 300px using a GAN\" width=\"300\"><\/a>\n<\/div>\n<capt>Image upscaled to 300px using a GAN.<\/capt>\n<p>There\u2019s <a href=\"https:\/\/arxiv.org\/abs\/1902.06068\">many different approaches<\/a> you can use to implement super resolution and some great blog posts describing the underlying theory are available <a href=\"https:\/\/towardsdatascience.com\/an-evolution-in-single-image-super-resolution-using-deep-learning-66f0adfb2d6b\">here<\/a> and <a href=\"https:\/\/medium.com\/beyondminds\/an-introduction-to-super-resolution-using-deep-learning-f60aff9a499d\">here<\/a>.<\/p>\n<iframe width=\"560\" height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/LhF_56SxrGk?start=15\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe>\n<p>Super Resolution <a href=\"https:\/\/letsenhance.io\/\">is usually implemented on the backend using Python<\/a>. There&rsquo;s good arguments for building it this way. Running it on the backend gives you access to beefy hardware, and that hardware lets you use the latest, most accurate models. If getting the highest resolution images is important, backend deployments are a good choice. Additionally, many use cases are &ldquo;scale once, display often&rdquo; - if it takes a little longer to upscale an image, well, no big deal.<\/p>\n<p>On the other hand, there&rsquo;s drawbacks to implementing this on the backend. One is immediate feedback - you need to upload the image to a server, process it, and send it back down, which can take a while, depending on your user&rsquo;s connection and the size of your model. This can be non-trivial, especially because so many cutting edge implementations are at the bleeding edge, with unstable dependencies and changing requirements. And, if your deployment needs GPUs, that could end being an expensive proposition, and hard to scale.<\/p>\n<p>It turns out that we can run this in the browser, and doing so has some clear benefits.<\/p>\n<p>First, that issue of deployment? Totally gone. Running a neural network in Javascript means nothing to install, no GPUs to provision - Tensorflow.js takes care of all of that. The models run directly in your users&rsquo; browsers.<\/p>\n<p>Second, your users will see more immediate feedback. Particularly for connections that might have slower bandwidth - like a phone - performing inference directly on the device can cut out a costly round trip step.<\/p>\n<p>Third, and in my mind, the most compelling argument - you can serve smaller images. Sometimes, <em>much<\/em> smaller images.<\/p>\n<p>For instance, those images above? The 300px is 724kb. The 150px version? It&rsquo;s <em>9kb<\/em>.<\/p>\n<p>That&rsquo;s an image that&rsquo;s <em>6 percent<\/em> of the original file size. That is a massive reduction!<\/p>\n<p>There are, of course, some clear drawbacks to running in the browser. The biggest is that you&rsquo;re constrained by the hardware of your users. And this manifests in two ways. One is, if you want to deploy the latest and greatest models, you may be out of luck. Particularly if they are GPU hungry, they just might not be capable of running in the browser. In recent years, hardware manufacturers including Apple and Google <a href=\"https:\/\/heartbeat.fritz.ai\/hardware-acceleration-for-machine-learning-on-apple-and-android-f3e6ca85bda6\">have invested huge sums of money in improving the performance<\/a> of their on-device chips, with a particular focus on improving the ability to run neural networks on devices. The good news is that, year after year, the performance of this technology will get better; the bad news is that, for users on older devices, the disparity between performance will become that much more significant. If you want a consistent experience across platforms, server-side solutions may be a better bet.<\/p>\n<p>Ultimately, though the precise tradeoffs will depend on the use case, Javascript is absolutely a worthy contender for considering an application of this technology. Let&rsquo;s see how we can evaluate what&rsquo;s out there, and see what would work for our purposes.<\/p>\n<h2 id=\"hearing-it-through-the-grapevine\">Hearing it through the grapevine<\/h2>\n<p>If you come from the world of Javascript, a question on your mind is - how do you even hear about this research in the first place?<\/p>\n<p>Most cutting edge Machine Learning research is posted to <a href=\"https:\/\/ARXIV.org\">arxiv.org<\/a>, where it is freely searchable and downloadable in PDF form. This is academic research, papers that can tend to be theory and math heavy and hard to penetrate. This can scare off a lot of people - it certainly scared me off at first.<\/p>\n<p>I don\u2019t want to minimize the importance of fully understanding the research - deeply understanding theory often can lead to novel insights and development that is relevant to your field - but you don\u2019t necessarily <em>need<\/em> a deep understanding of the technology to use it. Particularly if you\u2019re focused on inference, like we are in this case, you can rely on others to evaluate the research, as well as implement the training code, and in some cases, provide the trained models.<\/p>\n<p>There&rsquo;s a website that does just this called <a href=\"https:\/\/paperswithcode.com\/\">Papers With Code<\/a>:<\/p>\n<p><img src=\".\/images\/paperswithcode.png\" alt=\"Papers with Code\"><\/p>\n<p>Research is categorized by subject area, and ranked by its performance against recognizable metrics. <a href=\"https:\/\/paperswithcode.com\/task\/image-super-resolution\">There&rsquo;s even a specific category dedicated to this domain<\/a>.<\/p>\n<p><img src=\".\/images\/top-ranked-methods.png\" alt=\"Top Ranked Methods for Super Resolution as of August 2020\" title=\"Top Ranked Methods for Super Resolution as of August 2020\"><\/p>\n<p>You can see the performance of each implementation against a standard dataset, and see how they&rsquo;re measured on different metrics. PSNR and SSIM are two common ways of measuring performance for Super Resolution tasks; <a href=\"https:\/\/en.wikipedia.org\/wiki\/Peak_signal-to-noise_ratio\">PSNR<\/a> can measure noise, and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Structural_similarity\">SSIM<\/a> measures the similarity between two images.<\/p>\n<p><img src=\".\/images\/psnr-ssim.jpeg\" alt=\"A visual representation of differing scores for PSNR &amp; SSIM\">\n<capt><a href=\"https:\/\/medium.com\/@datamonsters\/a-quick-overview-of-methods-to-measure-the-similarity-between-images-f907166694ee\" target=\"_blank\" rel=\"noopener noreferrer\">From &ldquo;A Quick Overview of Methods to Measure the Similarity Between Images&rdquo;<\/a><\/capt><\/p>\n<p>Metrics can be a bit tricky. You can see in the above image that identical PSNR scores can have radically different SSIM scores, with correlatingly different visual performance.<\/p>\n<p>Both PSNR and SSIM are measurements of how different an image is from one another, but neither is a replacement for human evaluation. As humans, we perceive images differently than a computer does. A set of pixels that are, say, saturated differently while also being sharper, may lead to a lower metric score, but a more aesthetically pleasing score from a human.<\/p>\n<blockquote>\n<p>SR algorithms are typically evaluated by several widely used distortion measures, e.g., PSNR and SSIM. However, these metrics fundamentally disagree with the subjective evaluation of human observers. Non-reference measures are used for perceptual quality evaluation, including Ma\u2019s score and NIQE, both of which are used to calculate the perceptual index in the PIRM-SR Challenge. In a recent study, Blau et al. find that the distortion and perceptual quality are at odds with each other. \u2014 <a href=\"https:\/\/arxiv.org\/pdf\/1809.00219v2.pdf\">Wang et al.<\/a><\/p>\n<\/blockquote>\n<p>In addition to the subjectivity involved in judging a model&rsquo;s accuracy, there&rsquo;s other reasons that accuracy is not the paramount concern for us. Remember, our final goal is a model that runs in Javascript. It&rsquo;s also important to consider:<\/p>\n<ul>\n<li><strong>A good paper<\/strong>. We want an architecture that is sound. We\u2019ll probably need to develop some familiarity with the underlying theory, so it\u2019s important that a paper be clear and digestible, as well as rigorous; how often a paper\u2019s been cited can also be a good indicator of its overall quality.<\/li>\n<li><strong>Good performance<\/strong>. Speed is as important as accuracy. A model that takes a minute to run is not going to be suitable for the browser.<\/li>\n<li><strong>Saveable, and convertable<\/strong>. The implementation\u2019s models must be compatible with Javascript. We&rsquo;ll touch on the specifics shortly, but the big one is to insist on a Tensorflow implementation, as Tensorflow.js is the main way to do Machine Learning in the browser, so Pytorch implementations are out.<\/li>\n<\/ul>\n<p>I ended up choosing <a href=\"https:\/\/paperswithcode.com\/paper\/esrgan-enhanced-super-resolution-generative\">ESRGAN<\/a>.<\/p>\n<p><a href=\"https:\/\/paperswithcode.com\/sota\/image-super-resolution-on-set5-4x-upscaling\">I started by looking at papers ranked by their score<\/a>. A few implementations that scored well either had zero linked code implementations, or the code implementations were exclusively in Pytorch. (Not all code implementations will be shown on paperswithcode.com, so it&rsquo;s a good idea to do some Googling of your own.)<\/p>\n<p>ESRGAN ranked highly on the metrics, and boasted more than a few implementations in Tensorflow. The paper itself was reasonably clear and approachable. ESRGAN is based on a prior architecture, <a href=\"https:\/\/arxiv.org\/pdf\/1609.04802.pdf\">SRGAN<\/a>, which itself is a robust architecture, but ESRGAN makes a number of improvements, including an improved building block for the generator, an improved discriminator for predicting how realistic an image appears, and a more effective perceptual loss.<\/p>\n<p>Of the implementations I could find, I felt three satisfied my criteria, seeming of decent code quality and with good documentation.<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/idealo\/image-super-resolution\">idealo\/image-super-resolution<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/krasserm\/super-resolution\">krasserm\/super-resolution<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/peteryuX\/esrgan-tf2\">peteryuX\/esrgan-tf2<\/a><\/li>\n<\/ul>\n<p>It\u2019s difficult to ascertain whether an implementation will be a good fit without downloading the code and running it. If you\u2019re used to installing an <code>npm<\/code> library and jumping right in, prepare yourself: working with Machine Learning code can often be an exercise in frustration. Working through dependency challenges, environmental issues, and memory bottlenecks can turn an evaluation into a multi-day affair.<\/p>\n<p>For that reason, repos that include <code>Dockerfile<\/code>s or Google Colab links are generally a very good sign. It\u2019s also a good sign when the author includes pretrained weights, along with documentation on how those weights were trained. If you\u2019re able to skip right to inference, it helps assess a model much faster; likewise, information on how those weights were trained gives you the ability to test your own implementation, which gives you a solid benchmark. Neglecting to include these in a repo is not a dealbreaker, but it will make your life harder.<\/p>\n<p>I usually make a point of setting up my <em>own<\/em> Dockerfile, regardless of whether the author provides one, because as I explore a repo I&rsquo;ll install my own dependencies and write exploratory code that I&rsquo;ll want to be able to run in a reproducible way. In almost every case where I play with some machine learning code only to leave it alone for a few weeks, upon return I\u2019m greeted with some esoteric error caused by some package being out of date or upgraded along the way. Pin your versions, and get a reproducible environment from the start!<\/p>\n<p>I ultimately settled on the implementation by <a href=\"https:\/\/github.com\/idealo\/image-super-resolution\">idealo<\/a>. The code is easy to read, pretrained models are provided, and the author <a href=\"https:\/\/medium.com\/idealo-tech-blog\/a-deep-learning-based-magnifying-glass-dae1f565c359\">provides a wonderful write up of their journey exploring the space<\/a>. However, the real clincher was that I could convert the RDN models to Javascript with only a few modifications. Converting the RRDN models was a bit trickier - more on that in a bit.<\/p>\n<h2 id=\"conversion-to-javascript\">Conversion to Javascript<\/h2>\n<p>Tensorflow.js provides a handy command line tool for converting models to Javascript called TFJS converter. You can convert a model with something like:<\/p>\n<pre><code>tensorflowjs_converter --input_format=keras --output_format=tfjs_layers_model .\/rdn-model.h5 rdn-tfjs\n<\/code><\/pre>\n<p><a href=\"https:\/\/colab.research.google.com\/drive\/1WmTHfcNiEWVta5B5AJ5V0dnrQg-JXH06#scrollTo=oMMODAFu05Rc\">I&rsquo;ve put together a Google colab that demonstrates this.<\/a><\/p>\n<p>To convert a model cleanly to Javascript, there&rsquo;s some things to be aware of:<\/p>\n<ol>\n<li>The model must be saved in Keras, or another format compatible with the Tensorflow converter. Also, ensure that you are converting the <em>model<\/em> into Javascript, and not just the weights. If the latter, you&rsquo;ll probably receive a cryptic error with little guidance as to what&rsquo;s going on.<\/li>\n<li>Variables must not refer to <code>self<\/code> - this tripped me up with idealo&rsquo;s implementation. (<a href=\"https:\/\/github.com\/idealo\/image-super-resolution\/issues\/114#issuecomment-605067405\">Refer to this Github issue<\/a>, <a href=\"https:\/\/github.com\/idealo\/image-super-resolution\/pull\/137\">or my PR<\/a>, for a solution)<\/li>\n<li>All of the tensor ops must be implemented in Javascript. I don\u2019t know of a good way of inspecting this other than trial and error (aka, converting the model and seeing if it runs).<\/li>\n<li>If custom layers are implemented, those will have to be re-implemented in Javascript. For instance, <a href=\"https:\/\/github.com\/idealo\/image-super-resolution\/pull\/137\/files#diff-33e903d44b1c48dec8eabcca53955976R191\">the RRDN model&rsquo;s custom layers had to be re-implemented in order to save cleanly<\/a>. Later in this article I&rsquo;ll discuss how to deal with custom layers.<\/li>\n<li>In the output of the Tensorflow.js Converter, I had to manually change the model&rsquo;s <code>class_name<\/code> from <code>Functional<\/code> to <code>Model<\/code>. (<a href=\"https:\/\/colab.research.google.com\/drive\/1WmTHfcNiEWVta5B5AJ5V0dnrQg-JXH06#scrollTo=oMMODAFu05Rc&amp;line=4&amp;uniqifier=1\">In the Google Colab, this is the cell that implements this<\/a>.) I don&rsquo;t know why this is the case, or whether it&rsquo;s a bug or not - comments are welcome!<\/li>\n<li>Any pre- and post- processing of images needs to be reproduced in Javascript.<\/li>\n<\/ol>\n<h2 id=\"pull-your-weight-fine-neurons\">Pull your weight, fine neurons<\/h2>\n<p>Performance is key for browser-based applications. Skinnier models perform better.<\/p>\n<p>There&rsquo;s two ways we can improve performance in Javascript.<\/p>\n<p>First, we can <strong>quantize<\/strong> our model. <a href=\"https:\/\/github.com\/tensorflow\/tfjs-examples\/tree\/master\/quantization\">Quantizing a model means reducing the precision of the model&rsquo;s weights<\/a>. This can potentially lead to lower accuracy, but can reduce model size significantly (and has the side benefit of making the model more compressible over gzip).<\/p>\n<p>We can quantize directly in the tensorflow js converter:<\/p>\n<pre><code>tensorflowjs_converter \\\n--input_format tfjs_layers_model \\\n--output_format tfjs_layers_model \\\n--quantize_uint8 \\\noriginal_model\/model.json\nquantized_model\/\n<\/code><\/pre>\n<p>In this specific case, the maximum amount of quantization - <code>uint8<\/code> - had no significant effects on the final model&rsquo;s performance.<\/p>\n<p>Second, we can <strong>prune<\/strong> our model. Pruning is a process whereby we cull under-performing weights during training. I haven&rsquo;t personally explored this avenue, but if you&rsquo;re interested <a href=\"https:\/\/www.tensorflow.org\/model_optimization\/guide\/pruning\">you can read more about it here<\/a>. It certainly seems like a promising strategy for squeezing out additional performance on the front end.<\/p>\n<h2 id=\"inference-in-the-browser---show-me-the-code\">Inference in the Browser - show me the code!<\/h2>\n<p>We\u2019ve got our RDN model converted and quantized. Success! Now, how do we make it run in the browser? We can load our model with the following:<\/p>\n<pre><code class=\"language-javascript\">import * as tf from &quot;@tensorflow\/tfjs&quot;;\ntf.loadLayersModel(&quot;.\/rdn-tfjs\/model.json&quot;);\n<\/code><\/pre>\n<p>Ensure you load the <code>model.json<\/code>, <em>not<\/em> the <code>bin<\/code> files.<\/p>\n<p>Then, we can get our image as a <em>tensor<\/em>:<\/p>\n<pre><code class=\"language-javascript\">const img = new Image();\nimg.crossOrigin = &quot;anonymous&quot;;\nimg.src = &quot;your-image&quot;;\nimg.onload = () =&gt; {\nconst tensor = tf.browser.fromPixels(img).expandDims(0);\n};\n<\/code><\/pre>\n<p>Tensors are a numeric data structure used in almost all neural nets, and <a href=\"https:\/\/thekevinscott.com\/tensors-in-javascript\/\">you can read more about them here<\/a>.<\/p>\n<p>Two things to note in the code above:<\/p>\n<ol>\n<li>If you&rsquo;re dealing with images from other domains you may run into CORS issues. Setting <code>crossOrigin<\/code> to <code>anonymous<\/code> can help.<\/li>\n<li>You&rsquo;ll need to call <code>expandDims<\/code> on the tensor you grab from your image. You&rsquo;ll need to pass a four-dimensional tensor to your model; to learn more on why, <a href=\"https:\/\/thekevinscott.com\/image-classification-with-javascript\/#4-translate-the-tensor\">you can check out my article on image classification here<\/a>.<\/li>\n<\/ol>\n<p>Now that we have a tensor, we can run it through our model:<\/p>\n<pre><code class=\"language-javascript\">import tensorAsBase64 from 'tensor-as-base64';\nconst prediction = model.predict(tensor).squeeze();\n<\/code><\/pre>\n<p>And viola! You have an upscaled tensor, ready to display in your browser!<\/p>\n<p><a href=\"https:\/\/codesandbox.io\/s\/upscaling-1-0xmbx?file=\/src\/index.js\">Here&rsquo;s an implementation of all that code on CodeSandbox<\/a>:<\/p>\n<iframe src=\"https:\/\/codesandbox.io\/embed\/upscaling-1-0xmbx?fontsize=14&hidenavigation=1&theme=dark&view=preview\"\nstyle=\"width:100%; height:320px; border:0; border-radius: 4px; overflow:hidden;\"\ntitle=\"Upscaling-1\"\nallow=\"accelerometer; ambient-light-sensor; camera; encrypted-media; geolocation; gyroscope; hid; microphone; midi; payment; usb; vr; xr-spatial-tracking\"\nsandbox=\"allow-forms allow-modals allow-popups allow-presentation allow-same-origin allow-scripts\"\n><\/iframe>\n<p>It still takes a while - in my case, around 2.5 seconds - which is unacceptable for production. Also, it has the nasty side effect of freezing the UI while it works. Let\u2019s look at a number of different strategies for improving performance.<\/p>\n<h3 id=\"warm-ups\">Warm ups<\/h3>\n<p>The initial invocation of a neural network in Tensorflow.js will take a long time, but <em>subsequent<\/em> invocations will be much faster.<\/p>\n<blockquote>\n<p>TensorFlow.js executes operations on the GPU by running WebGL shader programs. These shaders are assembled and compiled lazily when the user asks to execute an operation. The compilation of a shader happens on the CPU on the main thread and can be slow. TensorFlow.js will cache the compiled shaders automatically, making the second call to the same operation with input and output tensors of the same shape much faster.\n\u2014 <a href=\"https:\/\/www.tensorflow.org\/js\/guide\/platform_environment#shader_compilation_texture_uploads\">Tensorflow.js Documentation<\/a><\/p>\n<\/blockquote>\n<p>We can make use of this and &ldquo;warm up&rdquo; our model by passing a dummy tensor through. Here\u2019s some code you can use for that (<a href=\"https:\/\/codesandbox.io\/s\/upscaling-2-warm-up-vjllx?file=\/src\/index.js\">view the code on CodeSandbox<\/a>):<\/p>\n<pre><code class=\"language-javascript\">const dummyTensor = tf.zeros([1, img.height, img.width, 3]);\nmodel.predict(dummyTensor);\n<\/code><\/pre>\n<p>In this case, inference time drops to 150ms for me. Much better! However, this only works if the tensor sizes match exactly. We obviously can&rsquo;t depend on that - users could upload photos of any size and scale. Also, there&rsquo;s still a noticeable lag on the UI while the model runs its predictions.<\/p>\n<p>Let&rsquo;s try and tackle the second issue first. What if we move the calculation off the main thread - to a web worker?<\/p>\n<h3 id=\"web-workers\">Web Workers<\/h3>\n<p><a href=\"https:\/\/codesandbox.io\/s\/github\/thekevinscott\/upscalerjs\/tree\/master\/examples\/webworker\">Here&rsquo;s a CodeSandbox link that demonstrates the use of Web Workers<\/a>. (This example uses UpscalerJS instead of writing the TFJS code out by hand, but the concept is the same.)<\/p>\n<iframe src=\"https:\/\/codesandbox.io\/embed\/github\/thekevinscott\/upscalerjs\/tree\/master\/examples\/webworker?fontsize=14&hidenavigation=1&theme=dark&view=preview\"\nstyle=\"width:100%; height:500px; border:0; border-radius: 4px; overflow:hidden;\"\ntitle=\"thekevinscott\/upscalerjs: webworker\"\nallow=\"accelerometer; ambient-light-sensor; camera; encrypted-media; geolocation; gyroscope; hid; microphone; midi; payment; usb; vr; xr-spatial-tracking\"\nsandbox=\"allow-forms allow-modals allow-popups allow-presentation allow-same-origin allow-scripts\"\n><\/iframe>\n<p>Moving the code to a web worker lets us move the processing off the main thread, which lets us run animations at a much smoother rate. However, it&rsquo;s not a panacea; there&rsquo;s still some choppiness in the animation. I <em>believe<\/em> this choppiness is coming from the GPU itself locking the thread, which manifests worse on older devices than newer one. Web workers absolutely help, but they don&rsquo;t solve the problem entirely.<\/p>\n<h3 id=\"splitting-the-image-into-chunks\">Splitting the image into chunks<\/h3>\n<p>What if, instead of processing the full image in one go, we subdivided the image into pieces for processing individually?<\/p>\n<p><img src=\".\/images\/splitting-image.gif\" alt=\"Gif demonstrating splitting an image into chunks\"><\/p>\n<p>If we subdivide our image into sections, we can take one long task and break it into 4 tasks, and after each one we can release the UI thread:<\/p>\n<pre><code class=\"language-javascript\">const tensor = tf.browser.fromPixels(img);\nconst [height, width] = tensor.shape;\nfor (let i = 0; i &lt; 2; i++) {\nfor (let j = 0; j &lt; 2; j++) {\nconst slicedTensor = tensor.slice(\n[(i * height) \/ 2, (j * width) \/ 2],\n[height \/ 2, width \/ 2]\n);\nconst prediction = model.predict(slicedTensor.expandDims(0)).squeeze();\n}\n}\n<\/code><\/pre>\n<p><a href=\"https:\/\/codesandbox.io\/s\/upscaling-patch-size-demonstration-pdki6?file=\/src\/index.js\">Here&rsquo;s a CodeSandbox link demonstrating this<\/a>.<\/p>\n<p>This significantly improves the responsiveness of our code, but now there&rsquo;s a new problem:<\/p>\n<p><img src=\".\/images\/artifacting.gif\" alt=\"Demonstrating the artifacting effect of an upscaled image\"><\/p>\n<p>These upscaled images tend to have artifacting around the edges. This is a fairly common issue inherent in a lot of upscaling algorithms, but it&rsquo;s generally not a problem unless you&rsquo;re closely staring at the edges of your upscaled image. However, in this case - since we&rsquo;re stitching a number of images together into one - the problem is much more apparent.<\/p>\n<p>The fix is to add <em>padding<\/em> to each of our image slices - something like this:<\/p>\n<p><img src=\".\/images\/padding.gif\" alt=\"Demonstration of an image with padding\"><\/p>\n<p>We can then slice off the extra pixels, and put together an image without any artifacts. <a href=\"https:\/\/codesandbox.io\/s\/upscaling-3-patch-sizes-8h6pt?file=\/src\/utils.ts\">Here&rsquo;s a CodeSandbox that demonstrates that end-to-end<\/a>.<\/p>\n<p>The best part is that, so long as you set the patch size small enough - smaller than the smallest image you expect to receive - you&rsquo;ll wind up with consistently sized images. And, remember how in the warm ups section, we mentioned requiring consistently sized images in order to get the benefit of the speed up? This solution does both!<\/p>\n<h2 id=\"rrdn-and-the-hunt-for-custom-layers\">RRDN and the hunt for Custom Layers<\/h2>\n<p>So far, we\u2019ve been dealing with the RDN model. The RRDN model is the more powerful version, and it relies on custom layers, which will need to be reimplemented in Javascript.<\/p>\n<p>I didn&rsquo;t find a ton of documentation out there on custom layers in Tensorflow.js. <a href=\"https:\/\/www.tensorflow.org\/js\/guide\/models_and_layers#custom_layers\">There&rsquo;s the official documentation<\/a>, as well as <a href=\"https:\/\/gist.github.com\/caisq\/33ed021e0c7b9d0e728cb1dce399527d\">this gist by Shanqing Cai<\/a>, and that&rsquo;s most of what I could find.<\/p>\n<p><a href=\"https:\/\/github.com\/idealo\/image-super-resolution\/blob\/master\/ISR\/models\/rrdn.py#L191\">The two custom layers in Python are defined as<\/a>:<\/p>\n<pre><code class=\"language-python\">class PixelShuffle(tf.keras.layers.Layer):\ndef __init__(self, scale, *args, **kwargs):\nsuper(PixelShuffle, self).__init__(*args, **kwargs)\nself.scale = scale\ndef call(self, x):\nreturn tf.nn.depth_to_space(x, block_size=self.scale, data_format='NHWC')\ndef get_config(self):\nconfig = super().get_config().copy()\nconfig.update({\n'scale': self.scale,\n})\nreturn config\nclass MultiplyBeta(tf.keras.layers.Layer):\ndef __init__(self, beta, *args, **kwargs):\nsuper(MultiplyBeta, self).__init__(*args, **kwargs)\nself.beta = beta\ndef call(self, x, **kwargs):\nreturn x * self.beta\ndef get_config(self):\nconfig = super().get_config().copy()\nconfig.update({\n'beta': self.beta,\n})\nreturn config\n<\/code><\/pre>\n<p>In Javascript, those look like:<\/p>\n<pre><code class=\"language-javascript\">class MultiplyBeta extends tf.layers.Layer {\nbeta: number;\nconstructor() {\nsuper({});\nthis.beta = BETA;\n}\ncall(inputs: Inputs) {\nreturn tf.mul(getInput(inputs), this.beta);\n}\nstatic className = 'MultiplyBeta';\n}\nclass PixelShuffle extends tf.layers.Layer {\nscale: number;\nconstructor() {\nsuper({});\nthis.scale = SCALE;\n}\ncomputeOutputShape(inputShape: number[]) {\nreturn [inputShape[0], inputShape[1], inputShape[2], 3];\n}\ncall(inputs: Inputs) {\nreturn tf.depthToSpace(getInput(inputs), this.scale, 'NHWC');\n}\nstatic className = 'PixelShuffle';\n}\n<\/code><\/pre>\n<p>You also need to explicitly register each custom layer:<\/p>\n<pre><code class=\"language-javascript\">tf.serialization.registerClass(MultiplyBeta);\ntf.serialization.registerClass(PixelShuffle);\n<\/code><\/pre>\n<p>Some things to call out here:<\/p>\n<ul>\n<li>Make sure you define a static <code>className<\/code> on the layer, that matches the name of the layer exactly<\/li>\n<li><code>call<\/code> is where you do the bulk of your computation.<\/li>\n<li><code>computeOutputShape<\/code> I <em>believe<\/em> you only need to define if it&rsquo;s different - this function is called to tell TFJS the shape of your output tensor<\/li>\n<li>You may need to translate function calls from Python to Javascript; for instance <code>tf.nn.depth_to_space<\/code> in Python becomes <code>tf.depthToSpace<\/code> in Javascript<\/li>\n<\/ul>\n<h2 id=\"training-your-model\">Training Your Model<\/h2>\n<p>One challenge with super resolution techniques is that their scale is fixed.<\/p>\n<p>What that means is that a model trained to scale an image up to 2x, will be unable to go to 3x, or 4x. It will only ever be able to upscale an image to 2x.<\/p>\n<p>To change the scale, you need to train a model from scratch. As you can imagine, to support different scales can drastically increase the amount of training you have to do.<\/p>\n<p>In addition, there\u2019s some sign that further training on specific datasets yields specific benefits related to your domain.<\/p>\n<blockquote>\n<p>First we show that larger datasets lead to better performance for PSNR-oriented methods. We use a large model, where 23 Residual-in-Residual Blocks (RRDB) are placed before the upsampling layer followed by two convolution layers for reconstruction &hellip; A widely used training dataset is DIV2K that contains 800 images. We also explore other datasets with more diverse scenes \u2013 Flickr2K dataset consisting of 2650 2K high-resolution images collected on the Flickr website. It is observed that the merged dataset with DIV2K and Flickr2K, namely DF2K dataset, increases the PSNR performance.\n<img src=\".\/images\/influence-of-different-datasets.png\" alt=\"Influence of Different Datasets\"><\/p>\n<\/blockquote>\n<p>It&rsquo;s likely that training on a dataset specific to your domain will yield increased accuracy.<\/p>\n<p>Last year I spent some time working with <a href=\"https:\/\/ai.googleblog.com\/2016\/11\/enhance-raisr-sharp-images-with-machine.html\">RAISR<\/a>. One of the key insights in the paper was that compressing the low resolution images led to a more resilient model better able to handle artifacting, while sharpening the high resolution images led to more aesthetically pleasing upscaled images (at the expense of worse performance against the metrics). I suspect - though I don&rsquo;t know for sure - that similar techniques might yield similar benefits in training here, and I&rsquo;m currently experimenting to find out.<\/p>\n<h2 id=\"upscalerjs\">Upscaler.JS<\/h2>\n<p>I&rsquo;ve packaged this all up into an npm model called <a href=\"https:\/\/github.com\/thekevinscott\/UpscalerJS\">Upscaler.js<\/a>.<\/p>\n<p>It&rsquo;s agnostic to the upscaling model being used, which means that in the future, I\u2019ll be able to improve models, and potentially introduce models tuned to various use cases (faces, illustrations). I&rsquo;m currently serving models via JS CDNs and look forward to adding additional models in the future.<\/p>\n<p><img src=\".\/images\/demo.gif\" alt=\"UpscalerJS\"><\/p>\n<p>I think there&rsquo;s lots of opportunities for improvements, particularly around performance, but I&rsquo;m frankly thrilled that this is possible.<\/p>\n<p>Imagine being able to apply this to a video stream. Imagine if you could serve a video at <em>6%<\/em> of the file size as a normal video stream? We&rsquo;re not there yet - we&rsquo;d have to get upscaling working 10x faster to handle realtime video - but it&rsquo;s <em>not far off<\/em>. And that&rsquo;s really exciting to think about!<\/p>"},{"title":"What is Deep Learning","link":"https:\/\/thekevinscott.com\/what-is-deep-learning\/","pubDate":"Sun, 28 Jun 2020 14:00:00 +0000","guid":"https:\/\/thekevinscott.com\/what-is-deep-learning\/","description":"<p>When I was growing up, I read a story about a man from Serbia named <a href=\"http:\/\/elevate.airserbia.com\/Elevate_Specijal_jul2017\/index.html#28\/z\">Kalfa Manojlo<\/a>. He was a blacksmith&rsquo;s apprentice who dreamt of becoming the first man in human history to fly.<\/p>\n<p>He wasn&rsquo;t the first to try. People had been flapping their arms like birds for centuries with no apparent success, but Manojlo, with the confidence of somebody too young to know what they don&rsquo;t know, believed he could build a pair of wings better than any that had come before. On a chilly November day in 1841 Manojlo took his winged contraption, scaled the town Customs Office, and launched himself into the air above a crowd of bemused onlookers.<\/p>\n<p>You&rsquo;ve probably never heard of this guy unless you&rsquo;re Serbian, so you can probably guess what happened: Manojlo landed head first in a nearby snowbank to much amusement from the gathered crowd. (Pretty good entertainment in the 1840s.)<\/p>\n<p>Many early attempts at flight were like this. People thought that to fly like a bird, you had to imitate a bird. It&rsquo;s not an unreasonable assumption; birds have been flying pretty darn well for a pretty long time. But to conquer flight on a human scale requires a fundamentally different approach.<\/p>\n<p>Neural Networks, the engines that power Deep Learning, are inspired by the human brain, in particular its capacity to learn over time. Similar to biological brains, they are composed of cells connected together that change in response to exposure from stimulus. However, Neural Networks are really closer to statistics on steroids than they are to a human brain, and the strategies we&rsquo;ll use to build and train them are diverge pretty quickly from anything related to the animal kingdom.<\/p>\n<p>Neural Networks have been around since at least the &rsquo;50s, and from the beginning people have asked when we might expect machines to achieve consciousness. Depending on the speaker, the term Artificial Intelligence can mean anything from Logistic Regression to Skynet taking over the world. For this reason, we&rsquo;ll instead refer to the technology we&rsquo;re interested in as <strong>Machine Learning<\/strong> and <strong>Deep Learning<\/strong>.<\/p>\n<p>Machine Learning is the act of making predictions. That&rsquo;s it. You put data in, you get data out. This includes a large range of methods not under the Deep Learning umbrella, including many traditional statistical methods. And Deep Learning covers the specific technology we&rsquo;ll study in this book, Neural Networks.<\/p>\n<h2 id=\"the-proliferation-of-deep-learning\">The Proliferation of Deep Learning<\/h2>\n<p>Though the concepts behind Neural Networks have been around since at least the &rsquo;50s, it&rsquo;s only within the past decade that it&rsquo;s exploded in industry, largely because of advances in hardware, in the volume of data, and in cutting-edge research advancements.<\/p>\n<p>It turns out that asking a machine to make predictions, and giving it the tools to improve those predictions, is applicable to a startlingly diverse set of applications.<\/p>\n<p>You&rsquo;ve probably encountered Neural Networks in use through one of the popular virtual assistants, like Siri, Alexa, or Google Home, on the market today. When you use your face or fingerprint to unlock your phone, that&rsquo;s a Neural Network. There&rsquo;s a Neural Network running on your phone&rsquo;s keyboard, predicting which words you&rsquo;re likely to type next, or autosuggesting likely phrases in your email application. Perhaps you&rsquo;ve been prompted to tag your friends in uploaded photos, with your friends&rsquo; faces highlighted and their names autosuggested.<\/p>\n<p>Deep Learning can recognize and classify images to a <a href=\"https:\/\/arxiv.org\/pdf\/1706.06969.pdf\">degree of accuracy that exceeds humans<\/a>. Deep Learning monitors your inbox for spam and monitors your purchases for fraud. An autonomous agent uses Deep Learning to decide which move to make in a game of Go. An autonomous car uses Deep Learning to decide whether to speed up, slow down, and turn right or left. Hedge funds use Deep Learning to predict what stocks to buy, and stores use it to forecast demand. Magazines and newspapers use Deep Learning to automatically generate summaries of sports, and doctors are using Deep Learning to identify cancerous cells, perform surgery, and sequence genomes. Researchers recently trained a Neural Network to <a href=\"https:\/\/www.ucsf.edu\/news\/2018\/12\/412946\/artificial-intelligence-can-detect-alzheimers-disease-brain-scans-six-years\">diagnose early-stage Alzheimer&rsquo;s disease long before doctors could<\/a>.<\/p>\n<p>There&rsquo;s almost no industry that won&rsquo;t realize huge changes from Deep Learning, and these changes are coming not in decades, but <em>today<\/em> and over the next few years.<\/p>\n<p>Andrew Ng, co-founder of Google Brain, often refers to this technology as <a href=\"https:\/\/www.theguardian.com\/future-focused-it\/2018\/nov\/12\/is-ai-the-new-electricity\">&ldquo;the new electricity&rdquo;<\/a>: a technology that will become so ubiquitous as to be embedded in every device, everywhere around us. This oncoming sea change has huge implications for how we build applications and craft software. Andrej Karpathy, Director of AI at Tesla, calls it &ldquo;Software 2.0&rdquo;:<\/p>\n<blockquote>\n<p>It turns out that a large portion of real-world problems have the property that it is significantly easier to collect the data (or more generally, identify a desirable behavior) than to explicitly write the program. In these cases, the programmers will split into two teams. The 2.0 programmers manually curate, maintain, massage, clean and label datasets; each labeled example literally programs the final system because the dataset gets compiled into Software 2.0 code via the optimization. Meanwhile, the 1.0 programmers maintain the surrounding tools, analytics, visualizations, labeling interfaces, infrastructure, and the training code. \u2014 <a href=\"https:\/\/medium.com\/@karpathy\/software-2-0-a64152b37c35\">Andrej Karpathy<\/a><\/p>\n<\/blockquote>\n<h2 id=\"intelligent-devices\">Intelligent Devices<\/h2>\n<p>Traditionally, Neural Networks have been run exclusively on servers, massive computers with the computing horsepower to support them. That&rsquo;s beginning to change.<\/p>\n<p>Companies are investing huge sums of money in improving consumer-grade hardware to run Neural Networks. Apple&rsquo;s recently released NPU chip - a dedicated neural processing unit embedded in every new iPhone - saw an astounding <em>9x<\/em> increase over the <a href=\"https:\/\/www.apple.com\/in\/iphone-xs\/a12-bionic\/\">previous year&rsquo;s model<\/a>. As Moore&rsquo;s Law slows down for traditional CPUs, manufacturers are finding that they can eke out big gains by focusing on more specialized domains, like Neural Networks. This means that, increasingly, consumer-level hardware will feature powerful specialized hardware tuned to efficiently run Neural Networks.<\/p>\n<p>And just in time, because there&rsquo;s compelling reasons to run Neural Networks directly on the device.<\/p>\n<p>A big one is privacy. Consumers and governments are increasingly asking question about how data is being collected, used, and stored. If you run your Neural Network on-device, the data never needs to leave the device, and all processing can happen locally. While this is a compelling argument to users tired of hearing about yet another breach of their data in the news, it also opens up new use cases that demand increased respect for privacy, like smart devices expanding into the more intimate spaces of our homes.<\/p>\n<p>Another reason is latency. If you&rsquo;re in a self-driving car, you can&rsquo;t rely on a cloud connection to detect whether pedestrians are in front of you. Even with a good connection, it&rsquo;s hard to do real time analysis on a 60 FPS video or audio stream if you&rsquo;re processing it on the server; that goes double for cutting-edge AR and VR applications. Processing directly on the device avoids the round trip.<\/p>\n<p>Consumer devices have direct access to a wide array of sensors - proximity sensors, motion sensors, ambient light sensors, moisture sensors - that Neural Networks can hook into, and the devices in turn provide a rich surface for enabling direct interactivity with Neural Networks in ways that servers can&rsquo;t match.<\/p>\n<blockquote>\n<p>I believe we&rsquo;re currently at an inflection point, and mobile devices are soon going to dominate inference. The growth of the entire AI ecosystem is going to be fueled by mobile inference capabilities. Thanks to the scale of the mobile ecosystem, the world will become filled with amazing experiences enabled by mainstream AI technology &hellip; Tasks that were once only practical in the cloud are now being pushed to the edge as mobile devices become more powerful. \u2014 Dan Abdinoor, CTO of Fritz.ai<\/p>\n<\/blockquote>\n<p>As more companies face the decision of whether to deploy their Neural Network on a server or directly on the device, increasingly that answer will be the device. It just makes sense.<\/p>\n<h2 id=\"javascript-and-neural-networks\">Javascript and Neural Networks<\/h2>\n<p>While there&rsquo;s no shortage of domain-specific languages for on-device Neural Networks, I think Javascript is well positioned to garner market share for <em>on-device<\/em> Neural Networks going forward.<\/p>\n<p>Javascript boasts a chameleon-like ability to exist on every platform, everywhere. It starts out with an enormous installed base of every browser on every phone and desktop, but it&rsquo;s also a popular environment for building native mobile application development (with React Native), native desktop development (Electron), or server-side development (Node.js).<\/p>\n<p>In most AI frameworks (for instance, Tensorflow) the framework serves as a translation layer between high level abstractions and the actual mathematical operations farmed out to the GPU. Since the underlying math layer is C++, software developers have the freedom to build abstractions in whatever language is most comfortable.<\/p>\n<p>Javascript is ideal for our purposes in this book, because you already have it installed (through your web browser) and it excels as a language in handling rich interactive experiences. Though all the Networks we&rsquo;ll write in this book are designed to run in a browser, you may wish to tackle larger datasets requiring more computation in the future; if so, all examples can be ported to run in Node.js and can take advantage of whatever server-side GPUs you have at your disposal.<\/p>\n<blockquote>\n<p>Any application that can be written in JavaScript, will eventually be written in JavaScript. \u2014 Jeff Atwood, co-founder of StackOverflow<\/p>\n<\/blockquote>\n<p>The biggest drawback I see for using Javascript for Deep Learning is the nascent ecosystem. <code>npm<\/code> still lags Python&rsquo;s tools in the breadth and depth of packages supporting Deep Learning, and a huge amount of resources, tutorials, and books demonstrate AI concepts in Python or R. To me, this presents an opportunity as a community to step up and contribute the next generation of tools. Javascript is a wily language and I have no doubt developers will fill in the gaps soon.<\/p>\n<h2 id=\"neural-networks\">Neural Networks<\/h2>\n<p>We&rsquo;ve talked about why Neural Networks are important, but what exactly <em>is<\/em> a Neural Network?<\/p>\n<p>Earlier we said Machine Learning is the act of making predictions. More precisely, you put numbers in, those numbers get put through some number of mathematical functions, and new numbers come out.<\/p>\n<p>The fundamental building block of a Neural Network is a <strong>Neuron<\/strong>. In code, Neurons are commonly referred to as <strong>units<\/strong>.<\/p>\n<p><img src=\"images\/neural-net\/neuron_450x450.png\" alt=\"A lone Neuron\"><\/p>\n<p>The Neuron takes in a single number and transforms it. You&rsquo;ll specify the nature of this transformation when you architect your Network.<\/p>\n<p>A collection of Neurons comprise a <strong>Layer<\/strong>.<\/p>\n<p><img src=\"images\/neural-net\/layer_1050x450.png\" alt=\"A Layer of Neurons\"><\/p>\n<p>Since a layer is a collection of neurons acting in concert, it is capable of much richer computation than a single Neuron.<\/p>\n<p>Neurons are connected to other Neurons via <strong>Weights<\/strong>.<\/p>\n<p><img src=\"images\/neural-net\/weights_1050x650.png\" alt=\"Neurons connected by Weights\"><\/p>\n<p>Weights describe the strength of the connection between a given pair of Neurons. A bigger weight implies a stronger connection between one Neuron and another, increasing the influence that Neuron will have on the final prediction.<\/p>\n<p>A Neural Network is made up of some number of layers.<\/p>\n<p><img src=\"images\/neural-net\/layers_1050x750.png\" alt=\"Layers in a Neural Network\"><\/p>\n<p>This diagram shows a three-layer Neural Network. The first layer is the <strong>input layer<\/strong>. This is where data enters the Network. The last layer is the <strong>output layer<\/strong>, responsible for emitting the transformed values from the Network.<\/p>\n<p>The layer in the middle is called a <strong>hidden layer<\/strong>. Hidden layers make up the bulk of Neural Networks, and it&rsquo;s Networks with a lot of hidden layers that give rise to the phrase &ldquo;Deep Learning&rdquo;: those Networks are &ldquo;Deep&rdquo;. There&rsquo;s no limit to the number of hidden layers you can use.<\/p>\n<p><img src=\"images\/neural-net\/complicated_1400x1000.png\" alt=\"A complicated Neural Network\"><\/p>\n<p>A Neural Network exists in one of two phases: <strong>Inference<\/strong> or <strong>Training<\/strong>.<\/p>\n<p><strong>Inference<\/strong> describes the flow of data moving forward in one direction through the Network, from the input layer to the output layer. You can think of this as the network <em>predicting<\/em> the expected value, based on the given input values. For instance, you might feed it a picture of an animal, and the Network might answer &ldquo;dog&rdquo;.<\/p>\n<p>When you do this, specific Neurons fire based on the presence or absence of certain characteristics: Does it have fur? Does it have floppy ears? Is its tongue hanging out at an odd angle? Based on the answers to these questions, the Network might answer &ldquo;Yes, I think this is a dog!&rdquo;<\/p>\n<img src=\"images\/what\/dog_987x648.png\" alt=\"I believe this is probably a dog and I am 99.9% sure\" \/>\n<p>Inference is usually, though not always, how your users will interact with your Neural Network.<\/p>\n<p><strong>Training<\/strong> describes the flow of data forward <em>and<\/em> backward through the network. Based on the accuracy of the Network&rsquo;s predictions, changes ripple backwards, adjusting weights so that the network can improve and produce more accurate predictions in the future. You might feed the network a hundred photos of dogs, allowing the network to figure out - on its own - that all dogs tend to have fur and oddly angled tongues.<\/p>\n<p><em>(You may see the terms <strong>forward propagation<\/strong> and <strong>backpropagation<\/strong>. These are formal terms for describing Inference and an element of Training, respectively.)<\/em><\/p>\n<!--\nOften these two phases - Inference and Training - are approached separately, and in this book that's how we'll tackle them. We'll start by looking at **Inference**, including how to interact with pretrained Neural Networks, how to pass them data, and how to interpret predictions. After that, we'll discuss **Training**, including how to build Neural Networks from scratch, and how to train them to return accurate predictions.\nWe'll start by learning how to load a Neural Network in our browser and use it.\n-->\n<hr>\n<p>This is chapter 1 of my book about Neural Nets in Javascript. Want to get a few more sample chapters? Subscribe below!<\/p>"},{"title":"Image Classification with Javascript","link":"https:\/\/thekevinscott.com\/image-classification-with-javascript\/","pubDate":"Thu, 16 Aug 2018 11:30:00 +0000","guid":"https:\/\/thekevinscott.com\/image-classification-with-javascript\/","description":"<p>Machine Learning has a reputation for demanding lots of data and powerful GPU computations. This leads many people to believe that building custom machine learning models for their specific dataset is impractical without a large investment of time and resources. In fact, you can leverage Transfer Learning on the web to train an accurate image classifier in less than a minute with just a few labeled images.<\/p>\n<h2 id=\"whats-image-classification-used-for\">What&rsquo;s Image Classification Used For?<\/h2>\n<p>Teaching a machine to classify images has a wide range of practical applications. You may have seen image classification at work in your photos app, automatically suggesting friends or locations for tagging. Image Classification can be used to <a href=\"https:\/\/www.kaggle.com\/c\/data-science-bowl-2017\">recognize cancer cells<\/a>, to <a href=\"https:\/\/www.kaggle.com\/c\/airbus-ship-detection\">recognize ships in satelitte imagery<\/a>, or to <a href=\"https:\/\/www.kaggle.com\/c\/yelp-restaurant-photo-classification\">automatically classify images on Yelp<\/a>. It can even be used beyond the realm of images, analyzing heat maps of user activity for potential fraud, or Fourier transforms of audio waves.<\/p>\n<p>I recently <a href=\"https:\/\/github.com\/thekevinscott\/ml-classifier\">released an open source tool<\/a> to quickly train image classification models in your browser. Here&rsquo;s how it works:<\/p>\n<p><img src=\"images\/ml-classifier-example.gif\" alt=\"A demo of classifying images\"><\/p>\n<p>Embedded below is a live demo of the tool you can use. <a href=\"https:\/\/github.com\/thekevinscott\/dataset-tutorial-for-image-classification\/tree\/master\/data\">I&rsquo;ve put together a dataset for testing here<\/a> (or feel free to build your own). The dataset has 10 images I downloaded from each of the three most popular searches on <a href=\"https:\/\/pexels.com\">pexels.com<\/a> : Mobile&quot;, &ldquo;Wood&rdquo;, and &ldquo;Notebook&rdquo;.<\/p>\n<p>Drag the <strong>train<\/strong> folder into the drop zone, and once the model is trained, upload the <strong>validation<\/strong> folder to see how well your model can classify novel images.<\/p>\n<p><embed border=\"1\" width=\"340\" height=\"660\" src=\"https:\/\/thekevinscott.github.io\/ml-classifier-ui\/?SHOW_HELP=0&SHOW_DOWNLOAD=0\"><\/embed><\/p>\n<h2 id=\"how-does-this-work\">How does this work?<\/h2>\n<p>Transfer Learning is the special sauce that makes it possible to train extremely accurate models in your browser in a fraction of the time. Models are trained on large corpuses of data, and saved as pretrained models. Those pretrained models&rsquo; final layers can then be tuned to your specific use case.<\/p>\n<p>This works particularly well in the realm of computer vision, because so many features of images are generalizable. Rob Fergus and Matthew Zeiler <a href=\"https:\/\/arxiv.org\/abs\/1311.2901\">demonstrate in their paper<\/a> the featured learned at the early stages of their model:<\/p>\n<p><img src=\"images\/layer-1.png\" alt=\"Low Level Features\" title=\"Low Level Features\"><\/p>\n<p>The model is beginning to recognize generic features, including lines, circles, and shapes, that are applicable to any set of images. After a few more layers, it&rsquo;s able to recognize more complex shapes like edges and words:<\/p>\n<p><img src=\"images\/layer-4.png\" alt=\"Higher Level Features\" title=\"Higher Level Features\"><\/p>\n<p>The vast majority of images share general features such as lines and circles. Many share higher level features, things like an &ldquo;eye&rdquo; or a &ldquo;nose&rdquo;. This allows you to reuse the existing training that&rsquo;s already been done, and tune just the last few layers on your specific dataset, which is faster and requires less data than training from scratch.<\/p>\n<p>How much less data? <strong>It depends<\/strong>. How different your data is from your pre-trained model, how complex or variable your data is, and other factors can all play into your accuracy. With the example above I got to 100% accuracy with 30 images. For something like dogs and cats, just a handful of images is enough to get good results. <a href=\"https:\/\/medium.com\/@bingobee01\/how-much-data-to-you-need-ba834d074f3a\">Adrian G has put together a more rigorous analysis on his blog<\/a>.<\/p>\n<p>So, it depends on your dataset, but it&rsquo;s probably less than you think.<\/p>\n<p><embed border=\"0\" height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/AgkfIQ4IGaM\" frameborder=\"0\" allow=\"autoplay; encrypted-media\" allowfullscreen=\"1\"><\/embed><\/p>\n<h1 id=\"show-me-the-code\">Show me the Code!<\/h1>\n<p>Next, we&rsquo;ll look at how to import and tune a pretrained model in Javascript. We&rsquo;ll tune <a href=\"https:\/\/github.com\/tensorflow\/models\/blob\/master\/research\/slim\/nets\/mobilenet_v1.md\">MobileNet<\/a>, a pretrained model produced by Google.<\/p>\n<blockquote>\n<p>MobileNets are a class of convolutional neural network designed by researches at Google. They are coined \u201cmobile-first\u201d in that they\u2019re architected from the ground up to be resource-friendly and run quickly, right on your phone. \u2014 <a href=\"https:\/\/hackernoon.com\/creating-insanely-fast-image-classifiers-with-mobilenet-in-tensorflow-f030ce0a2991\">Matt Harvey<\/a><\/p>\n<\/blockquote>\n<p>MobileNet is trained on a huge corpus of images called <a href=\"http:\/\/www.image-net.org\/\">ImageNet<\/a>, containing over 14 million labeled images belonging to a 1000 different categories. If you download <code>mobilenet_v1_0.25_224<\/code>, you&rsquo;ll see a structure of files like:<\/p>\n<pre><code>mobilenet_v1_0.25_224.ckpt.data-00000-of-00001\nmobilenet_v1_0.25_224.ckpt.index\nmobilenet_v1_0.25_224.ckpt.meta\nmobilenet_v1_0.25_224.tflite\nmobilenet_v1_0.25_224_eval.pbtxt\nmobilenet_v1_0.25_224_frozen.pb\nmobilenet_v1_0.25_224_info.txt\n<\/code><\/pre>\n<p>Within <code>mobilenet_v1_0.25_224_eval.pbtxt<\/code>, note the <code>shape<\/code> attribute:<\/p>\n<pre><code> attr {\nkey: &quot;shape&quot;\nvalue {\nshape {\ndim {\nsize: -1\n}\ndim {\nsize: 224\n}\ndim {\nsize: 224\n}\ndim {\nsize: 3\n}\n}\n}\n}\n<\/code><\/pre>\n<p>This tells us that the first layer of this MobileNet expects to receive a Tensor of Rank 4 with dimensions <code>[any, 224, 224, 3]<\/code>. (If you&rsquo;re wondering what a Tensor is, <a href=\"https:\/\/thekevinscott.com\/tensors-in-javascript\/\">check out this article first<\/a>.)<\/p>\n<h2 id=\"importing-and-setup\">Importing and Setup<\/h2>\n<p><a href=\"https:\/\/github.com\/thekevinscott\/dataset-tutorial-for-image-classification\">I&rsquo;ve set up a repo with the necessary packages<\/a> to get you going. Clone it and follow the readme instructions to install the packages and run it. In <code>index.js<\/code>, import Tensorflow.js with:<\/p>\n<pre><code class=\"language-javascript\">import * as tf from '@tensorflow\/tfjs';\n<\/code><\/pre>\n<p>Tensorflow.js provides a function to load a pretrained model asynchronously. We&rsquo;ll use this to load MobileNet:<\/p>\n<pre><code class=\"language-javascript\">function loadMobilenet() {\nreturn tf.loadModel('https:\/\/storage.googleapis.com\/tfjs-models\/tfjs\/mobilenet_v1_0.25_224\/model.json');\n}\n<\/code><\/pre>\n<h2 id=\"data-pipelines\">Data Pipelines<\/h2>\n<p>At the heart of your machine learning model is data. Building a solid pipeline for processing your data is crucial for success; often, a <a href=\"https:\/\/thekevinscott.com\/dealing-with-mnist-image-data-in-tensorflowjs\/\">majority of your time will be spent working with your data pipeline<\/a>.<\/p>\n<blockquote>\n<p>It may be surprising to the academic community to know that only a tiny fraction of the code in many machine learning systems is actually doing \u201cmachine learning\u201d. When we recognize that a mature system might end up being (at most) 5% machine learning code and (at least) 95% glue code, reimplementation rather than reuse of a clumsy API looks like a much better strategy. \u2014 <a href=\"https:\/\/ai.google\/research\/pubs\/pub43146\">D. Sculley et all<\/a><\/p>\n<\/blockquote>\n<p>There&rsquo;s a few common ways you&rsquo;ll see image data structured:<\/p>\n<ol>\n<li>A list of folders containing images, where the folder name is the label<\/li>\n<li>Images in a single folder, with images named by label (<code>dog-1<\/code>, <code>dog-2<\/code>)<\/li>\n<li>Images in a single folder, and a csv or other file with a mapping of label to file<\/li>\n<\/ol>\n<p>There&rsquo;s no right way to organize your images. Choose whatever format makes sense for you and your team. This dataset is organized by folder.<\/p>\n<p><img src=\"images\/goldberg.gif\" alt=\"Rube Goldberg Machine\"><\/p>\n<p>Our data processing pipeline will consist of four parts:<\/p>\n<ol>\n<li>Load the image (and turn it into a tensor)<\/li>\n<li>Crop the image<\/li>\n<li>Resize the image<\/li>\n<li>Translate the Tensor into an appropriate input format<\/li>\n<\/ol>\n<h3 id=\"1-loading-the-image\">1. Loading the Image<\/h3>\n<p>Since our machine learning model expects <a href=\"https:\/\/thekevinscott.com\/tensors-in-javascript\/\">Tensors<\/a>, the first step is to load the image and translate its pixel data into a Tensor. Browsers provide many convenient tools to load images and read pixels, and Tensorflow.js provides a function to convert an <code>Image<\/code> object into a Tensor. (If you&rsquo;re in Node, you&rsquo;ll have to handle this yourself). This function will take a <code>src<\/code> URL of the image, load the image, and returns a promise resolving with a 3D Tensor of shape <code>[width, height, color_channels]<\/code>:<\/p>\n<pre><code class=\"language-javascript\">function loadImage(src) {\nreturn new Promise((resolve, reject) =&gt; {\nconst img = new Image();\nimg.src = src;\nimg.onload = () =&gt; resolve(tf.fromPixels(img));\nimg.onerror = (err) =&gt; reject(err);\n});\n}\n<\/code><\/pre>\n<h3 id=\"2-cropping-the-image\">2. Cropping the Image<\/h3>\n<p>Many classifiers expect square images. This is not a strict requirement; if you build your own model you can specify any size resolution you want. However, standard CNN architectures expect that images be of a <strong>fixed size<\/strong>. Given this necessity, many pretrained models accept squares, in order to support the widest variety of image ratios. (Squares also provide flexibility for handling a variety of <a href=\"https:\/\/medium.com\/ymedialabs-innovation\/data-augmentation-techniques-in-cnn-using-tensorflow-371ae43d5be9\">data augmentation techniques<\/a>).<\/p>\n<p>We determined above that MobileNet expects 224x224 square images, so we&rsquo;ll need to first crop our images. We do that by chopping off the edges of the longer side:<\/p>\n<pre><code class=\"language-javascript\">function cropImage(img) {\nconst width = img.shape[0];\nconst height = img.shape[1];\n\/\/ use the shorter side as the size to which we will crop\nconst shorterSide = Math.min(img.shape[0], img.shape[1]);\n\/\/ calculate beginning and ending crop points\nconst startingHeight = (height - shorterSide) \/ 2;\nconst startingWidth = (width - shorterSide) \/ 2;\nconst endingHeight = startingHeight + shorterSide;\nconst endingWidth = startingWidth + shorterSide;\n\/\/ return image data cropped to those points\nreturn img.slice([startingWidth, startingHeight, 0], [endingWidth, endingHeight, 3]);\n}\n<\/code><\/pre>\n<h3 id=\"3-resizing-the-image\">3. Resizing the image<\/h3>\n<p>Now that our image is square, we can resize it to 224x224. This part is easy: Tensorflow.js provides a resize method out of the box:<\/p>\n<pre><code class=\"language-javascript\">function resizeImage(image) {\nreturn tf.image.resizeBilinear(image, [224, 224]);\n}\n<\/code><\/pre>\n<h3 id=\"4-translate-the-tensor\">4. Translate the Tensor<\/h3>\n<p>Recall that our model expects an input object of the shape <code>[any, 224, 224, 3]<\/code>. This is known as a Tensor of Rank 4. This dimension refers to the number of training examples. If you have 10 training examples, that would be <code>[10, 224, 224, 3]<\/code>.<\/p>\n<p>We also want our pixel data as a floating point number between -1 and 1, instead of integer data between 0 and 255, a process called normalization. While <a href=\"https:\/\/stackoverflow.com\/questions\/4674623\/why-do-we-have-to-normalize-the-input-for-an-artificial-neural-network\">neural networks are generally agnostic to the size<\/a> of the numbers coming in, using smaller numbers can help the network train faster.<\/p>\n<p>We can build a function that expands our Tensor and translates the integers into floats with:<\/p>\n<pre><code class=\"language-javascript\">function batchImage(image) {\n\/\/ Expand our tensor to have an additional dimension, whose size is 1\nconst batchedImage = image.expandDims(0);\n\/\/ Turn pixel data into a float between -1 and 1.\nreturn batchedImage.toFloat().div(tf.scalar(127)).sub(tf.scalar(1));\n}\n<\/code><\/pre>\n<h3 id=\"the-final-pipeline\">The Final Pipeline<\/h3>\n<p>Putting all the above functions together into a single function, we get:<\/p>\n<pre><code class=\"language-javascript\">function loadAndProcessImage(image) {\nconst croppedImage = cropImage(image);\nconst resizedImage = resizeImage(croppedImage);\nconst batchedImage = batchImage(resizedImage);\nreturn batchedImage;\n}\n<\/code><\/pre>\n<p>We can now use this function to test that our data pipeline is set up correctly. We&rsquo;ll import an image whose label is known (a drum) and see if the prediction matches the expected label:<\/p>\n<p><img src=\"images\/drum.jpg\" alt=\"Image of a drum\" title=\"An image that is known\"><\/p>\n<pre><code class=\"language-javascript\">import drum from '.\/data\/pretrained-model-data\/drum.jpg';\nloadMobilenet().then(pretrainedModel =&gt; {\nloadImage(drum).then(img =&gt; {\nconst processedImage = loadAndProcessImage(img);\nconst prediction = pretrainedModel.predict(processedImage);\n\/\/ Because of the way Tensorflow.js works, you must call print on a Tensor instead of console.log.\nprediction.print();\n});\n});\n<\/code><\/pre>\n<p>You should see something like:<\/p>\n<pre><code class=\"language-javascript\">Tensor\n[[0.0000273, 5e-7, 4e-7, ..., 0.0001365, 0.0001604, 0.0003134],]\n<\/code><\/pre>\n<p>If we inspect the shape of this Tensor, we&rsquo;ll see it to be <code>[1, 1000]<\/code>. MobileNet returns a Tensor containing a prediction for every category, and since MobileNet has learned 1000 classes, we receive 1000 predictions, each representing the probability that the given image belongs to a given class.<\/p>\n<p>In order to get an actual prediction, we need to determine the most likely prediction. We flatten the tensor to 1 dimension and get the max value, which corresponds to our most confident prediction:<\/p>\n<pre><code class=\"language-javascript\">prediction.as1D().argMax().print();\n<\/code><\/pre>\n<p>This should produce:<\/p>\n<pre><code class=\"language-javascript\">Tensor\n541\n<\/code><\/pre>\n<p>In the repo you&rsquo;ll find <a href=\"https:\/\/github.com\/thekevinscott\/dataset-tutorial-for-image-classification\/blob\/master\/imagenet_labels.json\">a copy of the ImageNet class definitions in JSON format<\/a>. You can import that JSON file to translate the numeric prediction into an actual string:<\/p>\n<pre><code class=\"language-javascript\">import labels from '.\/imagenet_labels.json';\nloadMobilenet().then(pretrainedModel =&gt; {\n...\nconst labelPrediction = prediction.as1D().argMax().dataSync()[0];\nconsole.log(`\nNumeric prediction is ${labelPrediction}\nThe predicted label is ${labels[labelPrediction]}\nThe actual label is drum, membranophone, tympan\n`);\n});\n<\/code><\/pre>\n<p>You should see that <code>541<\/code> corresponds to <code>drum, membranophone, tympan<\/code>, which is the category our image comes from. At this point you have a working pipeline and the ability to leverage MobileNet to predict ImageNet images.<\/p>\n<p>Now let&rsquo;s look at how to tune MobileNet on your specific dataset.<\/p>\n<h2 id=\"training-the-model\">Training The Model<\/h2>\n<p><img src=\"images\/monorail.gif\" alt=\"You get a monorail!\" title=\"You get a monorail\"><\/p>\n<p>We want to build a model that successfully predicts <strong>novel data<\/strong>; data it hasn&rsquo;t seen before.<\/p>\n<p>To do this, you first train the model on labeled data - data that has already been identified - and you validate the model&rsquo;s performance on other labeled data <em>it hasn&rsquo;t seen before<\/em>.<\/p>\n<blockquote>\n<p>Supervised learning reverses this process, solving for m and b, given a set of x\u2019s and y\u2019s. In supervised learning, you start with many particulars \u2014 the data \u2014 and infer the general equation. And the learning part means you can update the equation as you see more x\u2019s and y\u2019s, changing the slope of the line to better fit the data. The equation almost never identifies the relationship between each x and y with 100% accuracy, but the generalization is powerful because later on you can use it to do algebra on new data. \u2014 <a href=\"https:\/\/hbr.org\/2017\/10\/how-to-spot-a-machine-learning-opportunity-even-if-you-arent-a-data-scientist\">Kathryn Hume<\/a><\/p>\n<\/blockquote>\n<p>When you trained the model above by dragging the <code>training<\/code> folder in, the model produced a training score. This indicates how many images the classifier was able to learn to successfully predict out of the training set. The second number it produced indicated how many images it could predict that it <em>hadn&rsquo;t seen before<\/em>. This second score is the one you want to optimize for (well, you want to optimize for both, but the latter number is more applicable to novel data).<\/p>\n<p>We&rsquo;re going to train on the <strong>colors<\/strong> dataset. In the repo, you&rsquo;ll find a folder <code>data\/colors<\/code> that contains:<\/p>\n<pre><code>validation\/\nblue\/\nblue-3.png\nred\/\nred-3.png\ntraining\/\nblue\/\nblue-1.png\nblue-2.png\nred\/\nred-1.png\nred-2.png\n<\/code><\/pre>\n<p>Building machine learning models, I&rsquo;ve found that <em>code-related<\/em> errors - a missing variable, an inability to compile - are fairly straight forward to fix, whereas <em>training<\/em> errors - the labels were in an incorrect order, or the images were being cropped incorrectly - are devilish to debug. Testing exhaustively and setting up sanity test cases can help save you a few gray hairs.<\/p>\n<p>The <code>data\/colors<\/code> folder provides a list of solid red and blue colors that are guaranteed to be easy to train with. We&rsquo;ll use these to train our model and ensure that our machine learning code learns correctly, before attempting with a more complicated dataset.<\/p>\n<pre><code class=\"language-javascript\">import blue1 from '..\/data\/colors\/training\/blue\/blue-1.png';\nimport blue2 from '..\/data\/colors\/training\/blue\/blue-2.png';\nimport blue3 from '..\/data\/colors\/validation\/blue\/blue-3.png';\nimport red1 from '..\/data\/colors\/training\/red\/red-1.png';\nimport red2 from '..\/data\/colors\/training\/red\/red-2.png';\nimport red3 from '..\/data\/colors\/validation\/red\/red-3.png';\nconst training = [\nblue1,\nblue2,\nred1,\nred2,\n];\n\/\/ labels should match the positions of their associated images\nconst labels = [\n'blue',\n'blue',\n'red',\n'red',\n];\n<\/code><\/pre>\n<p>When we previously loaded MobileNet, we used the model without any modifications. When training, we want to use a subset of its layers - specifically, we want to ignore the final layers that produce the one-of-1000 classification. You can inspect the structure of a pretrained model with <code>.summary()<\/code>:<\/p>\n<pre><code class=\"language-javascript\">loadMobilenet().then(mobilenet =&gt; {\nmobilenet.summary();\n});\n<\/code><\/pre>\n<p>In your console should be the model output, and near the end you should see something like:<\/p>\n<pre><code>conv_dw_13_bn (BatchNormaliz [null,7,7,256] 1024\n_________________________________________________________________\nconv_dw_13_relu (Activation) [null,7,7,256] 0\n_________________________________________________________________\nconv_pw_13 (Conv2D) [null,7,7,256] 65536\n_________________________________________________________________\nconv_pw_13_bn (BatchNormaliz [null,7,7,256] 1024\n_________________________________________________________________\nconv_pw_13_relu (Activation) [null,7,7,256] 0\n_________________________________________________________________\nglobal_average_pooling2d_1 ( [null,256] 0\n_________________________________________________________________\nreshape_1 (Reshape) [null,1,1,256] 0\n_________________________________________________________________\ndropout (Dropout) [null,1,1,256] 0\n_________________________________________________________________\nconv_preds (Conv2D) [null,1,1,1000] 257000\n_________________________________________________________________\nact_softmax (Activation) [null,1,1,1000] 0\n_________________________________________________________________\nreshape_2 (Reshape) [null,1000] 0\n=================================================================\nTotal params: 475544\nTrainable params: 470072\nNon-trainable params: 5472\n_________________________________________________________________\n<\/code><\/pre>\n<p>What we&rsquo;re looking for is the final <code>Activation<\/code> layer that is not <code>softmax<\/code> (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Softmax_function\"><code>softmax<\/code> is the activation<\/a> used to boil the predictions down to one of a thousand categories). That layer is <code>conv_pw_13_relu<\/code>. We return a pretrained model that includes everything up to that activation layer:<\/p>\n<pre><code class=\"language-javascript\">function buildPretrainedModel() {\nreturn loadMobilenet().then(mobilenet =&gt; {\nconst layer = mobilenet.getLayer('conv_pw_13_relu');\nreturn tf.model({\ninputs: mobilenet.inputs,\noutputs: layer.output,\n});\n});\n}\n<\/code><\/pre>\n<p>Let&rsquo;s write a function to loop through an array of images and return a Promise that resolves when they load.<\/p>\n<pre><code class=\"language-javascript\">function loadImages(images, pretrainedModel) {\nlet promise = Promise.resolve();\nfor (let i = 0; i &lt; images.length; i++) {\nconst image = images[i];\npromise = promise.then(data =&gt; {\nreturn loadImage(image).then(loadedImage =&gt; {\n\/\/ Note the use of `tf.tidy` and `.dispose()`. These are two memory management\n\/\/ functions that Tensorflow.js exposes.\n\/\/ https:\/\/js.tensorflow.org\/tutorials\/core-concepts.html\n\/\/\n\/\/ Handling memory management is crucial for building a performant machine learning\n\/\/ model in a browser.\nreturn tf.tidy(() =&gt; {\nconst processedImage = loadAndProcessImage(loadedImage);\nconst prediction = pretrainedModel.predict(processedImage);\nif (data) {\nconst newData = data.concat(prediction);\ndata.dispose();\nreturn newData;\n}\nreturn tf.keep(prediction);\n});\n});\n});\n}\nreturn promise;\n}\n<\/code><\/pre>\n<p>We build a sequential promise that iterates over each image and processes it. Alternatively, you can use <code>Promise.all<\/code> to load images in parallel, but be aware of UI performance if you do that.<\/p>\n<p>Putting those functions together, we get:<\/p>\n<pre><code class=\"language-javascript\">buildPretrainedModel().then(pretrainedModel =&gt; {\nloadImages(training, pretrainedModel).then(xs =&gt; {\nxs.print();\n})\n});\n<\/code><\/pre>\n<p>Calling your data &ldquo;x&rdquo; and &ldquo;y&rdquo; is <a href=\"https:\/\/datascience.stackexchange.com\/questions\/17598\/why-are-variables-of-train-and-test-data-defined-using-the-capital-letter-in-py\">a convention in the machine learning world<\/a>, carrying over from its mathematical origins. You can call your variables whatever you want, but I find it useful to stick to the conventions where I can.<\/p>\n<h3 id=\"labels\">Labels<\/h3>\n<p>Next, you&rsquo;ll need to convert your labels into numeric form. However, it&rsquo;s not as simple as assigning a number to each category. To demonstrate, let&rsquo;s say you&rsquo;re classifying three categories of fruit:<\/p>\n<pre><code>raspberry - 0\nblueberry - 1\nstrawberry - 2\n<\/code><\/pre>\n<p>Denoting numbers like this can imply a relationship where one does not exist, since these numbers are considered <em>ordinal<\/em> values; they imply some order in the data. Real world consequences of this might be that the network decides that a blueberry is something that is halfway between a raspberry and a strawberry, or that a strawberry is the &ldquo;best&rdquo; of the berries.<\/p>\n<p><img src=\"images\/strawberry.gif\" alt=\"Strawberry Inception\"><\/p>\n<p>To prevent these incorrect assumptions we use a process called &ldquo;one hot encoding&rdquo;, resulting in data that looks like:<\/p>\n<pre><code>raspberry - [1, 0, 0]\nblueberry - [0, 1, 0]\nstrawberry - [0, 0, 1]\n<\/code><\/pre>\n<p>(Two great articles that go into more depth on one hot encoding are <a href=\"https:\/\/hackernoon.com\/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f\">here<\/a> and <a href=\"https:\/\/machinelearningmastery.com\/why-one-hot-encode-data-in-machine-learning\/\">here<\/a>.) We can leverage Tensorflow.js&rsquo;s built in <code>oneHot<\/code> functions to translate our labels:<\/p>\n<pre><code class=\"language-javascript\">function oneHot(labelIndex, classLength) {\nreturn tf.tidy(() =&gt; tf.oneHot(tf.tensor1d([labelIndex]).toInt(), classLength));\n};\n<\/code><\/pre>\n<p>This function takes a particular number (<code>labelIndex<\/code>, a number that corresponds to a label) and translates it to a one hot encoding, given some number of classes (<code>classLength<\/code>). We can use the function with the following bit of code, that first builds a mapping of numbers-to-labels off the incoming array of labels, and then builds a Tensor containing those one-hot encoded labels:<\/p>\n<pre><code class=\"language-javascript\">function getLabelsAsObject(labels) {\nlet labelObject = {};\nfor (let i = 0; i &lt; labels.length; i++) {\nconst label = labels[i];\nif (labelObject[label] === undefined) {\n\/\/ only assign it if we haven't seen it before\nlabelObject[label] = Object.keys(labelObject).length;\n}\n}\nreturn labelObject;\n}\nfunction addLabels(labels) {\nreturn tf.tidy(() =&gt; {\nconst classes = getLabelsAsObject(labels);\nconst classLength = Object.keys(classes).length;\nlet ys;\nfor (let i = 0; i &lt; labels.length; i++) {\nconst label = labels[i];\nconst labelIndex = classes[label];\nconst y = oneHot(labelIndex, classLength);\nif (i === 0) {\nys = y;\n} else {\nys = ys.concat(y, 0);\n}\n}\nreturn ys;\n});\n};\n<\/code><\/pre>\n<p>Now that we have our data, we can build our model. You are welcome to innovate at this stage, but I find that building on others&rsquo; conventions tends to produce a good enough model in most cases. We&rsquo;ll look to the <a href=\"https:\/\/github.com\/tensorflow\/tfjs-examples\/tree\/master\/webcam-transfer-learning\">Webcam Tensorflow.js example<\/a> for a well structured transfer learning model we&rsquo;ll reuse largely verbatim.<\/p>\n<p>Things worth highlighting are that the first layer matches the output shape of our pretrained model, and the final <code>softmax<\/code> layer corresponds to the number of labels, defined as <code>numberOfClasses<\/code>. 100 units on the second layer is arbitrary, and you can absolutely experiment with changing this number for your particular use case.<\/p>\n<pre><code class=\"language-javascript\">function getModel(numberOfClasses) {\nconst model = tf.sequential({\nlayers: [\ntf.layers.flatten({inputShape: [7, 7, 256]}),\ntf.layers.dense({\nunits: 100,\nactivation: 'relu',\nkernelInitializer: 'varianceScaling',\nuseBias: true\n}),\ntf.layers.dense({\nunits: numberOfClasses,\nkernelInitializer: 'varianceScaling',\nuseBias: false,\nactivation: 'softmax'\n})\n],\n});\nmodel.compile({\noptimizer: tf.train.adam(0.0001),\nloss: 'categoricalCrossentropy',\nmetrics: ['accuracy'],\n});\nreturn model;\n}\n<\/code><\/pre>\n<p>Here are various links if you want to go into a little more depth on the neural networks&rsquo; internal parts:<\/p>\n<ul>\n<li><a href=\"https:\/\/js.tensorflow.org\/api\/0.12.0\/#sequential\"><code>tf.sequential<\/code><\/a><\/li>\n<li><a href=\"https:\/\/js.tensorflow.org\/api\/0.12.0\/#layers.flatten\"><code>tf.layers.flatten<\/code><\/a><\/li>\n<li><a href=\"https:\/\/js.tensorflow.org\/api\/0.12.0\/#layers.dense\"><code>tf.layers.dense<\/code><\/a><\/li>\n<li><a href=\"https:\/\/www.kaggle.com\/dansbecker\/rectified-linear-units-relu-in-deep-learning\">the activation <code>relu<\/code><\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/adam-optimization-algorithm-for-deep-learning\/\"><code>adam<\/code> optimizer<\/a><\/li>\n<li><a href=\"https:\/\/keras.io\/losses\/\"><code>categoricalCrossentropy<\/code> loss<\/a><\/li>\n<\/ul>\n<p>The final step is actually train the model, which we do by calling <code>.fit()<\/code> on the model. We shuffle our training images so the model doesn&rsquo;t learn to rely on the order of the incoming training data, and we train for 20 epochs. (An epoch denotes one cycle through your entire training set.)<\/p>\n<pre><code class=\"language-javascript\">function makePrediction(pretrainedModel, image, expectedLabel) {\nloadImage(image).then(loadedImage =&gt; {\nreturn loadAndProcessImage(loadedImage);\n}).then(loadedImage =&gt; {\nconst activatedImage = pretrainedModel.predict(loadedImage);\nloadedImage.dispose();\nreturn activatedImage;\n}).then(activatedImage =&gt; {\nconst prediction = model.predict(activatedImage);\nconst predictionLabel = prediction.as1D().argMax().dataSync()[0];\nconsole.log('Expected Label', expectedLabel);\nconsole.log('Predicted Label', predictionLabel);\nprediction.dispose();\nactivatedImage.dispose();\n});\n}\nbuildPretrainedModel().then(pretrainedModel =&gt; {\nloadImages(training, pretrainedModel).then(xs =&gt; {\nconst ys = addLabels(labels);\nconst model = getModel(2);\nmodel.fit(xs, ys, {\nepochs: 20,\nshuffle: true,\n}).then(history =&gt; {\n\/\/ make predictions\nmakePrediction(pretrainedModel, blue3, &quot;0&quot;);\nmakePrediction(pretrainedModel, red3, &quot;1&quot;);\n});\n});\n});\n<\/code><\/pre>\n<p>How many epochs should you run for?<\/p>\n<blockquote>\n<p>Unfortunately, there is no right answer to this question. The answer is different for different datasets but you can say that the numbers of epochs is related to how diverse your data is \u2014 <a href=\"https:\/\/towardsdatascience.com\/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9\">Sagar Sharma<\/a><\/p>\n<\/blockquote>\n<p>Basically, you can run it until it&rsquo;s good, or or it&rsquo;s clear it&rsquo;s not working, or you run out of time.<\/p>\n<p>You should see 100% accuracy in the training above. Try modifying the code to work on the <a href=\"https:\/\/github.com\/thekevinscott\/dataset-tutorial-for-image-classification\/tree\/master\/data\/pexel-images\">Pexels dataset<\/a>. I found in my testing that my accuracy numbers fall a little bit with this more complex dataset.<\/p>\n<hr>\n<p>In summary, it&rsquo;s cheap and fast to build on top of a pretrained model and get a classifier that is pretty darn accurate.<\/p>\n<p>When coding machine learning, be careful to test your code at each section of the process and validate with data you know works. It pays to set up a stable and reusable data pipeline early in your process, since so much of your time is spent working with your data.<\/p>\n<p>Finally, if you&rsquo;re interested in learning more about training CNNs from scratch, a great place to start is <a href=\"https:\/\/fastai.com\">Fast.ai<\/a>&rsquo;s tutorials for hackers. It&rsquo;s built in Python but you can translate the ideas in Node.js if you want to stay in Javascript.<\/p>"},{"title":"Tensors in JavaScript","link":"https:\/\/thekevinscott.com\/tensors-in-javascript\/","pubDate":"Tue, 07 Aug 2018 14:00:00 +0000","guid":"https:\/\/thekevinscott.com\/tensors-in-javascript\/","description":"<p>At the heart of most Machine Learning models are numbers. The special data type that undergirds all of the mathematical transformations you perform is called a <strong>Tensor<\/strong>.<\/p>\n<p>Tensors are a concept imported from mathematics and physics, and they are <a href=\"https:\/\/www.quora.com\/What-is-a-tensor\">considerably more complicated in theory<\/a> than this article will get into. If you&rsquo;re a hacker looking to get started with a Machine Learning project on the web in Javascript, you can assume that:<\/p>\n<ol>\n<li>A Tensor has Data<\/li>\n<li>A Tensor has a Dimension<\/li>\n<li>A Tensor has a Shape<\/li>\n<li>A Tensor has a Type<\/li>\n<li>A Tensor Describes Valid Transformations<\/li>\n<\/ol>\n<p>Let&rsquo;s go through these one by one.<\/p>\n<p><img src=\"rumble.gif\" alt=\"Let&amp;rsquo;s get ready to rumble\" title=\"Let's get ready to rumble\"><\/p>\n<h2 id=\"1-a-tensor-has-data\">1. A Tensor has Data<\/h2>\n<p>A Tensor is a repository for some set of data, usually numeric. In this way, it&rsquo;s similar to the flat or multidimensional arrays you write in Javascript.<\/p>\n<p>We can build a Tensor using Tensorflow.js, and get back a representation of its data by calling <code>.print()<\/code>:<\/p>\n<pre><code>&gt; tf.tensor([1, 2, 3, 4]).print();\nTensor\n[1, 2, 3, 4]\n<\/code><\/pre>\n<h2 id=\"2-a-tensor-has-a-dimension\">2. A Tensor has a Dimension<\/h2>\n<p>The array from the previous example was a flat sequence of numbers. Another way of thinking about that array is that it <strong>has a dimension of 1<\/strong>.<\/p>\n<p>Something more complex, like an Excel spreadsheet which contains rows and columns, would <strong>have a dimension of 2<\/strong>.<\/p>\n<p>Tensors define an easy way to encode dimensionality into the data structure. (Dimensionality is commonly referred to as &ldquo;Rank&rdquo;, as in &ldquo;this tensor has a Rank of 2&rdquo;.)<\/p>\n<p>Let&rsquo;s see an example of a 2-dimensional Tensor:<\/p>\n<pre><code>&gt; tf.tensor([[1, 2], [3, 4]]).print();\nTensor\n[[1, 2],\n[3, 4]]\n<\/code><\/pre>\n<p>Higher rank tensors are used for a wide variety of machine learning problems, as <a href=\"https:\/\/hackernoon.com\/learning-ai-if-you-suck-at-math-p4-tensors-illustrated-with-cats-27f0002c9b32\">Daniel Jeffries lists in his tutorial<\/a>:<\/p>\n<blockquote>\n<ul>\n<li>3D = Time series<\/li>\n<li>4D = Images<\/li>\n<li>5D = Videos<\/li>\n<\/ul>\n<\/blockquote>\n<h2 id=\"3-a-tensor-has-a-shape\">3. A Tensor has a Shape<\/h2>\n<p>Closely correlated with the Tensor&rsquo;s Dimension (or Rank) is Shape.<\/p>\n<p>A Tensor&rsquo;s shape describes the underlying length of the Tensor&rsquo;s dimensions. Here&rsquo;s an example:<\/p>\n<pre><code>&gt; tf.tensor([[1, 2, 3], [3, 4, 5]]).shape\n(2) [2, 3]\n<\/code><\/pre>\n<h2 id=\"4-a-tensor-has-a-type\">4. A Tensor has a Type<\/h2>\n<p>A Tensor&rsquo;s data has a fixed type that describes what the data is. Valid types in Tensorflow.js can be floating point numbers (decimals), integers, or booleans.<\/p>\n<p>We can set the data type upon creation of the Tensor:<\/p>\n<pre><code>&gt; tf.tensor1d([1, 2], null, 'float32').dtype\n&quot;float32&quot;\n<\/code><\/pre>\n<h2 id=\"5-a-tensor-describes-valid-transformations\">5. A Tensor Describes Valid Transformations<\/h2>\n<p>A Tensor encodes some knowledge of what are valid mathematical operations in relation to other Tensors. For this reason, it can be useful to think of Tensors not as data structures but as objects or classes. This is exactly how <a href=\"https:\/\/js.tensorflow.org\/api\/latest\/#class:Tensor\">Tensorflow.js represents a Tensor<\/a>.<\/p>\n<p>Let&rsquo;s say we wanted to compute the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dot_product\">dot product<\/a>:<\/p>\n<pre><code>&gt; tf.tensor1d([1, 2]).dot(tf.tensor2d([[1,2], [2, 3]])).print()\nTensor\n[5, 8]\n<\/code><\/pre>\n<p>However, if we try and perform an invalid calculation:<\/p>\n<pre><code>&gt; tf.tensor2d([[1, 2, 3], [4, 5, 6]]).dot(tf.tensor2d([[1,2], [2, 3]])).print()\nUncaught Error: Error in dot: inner dimensions of inputs must match, but got 3 and 2.\n<\/code><\/pre>\n<p>Tensors prevent us from performing invalid calculations. If you&rsquo;re coming from a non-mathematical background (like I am) you&rsquo;ll be very grateful for these error messages.<\/p>\n<hr>\n<p><img src=\"scientist.gif\" alt=\"A Mad Scientist\" title=\"A mad scientist creating tensors\"><\/p>\n<p>We&rsquo;ve seen examples so far of building Tensors with Tensorflow.js using plain arrays as input. Another way of constructing a tensor is with a <code>TypedArray<\/code>.<\/p>\n<h1 id=\"typed-arrays\">Typed Arrays<\/h1>\n<p>A Typed Array is defined by an underlying data buffer, an <code>ArrayBuffer<\/code>, and an object for working with that buffer&rsquo;s data, a <code>DataView<\/code>.<\/p>\n<blockquote>\n<p>Typed Arrays are a relatively recent addition to browsers, born out of the need to have an efficient way to handle binary data in WebGL. A Typed Array is a slab of memory with a typed view into it, much like how arrays work in C. \u2014 <a href=\"https:\/\/www.html5rocks.com\/en\/tutorials\/webgl\/typed_arrays\">Ilmari Heikkinen<\/a><\/p>\n<\/blockquote>\n<p>You can create a view directly, creating a buffer behind the scenes with:<\/p>\n<pre><code>const typedArray = new Int8Array(5);\nf64a[0] = 1;\nf64a[1] = 2;\n\/\/ Int8Array(5) [1, 2, 0, 0, 0]\n<\/code><\/pre>\n<p>Alternatively, you can explicitly declare your buffer separately from your view:<\/p>\n<pre><code>const buffer = new ArrayBuffer(8); \/\/ 8-byte ArrayBuffer.\nconst typedArray = new Int8Array(buffer);\ntypedArray[0] = 1\n\/\/ Int8Array(8) [1, 0, 0, 0, 0, 0, 0, 0]\n<\/code><\/pre>\n<p>If you do explicitly create your buffer, you must be aware of the underlying representation of the bytes:<\/p>\n<pre><code>const buffer = new ArrayBuffer(8); \/\/ 8-byte ArrayBuffer.\nconst typedArray = new Int16Array(buffer);\ntypedArray[0] = 1\n\/\/ Int16Array(4) [1, 0, 0, 0]\n<\/code><\/pre>\n<p>You can have multiple views pointing to the same underlying buffer. This approach is used, for instance, <a href=\"https:\/\/thekevinscott.com\/dealing-with-mnist-image-data-in-tensorflowjs\/\">to iteratively build MNIST image examples into an underlying data buffer<\/a>. Here&rsquo;s a simple example:<\/p>\n<pre><code>const buffer = new ArrayBuffer(8); \/\/ 8-byte ArrayBuffer.\nconst firstHalfView = new Int8Array(buffer, 0, 4);\nconst secondHalfView = new Int8Array(buffer, 4, 4);\nfirstHalfView[0] = 1\nsecondHalfView[0] = 2\nconsole.log(buffer);\n\/\/ [[Int8Array]]: Int8Array(8) [1, 0, 0, 0, 2, 0, 0, 0]\n<\/code><\/pre>\n<p>There are a number of eponymous Typed Arrays you can use; <a href=\"https:\/\/blog.codingbox.io\/exploring-javascript-typed-arrays-c8fd4f8bd24f\">a great rundown of each with their byte sizes is here<\/a>.<\/p>\n<h2 id=\"why-use-a-typed-array\">Why Use a Typed Array?<\/h2>\n<p>The answer: <strong>performance<\/strong>.<\/p>\n<p>Typed Arrays were originally introduced to handle things like WebGL and other graphical layers that required blazing fast performance. Machine Learning benefits from a similar level of performance, which is why many large machine learning models are trained on servers, parallelized across powerful GPUs.<\/p>\n<blockquote>\n<p>Because a Typed Array is backed by raw memory, the JavaScript engine can pass the memory directly to native libraries without having to painstakingly convert the data to a native representation. As a result, typed arrays perform a lot better than JavaScript arrays for passing data to WebGL and other APIs dealing with binary data. \u2014 <a href=\"https:\/\/www.html5rocks.com\/en\/tutorials\/webgl\/typed_arrays\">Ilmari Heikkinen<\/a><\/p>\n<\/blockquote>\n<p>It&rsquo;s a good habit to get comfortable with using Typed Arrays to ensure you&rsquo;re writing performant code.<\/p>\n<h1 id=\"conclusion\">Conclusion<\/h1>\n<p>In practice you can get by with Tensors keeping in mind that a tensor has:<\/p>\n<ol>\n<li>Data<\/li>\n<li>Dimension<\/li>\n<li>Shape<\/li>\n<li>Type<\/li>\n<li>Description of Valid Transformations<\/li>\n<\/ol>"},{"title":"MNIST image data in Tensorflow.js","link":"https:\/\/thekevinscott.com\/dealing-with-mnist-image-data-in-tensorflowjs\/","pubDate":"Tue, 29 May 2018 10:00:00 +0000","guid":"https:\/\/thekevinscott.com\/dealing-with-mnist-image-data-in-tensorflowjs\/","description":"<blockquote>\n<p>There&rsquo;s the joke that 80 percent of data science is cleaning the data and 20 percent is complaining about cleaning the data &hellip; data cleaning is a much higher proportion of data science than an outsider would expect. Actually training models is typically a relatively small proportion (less than 10 percent) of what a machine learner or data scientist does.\n- <a href=\"https:\/\/www.theverge.com\/2017\/11\/1\/16589246\/machine-learning-data-science-dirty-data-kaggle-survey-2017\">Anthony Goldbloom, CEO of Kaggle<\/a><\/p>\n<\/blockquote>\n<p>Manipulating data is a crucial step step for any machine learning problem. This article will take the <a href=\"https:\/\/github.com\/tensorflow\/tfjs-examples\/blob\/master\/mnist\/data.js\">MNIST example for Tensorflow.js (0.11.1)<\/a> and walk through the code that handles the data loading line-by-line.<\/p>\n<h1 id=\"mnist-example\">MNIST example<\/h1>\n<pre><code class=\"language-javascript\">18 import * as tf from '@tensorflow\/tfjs';\n19\n20 const IMAGE_SIZE = 784;\n21 const NUM_CLASSES = 10;\n22 const NUM_DATASET_ELEMENTS = 65000;\n23\n24 const NUM_TRAIN_ELEMENTS = 55000;\n25 const NUM_TEST_ELEMENTS = NUM_DATASET_ELEMENTS - NUM_TRAIN_ELEMENTS;\n26\n27 const MNIST_IMAGES_SPRITE_PATH =\n28 'https:\/\/storage.googleapis.com\/learnjs-data\/model-builder\/mnist_images.png';\n29 const MNIST_LABELS_PATH =\n30 'https:\/\/storage.googleapis.com\/learnjs-data\/model-builder\/mnist_labels_uint8';`\n<\/code><\/pre>\n<p>First the code imports tensorflow <a href=\"https:\/\/thekevinscott.com\/tensorflowjs-hello-world\/\">(make sure you&rsquo;re transpiling your code!)<\/a>, and establishes some constants, including:<\/p>\n<ul>\n<li><code>IMAGE_SIZE<\/code> \u2013 the size of an image (width and height of 28x28 = 784)<\/li>\n<li><code>NUM_CLASSES<\/code> \u2013 number of label categories (a number can be 0-9, so there&rsquo;s 10 classes)<\/li>\n<li><code>NUM_DATASET_ELEMENTS<\/code> \u2013 number of images total (65,000)<\/li>\n<li><code>NUM_TRAIN_ELEMENTS<\/code> \u2013 number of training images (55,000)<\/li>\n<li><code>NUM_TEST_ELEMENTS<\/code> \u2013 number of test images (10,000, aka the remainder)<\/li>\n<li><code>MNIST_IMAGES_SPRITE_PATH<\/code> &amp; <code>MNIST_LABELS_PATH<\/code> \u2013 paths to the images and the labels<\/li>\n<\/ul>\n<p>The images are concatenated into one huge image which looks like:<\/p>\n<p><img src=\"mnist.png\" alt=\"MNIST Data sprited\" title=\"MNIST data as sprites\"><\/p>\n<h3 id=\"mnistdata\"><code>MnistData<\/code><\/h3>\n<p>Next up is <code>MnistData<\/code>, a class that exposes the following functions:<\/p>\n<ul>\n<li><code>load<\/code> \u2013 responsible for asynchronously loading the image and label data<\/li>\n<li><code>nextTrainBatch<\/code> \u2013 load the next training batch<\/li>\n<li><code>nextTestBatch<\/code> \u2013 load the next test batch<\/li>\n<li><code>nextBatch<\/code> \u2013 a generic function to return the next batch, depending on whether it is in the training set or test set<\/li>\n<\/ul>\n<p>For the purposes of getting started, this article will only step through the <code>load<\/code> function.<\/p>\n<h3 id=\"load\"><code>load<\/code><\/h3>\n<pre><code class=\"language-javascript\">44 async load() {\n45 \/\/ Make a request for the MNIST sprited image.\n46 const img = new Image();\n47 const canvas = document.createElement('canvas');\n48 const ctx = canvas.getContext('2d');\n<\/code><\/pre>\n<p><code>async<\/code> <a href=\"https:\/\/thekevinscott.com\/tensorflowjs-hello-world\/#async-and-await\">is a relatively new language feature in Javascript<\/a> for which you will need a transpiler.<\/p>\n<p>The <code>Image<\/code> object is a native DOM function that represents an image in memory, and it provides callbacks for when the image is loaded along with access to the image attributes. <code>canvas<\/code> is another DOM element that provides easy access to pixel arrays and processing by way of <code>context<\/code>.<\/p>\n<p>Since both of these are DOM elements, if you&rsquo;re working in Node.js (or a Web Worker) you won&rsquo;t have access to these elements. For an alternative approach see below.<\/p>\n<h3 id=\"imgrequest\"><code>imgRequest<\/code><\/h3>\n<pre><code class=\"language-javascript\">49 const imgRequest = new Promise((resolve, reject) =&gt; {\n50 img.crossOrigin = '';\n51 img.onload = () =&gt; {\n52 img.width = img.naturalWidth;\n53 img.height = img.naturalHeight;\n<\/code><\/pre>\n<p>The code initializes a new promise that will be resolved once the image is loaded successfully. <em>(This example does not explicitly handle the error state.)<\/em><\/p>\n<p><code>crossOrigin<\/code> is an <code>img<\/code> attribute that allows for the loading of images across domains, and gets around CORS (cross-origin resource sharing) issues when interacting with the DOM. <code>naturalWidth<\/code> and <code>naturalHeight<\/code> refer to the original dimensions of the loaded image, and serve to enforce that the image&rsquo;s size is correct when performing calculations.<\/p>\n<pre><code class=\"language-javascript\">55 const datasetBytesBuffer =\n56 new ArrayBuffer(NUM_DATASET_ELEMENTS * IMAGE_SIZE * 4);\n57\n58 const chunkSize = 5000;\n59 canvas.width = img.width;\n60 canvas.height = chunkSize;\n<\/code><\/pre>\n<p>The code initializes a new buffer to contain every pixel of every image. It multiplies the total number of images by the size of each image by the number of channels (4).<\/p>\n<p>I <em>believe<\/em> that <code>chunkSize<\/code> is used to prevent the UI from loading too much data into memory at once, though I&rsquo;m not 100% sure.<\/p>\n<pre><code class=\"language-javascript\">62 for (let i = 0; i &lt; NUM_DATASET_ELEMENTS \/ chunkSize; i++) {\n63 const datasetBytesView = new Float32Array(\n64 datasetBytesBuffer, i * IMAGE_SIZE * chunkSize * 4,\n65 IMAGE_SIZE * chunkSize);\n66 ctx.drawImage(\n67 img, 0, i * chunkSize, img.width, chunkSize, 0, 0, img.width,\n68 chunkSize);\n69\n70 const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);\n<\/code><\/pre>\n<p>This code loops through every image in the sprite and <a href=\"https:\/\/thekevinscott.com\/tensors-in-javascript#typed-arrays\">initializes a new <code>TypedArray<\/code> for that iteration<\/a>; then, the context image gets a chunk of the image drawn. Finally, that drawn image is turned into image data using context&rsquo;s <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/API\/CanvasRenderingContext2D\/getImageData\"><code>getImageData<\/code><\/a> function, which returns an object representing pixels.<\/p>\n<pre><code class=\"language-javascript\">72 for (let j = 0; j &lt; imageData.data.length \/ 4; j++) {\n73 \/\/ All channels hold an equal value since the image is grayscale, so\n74 \/\/ just read the red channel.\n75 datasetBytesView[j] = imageData.data[j * 4] \/ 255;\n76 }\n77 }\n<\/code><\/pre>\n<p>We loop through the pixels, and divide by 255 (the maximum possible value of a pixel) to clamp the values between 0 and 1. Only the red channel is necessary, since its a grayscale image.<\/p>\n<pre><code class=\"language-javascript\">78 this.datasetImages = new Float32Array(datasetBytesBuffer);\n79\n80 resolve();\n81 };\n82 img.src = MNIST_IMAGES_SPRITE_PATH;\n83 });\n<\/code><\/pre>\n<p>This line takes the buffer, and recasts it into a new <code>TypedArray<\/code> that holds our pixel data, and then resolves the Promise. The last line (setting the <code>src<\/code>) actually begins loading the image, which starts the function.<\/p>\n<p>One thing that confused me at first <a href=\"https:\/\/thekevinscott.com\/tensors-in-javascript#typed-arrays\">was the behavior of <code>TypedArray<\/code> in relation to its underlying data buffer<\/a>. You might notice that <code>datasetBytesView<\/code> is set within the loop, but is never returned. Under the hood, <code>datasetBytesView<\/code> is referencing the buffer <code>datasetBytesBuffer<\/code> (with which it is initialized); when the code updates the pixel data, it is indirectly editing the values of the buffer itself, which in turn is recast into a new <code>Float32Array<\/code> on line 78.<\/p>\n<h2 id=\"fetching-image-data-outside-of-the-dom\">Fetching image data outside of the DOM<\/h2>\n<p>If you&rsquo;re in the DOM, you should use the DOM. The browser (through <code>canvas<\/code>) takes care of figuring out the format of images and translating buffer data into pixels. But if you&rsquo;re working outside the DOM (say, in Node.js, or a Web Worker), you&rsquo;ll need an alternative approach.<\/p>\n<p><code>fetch<\/code> provides a mechanism, <code>response.arrayBuffer<\/code>, which gives you access to a file&rsquo;s underlying buffer. We can use this to read the bytes manually, avoiding the DOM entirely. Here&rsquo;s an alternative approach to writing the above code (this code requires <code>fetch<\/code>, which can be polyfilled in Node with something like <a href=\"https:\/\/github.com\/matthew-andrews\/isomorphic-fetch\"><code>isomorphic-fetch<\/code><\/a>):<\/p>\n<pre><code class=\"language-javascript\">const imgRequest = fetch(MNIST_IMAGES_SPRITE_PATH).then(resp =&gt; resp.arrayBuffer()).then(buffer =&gt; {\nreturn new Promise(resolve =&gt; {\nconst reader = new PNGReader(buffer);\nreturn reader.parse((err, png) =&gt; {\nconst pixels = Float32Array.from(png.pixels).map(pixel =&gt; {\nreturn pixel \/ 255;\n});\nthis.datasetImages = pixels;\nresolve();\n});\n});\n});\n<\/code><\/pre>\n<p>This returns an array buffer for the particular image. When writing this, I first attempted to parse the incoming buffer myself, which I wouldn&rsquo;t recommend. (If you <em>are<\/em> interested in doing that, <a href=\"http:\/\/www.libpng.org\/pub\/png\/spec\/1.2\/PNG-Structure.html\">here&rsquo;s some information on how to read an array buffer for a png<\/a>.) Instead, I elected to <a href=\"https:\/\/github.com\/arian\/pngjs\">use <code>pngjs<\/code><\/a>, which handles the <code>png<\/code> parsing for you. When dealing with other image formats, you&rsquo;ll have to figure out the parsing functions yourself.<\/p>\n<h1 id=\"just-scratching-the-surface\">Just scratching the surface<\/h1>\n<p>Understanding data manipulation is a crucial component of machine learning in Javascript. By understanding our use cases and requirements, we can use a few key functions to elegantly format our data correctly for our needs.<\/p>\n<p>The Tensorflow.js team is continuously changing the underlying data API in Tensorflow.js. This can help accommodate more of our needs as the API evolves. This also means that it&rsquo;s worth staying abreast of <a href=\"https:\/\/github.com\/tensorflow\/tfjs\">developments to the API<\/a> as Tensorflow.js continues to grow and be improved.<\/p>"},{"title":"Hello World with Tensorflow.js","link":"https:\/\/thekevinscott.com\/tensorflowjs-hello-world\/","pubDate":"Thu, 17 May 2018 10:00:00 +0000","guid":"https:\/\/thekevinscott.com\/tensorflowjs-hello-world\/","description":"<p>Up until fairly recently, just getting started writing your first line of machine learning code required a hefty upfront investment in time and money. For example, just last year <a href=\"http:\/\/thekevinscott.com\/deep-learning-cryptocurrency-pc-1-hardware\/\">I built my own PC specifically for machine learning<\/a>. I researched the parts and assembled it myself. Just doing that cost me around $1600 and 30 hours of setup time, and I&rsquo;m still trying to wrangle the computer&rsquo;s configuration, libraries, and make it work with various frameworks.<\/p>\n<p>The good news is that getting started with machine learning today has never been easier. In fact, if you&rsquo;re reading this it means you already have the tools you need to dive right in. <strong>You can now learn the machine learning framework Tensorflow right in your browser, using Javascript.<\/strong><\/p>\n<p><img src=\"googleio.png\" alt=\"Google I\/O 2018\"><\/p>\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=OmofOvMApTU\">Tensorflow.js was released<\/a> at Google&rsquo;s I\/O 2018. <a href=\"http:\/\/thekevinscott.com\/reasons-for-machine-learning-in-the-browser\/\">Running machine learning in the browser opens up a world of use cases<\/a>, and is a great opportunity to use Javascript to learn about machine learning concepts and frameworks.<\/p>\n<p>If you&rsquo;re new to Javascript or it&rsquo;s been a while since you&rsquo;ve written any frontend code, some of the recent languages changes might throw you for a loop. I&rsquo;ll walk through the basics of modern Javascript you&rsquo;ll need to get the basic Tensorflow.js examples running.<\/p>\n<h1 id=\"setup-tutorial\">Setup Tutorial<\/h1>\n<p><strong>All you need to run Tensorflow.js is your web browser<\/strong>. It&rsquo;s easy to lose sight amongst all the talk of transpilers, bundlers, and packagers, but all you need is a web browser to run Tensorflow.js. The code you develop locally is the same code you&rsquo;ll be able to ship to your users to run on their browsers.<\/p>\n<p>Let&rsquo;s see three quick ways to get the Hello World example working without installing anything. I&rsquo;ll be using the <a href=\"https:\/\/js.tensorflow.org\/#getting-started\">Getting Started code<\/a> from the Tensorflow.js documentation.<\/p>\n<h2 id=\"getting-started-with-your-browser-console\"><code>Getting Started<\/code> with your Browser Console<\/h2>\n<p>Every modern web browser ships with some sort of interactive Javascript Console built in. I use Chrome, which includes a Javascript Console you can open with &ldquo;View &gt; Developer &gt; Javascript Console&rdquo;.<\/p>\n<p><img src=\"chrome-console.gif\" alt=\"A GIF demonstrating how to open the Javascript Console in Chrome\"><\/p>\n<p>This Javascript Console lets you write Javascript and execute it immediately. We&rsquo;ll use this to run the Getting Started example from the <a href=\"https:\/\/js.tensorflow.org\/#getting-started\">Tensorflow.js docs<\/a>.<\/p>\n<p>First, you&rsquo;ll need to include the Tensorflow.js Javascript file. A hosted version of the file is available via the <a href=\"https:\/\/www.webopedia.com\/TERM\/C\/CDN.html\">Content Delivery Network (CDN)<\/a> below. A quick way to include an external <code>.js<\/code> file via the console is:<\/p>\n<pre><code>var script = document.createElement('script');\nscript.src = &quot;https:\/\/cdn.jsdelivr.net\/npm\/@tensorflow\/tfjs@0.10.0&quot;;\ndocument.getElementsByTagName('head')[0].appendChild(script);\n<\/code><\/pre>\n<p>Copy and paste this into your Javascript Console and you&rsquo;ll have a copy of Tensorflow saved as the variable <code>tf<\/code>. (If you type <code>tf<\/code> in your console, you&rsquo;ll see a reference to it.) You can then copy and paste the rest of the Getting Started example (the Javascript between the second <code>&lt;script&gt;<\/code> tag) by pasting it directly into your console.<\/p>\n<h2 id=\"getting-started-with-a-javascript-hosting-platform\">Getting Started with a Javascript hosting platform<\/h2>\n<p>An alternative approach is to use an online Javascript hosting platform. Three popular ones are <a href=\"https:\/\/codepen.io\/\">CodePen<\/a>, <a href=\"https:\/\/jsfiddle.net\/\">JSFiddle<\/a>, and <a href=\"https:\/\/jsbin.com\/\">JSBin<\/a>. These platforms can automatically include scripts for you and take care of transpiling your code in the browser, which makes getting started a cinch.<\/p>\n<p>You can view <a href=\"https:\/\/codepen.io\/thekevinscott\/pen\/aGapZL\">the following example on Codepen<\/a> to see an implementation working. Make sure to open your browser console, as explained above, to see the output.<\/p>\n<p><img src=\".\/codepen.jpg\" alt=\"Codepen implementation of Getting Started from Tensorflow.js\"><\/p>\n<h2 id=\"getting-started-locally\">Getting Started locally<\/h2>\n<p>Finally, a third option for getting Tensorflow.js working involves saving the code as an <code>.html<\/code> file and opening it locally on your computer. And you don&rsquo;t need a web server to do this!<\/p>\n<p>Copy the html code into a file, and open it in your web browser. For instance, if you save the file onto your desktop and you&rsquo;re on a Mac, you might open it in your browser with the following URL:<\/p>\n<p><code>file:\/\/\/Users\/YOURNAME\/Desktop\/sample.html<\/code><\/p>\n<p>It is important to note that viewing <code>html<\/code> files this way introduces limitations, including issues with referencing relative links, handling ajax calls and security, among other things. But it&rsquo;s a quick and easy way to get something running in your browser.<\/p>\n<h1 id=\"the-modern-javascript-development-workflow\">The Modern Javascript Development Workflow<\/h1>\n<p>Hopefully by this point, you can see how easy it is to get something basic to show up in your browser. If you begin looking at the Tensorflow.js examples, you might be thinking<\/p>\n<ul>\n<li>how do I organize my files?<\/li>\n<li>how do I manage third party libraries in my code?<\/li>\n<li>what&rsquo;s with these syntax errors?<\/li>\n<\/ul>\n<p>As soon as you move beyond the basic Hello World example above and into some of the other examples, you&rsquo;ll begin to run into syntax issues and organization issues, and that&rsquo;s where a strong Javascript pipeline will be your best friend.<\/p>\n<h2 id=\"a-little-bit-of-javascript-history\">A little bit of Javascript history<\/h2>\n<p>As our expectations for web apps has grown over the past decade, the front-end ecosystem has exploded in complexity. Javascript in particular has matured a lot as a programming language, adopting a number of forward-thinking changes to the language while continuing to support one of the largest userbases of any programming language.<\/p>\n<p>New changes to the language spec are referenced with acronyms like <code>ES5<\/code>, <code>ES6<\/code>, <code>ES2015<\/code>, <code>E2016<\/code>. <code>ES<\/code> stands for <code>ECMAScript<\/code> and <a href=\"https:\/\/benmccormick.org\/2015\/09\/14\/es5-es6-es2016-es-next-whats-going-on-with-Javascript-versioning\/\">Javascript is based on this standard<\/a>. <code>5<\/code> and <code>6<\/code> were traditionally used to refer to versions of the standard, but nowadays years are used for additional clarity.<\/p>\n<p><a href=\"http:\/\/kangax.github.io\/compat-table\/es6\/\">Modern browser support for ES6 is spotty<\/a>. Some cutting edge or proposed features are not yet supported, and older browsers (in particular IE) will never support the latest spec. Because of this instability, if you want to reach the widest audience possible you use something called a <a href=\"https:\/\/dev.to\/kayis\/4-Javascript-bundlers-2g4b\">bundler or transpiler<\/a>, which is a piece of software that converts your Javascript code written with modern conveniences into a version with wide-spread adoption (ES5 is widely supported and is generally a good target).<\/p>\n<p>Many of the Tensorflow.js examples make use of new syntax that is not yet widely supported in browsers and requires transpiling. I&rsquo;ll explain the syntax first and then explain how to get them working.<\/p>\n<h3 id=\"import-and-export\"><code>import<\/code> and <code>export<\/code><\/h3>\n<p><code>import<\/code> and <code>export<\/code> are two bits of syntax recently introduced into Javascript for importing modules. The saga of Javascript modules is <a href=\"https:\/\/ponyfoo.com\/articles\/brief-history-of-modularity\">long and winding<\/a>, but the <a href=\"https:\/\/insights.untapt.com\/webpack-import-require-and-you-3fd7f5ea93c0\">community has largely settled on <code>import<\/code> over <code>require<\/code><\/a>. Unfortunately, as of May 2018 <code>import<\/code> is not supported by any browsers, so to use it you need to use a transpiler.<\/p>\n<p>In the Getting Started docs, you&rsquo;ll see an example of <code>import<\/code> upfront:<\/p>\n<pre><code>import * as tf from '@tensorflow\/tfjs';\n<\/code><\/pre>\n<p>This is basically the same as:<\/p>\n<pre><code>var tf = require('@tensorflow\/tfjs');\n<\/code><\/pre>\n<p>You also might see something like:<\/p>\n<pre><code>import { util, tensor2d } from '@tensorflow\/tfjs';\n<\/code><\/pre>\n<p>Whose equivalent using <code>require<\/code> is:<\/p>\n<pre><code>var tf = require(&quot;@tensorflow\/tfjs&quot;);\nvar util = tf.util;\nvar tensor2d = tf.tensor2d\n<\/code><\/pre>\n<h3 id=\"async-and-await\"><code>async<\/code> and <code>await<\/code><\/h3>\n<p>Javascript has traditionally been used heavily with UIs, which perform a lot of asynchronous actions. There have been three broad patterns for handling asynchronous code over the years: <a href=\"https:\/\/medium.com\/@stevekonves\/three-Javascript-async-patterns-1d2e7094860a\">callbacks, promises, and async\/await<\/a>.<\/p>\n<p><code>async<\/code>\/<code>await<\/code> provides a way of defining asynchronous functions in a synchronous way. <a href=\"https:\/\/js.tensorflow.org\/tutorials\/webcam-transfer-learning.html\">Many of the Tensorflow.js examples<\/a> make use this <code>async<\/code> \/ <code>await<\/code> syntax.<\/p>\n<p>Here&rsquo;s two versions of the same code, the first written with <code>async<\/code>\/<code>await<\/code>, the second using promises:<\/p>\n<pre><code>\/\/ With async\/await\nasync function loadMobilenet() {\nconst mobilenet = await tf.loadModel(\n'https:\/\/storage.googleapis.com\/tfjs-models\/tfjs\/mobilenet_v1_0.25_224\/model.json');\n\/\/ Return a model that outputs an internal activation.\nconst layer = mobilenet.getLayer('conv_pw_13_relu');\nreturn tf.model({inputs: model.inputs, outputs: layer.output});\n});\n<\/code><\/pre>\n<pre><code>\/\/ With promises\nfunction loadMobilenet() {\nreturn tf.loadModel('https:\/\/storage.googleapis.com\/tfjs-models\/tfjs\/mobilenet_v1_0.25_224\/model.json').then(function (mobilenet) {\n\/\/ Return a model that outputs an internal activation.\nconst layer = mobilenet.getLayer('conv_pw_13_relu');\nreturn tf.model({inputs: model.inputs, outputs: layer.output});\n});\n});\n<\/code><\/pre>\n<hr>\n<p>Both of these language features - <code>import<\/code>\/<code>export<\/code> and <code>async<\/code>\/<code>await<\/code> - make writing Javascript more pleasant. Let&rsquo;s next see the tools we need to use them in our own code.<\/p>\n<h2 id=\"javascript-tooling\">Javascript Tooling<\/h2>\n<p>On the Getting Started docs, you&rsquo;ll see:<\/p>\n<blockquote>\n<p><strong>Note<\/strong>: Because we use ES2017 syntax (such as <code>import<\/code>), this workflow assumes you are using a bundler\/transpiler to convert your code to something the browser understands. See our examples to see how we use Parcel to build our code. However you are free to use any build tool that you prefer.<\/p>\n<\/blockquote>\n<p>Let&rsquo;s talk about build tools.<\/p>\n<h3 id=\"bundlers\">Bundlers<\/h3>\n<p><img src=\"conductor.jpg\" alt=\"Conductor\" title=\"Conductor by Rob Swystun https:\/\/www.flickr.com\/photos\/rob_swystun\/8098008837\/in\/photolist-dkApU2-KcT4m-4FRtTt-bs1ie-4FaQwJ-n4ZLz-5H5h5h-9QyqcV-HMKLpZ-bRcaTr-8AJzKR-o1hz5g-mUja5-4hde2s-ojw5ER-o1hzfM-7QTcn-baxtwT-o1gyBW-PZwwc-9Lqwso-o1gwTq-q6JLU3-4tpd7s-6utd7E-afAcD1-eQ5nNq-7k6Kmu-TZwnt4-hzhqsc-QW7UrX-6Sgmk9-di55YZ-c5g9mh-4sJY58-66uZkH-nuSDiR-tiR5Un-62C3pm-6GkQ63-5mXNoS-9rBtDY-8eJvZq-26reTMP-6o1GgZ-7nJCtp-kqpcEr-7r1AZJ-RAtTeU-8nX15C\"><\/p>\n<p>Bundlers have taken on the role of conductor of the orchestra of growing front-end codebases. A bundler is a program that takes your Javascript code and &ldquo;bundles&rdquo; it up into a compatible file for the browser. Bundlers will also transpile code (convert ES2018 code to ES5, along with other dialects like React or Typescript, using something like <a href=\"https:\/\/babeljs.io\/\"><code>babel<\/code><\/a>), set up &ldquo;hot reloading&rdquo; to refresh the browser with code changes without reloading the page, and many other things to make front-end development better.<\/p>\n<p><a href=\"https:\/\/gruntjs.com\/\">Grunt<\/a> and <a href=\"http:\/\/gulpjs.com\/\">Gulp<\/a> used to be popular bundlers but have recently fallen out of favor to <a href=\"https:\/\/webpack.js.org\/\"><code>webpack<\/code><\/a>. Other bundlers include <a href=\"https:\/\/parceljs.org\/\"><code>parcel<\/code><\/a> and <a href=\"https:\/\/rollupjs.org\/guide\/en\"><code>rollup<\/code><\/a>. The Tensorflow.js examples use <code>parcel<\/code>.<\/p>\n<h3 id=\"package-managers\">Package managers<\/h3>\n<p>Often, when encountering a Javascript library, you&rsquo;ll see installation instructions like <code>yarn add @tensorflow\/tfjs<\/code> or <code>npm install @tensorflow\/tfjs<\/code>.<\/p>\n<p><a href=\"https:\/\/yarnpkg.com\/en\/\"><code>yarn<\/code><\/a> and <a href=\"https:\/\/www.npmjs.com\/\"><code>npm<\/code><\/a> are both package managers. They&rsquo;re command line tools used to install and keep track of your third party Javascript dependencies.<\/p>\n<p><code>yarn<\/code> and <code>npm<\/code> are pretty similar and the decision of which one to use is largely up to personal preference (though you&rsquo;ll find plenty of holy wars online if you&rsquo;re into that sort of thing).<\/p>\n<p>Either one will save your dependencies into a <code>package.json<\/code> file which should be checked into your git repository. This file will enable other developers to quickly install all the necessary dependencies for your project and get things running quickly.<\/p>\n<hr>\n<p>To get all these goodies, the first step is to install <code>npm<\/code> or <code>yarn<\/code>, along with <code>Node.js<\/code>. Once those are in place, you can follow the instructions on any of the Tensorflow.js examples and they should work out of the box. Usually, getting set up with a new front-end project using these tools is a one step process.<\/p>\n<p>Again, you don&rsquo;t need any of these tools to work with these examples, but using them makes things so much easier. If you intend to do any sort of serious Javascript development, I would encourage you to play with these tools, along with other popular Javascript tools like <a href=\"https:\/\/reactjs.org\/\">React<\/a> and <a href=\"https:\/\/www.typescriptlang.org:\/\/www.typescriptlang.org\/\">Typescript<\/a>, which make handling larger codebases much better.<\/p>"},{"title":"Use cases for Tensorflow.js","link":"https:\/\/thekevinscott.com\/reasons-for-machine-learning-in-the-browser\/","pubDate":"Wed, 16 May 2018 08:00:00 +0000","guid":"https:\/\/thekevinscott.com\/reasons-for-machine-learning-in-the-browser\/","description":"<p>Say you\u2019re on your daily commute from Brooklyn to Manhattan to work at that new machine learning startup you just joined a few months ago. Your train is stopped between two stations, the heat is making everyone sticky, and the other riders are buzzing like a beehive.<\/p>\n<p>No worries\u2014your earbuds are in, and you\u2019re ready for some new jams. You pull out your phone and try to load of the newest recommended music, but <em>crap<\/em>\u2014recommendations won\u2019t load because you have no network service. You <em>know<\/em> you downloaded some new music this morning; in fact there it is in your <em>New Music<\/em> playlist, waiting for you. But recommendations? No dice.<\/p>\n<p><a href=\"https:\/\/medium.com\/tensorflow\/introducing-tensorflow-js-machine-learning-in-javascript-bf3eab376db\">Recently, Google introduced TensorFlow.js to the world<\/a>. TensorFlow.js is just one example of a machine learning framework which can employ client-side machine learning. <strong>This means that machine learning algorithms can be run directly on your user\u2019s device, without needing to talk to any server. This would mean that those music recommendations could be suggested to you\u2014no matter if you\u2019re at home, in the office, or on that internet-free, sweaty train.<\/strong><\/p>\n<p>This post will explore unique examples of client-side machine learning, to understand how a JavaScript machine learning framework like TensorFlow.js unlocks huge value for users, businesses, and software developers.<\/p>\n<h1 id=\"training-vs-inference\">Training vs. inference<\/h1>\n<p>Before diving in, it\u2019s worth defining two phases of machine learning: training, and inference. <strong>Training<\/strong> is the process of a machine learning a new capability from an existing set of data. Just like going to school to learn a new school or trade, we teach machines new skills through the training process, by giving them enough concrete examples of something for it to recognize. These examples are called the training dataset.<\/p>\n<p><strong>Inference<\/strong>, on the other hand, is the machine performing the skill once it has run through enough examples from the training session. This is like the computer\u2019s piano recital or a pop-quiz, where the computer applies its skill to data it has never seen before.<\/p>\n<h1 id=\"privacy\">Privacy<\/h1>\n<p>As <a href=\"https:\/\/www.quora.com\/What-are-the-advantages-of-running-a-Machine-Learning-algorithm-using-a-Javascript-ML-library-like-Tensorflow-js-Isnt-better-to-train-a-model-on-the-server-side\">Vinay Muttineni suggests on Quora<\/a>, one of the core benefits to server-side machine learning is that the data being used to train the model, and the usage of the model, can be done entirely on the user\u2019s device. This means no data being fed or stored on a server.<\/p>\n<p><img src=\"23692103834_acc8a0882a_o.jpg\" alt=\"Amazon Alexa. Credit: Rob Albright via Flickr\"><\/p>\n<p>An example of this is <a href=\"https:\/\/medium.com\/@tomasreimers\/compiling-tensorflow-for-the-browser-f3387b8e1e1c\">Tomas Reimers\u2019s suggestion<\/a> of a voice assistant like Google Assistant or Alexa. These systems employ a wake-word (OK, Google or Alexa) using client-side machine learning inference to know to listen for the following commands. The wake-word is local, which ensures that the data sent to the cloud is data the user consents to send up. Data only gets sent to the server once the user explicitly tells the system to start listening. You can think of this usage of client-side machine learning almost as a digital nervous-system reflex\u2014the system starts paying attention using a client-side inference from the wake word (the reflex) before sending up the more robust request from the user to a server (data being sent to the brain).<\/p>\n<h1 id=\"wider-access-and-distribution\">Wider access and distribution<\/h1>\n<p>Love it or hate it, JavaScript has one of the widest install bases of any language and framework. Virtually any modern personal computing device has a web browser installed, and almost any modern web browser can run JavaScript on it.<\/p>\n<p>This simple fact alone means that machine learning is opened to a multitude of new devices which can now leverage machine learning algorithms. A smartphone which may have had its heyday a few years ago can suddenly leverage machine learning frameworks directly on the device.<\/p>\n<p><img src=\"20197650761_aec2b0b88e_o.jpg\" alt=\"A fitting room. Credit: Antonio Rubio via Flickr\"><\/p>\n<p>For example, your favorite streetwear brand may want to develop a virtual fitting room so you can see how you look rocking their latest threads before purchasing. According to Business Insider, <a href=\"http:\/\/www.businessinsider.com\/mobile-apps-most-popular-e-commerce-channel-q4-2017-2018-2\">23% of eCommerce sales are being made through mobile web<\/a>, so the streetwear brand wants to make sure they are capturing that market share. Using client-side machine learning in the browser, the brand could create a virtual fitting room for their customers to use on their phones or computers. With no app to download or anything to install, client-side machine learning in the browser helps lower the barrier to entry for customers to engage with the virtual fitting room, leading to higher conversion and happier customers.<\/p>\n<h1 id=\"distributed-computing\">Distributed computing<\/h1>\n<p>Many of the previous examples have been focused on inference\u2014that is, leveraging a pre-trained machine learning model and data set. There is, additionally, a good use case for distributed computing using client-side machine learning. For example, every time a user engages with a system, he or she should run a machine learning algorithm on their own device, use the data they are engaging with to train the model on their own computer, and then push the new data points to a server to help improve the model. In this way, the user is leveraging his or her own device to run the algorithm, and additionally training the model with the results, which would help future users of the algorithm. This could cut down on the costs for computing power to continuously train a model.<\/p>\n<hr>\n<p>The world of client-side machine learning is growing and developing quickly, despite the perceived limitations in machine learning with Javascript. Investment from companies like Google hand-porting TensorFlow to JavaScript, however, leads to an exciting new realm of possibility for machine learning to become accessible by users leveraging nothing more than the web browser they already use every day.<\/p>"},{"title":"Common Patterns for Analyzing Data","link":"https:\/\/thekevinscott.com\/common-patterns-for-analyzing-data\/","pubDate":"Mon, 12 Mar 2018 07:00:00 +0000","guid":"https:\/\/thekevinscott.com\/common-patterns-for-analyzing-data\/","description":"<p>Data is often messy, and a key step to building an accurate model is a thorough understanding of the data you&rsquo;re working with.<\/p>\n<p>Before I started teaching myself machine learning a few months ago, I hadn&rsquo;t thought much about how to understand data. I&rsquo;d assumed data came in a nice organized package with a bow on top, or at least a clear set of steps to follow.<\/p>\n<p>Looking through others&rsquo; code, I&rsquo;ve been struck by the amount of variation in how people understand, visualize, and analyze identical datasets. I decided to read through several different data analyses in an attempt to find similarities and differences, and see if I can <strong>distill a set of best practices or strategies for understanding datasets to best leverage them for analysis<\/strong>.<\/p>\n<p><img src=\"images\/example_eda.png\" alt=\"Example of an EDA\">\n<capt><a href=\"https:\/\/www.kaggle.com\/tentotheminus9\/r-eda\/notebook\">An example EDA in the wild<\/a><\/capt><\/p>\n<blockquote>\n<p>Data Scientists spend [the] vast majority of their time by [doing] data preparation, not model optimization. - <a href=\"https:\/\/www.kaggle.com\/lorinc\/feature-extraction-from-images\">lorinc<\/a><\/p>\n<\/blockquote>\n<p>In this article, I chose a number of <a href=\"https:\/\/www.kaggle.com\/general\/12796\"><strong>Exploratory Data Analyses<\/strong> (or EDAs)<\/a> that were made publicly available on <a href=\"https:\/\/www.kaggle.com\/\">Kaggle<\/a>, a website for data science. These analyses mix interactive code snippets alongside prose, and can help offer a birds-eye view of the data or tease out patterns in the data.<\/p>\n<p>I simultaneously looked at <a href=\"https:\/\/www.quora.com\/Does-deep-learning-reduce-the-importance-of-feature-engineering\">feature engineering<\/a>, a technique for taking existing data and transforming it in such a way as to impart additional meaning (for example, taking a timestamp and pulling out a <code>DAY_OF_WEEK<\/code> column, which might come in handy for predicting sales in a store).<\/p>\n<p>I wanted to look at a variety of different kinds of datasets, so I chose:<\/p>\n<ul>\n<li><a href=\"#structured-data\">Structured Data<\/a><\/li>\n<li><a href=\"#nlp\">NLP (Natural Language)<\/a><\/li>\n<li><a href=\"#images\">Image<\/a><\/li>\n<\/ul>\n<p>Feel free to <a href=\"#conclusions\">jump ahead to the conclusions below<\/a>, or read on to dive into the datasets.<\/p>\n<h3 id=\"criteria\">Criteria<\/h3>\n<p>For each category I chose two competitions where the submission date had passed, and sorted (roughly) by how many teams had submitted.<\/p>\n<p>For each competition I searched for EDA tags, and chose three kernels that were highly rated or well commented. Final scores did not factor in (some EDAs didn&rsquo;t even submit a score).<\/p>\n<h1 id=\"structured-data\">Structured Data<\/h1>\n<p>A structured data dataset is characterized by spreadsheets containing training and test data. The spreadsheets may contain categorical variables (colors, like <code>green<\/code>, <code>red<\/code>, and <code>blue<\/code>), continuous variables (ages, like <code>4<\/code>, <code>15<\/code>, and <code>67<\/code>) and ordinal variables (educational level, like <code>elementary<\/code>, <code>high school<\/code>, <code>college<\/code>).<\/p>\n<h3 id=\"terms\">Terms<\/h3>\n<ul>\n<li>Imputation \u2014 Filling in missing values in the data<\/li>\n<li>Binning \u2014 Combining continuous data into buckets, a form of feature engineering<\/li>\n<\/ul>\n<p>The training spreadsheet has a target column that you&rsquo;re trying to solve for, which will be missing in the test data. The majority of the EDAs I examined focused on teasing out potential correlations between the target variable and the other columns.<\/p>\n<p>Because you&rsquo;re mostly looking for correlations between different variables, there&rsquo;s only so many ways you can slice and dice the data. For visualizations, there&rsquo;s more options, but even so, <a href=\"https:\/\/towardsdatascience.com\/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f\">some techniques seem better suited for a task at hand than others<\/a>, resulting in a lot of similar-looking notebooks.<\/p>\n<p>Where you can really let your imagination run wild is with feature engineering. Each of the authors I looked at had different approaches to feature engineering, whether it was choosing how to bin a feature or combining categorical features into new ones.<\/p>\n<p>Let&rsquo;s take a deeper look at two competitions, the <a href=\"https:\/\/www.kaggle.com\/c\/titanic\">Titanic competition<\/a>, followed by the <a href=\"https:\/\/www.kaggle.com\/c\/house-prices-advanced-regression-techniques\">House Prices competition<\/a>.<\/p>\n<h2 id=\"titanichttpswwwkagglecomctitanic\"><a href=\"https:\/\/www.kaggle.com\/c\/titanic\">Titanic<\/a><\/h2>\n<p><img src=\"images\/titanic.jpg\" alt=\"Titanic\">\n<capt>by <a href=\"https:\/\/www.flickr.com\/photos\/viaggioroutard\/32746842734\/in\/photolist-RTJ8sN-7reGoc-7rdfgb-7reqrP-7rhfiJ-b4aUUF-bv64XJ-91NeZE-q2mfUz-eFvcpv-VMircS-pzVRNe-dF1MGZ-WCozhj-95TEWr-gkyMjV-75JPMM-7r8VAM-7r8K54-7ricVq-7rcJaC-7r8WZP-7rcUuc-7rgRJC-7rgFnC-oktnFk-7rdZK1-7rhNjL-adsXVC-7rcKPj-4YLEGK-7rhHQs-7r8TaB-7r8SoZ-e5wPAJ-8xv5oh-bvPFMY-7r8V3n-4YTM15-axQxWs-d1iAyQ-918Vc6-2gmvHf-8RCNJR-4YLEBM-b4aUXr-usDiD-c8Yp5o-22nLofY-okatX\">Viaggio Routard<\/a><\/p>\n<p>The Titanic competition is a popular beginners&rsquo; competition, and lots of folks on Kaggle cycle through it. As a result the EDAs tend to be well written and thoroughly documented, and were amongst the clearest I saw. The dataset includes a training spreadsheet with a column <code>Survived<\/code> indicating whether a passenger survived or not, along with other supplementary data like their age, gender, ticket fare price, and more.<\/p>\n<p>The EDAs I chose for analysis were <a href=\"https:\/\/www.kaggle.com\/ash316\/eda-to-prediction-dietanic\">EDA to Prediction Dietanic<\/a> by I, Coder, <a href=\"https:\/\/www.kaggle.com\/dejavu23\/titanic-survival-for-beginners-eda-to-ml\">Titanic Survival for Beginners EDA to ML<\/a> by deja vu, and <a href=\"https:\/\/www.kaggle.com\/jkokatjuhha\/in-depth-visualisations-simple-methods\">In Depth Visualisations Simple Methods<\/a> by Jekaterina Kokatjuhha.<\/p>\n<p>All three of the EDAs start with raw metrics, viewing a few sample rows and printing descriptive information about the CSV file like types of the columns and means and medians.<\/p>\n<p><img src=\"images\/ash316_describe.png\" alt=\"I, Coder describes the dataset\">\n<capt>I, Coder describes the dataset<\/capt><\/p>\n<p>Handling null or missing values is a crucial step in data preparation. One EDA handles this right upfront, while the other two tackle missing values during the feature engineering stages.<\/p>\n<p>I, Coder argues against assigning a random number to fill in missing ages:<\/p>\n<blockquote>\n<p>As we had seen earlier, the Age feature has 177 null values. To replace these NaN values, we can assign them the mean age of the dataset. But the problem is, there were many people with many different ages. We just cant assign a 4 year kid with the mean age that is 29 years. Is there any way to find out what age-band does the passenger lie?? Bingo!!!!, we can check the Name feature. Looking upon the feature, we can see that the names have a salutation like Mr or Mrs. Thus we can assign the mean values of Mr and Mrs to the respective groups.<\/p>\n<\/blockquote>\n<p><img src=\"images\/ash316_age_imputation.png\" alt=\"I, Coder imputing ages\">\n<capt>I, Coder imputing ages<\/capt><\/p>\n<p>Whereas I, Coder combines feature engineering as part of the pure data analysis, the other two authors consider it as a discrete step.<\/p>\n<p>All three kernel authors rely heavily on charts and visualizations to get a high level understanding of the data and find potential correlations. Charts used include factorplots, crosstabs, bar and pie charts, violin plots, and more.<\/p>\n<p><img src=\"images\/dejavu_survival_by_gender.png\" alt=\"deja vu plots survival by gender\">\n<capt>deja vu plots survival by gender<\/capt><\/p>\n<p>You&rsquo;re probably familiar with the phrase &ldquo;women and children first&rdquo; in regards to the Titanic disaster, and for each author, age and gender feature heavily in their initial data analyses. Income background (as indicated by the price of the ticket) also comes in for some detailed inspection.<\/p>\n<blockquote>\n<p>The number of men on the ship is lot more than the number of women. Still the number of women saved is almost twice the number of males saved. The survival rates for a women on the ship is around 75% while that for men in around 18-19%. - I, Coder<\/p>\n<\/blockquote>\n<p>Both Jekaterina and I, Coder draw conclusions based on visual inspection of the charts and data, with Jekaterina writing:<\/p>\n<blockquote>\n<ul>\n<li>Sex: Survival chances of women are higher.<\/li>\n<li>Pclass: Having a first class ticket is beneficial for the survival.<\/li>\n<li>SibSp and Parch: middle size families had higher survival rate than the people who travelled alone or big families. The reasoning might be that alone people would want to sacrifice themselves to help others. Regarding the big families I would explain that it is hard to manage the whole family and therefore people would search for the family members insetad of getting on the boat.<\/li>\n<li>Embarked C has a higher survival rate. It would be interesting to see if, for instance, the majority of Pclass 1 went on board in embarked C.<\/li>\n<\/ul>\n<\/blockquote>\n<p><img src=\"images\/jkok_stacked.png\" alt=\"Jekaterina builds a stacked chart illustrating Pclass and Embarked\"><capt>Jekaterina builds a stacked chart illustrating Pclass and Embarked<\/capt><\/p>\n<p>Deja Vu&rsquo;s EDA records an accuracy number at each step of his analysis, providing a nice bit of feedback as to how important each feature is to the final prediction.<\/p>\n<h3 id=\"feature-engineering\">Feature Engineering<\/h3>\n<p><img src=\"images\/jkok_cabin_feature.png\" alt=\"At the beginning of her EDA, Jekaterina engineers a feature to pull out cabin letter.\"><\/p>\n<p><capt>Jekaterina pulls out cabin letter.<\/capt><\/p>\n<p>When it comes to feature engineering, there&rsquo;s more variability amongst the three kernel authors.<\/p>\n<p>Each author chooses different numbers of buckets for continuous variables like age and fare. Meanwhile, each approaches family relationships differently, with I, Coder building a <code>SibSip<\/code> - whether an individual is alone or with family (either spouse or siblings) - along with <code>family_size<\/code> and <code>alone<\/code>, while Jekaterina pulls out a cabin bin and suggests a feature for <code>child<\/code> or <code>adult<\/code>. I, Coder in particular is aggressive in his culling of irrelevant columns:<\/p>\n<blockquote>\n<p>Name&ndash;&gt; We don&rsquo;t need name feature as it cannot be converted into any categorical value.<\/p>\n<p>Age&ndash;&gt; We have the Age_band feature, so no need of this.<\/p>\n<p>Ticket&ndash;&gt; It is any random string that cannot be categorised.<\/p>\n<p>Fare&ndash;&gt; We have the Fare_cat feature, so unneeded<\/p>\n<p>Cabin&ndash;&gt; A lot of NaN values and also many passengers have multiple cabins. So this is a useless feature.<\/p>\n<p>Fare_Range&ndash;&gt; We have the fare_cat feature.<\/p>\n<p>PassengerId&ndash;&gt; Cannot be categorised.<\/p>\n<\/blockquote>\n<p>For the imputation step, Jekaterina writes:<\/p>\n<blockquote>\n<ul>\n<li>Embarked: fill embarked with a major class<\/li>\n<li>Pclass: because there is only one missing value in Fare we will fill it with a median of the corresponding Pclass<\/li>\n<li>Age: There are several imputing techniques, we will use the random number from the range mean +- std<\/li>\n<\/ul>\n<\/blockquote>\n<p>She concludes her kernel by ensuring the new imputed data did not disrupt the mean:<\/p>\n<p><img src=\"images\/jkok_disrupting_the_mean.png\" alt=\"Jekaterina checking if the imputation disrupted the mean\">\n<capt>Jekaterina checking if the imputation disrupted the mean<\/capt><\/p>\n<h3 id=\"takeaways\">Takeaways<\/h3>\n<p>All three kernel authors spend time up front examining the data and describing the overall shape.<\/p>\n<p>I, Coder looks at the total null values, whereas Jekaterina does that near the end.<\/p>\n<p>Everyone starts with looking at the breakdown of survivors, and then the breakdown of survivors by gender. Cross tabs, factor plots, and violin plots are all popular graphs. Jekaterina also plots some really fascinating graphs.<\/p>\n<p>The authors diverge a bit more when it comes to feature engineering. The authors differ on when to engineer new features, with some treating it as a discrete step and others tackling it during their initial analysis of the data. Choices around binning differ, with age, title and fare all receiving different number of buckets, and only Jekaterina engineering a discrete <code>child<\/code> \/ <code>adult<\/code> feature.<\/p>\n<p>Approaches to imputation differ as well. I, Coder recommends looking at existing data to predict imputation values, whereas Jekaterina ensures her imputed data did not impact the mean.<\/p>\n<p>There&rsquo;s some clear similarities in how the authors think about and approach the data, with the main divergences concerning visualizations and feature engineering.<\/p>\n<h2 id=\"house-priceshttpswwwkagglecomchouse-prices-advanced-regression-techniques\"><a href=\"https:\/\/www.kaggle.com\/c\/house-prices-advanced-regression-techniques\">House Prices<\/a><\/h2>\n<p><img src=\"images\/house_price.jpg\" alt=\"House Prices\">\n<capt>by <a href=\"https:\/\/www.flickr.com\/photos\/120360673@N04\/13855784355\/in\/photolist-n7ovXH-gjrMhS-eDwNQx-fFyccW-eDCzpL-fQDNaP-cA4RYd-cA4MtL-cA4HuL-fKnTsf-cA4LzU-ssvhf2-fKnAV9-daEeEz-gtpvp8-cA4R5o-cA4XQ7-cA4NSA-g2hXow-cA4SQw-eDBSFb-9eW1ng-g2j9Z5-cA4xwN-fFyJkx-9EzH9a-UD524Z-gttD2c-v9HAST-R7GoBF-v9KGVk-irUqRZ-koMrNT-fKv1e1-cA4UCE-ggDSAS-cA4C4A-gi21pE-cA4wdd-qmiDzR-rSUbew-gnDV6V-gjucTQ-fK7FS6-fK7bD6-duD885-fKbUqP-ggrui7-DUB1dh-dsvoVH\">American Advisors Group<\/a><\/capt><\/p>\n<p><a href=\"https:\/\/www.kaggle.com\/c\/house-prices-advanced-regression-techniques\">House Prices<\/a> is another structured data competition. This one boasts many more variables than the Titanic competition, and includes categorical, ordinal and continuous features.<\/p>\n<p>The EDAs I chose for analysis were <a href=\"https:\/\/www.kaggle.com\/pmarcelino\/comprehensive-data-exploration-with-python\">Comprehensive Data Exploration with Python<\/a> by Pedro Marcelino, <a href=\"https:\/\/www.kaggle.com\/xchmiao\/detailed-data-exploration-in-python\">Detailed Data Exploration in Python<\/a> by Angela, and <a href=\"https:\/\/www.kaggle.com\/caicell\/fun-python-eda-step-by-step\">Fun Python EDA Step by Step<\/a> by Sang-eon Park.<\/p>\n<p>While similar in kind to Titanic, it&rsquo;s considerably more complicated.<\/p>\n<blockquote>\n<p>With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.<\/p>\n<\/blockquote>\n<p><img src=\"images\/pmarcelino_saleprice.png\" alt=\"Pedro plots the sale price\">\n<capt>Pedro plots the sale price<\/capt><\/p>\n<p>Angela and Pedro spend some time upfront investigating the initial data like we saw in Titanic. Angela plots the sale price in a histogram and builds a heatmap of the features, while Pedro plots the sale price and draws the following conclusions about the sale price:<\/p>\n<blockquote>\n<ul>\n<li>Deviate from the normal distribution.<\/li>\n<li>Have appreciable positive skewness.<\/li>\n<li>Show peakedness.<\/li>\n<\/ul>\n<\/blockquote>\n<p>Pedro then puts himself in the shoes of a buyer and speculates which features would matter to him, examining the correlations between his picks and the sale price. Later he builds a heatmap to glean a more objective view of feature relationships before zooming in on a couple promising candidates.<\/p>\n<p><img src=\"images\/pmarcelino_features.png\" alt=\"Plotting features against sale price\">\n<capt>Plotting features against sale price<\/capt><\/p>\n<p>By contrast, Angela starts with a more objective approach, listing numerical features by their correlation with <code>SalePrice<\/code>. She also plots features against the sale price, looking for patterns in the data.<\/p>\n<p>Sang-eon starts his kernel with a bang, aggressively culling missing values and outliers (with the exception of <code>LotFrontage<\/code> which he imputes using linear regression). Only then does he begin plotting various features against the sale price.<\/p>\n<p>Pedro waits until looking for correlations among the data to examine the problem of missing data. He asks:<\/p>\n<blockquote>\n<ul>\n<li>How prevalent is the missing data?<\/li>\n<li>Is missing data random or does it have a pattern?<\/li>\n<\/ul>\n<p>The answer to these questions is important for practical reasons because missing data can imply a reduction of the sample size. This can prevent us from proceeding with the analysis. Moreover, from a substantive perspective, we need to ensure that the missing data process is not biased and hidding an inconvenient truth.<\/p>\n<\/blockquote>\n<p>To address these, Pedro plots the totals and percents of missing cells, and chooses to delete columns where 15% or more cells contain missing data. He again relies on subjective choices to determine which features to remove:<\/p>\n<blockquote>\n<p>&hellip;will we miss this data? I don&rsquo;t think so. None of these variables seem to be very important, since most of them are not aspects in which we think about when buying a house (maybe that&rsquo;s the reason why data is missing?). Moreover, looking closer at the variables, we could say that variables like &lsquo;PoolQC&rsquo;, &lsquo;MiscFeature&rsquo; and &lsquo;FireplaceQu&rsquo; are strong candidates for outliers, so we&rsquo;ll be happy to delete them.<\/p>\n<\/blockquote>\n<p>Pedro&rsquo;s approach to the missing data is to either remove columns (features) entirely if they feature a large number of missing values, or remove rows where there are only a few missing. He does not impute any variables. He also establishes a heuristic for tackling outliers:<\/p>\n<blockquote>\n<p>The primary concern here is to establish a threshold that defines an observation as an outlier. To do so, we&rsquo;ll standardize the data. In this context, data standardization means converting data values to have mean of 0 and a standard deviation of 1.<\/p>\n<\/blockquote>\n<p>He concludes that there&rsquo;s nothing to worry from a stastical standpoint, but after returning to visual inspections of the data, deletes a few single data points he finds questionable.<\/p>\n<h3 id=\"feature-engineering-1\">Feature Engineering<\/h3>\n<p>Sang-eon examines the skewness and kurtosis of the data, and performs a Wilxoc-rank Sum test. He concludes his kernel with a very nice looking plot:<\/p>\n<p><img src=\"images\/caicell_3d_plot.png\" alt=\"Sang-eon with a 3d plot of features\">\n<capt>Sang-eon with a 3d plot of features<\/capt><\/p>\n<p>Meanwhile, Pedro discusses Normality, Homoscedasticity, Linearity, and Absence of correlated errors; he normalizes the data and discovers that the other three are resolved as well. Success!<\/p>\n<h3 id=\"takeaways-1\">Takeaways<\/h3>\n<p>None of the three kernel authors does much feature engineering, possibly because there&rsquo;s so many features already present in the dataset.<\/p>\n<p>There&rsquo;s a wide range of strategies for determining how to approach the data, with some authors adopting a subjective strategy and others jumping straight to more objective measurements. There&rsquo;s also no clear consensus on when and how to cull missing values or outliers.<\/p>\n<p>There&rsquo;s more of a focus on statistical methods and integrity overall than in the Titanic competition, possibly because there&rsquo;s so many more features to handle; it&rsquo;s possible that negative statistical effects might have a larger overall effect than in the previous competition.<\/p>\n<h1 id=\"natural-language\">Natural Language<\/h1>\n<p>Natural Language, or NLP, datasets contain words or sentences. While the core data type is the same as in structured data competitions - text - the tools available for analyzing natural language are specialized, resulting in different strategies for analysis.<\/p>\n<p>In its original form, language is not easily decipherable by machine learning models. To get it into an appropriate format for a neural net requires transformation. One popular technique is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bag-of-words_model\">Bag of Words<\/a>, whereby a sentence is effectively transformed into a collection of 0s or 1s indicating whether a particular word is present or not.<\/p>\n<p>Because of this need to transform the data, the first few steps of most notebooks tend to be transforming the text into something machine readable, and that step tends to be similar across notebooks. Once that&rsquo;s done, coders diverge considerably in their approaches and employ a variety of different visualizations and techniques for feature engineering.<\/p>\n<h2 id=\"toxic-comment-classificationhttpswwwkagglecomcjigsaw-toxic-comment-classification-challenge\"><a href=\"https:\/\/www.kaggle.com\/c\/jigsaw-toxic-comment-classification-challenge\">Toxic Comment Classification<\/a><\/h2>\n<p><em>Warning: some of these comments might burn your eyeballs.<\/em><\/p>\n<p><img src=\"images\/toxic.png\" alt=\"Toxic\">\n<capt>by <a href=\"https:\/\/www.flickr.com\/photos\/navaneethkn\/7975953800\/in\/photolist-d9NRbQ-dEZp2L-dQinfV-8ZqMDd-GyaoHJ-oGKC67-5Kj4pp-8YybhA-8Yva5t-7Xh81B-oEZ6w8-4G19MZ-cm3zDf-3c7z32-GXuYz-oyayD-96qUSC-6UYVbr-bjWoro-duyWt-7jD4Nc-6KNazu-op1rhC-DY1c6F-bYNV7U-byHgQ3-cmFxgG-cm3zEC-8m74XJ-oZEcpA-9Kd3gM-7t1H1q-m9ZbHK-9r7F3j-r3kU-8ZPcAU-8RLfw5-TsHmuW-98S9zG-8Mzx2B-c6ZoZ9-7Bpck-8bnj49-4AJUnS-vag3VE-7Bp97-jeiiRG-bHbYQa-dJQMhT-N7SSSy\">navaneethkn<\/a><\/capt><\/p>\n<p>The first NLP competition I looked at was the <a href=\"https:\/\/www.kaggle.com\/c\/jigsaw-toxic-comment-classification-challenge%3CPaste%3E\">Toxic Comment Classifcation Competition<\/a>, which included a dataset featuring a large number of comments from Wikipedia talk page edits that been scored on a toxicity scale, indicating whether they were an insult, obscene, toxic, and more. The challenge was to predict a given comment&rsquo;s toxicity labels.<\/p>\n<p>The EDAs I chose for analysis were <a href=\"https:\/\/www.kaggle.com\/jagangupta\/stop-the-s-toxic-comments-eda\">Stop the S@#$ - Toxic Comments EDA<\/a> by Jagan, <a href=\"https:\/\/www.kaggle.com\/rhodiumbeng\/classifying-multi-label-comments-0-9741-lb\">Classifying Multi-label Comments<\/a> by Rhodium Beng, and <a href=\"https:\/\/www.kaggle.com\/fcostartistican\/don-t-mess-with-my-mothjer\">Don&rsquo;t Mess With My Mothjer<\/a> by Francisco Mendez.<\/p>\n<p>All three authors begin by describing the dataset and pulling a few comments at random. While there&rsquo;s no missing values, there is a lot of noise in the comments, but its unclear whether this noise will be useful in the final data analysis.<\/p>\n<p><img src=\"images\/jagan_category_distribution.png\" alt=\"Jagan plots the distribution of images per toxic category\">\n<capt>Jagan plots the distribution of images per toxic category<\/capt><\/p>\n<blockquote>\n<p>The toxicity is not evenly spread out across classes. Hence we might face class imbalance problems \u2014 Jagan<\/p>\n<\/blockquote>\n<p>Francisco immediately throws away words &ldquo;lacking meaning&rdquo; (e.g., &ldquo;and&rdquo; or &ldquo;the&rdquo;). Using a biplot, he plots out in which category a particular word is most likely to fit.<\/p>\n<blockquote>\n<p>From the biplot most of the words are organized as expected, with some exceptions, fat is associated to identity hate, which is surprissing because is the only non-race word on the bottom of the chart, there are some generic offensive words in the middle of the chart, meaning that they can be used for any awful purposes, other ones as die are exclusively associated to threat which make total sense some others as a$$ (sorry I feel uncomfortable writing it as it appear on the data) is associated with threat, on the middle left of the chart there are some unrecognizable words, which are shown using the code \u2014 Francisco Mendez<\/p>\n<\/blockquote>\n<p>Francisco then asks whether there&rsquo;s a correlation between typos and toxicity.<\/p>\n<blockquote>\n<p>Apparently there is, and surprisingly, mother when is misspelled is never related to hate or threat, but when it is properly spelled there are some hate and threat comments that have the word mother in it &hellip; Is it that people tend to write more carefully when they are threating somebody or when they hate it?<\/p>\n<\/blockquote>\n<p>As Francisco digs further, he finds that in many cases, toxic comments would contain copy-pasted phrases, over and over again. After rerunning his analysis after removing duplicate words, he discovers a new set of correlations.<\/p>\n<blockquote>\n<p>Here there are some new words the ones that can be highlited are gay used mainly on threat comments and hate. Some general mild words as mother, hell, piece, stupid, idiot and shut are used for any toxic general purpose, meantime any derivative of the f-word is used in toxic and obscene comments. Also from the biplot is possible to realize that toxic and insult are similar and the least aggressive ones, while hate and threat are the most serious ones.<\/p>\n<\/blockquote>\n<p>All three authors utilize visualizations of the data to great effect. (Given the subject matter I <a href=\"https:\/\/www.kaggleusercontent.com\/kf\/1999919\/eyJhbGciOiJkaXIiLCJlbmMiOiJBMTI4Q0JDLUhTMjU2In0..VNoIJq1HAQhUsgxohTIQbw.PJ_2btsXr_QE8M47LbU_hnDC0EjxjMc2MU78sA3FgRImN8RkF7EHKV3qblwpWNMFxd1euS-fJoKYaH08WmK_WrOThoricdI7DSnR_bPBpbcYd38Ee2nI79hiLFLmfOYqvouAokDBWFJttLZ1hNABug.HeaSlJMn0s2ZxykWFP8wPg\/__results__.html#Counting-words-different\">won&rsquo;t embed<\/a> the images <a href=\"https:\/\/www.kaggleusercontent.com\/kf\/2666420\/eyJhbGciOiJkaXIiLCJlbmMiOiJBMTI4Q0JDLUhTMjU2In0..-nfZk6vneRueL7x2gvhaTQ.DCk0SonxzFc_r5z4QAPJrjRqu-d_6KgEKOvKmlf4z1zlqQBrZ2qsd1T-r3L_djSWn-W58trDbpNTHFuf8tKiL7Akq-vZwKMkLXv4_1dqiMCIUpa7lhdXW_Ss25eREE8U.HG-fudsk_AA4Iefzv5-n5A\/__results__.html#Wordclouds---Frequent-words:\">but you can find them on each author&rsquo;s kernel<\/a>.)<\/p>\n<p>Rhodium builds a histogram of character length as well as a heatmap between categories, finding that some labels are highly correlated; for instance, an insult is 74% likely to also be obscene.<\/p>\n<p>Jagan plots some word clouds, a heatmap, and a crosstab, observing:<\/p>\n<blockquote>\n<p>A Severe toxic comment is always toxic<br \/>\nOther classes seem to be a subset of toxic barring a few exceptions<\/p>\n<\/blockquote>\n<h3 id=\"feature-engineering-2\">Feature Engineering<\/h3>\n<p>Rhodium lowercases the text, manually turns contractions into things, and manually cleans punctuation.<\/p>\n<p>Jagan plots various features against toxicity looking for correlations. Among the discoveries: spammers tend to be more toxic.<\/p>\n<p><img src=\"images\/jagan_discussing_feature_engineering.png\" alt=\"Jagan discussing feature engineering\">\n<capt>Jagan discussing feature engineering<\/capt><\/p>\n<p>For single words and pairs of words, Jagan and Rhodium both plot the top words using TF-IDF, described as:<\/p>\n<blockquote>\n<p>TF stands for term frequency; essentially how often a word appears in the text &hellip; You can understand it as a normalisation of the relativ text frequency by the overall document frequency. This will lead to words standing out that are characteristic for a specific author, which is pretty much what we want to achieve in order build a prediction model. \u2014 <a href=\"https:\/\/www.kaggle.com\/headsortails\/treemap-house-of-horror-spooky-eda-lda-features\">Heads or Tails<\/a><\/p>\n<\/blockquote>\n<h3 id=\"takeaways-2\">Takeaways<\/h3>\n<p>There seem to be a few best practices all the authors follow, including things like lower casing text, handling contractions, and cleaning up punctuation were all areas the authors looked at. However, some authors also considered that these could be potential features and not just noise (for instance, Francesco discovering a correlation between typos and toxicity).<\/p>\n<h2 id=\"spooky-author-identificationhttpswwwkagglecomcspooky-author-identification\"><a href=\"https:\/\/www.kaggle.com\/c\/spooky-author-identification\">Spooky Author Identification<\/a><\/h2>\n<p><img src=\"images\/halloween.png\" alt=\"A very scary image\">\n<capt>by <a href=\"https:\/\/www.flickr.com\/photos\/gaelvaroquaux\/29632530995\/in\/photolist-M9wt5P-97P7tg-4vzLRF-61r11U-Zt2GHV-cY8aNJ-cY7ZgL-UXxYV9-b4qibP-4tm3wK-7haukg-2JiX6D-cVsp9-cY7XLU-4eeRFT-8PsYcb-cY7X8j-5jUhKv-jVRzRb-97Sb5A-7aBbJH-dZNRw2-smkRf-gxqQt3-aqqb74-gxs9eF-62dAE-FnZJs-62dXh-ZWp8CL-DpiJqc-WuwzSK-FnXvG-Ef1yLk-7omXUv-r5iPPD-pDGN7f-61hnvE-FnZKU-FnXv1-n28gM8-quLHEs-iAsBz-WBEwee-5z3uaW-pEQPCo-efPVZU-YbEZgy-dVfyAo-nHKteU\">Gael Varoquaux<\/a><\/capt><\/p>\n<p>The <a href=\"https:\/\/www.kaggle.com\/c\/spooky-author-identification\">Spooky Author Identification<\/a> provided snippets of text from three horror-themed authors - Edgar Allan Poe, HP Lovecraft, or Mary Wollstonecraft Shelley - and asked participants to build a model capable of predicting which writer authored a particular bit of text.<\/p>\n<p>The EDAs I chose for analysis were <a href=\"https:\/\/www.kaggle.com\/arthurtok\/spooky-nlp-and-topic-modelling-tutorial\">Spooky NLP and Topic Modelling Tutorial<\/a> by Anisotropic, <a href=\"https:\/\/www.kaggle.com\/ambarish\/tutorial-detailed-spooky-fun-eda-and-modelling\">Tutorial Detailed Spooky Fun EDA and Modelling<\/a> by Bukun, and <a href=\"https:\/\/www.kaggle.com\/headsortails\/treemap-house-of-horror-spooky-eda-lda-features\">Treemap House of Horror Spooky EDA LDA Features<\/a> by Heads or Tails.<\/p>\n<p>What&rsquo;s interesting about this dataset is its simplicity; there&rsquo;s very little unstructured data accompanying the text, other than author. As a result, all the EDAs focused solely on different approaches to parsing and analyzing language.<\/p>\n<p>Each author begins by examining the dataset, picking out a few rows, and plotting the number of stories per author. Bukun also looks at word lengths per author, while Anisotropic plots a bar graph of overall word counts:<\/p>\n<p><img src=\"images\/arthurtok_word_freq.png\" alt=\"Anisotropic plots a graph of overall word frequency\"><\/p>\n<blockquote>\n<p>Notice anything odd about the words that appear in this word frequency plot? Do these words actually tell us much about the themes and concepts that Mary Shelley wants to portray to the reader in her stories? These words are all so commonly occuring words which you could find just anywhere else. Not just in spooky stories and novels by our three authors but also in newspapers, kid book, religious texts - really almost every other english text. Therefore we must find some way to preprocess our dataset first to strip out all these commonly occurring words which do not bring much to the table. - Anisotropic<\/p>\n<\/blockquote>\n<p>Each author builds word clouds showing the most frequent words largest:<\/p>\n<p><img src=\"images\/headsortails_wordcloud.png\" alt=\"Heads or Tails builds a word cloud of the 50 most common words\">\n<capt>Heads or Tails builds a word cloud of the 50 most common words<\/capt><\/p>\n<p>Heads or Tails also plots overall sentences, sentence, and word length per author, and discovers subtle but measurable differences between the authors.<\/p>\n<p>Anisotropic and Bukun discuss tokenization, and removing stop words:<\/p>\n<blockquote>\n<p>The work at this stage attempts to reduce as many different variations of similar words into a single term ( different branches all reduced to single word stem). Therefore if we have &ldquo;running&rdquo;, &ldquo;runs&rdquo; and &ldquo;run&rdquo;, you would really want these three distinct words to collapse into just the word &ldquo;run&rdquo;. (However of course you lose granularity of the past, present or future tense). \u2014 Anisotropic<\/p>\n<\/blockquote>\n<p>After the tokenization, stop word removal and lemmatization, Anisotropic rebuilds the graph of top 50 words:<\/p>\n<p><img src=\"images\/arthurtok_word_freq2.png\" alt=\"Anisotropic replots top 50 words after stop word removal\"><\/p>\n<p>Bukun plots his top 10 words overall and by author, finding a different set:<\/p>\n<p><img src=\"images\/bukun_top_ten_words.png\" alt=\"Bukun&amp;rsquo;s top ten words\"><\/p>\n<p>Heads or Tails does this as well, additionally looking at top words by author, after tokenization and stemming.<\/p>\n<p>Bukun and Heads or Tails both then use TF-IDF to to find the most &ldquo;important&rdquo; words for a particular author.<\/p>\n<p><img src=\"images\/headsortails_tfidf.png\" alt=\"Heads or Tails plots the most significant words by author in a bit of a different chart\">\n<capt>Heads or Tails plots the most significant words by author in a bit of a different chart<\/capt><\/p>\n<p>Bukun looks at top bigrams and trigrams (collections of two and three words, respectively).<\/p>\n<p><img src=\"images\/headsortails_wordrelationship.png\" alt=\"Heads or Tails plots the word relationships for bigrams\">\n<capt>Heads or Tails plots the word relationships for bigrams<\/capt><\/p>\n<p>Both Bukun and Heads or Tails perform a sentiment analysis, and look at overall negativity per author.<\/p>\n<p>Bukun uses something called &ldquo;NRC Sentiment lexicon&rdquo; to examine the amount of &ldquo;Fear&rdquo;, &ldquo;Surprise&rdquo;, and &ldquo;Joy&rdquo; in each snippet of text, and visualizes the sentiment of various authors using word clouds, tables, bar charts.<\/p>\n<p><img src=\"images\/bukun_wordcloud_joy.png\" alt=\"Bukun plots a word cloud for words matching Joy\">\n<capt>Bukun plots a word cloud for words matching Joy<\/capt><\/p>\n<h3 id=\"feature-engineering-3\">Feature engineering<\/h3>\n<p>Bukun suggests a number of possible features to add, including number of commas, semicolons, colons, blanks, words with capitals or beginning with capitals, and graphs each one. There do appear to be some correlations for some authors against some of these features.<\/p>\n<p>Heads or Tails notes that:<\/p>\n<blockquote>\n<p>We have already noticed that our three authors can be identified by the names of their most prominent characters; with Mary Shelley writing about \u201cRaymond\u201d or Lovecraft about \u201cHerbert West\u201d. But what about names in general? Are some authors more likely to use names under certain circumstances? After sentence or character length this is one of our first feature-engineering ideas on our quest for knowledge<\/p>\n<\/blockquote>\n<p>From this insight, Heads or Tails relies on the <code>babynames<\/code> package, featuring a list of most popular names per a given year, to add an additional feature to the data.<\/p>\n<p>Bukun and Heads or Tails both look at the gender pronoun breakdown between authors, and Heads or Tails also looks at sentence topics, starting word per author, and last word per author, number of unique words, fraction of distinct words per sentence, dialogue markers and alliteration (which is a cool idea!)<\/p>\n<p><img src=\"images\/headsortails_alliteration.png\" alt=\"heads or tails plots various measurements of alliteration by author\">\n<capt>heads or tails plots various measurements of alliteration by author<\/capt><\/p>\n<p>Heads or Tails ends his kernel with an alluvial plot showcasing feature interaction:<\/p>\n<p><img src=\"images\/headsortails_alluvian.png\" alt=\"Heads or Tails&amp;rsquo; alluvial plot showcasing feature interaction\">\n<capt>Heads or Tails&rsquo; alluvial plot showcasing feature interaction<\/capt><\/p>\n<h3 id=\"takeaways-3\">Takeaways<\/h3>\n<p>This is a fascinating competition to study since the text snippets are longer and there&rsquo;s no structured data to rely on.<\/p>\n<p>Kernels tended to leverage NLP best practices, like lowercasing words, stemming, and tokenization. Kernels also tended to use more advanced techniques than were seen in the Toxic kernels, like sentiment analysis and bi- and trigram analysis.<\/p>\n<p>In both competitions, kernel authors used <a href=\"https:\/\/www.kaggle.com\/headsortails\/treemap-house-of-horror-spooky-eda-lda-features\">TF-IDF<\/a>.<\/p>\n<p>For feature engineering, authors engineered a variety of new features including average words per sentence, punctuation choices, and whether words were duplicated.<\/p>\n<h1 id=\"images\">Images<\/h1>\n<p>So far, the datasets have all been purely text-based (either language, strings or numbers). The last two datasets I chose to look at were image-based.<\/p>\n<p>The two competitions I examined (<a href=\"https:\/\/www.kaggle.com\/c\/data-science-bowl-2017\/\">lung cancer<\/a> and <a href=\"https:\/\/www.kaggle.com\/c\/leaf-classification\/\">leaf classification<\/a>) were both far more domain-specific than the other ones I looked at. As a result, the analyses tended to assume an advanced audience, and authors skipped over rudimentary analysis in favor of exploring different techniques for image analysis.<\/p>\n<p>I saw a great variety in terms of the visualization techniques used, along with features that were engineered. In particular, some authors in the lung cancer competition drew upon existing medical knowledge in order to engineer extremely domain-specific features. I can&rsquo;t speak to how effective those features were, but I can say that the visualizations they produced were stunning.<\/p>\n<h2 id=\"leaf-classificationhttpswwwkagglecomcleaf-classification\"><a href=\"https:\/\/www.kaggle.com\/c\/leaf-classification\/\">Leaf Classification<\/a><\/h2>\n<p>The Leaf Classification competition includes 1,584 masked images of leaves, organized by species. Participants were instructured to build a model capable of classifying new images into one of the categories.<\/p>\n<p>The EDAs I chose for analysis were <a href=\"https:\/\/www.kaggle.com\/lorinc\/feature-extraction-from-images\">Feature Extraction From Images<\/a> by lorinc, <a href=\"https:\/\/www.kaggle.com\/selfishgene\/visualizing-pca-with-leaf-dataset\">Visualizing PCA with Leaf Dataset<\/a> by selfishgene, and <a href=\"https:\/\/www.kaggle.com\/josealberto\/fast-image-exploration\">Fast Image Exploration<\/a> by Jose Alberto.<\/p>\n<p>A good first step is to look at the images of the leaves, which is how two of the EDAs start.<\/p>\n<p><img src=\"images\/selfish_overview.png\" alt=\"selfishgene examines the leaf specimens\">\n<capt>selfishgene examines the leaf specimens<\/capt><\/p>\n<p>Jose plots the various species, and notes that there are 10 images per species. He also looks at the similarity of leaves, within a category, to each other:<\/p>\n<p><img src=\"images\/jose_leaf_morphing.gif\" alt=\"josealberto creates a gif of all the leaves from a category\">\n<capt>Jose compares leaves within a category<\/capt><\/p>\n<p>Meanwhile, lorinc jumps straight into analysis, locating the center of each leaf and applying edge detection. lorinc also converts the outline of the leaf into polar coordinates, in order to more effectively measure the center of the leaf:<\/p>\n<blockquote>\n<p>Later we might want to switch to another measure of centrality, based on how efficient this center is, when we generate a time-series from the shape, using the distance between the edge and the center. One way to do that is just measure the (Euclidean) distance between the center and the edge&hellip; but there is a better way - we project the Cartesian coordinates into Polar coordinates.<\/p>\n<\/blockquote>\n<p>selfishgene chooses to look at the variance direction of the images, writing:<\/p>\n<blockquote>\n<p>Each image can be though of as a different &ldquo;direction&rdquo; in the high dimensional image space<\/p>\n<\/blockquote>\n<p><img src=\"images\/selfish_variance.png\" alt=\"selfishgene looks at the variance of a leaf image\">\n<capt>selfishgene looks at the variance of a leaf image<\/capt><\/p>\n<p>selfishgene also spends some time looking into image reconstruction, model variations around the mean image, and eigen vectors; he explains:<\/p>\n<blockquote>\n<p>&ldquo;The upper most row contains the data distributions of each eigenvector (i.e. the histogram along that &ldquo;direction&rdquo;) The second row contains what we already saw in a previous plot, what we called the variance directions. The forth row contains the median image of leafs. notice that this row is identical for all eigenvectors The third row holds the 2nd percentile images of each eigenvector. it&rsquo;s easier to think of this as the median image minus the eigenvector image multiplied by some constant.<\/p>\n<\/blockquote>\n<p><img src=\"images\/selfish_model_variation.png\" alt=\"selfishgene looks at model variations\">\n<capt>selfishgene looks at model variations<\/capt><\/p>\n<h3 id=\"feature-detection\">Feature detection<\/h3>\n<p>lorinc suggests splitting each sample in half and treating them as two samples (though he doesn&rsquo;t pursue this approach). lorinc finds local maxima and minima from the time series (e.g., the leaf graphed in polar coordinates) and notes:<\/p>\n<blockquote>\n<p>Ok, I surprised myself. This worked out pretty well. I think, I can build an extremely efficient feature from this. But this method is NOT robust yet.<\/p>\n<ul>\n<li>It is not finding the tips, but the points with the greatest distance from center. (look at leaf#19)<\/li>\n<li>It will miserably fail on a more complex, or unfortunately rotated leaf. (look at leaf#78)<\/li>\n<\/ul>\n<\/blockquote>\n<p><img src=\"images\/lorinc_minima_maxima.png\" alt=\"lorinc measures the minima and maxima of a leaf plotted in polar coordinates\">\n<capt>lorinc measures the minima and maxima of a leaf plotted in polar coordinates<\/capt><\/p>\n<p>From there, lorinc talks about mathematical morphology, before discovering the presence of noise around each leaf. He spends some time figuring out how to remove noise from the image and concludes with a lovely image showing a distance map superimposed on the leaf:<\/p>\n<p><img src=\"images\/lorinc_distance.png\" alt=\"lerinc measures the distance from the center of a leaf\">\n<capt>lerinc measures the distance from the center of a leaf<\/capt><\/p>\n<h2 id=\"lung-cancerhttpswwwkagglecomcdata-science-bowl-2017\"><a href=\"https:\/\/www.kaggle.com\/c\/data-science-bowl-2017\/\">Lung Cancer<\/a><\/h2>\n<p>The EDAs I chose for analysis were <a href=\"https:\/\/www.kaggle.com\/gzuidhof\/full-preprocessing-tutorial\">Full Preprocessing Tutorial<\/a> by Guido Zuidhof, <a href=\"https:\/\/www.kaggle.com\/anokas\/exploratory-data-analysis-4\">Exploratory Data Analysis<\/a> by Mikel Bober-Irizar, and <a href=\"https:\/\/www.kaggle.com\/apapiu\/exploratory-analysis-visualization\">Exploratory Analysis Visualization<\/a> by Alexandru Papiu.<\/p>\n<p><img src=\"images\/dicom_info.png\" alt=\"DICOM meta info\">\n<capt>anokas examines the metadata for a single image. Patient date has been rendered anonymous: (1\/1\/1900)<\/capt><\/p>\n<p>The final image competition I looked at was the <a href=\"https:\/\/www.kaggle.com\/c\/data-science-bowl-2017\/\">2017 Data Science Bowl<\/a>, which asked participants to examine a list of images and predict whether the patients had cancer or not. While this competition did feature structured data (meta information embedded in the images themselves), some of this data was anonymized, meaning that features that could have otherwise had predictive value (like the age of the patient) were removed. This meant that all the kernels focusing exclusively on image analysis.<\/p>\n<p>Of the three kernel authors, Guido is the only one to discuss his background working with medical images, and it shows in his domain-specific analysis of the dataset:<\/p>\n<blockquote>\n<p>Dicom is the de-facto file standard in medical imaging. &hellip; These files contain a lot of metadata (such as the pixel size, so how long one pixel is in every dimension in the real world). This pixel size\/coarseness of the scan differs from scan to scan (e.g. the distance between slices may differ), which can hurt performance of CNN approaches. We can deal with this by isomorphic resampling<\/p>\n<\/blockquote>\n<p>The other two authors start their EDAs with more general explorations of the dataset and images themselves.<\/p>\n<p>apapie begins by examining the shape of the images, while anokas starts by looking at the number of scans per patient, total number of scans, and a histogram of DICOM files per patient, along with a quick sanity check to see if there&rsquo;s any relationship between row ID and whether a patient has cancer (none is found, implying that the dataset is well sorted).<\/p>\n<p>Alexandru takes a distribution of pixels and plots them:<\/p>\n<p><img src=\"images\/anapie_pixel_graph.png\" alt=\"anapie plots a distribution of pixels\"><\/p>\n<blockquote>\n<p>Interesting - the distribution seems to be roughly bimodal with a bunch of pixels set at - 2000 - probably for missing values.<\/p>\n<\/blockquote>\n<p>Guido sheds some more light in his EDA on why this is, namely being due to what HU units represent (air, tissue and bone):<\/p>\n<p><img src=\"images\/gzuidhof_hu_units.png\" alt=\"gzuidhof examines HU units distribution\"><\/p>\n<h3 id=\"images-1\">Images<\/h3>\n<p>Each author continues by examining the images themselves:<\/p>\n<p><img src=\"images\/anokas_images.png\" alt=\"anokas looks across a number of patient images\">\n<capt>anokas looks at a set of patient images side by side<\/capt><\/p>\n<p><img src=\"images\/anapie_x_sweep.png\" alt=\"anapie performs a sweep across the X angle\">\n<capt>Alexandru looks at images from the X angle<\/capt><\/p>\n<p><img src=\"images\/anokas_sweep.gif\" alt=\"anokas sweeps across a set of patient images\">\n<capt>anokas builds a gif that moves through a set of patient images<\/capt><\/p>\n<p>Alexandru spent some time exploring whether edge detection could enhance the images.<\/p>\n<p><img src=\"images\/anapie_edges.png\" alt=\"anapie performs edge detection across a variety of DICOM images\">\n<capt>After increasing the threshold, Alexandru was able to render some visually striking images<\/capt><\/p>\n<p>Alexandru concludes that:<\/p>\n<blockquote>\n<p>Interesting results, however the issue here is that the filter will also detect the blood vessels in the lung. So some sort of 3-D surface detection that differentiates between spheres and tubes would be more suitable for this situation.<\/p>\n<\/blockquote>\n<p>Meanwhile, Guido discusses resampling, focusing on the fundemental nature of the DICOM image:<\/p>\n<blockquote>\n<p>A scan may have a pixel spacing of [2.5, 0.5, 0.5], which means that the distance between slices is 2.5 millimeters. For a different scan this may be [1.5, 0.725, 0.725], this can be problematic for automatic analysis (e.g. using ConvNets)! A common method of dealing with this is resampling the full dataset to a certain isotropic resolution. If we choose to resample everything to 1mm1mm1mm pixels we can use 3D convnets without worrying about learning zoom\/slice thickness invariance.<\/p>\n<\/blockquote>\n<p>Later in his EDA, Guido is able to do a 3D plot of the inner cavity by combining multiple DICOM images:<\/p>\n<p><img src=\"images\/gzuidhof_3d.png\" alt=\"3D Segment\">\n<capt>3D plot<\/capt><\/p>\n<p>And another version, after removing the surrounding air to reduce memory:<\/p>\n<p><img src=\"images\/gzuidhof_no_air_3d.png\" alt=\"3D Segment minus air\">\n<capt>3D plot without air<\/capt><\/p>\n<h3 id=\"takeaways-4\">Takeaways<\/h3>\n<p>This competition featured the most differences between kernels of any I saw. Guido, given his familiarity with medical image formats, was able to leverage that background to draw significantly more nuanced conclusions. That said, the other two authors&rsquo; lack of medical familiarity did not prevent them from drawing equally fascinating conclusions.<\/p>\n<h1 id=\"conclusions\">Conclusions<\/h1>\n<p>It turns out that there are some strong patterns that guide approachs to different types of data.<\/p>\n<p>For <strong>Structured Data<\/strong> competitions, data analyses tend to look for correlations between the target variable and other columns, and spend significant amounts of time visualizing correlations or ranking correlations. For smaller datasets there&rsquo;s only so many columns you can examine; analyses in the <a href=\"https:\/\/www.kaggle.com\/c\/titanic\">Titanic competition<\/a> tended to be identical both in which columns to examine and in what order. However, different coders used very different visualization methods, and it seems that there&rsquo;s more creativity in choosing which features to engineer.<\/p>\n<p><strong>Natural Language<\/strong> datasets share similarities across EDAs in how the authors process and manipulate the text, but there&rsquo;s more variability in the features the authors choose to engineer, as well as differing conclusions drawn from those analyses.<\/p>\n<p>Finally, <strong>Image<\/strong> competitions showed the most diversity in terms of analysis and feature engineering. The image competitions I saw were mostly aimed at advanced audiences, and were in fairly domain-specific areas, which may have resulted in the more advanced diversity.<\/p>\n<p>It makes sense that as datasets become more specialized or esoteric, the amount of introductory analysis and explanation decreases, while the amount of deep or specialized analysis increases, and indeed this is what I saw. While there are clear trends across different types of data, domain knowledge plays an important role. In the lung cancer and leaf competitions, bringing domain knowledge to bear resulted in deeper analyses. (Anecdotally, I&rsquo;ve seen this in my own studies; Jeremy Howard, in his <a href=\"https:\/\/fast.ai\">fast.ai<\/a> course, discusses the Rossman dataset, and how the most successful models integrated third party datasets like temperature, store locations, and more, to make more accurate sales predictions.)<\/p>\n<p>There was no consistent process for when authors tackled feature engineering, with some choosing to dive right in as they were beginning their analyses, and others keeping it a discrete step after their initial analyses were complete.<\/p>\n<p>Finally, every notebook I saw was written with a clear audience in mind (beginner or advanced) and this affected the analysis and the writing. More popular competitions, or ones aimed at a more general audience, had EDAs that were exhaustive in their analyses. In these EDAs, I also saw a trend of interweaving supplementary prose or the use of narrative devices alongside the analysis, as tools to help beginners better understand the techniques. By comparison, notebooks aimed at domain experts tended to do away with superfluous framings, and many also skipped over rudimentary data analyses, instead diving straight into domain-specific techniques.<\/p>\n<hr \/>\n<p><em>Special thanks to <a href=\"http:\/\/michellelew.com\/\">Michelle Lew<\/a>, <a href=\"http:\/\/ari.zilnik.com\/\">Ari Zilnik<\/a>, <a href=\"http:\/\/seanmatthe.ws\/\">Sean Matthews<\/a>, and <a href=\"http:\/\/visiongrowth.org\">Bethany Basile<\/a> for reviewing drafts of this article.<\/em><\/p>"},{"title":"Standing on the shoulders of giants","link":"https:\/\/thekevinscott.com\/shoulders-of-giants\/","pubDate":"Tue, 23 Jan 2018 12:00:00 +0000","guid":"https:\/\/thekevinscott.com\/shoulders-of-giants\/","description":"<p>Before I started learning about AI, I thought that training a neural network meant training from scratch. Want to train a network to recognize dogs? Feed it 10,000 dogs and watch it go. Want to recognize images of malignant tumors in a CT scan? Collect a bunch of medical images and train away. I didn&rsquo;t realize you could take the models trained on one set of images and apply them to another, entirely unrelated set. Turns out this is exactly what you can do!<\/p>\n<p>I&rsquo;ve been working through the <a href=\"http:\/\/fast.ai\">fast.ai<\/a> courses, and one of the very first assignments begins by leveraging a pre-trained model (specifically <a href=\"https:\/\/www.kaggle.com\/keras\/vgg16\">VGG16<\/a>, trained to identify 1000 categories of images based on the <a href=\"http:\/\/www.image-net.org\">corpus of data from ImageNet<\/a>). You consume the outputs of the VGG16 model, and use that to inform your custom model to predict whether something is a cat or a dog. So you&rsquo;re basically taking the predictions from an existing model, and layering another layer of predictions on top of that.<\/p>\n<p>It makes perfect sense that you&rsquo;d build models like this; why train from scratch if someone has already done the training work beforehand? At its lower levels, <a href=\"https:\/\/youtu.be\/6kwQEBMandw?t=9m25s\">VGG16 is already capable of recognizing patterns like lines, circles, gradients<\/a>, as well as hundreds of other features. These patterns appear in all images, and so retraining from scratch is redundant.<\/p>\n<p>I could see adapting this for speech recognition. Imagine you&rsquo;re training a model to teach native English speakers to speak Spanish. Instead of training your model from scratch, you could leverage a pre-trained model that already recognizes human voices. Maybe you also leverage a model that understands English and Spanish. Your final model might be a thin layer on top of both of these pre-existing models, that specifically learns how to interpret English accents mangling Spanish words.<\/p>\n<p>I wonder how this will affect design.<\/p>\n<p>There&rsquo;s been some work around <a href=\"https:\/\/blog.floydhub.com\/turning-design-mockups-into-code-with-deep-learning\/\">turning design mockups into code with deep learning<\/a>. Imagine starting with a hand-drawn sketch as your base. You click around the sketch, nudging, encouraging and discouraging the AI: &ldquo;No no, that button should open a new page&rdquo;, or &ldquo;that link should create a new widget&rdquo;. You could use pre-trained models to help you where appropriate. Building a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Create,_read,_update_and_delete\">CRUD app<\/a>? Grab a pre-built model off the shelf that knows how to create, delete, and update. How about Uber for dogs? Grab a pre-built Uber model, and train a new layer on top to recognize dogs instead of humans.<\/p>\n<blockquote>\n<p>This is the cool thing about neural networks: you don\u2019t have to tell them what to find. They decide what they want to find in order to solve your problem. \u2014 <a href=\"https:\/\/www.youtube.com\/watch?v=6kwQEBMandw&amp;feature=youtu.be&amp;t=12m22s\">Jeremy Howard<\/a><\/p>\n<\/blockquote>\n<p>This layered approach to making neural nets smarter - by building on the shoulders of other, pre-trained nets - it blows my mind. It feels like programming from the future. In a future language we could use increasingly abstract language to describe the parameters of what the computer should achieve, and then let the computer figure it out, similar to how nowadays we write instructions that are then interpreted into 0&rsquo;s and 1&rsquo;s.<\/p>"},{"title":"Building a Deep Learning \/ Cryptocurrency PC (#4): AI","link":"https:\/\/thekevinscott.com\/deep-learning-cryptocurrency-pc-4-ai\/","pubDate":"Mon, 01 Jan 2018 09:00:00 +0000","guid":"https:\/\/thekevinscott.com\/deep-learning-cryptocurrency-pc-4-ai\/","description":"<p>Among the buzzwords of this past year, two tower above the rest: deep learning and cryptocurrencies. It seems that everyone I know (in tech) wants to learn these things. And guess what\u200a\u2014\u200aso do I! So much so that I&rsquo;m building my own computer in order to facilitate that learning.<\/p>\n<p>What follows are my notes-to-self as I build a computer to learn about deep learning and cryptocurrency mining. In the previous installments we discussed assembling the hardware, installing the OS, and setting up a mining operation. In this installment I&rsquo;ll talk about installing machine learning software and making sure it&rsquo;s working.<\/p>\n<iframe style=\"height:400px;width:100%;max-width:800px;margin:30px auto;\" src=\"https:\/\/upscri.be\/96fcab\/?as_embed\"><\/iframe>\n<hr \/>\n<p>To recap, in case you&rsquo;re just getting started with this series: my goal in purchasing and building my own PC was to have hardware on hand to run machine learning algorithms on, and bring myself up to speed on the exciting advances happening in AI. With the mining operation up and running, it&rsquo;s time for some AI!<\/p>\n<p>The two packages we&rsquo;re going to install are <a href=\"https:\/\/www.tensorflow.org\/\">TensorFlow<\/a> and <a href=\"https:\/\/github.com\/cdw\/pytorch\">PyTorch<\/a>. I&rsquo;ve heard lots of buzz about these two frameworks and there&rsquo;s tons of good resources for each. This article will walk through basic installation details on Ubuntu for both, plus instructions on setting up Jupyter notebooks. If you want to just pick one, <a href=\"https:\/\/awni.github.io\/pytorch-tensorflow\/\">Awni Hannun provides a great overview of the differences<\/a>.<\/p>\n<h1 id=\"docker\">Docker<\/h1>\n<p><strong>For both TensorFlow and PyTorch, we&rsquo;re going to install using Docker<\/strong>.<\/p>\n<p>Docker provides a virtualized environment that lets you isolate packages and libraries (or in our case, download pre-configured environments).<\/p>\n<h2 id=\"why-docker\">Why Docker?<\/h2>\n<p>The TensorFlow <a href=\"https:\/\/www.tensorflow.org\/install\/install_linux\">docs offer four options for installing TensorFlow<\/a>. My first instinct was to install directly using native <code>pip<\/code>.<\/p>\n<p>I installed CUDA 9, the latest version, which as of this writing Tensorflow doesn&rsquo;t support, and got lost in dependency hell trying to downgrade \/ uninstall \/ reinstall CUDA. A friend recommended I leave all that nonsense alone and just install Docker.<\/p>\n<p>So that&rsquo;s my recommendation: <strong>install docker and avoid dependency hell<\/strong>.<\/p>\n<h2 id=\"step-by-step-docker-installation-instructions\">Step-by-step Docker installation instructions<\/h2>\n<ol>\n<li>Install <code>docker<\/code>. <a href=\"https:\/\/www.digitalocean.com\/community\/tutorials\/how-to-install-and-use-docker-on-ubuntu-16-04\">Instructions for installing Docker on Ubuntu are here<\/a>.<\/li>\n<li>Make sure to follow the optional Step 2 instructions on adding yourself to the correct group so as to avoid needing <code>sudo<\/code>.<\/li>\n<li>Install <code>nvidia-docker<\/code>. By default, <code>docker<\/code> doesn&rsquo;t support leveraging the NVIDIA GPUs effectively, so to take advantage of that hardware <a href=\"https:\/\/github.com\/NVIDIA\/nvidia-docker#xenial-x86_64\">you&rsquo;ll need to install <code>nvidia-docker<\/code><\/a>.<\/li>\n<\/ol>\n<h1 id=\"tensorflow\">TensorFlow<\/h1>\n<p><a href=\"https:\/\/www.tensorflow.org\/\">TensorFlow<\/a> is software for machine learning released by the <a href=\"https:\/\/research.google.com\/teams\/brain\/\">Google Brain<\/a> team in 2015. It provides a set of tools for specifying training instructions, and then translating those instructions into commands that can be run quickly and take advantage of GPUs.<\/p>\n<p>It&rsquo;s a powerful piece of software and since it&rsquo;s release it&rsquo;s picked up a ton of developer mindshare.<\/p>\n<h2 id=\"installation\">Installation<\/h2>\n<p>Assuming you followed the Docker instructions above, you should have <code>nvidia-docker<\/code> available and working. The next step is to <a href=\"https:\/\/www.tensorflow.org\/install\/install_linux#gpu_support\">install the correct Docker image<\/a>.<\/p>\n<p>We want the latest GPU version, so run:<\/p>\n<pre><code>nvidia-docker run -it gcr.io\/tensorflow\/tensorflow:latest-gpu bash\n<\/code><\/pre>\n<p>This command will find the Docker file (remotely or locally), spin it up and put you at a bash prompt. From there, you can <a href=\"https:\/\/www.tensorflow.org\/install\/install_linux#ValidateYourInstallation\">validate your installation<\/a>.<\/p>\n<h2 id=\"validating-your-installation\">Validating your installation<\/h2>\n<p>Start <code>python<\/code>:<\/p>\n<pre><code>python\n<\/code><\/pre>\n<p>Paste this program, <a href=\"https:\/\/www.tensorflow.org\/install\/install_linux#ValidateYourInstallation\">provided by the TensorFlow docs<\/a>:<\/p>\n<pre><code># Python\nimport tensorflow as tf\nhello = tf.constant('Hello, TensorFlow!')\nsess = tf.Session()\nprint(sess.run(hello))\n<\/code><\/pre>\n<p>If you see &ldquo;Hello, TensorFlow!&rdquo; you&rsquo;ll know it&rsquo;s installed correctly.<\/p>\n<h2 id=\"jupyter\">Jupyter<\/h2>\n<p>So we&rsquo;ve got TensorFlow installed and working, but inputting python commands one line at a time is a terrible way to program. A much better approach to get started is to use a Jupyter notebook.<\/p>\n<p><a href=\"http:\/\/jupyter.org\/\">A Jupyter notebook<\/a> is a web interface for documents containing code the evaluations of that code, displayed side by side on the page.<\/p>\n<img src=\".\/jupyter-sample.png\" \/>\n<p>The TensorFlow Docker image <a href=\"https:\/\/www.tensorflow.org\/install\/install_linux#gpu_support\">supports Jupyter notebooks out of the box<\/a>. You&rsquo;ll use a similar command to spin up the Docker image with a few tweaks (If the previous Docker image is still running, shut it down with <code>ctrl-c<\/code>):<\/p>\n<pre><code>nvidia-docker run -it -p 8888:8888 gcr.io\/tensorflow\/tensorflow:latest-gpu\n<\/code><\/pre>\n<p>What this command does is tell docker to expose port <code>8888<\/code> (in Docker) on port <code>8888<\/code> (of your local machine). (Port <code>8888<\/code> is the Jupyter notebook&rsquo;s default port, if you were wondering.)<\/p>\n<p>To make sure that things are working as expected, open up another terminal on your box and run:<\/p>\n<pre><code>curl http:\/\/localhost:8888\n<\/code><\/pre>\n<p>If your machine is like mine, you won&rsquo;t see any output from this <code>curl<\/code> command, but you <em>should<\/em> see a request come in on the terminal running Docker, something like:<\/p>\n<pre><code>[I 02:05:57.912 NotebookApp] 302 GET \/ (xxx.xx.x.x) 0.22ms\n<\/code><\/pre>\n<p>You can access the Docker URL directly on your box. I use a laptop and prefer to access the PC remotely, so to do that, you&rsquo;ll need to be on the same wifi network as your Machine Learning PC and get it&rsquo;s IP.<\/p>\n<p>On your Machine Learning PC, type:<\/p>\n<pre><code>ifconfig\n<\/code><\/pre>\n<p>Look for an IP address that starts with <code>192.168.x.x<\/code>. On your laptop, you&rsquo;ll take the URL provided by Docker and replace <code>localhost<\/code> with this IP. So, if the Machine Learning box&rsquo;s IP is <code>192.168.1.1<\/code>, you would type in your browser something like:<\/p>\n<pre><code>http:\/\/192.168.1.1:8888\/?token=7f6b36f9d6b15272c76003b8c1cdfcdf306dc52ff310\n<\/code><\/pre>\n<p>At this URL, you should see the Jupyter notebook:<\/p>\n<img src=\".\/jupyter-overview.png\" \/>\n<p>Having a Jupyter notebook handy will make it much easier to run through the TensorFlow tutorials.<\/p>\n<h1 id=\"pytorch\">Pytorch<\/h1>\n<p>The other machine learning tool we&rsquo;re going to install is PyTorch.<\/p>\n<p><a href=\"https:\/\/github.com\/cdw\/pytorch\">PyTorch<\/a> was released in 2016 by <a href=\"https:\/\/twitter.com\/fbOpenSource?ref_src=twsrc%5Etfw&amp;ref_url=http%3A%2F%2Fpytorch.org%2F\">Facebook&rsquo;s team<\/a>. It&rsquo;s still fairly new and changing quickly, but it&rsquo;s already picked up a lot of steam in the community.<\/p>\n<h2 id=\"installation-1\">Installation<\/h2>\n<p>Let&rsquo;s install the appropriate pytorch docker image. Run:<\/p>\n<pre><code>nvidia-docker run --rm -ti --ipc=host -p 8888:8888 pytorch\/pytorch:latest\n<\/code><\/pre>\n<p>This puts you right at a bash prompt.<\/p>\n<p>I like learning via example, and <a href=\"http:\/\/pytorch.org\/tutorials\/beginner\/pytorch_with_examples.html\">luckily the Pytorch docs provide plenty of examples to learn from<\/a>. You can run through the tutorials by starting <code>python<\/code> and copy pasting code, but a Jupyter notebook is so much better. So let&rsquo;s get that working.<\/p>\n<h2 id=\"jupyter-1\">Jupyter<\/h2>\n<p>Jupyter doesn&rsquo;t come standard on the <code>pytorch<\/code> docker image, <a href=\"http:\/\/jupyter.org\/install.html\">but it&rsquo;s easy to install<\/a>. In your docker container (that you started above), type:<\/p>\n<pre><code>python3 -m pip install --upgrade pip\npython3 -m pip install jupyter\njupyter notebook\n<\/code><\/pre>\n<p>When I ran <code>jupyter notebook<\/code> for the first time, I got the following error:<\/p>\n<pre><code>OSError: [Errno 99] Cannot assign requested address\n<\/code><\/pre>\n<p>To fix this, I had to provide an explicit IP and allow root (<a href=\"https:\/\/github.com\/ipython\/ipython\/issues\/6193#issuecomment-350613300\">hat tip to this comment<\/a>):<\/p>\n<pre><code>jupyter notebook --allow-root --ip=0.0.0.0\n<\/code><\/pre>\n<h3 id=\"validating-the-jupyter-notebook\">Validating the Jupyter notebook<\/h3>\n<p>From another terminal, run curl:<\/p>\n<pre><code>curl http:\/\/localhost:8888\n<\/code><\/pre>\n<p>And make sure you see logs appear in Docker. If that works, try accessing in the browser at <code>http:\/\/192.168.x.xx:8888<\/code>.<\/p>\n<h3 id=\"saving-the-jupyter-notebook\">Saving the Jupyter notebook<\/h3>\n<p><strong>IMPORTANT:<\/strong> If you exit docker now, you&rsquo;ll lose the installation of Jupyter you just performed. You need to commit those changes as (<a href=\"https:\/\/www.techrepublic.com\/article\/how-to-commit-changes-to-a-docker-image\/\">outlined in this article<\/a>) if you want to avoid installing every time you spin up the container.<\/p>\n<p>First, in another terminal, get the name of the container:<\/p>\n<pre><code>docker ps -l\n<\/code><\/pre>\n<p>This should give you the most recently created container, which should be PyTorch. Then, commit with:<\/p>\n<pre><code>docker commit &lt;CONTAINER_NAME&gt; pytorch\n<\/code><\/pre>\n<p>Refer to our newly named container with:<\/p>\n<pre><code>nvidia-docker run --rm -ti --ipc=host -p 8888:8888 pytorch\n<\/code><\/pre>\n<p>And start up your notebook:<\/p>\n<pre><code>jupyter notebook --allow-root --ip=0.0.0.0\n<\/code><\/pre>\n<p>If that works, great! Try running through one or two of the tutorials to make sure everything&rsquo;s working.<\/p>\n<img src=\".\/pytorch-output.png\" \/>\n<h1 id=\"next-steps\">Next steps<\/h1>\n<p>At this point, you should have both <a href=\"https:\/\/github.com\/cdw\/pytorch\">PyTorch<\/a> and <a href=\"https:\/\/www.tensorflow.org\/\">TensorFlow<\/a> at your disposal.<\/p>\n<p>If you&rsquo;ve made it this far in the series, congratulations! You built a computer, installed an operating system, began mining cryptocurrencies, and set yourself up to begin training computers to do your bidding. You deserve a pat on the back!<\/p>\n<p>Where to go from here, you may ask? First, I&rsquo;d encourage you to subscribe:<\/p>\n<iframe style=\"height:400px;width:100%;max-width:800px;margin:30px auto;\" src=\"https:\/\/upscri.be\/96fcab\/?as_embed\"><\/iframe>\n<p>I&rsquo;m going to continue blogging my adventures learning this stuff, and I&rsquo;d love to share it with you.<\/p>\n<p>If you want an immediate next step, I&rsquo;d recommend <a href=\"https:\/\/www.coursera.org\/learn\/machine-learning\">Andrew Ng&rsquo;s course<\/a>. It&rsquo;s a deep and thorough introduction to the field.<\/p>\n<p>Finally, I&rsquo;ve been collecting various machine learning links and resources to work through once I have a base of knowledge. Some of these may come in handy for you! (I haven&rsquo;t gone through all of these so I can&rsquo;t vouch for them - feel free to recommend others in the comments).<\/p>\n<hr \/>\n<p><a href=\"https:\/\/www.reddit.com\/r\/MachineLearning\/comments\/7nrzhn\/d\">https:\/\/www.reddit.com\/r\/MachineLearning\/comments\/7nrzhn\/d<\/a>_results_from_best_of_machine_learning_2017\/?st=JBZ1N05J&amp;sh=aa234160<\/p>\n<p><a href=\"http:\/\/www.wildml.com\/2017\/12\/ai-and-deep-learning-in-2017-a-year-in-review\/\">http:\/\/www.wildml.com\/2017\/12\/ai-and-deep-learning-in-2017-a-year-in-review\/<\/a><\/p>\n<p><a href=\"https:\/\/explosion.ai\/blog\/prodigy-annotation-tool-active-learning\">https:\/\/explosion.ai\/blog\/prodigy-annotation-tool-active-learning<\/a><\/p>\n<p><a href=\"http:\/\/blog.kaggle.com\/2017\/09\/11\/how-can-i-find-a-dataset-on-kaggle\/\">http:\/\/blog.kaggle.com\/2017\/09\/11\/how-can-i-find-a-dataset-on-kaggle\/<\/a><\/p>\n<p><a href=\"https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/6zvszj\/another_keras_tutorial_for_neural_network\/?st=J7JRX00A&amp;sh=90a37148\">https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/6zvszj\/another_keras_tutorial_for_neural_network\/?st=J7JRX00A&amp;sh=90a37148<\/a><\/p>\n<p><a href=\"https:\/\/www.reddit.com\/r\/MachineLearning\/comments\/70c5zd\/n_google_launches_tensorboard_api_to_enhance\/?st=J7MJ4JWJ&amp;sh=2ece7122\">https:\/\/www.reddit.com\/r\/MachineLearning\/comments\/70c5zd\/n_google_launches_tensorboard_api_to_enhance\/?st=J7MJ4JWJ&amp;sh=2ece7122<\/a><\/p>\n<p><a href=\"https:\/\/medium.com\/@ageitgey\/machine-learning-is-fun-80ea3ec3c471\">https:\/\/medium.com\/@ageitgey\/machine-learning-is-fun-80ea3ec3c471<\/a><\/p>\n<p><a href=\"https:\/\/www.wired.com\/story\/when-websites-design-themselves\/\">https:\/\/www.wired.com\/story\/when-websites-design-themselves\/<\/a><\/p>\n<p><a href=\"https:\/\/chatbotsmagazine.com\/contextual-chat-bots-with-tensorflow-4391749d0077\">https:\/\/chatbotsmagazine.com\/contextual-chat-bots-with-tensorflow-4391749d0077<\/a><\/p>\n<p><a href=\"https:\/\/medium.com\/machine-learning-for-humans\/supervised-learning-740383a2feab\">https:\/\/medium.com\/machine-learning-for-humans\/supervised-learning-740383a2feab<\/a><\/p>\n<p><a href=\"https:\/\/medium.com\/intuitionmachine\/the-brute-force-method-of-deep-learning-innovation-58b497323ae5\">https:\/\/medium.com\/intuitionmachine\/the-brute-force-method-of-deep-learning-innovation-58b497323ae5<\/a><\/p>\n<p><a href=\"https:\/\/hackernoon.com\/how-i-started-with-learning-ai-in-the-last-2-months-251d19b23597?source=userActivityShare-f31f03e60056-1506529741\">https:\/\/hackernoon.com\/how-i-started-with-learning-ai-in-the-last-2-months-251d19b23597?source=userActivityShare-f31f03e60056-1506529741<\/a><\/p>\n<p><a href=\"https:\/\/github.com\/rhdeck\/bostonai\/blob\/master\/README.md\">https:\/\/github.com\/rhdeck\/bostonai\/blob\/master\/README.md<\/a><\/p>\n<p><a href=\"http:\/\/nicodjimenez.github.io\/2017\/10\/08\/tensorflow.html\">http:\/\/nicodjimenez.github.io\/2017\/10\/08\/tensorflow.html<\/a><\/p>\n<p><a href=\"https:\/\/www.reddit.com\/r\/hackernews\/comments\/7dlltw\/a_cookbook_for_machine_learning_vol_1\/?st=JA5IEFE8&amp;sh=b5513326\">https:\/\/www.reddit.com\/r\/hackernews\/comments\/7dlltw\/a_cookbook_for_machine_learning_vol_1\/?st=JA5IEFE8&amp;sh=b5513326<\/a><\/p>\n<p><a href=\"https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7dcog4\/neural_networks_for_beginners_popular_types_and\/?st=JA2YU96R&amp;sh=fc6787ce\">https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7dcog4\/neural_networks_for_beginners_popular_types_and\/?st=JA2YU96R&amp;sh=fc6787ce<\/a><\/p>\n<p><a href=\"https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7centc\/learning_machine_learning_01_machine_learning\/?st=J9WVF138&amp;sh=9b166f71\">https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7centc\/learning_machine_learning_01_machine_learning\/?st=J9WVF138&amp;sh=9b166f71<\/a><\/p>\n<p><a href=\"https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7c8ogk\/simple_deep_learning_model_for_stock_price\/?st=J9W5R9AZ&amp;sh=22315e0b\">https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7c8ogk\/simple_deep_learning_model_for_stock_price\/?st=J9W5R9AZ&amp;sh=22315e0b<\/a><\/p>\n<p><a href=\"http:\/\/blog.kaggle.com\/2017\/11\/27\/introduction-to-neural-networks\/\">http:\/\/blog.kaggle.com\/2017\/11\/27\/introduction-to-neural-networks\/<\/a><\/p>\n<p><a href=\"https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7g5zx9\/predicting_cryptocurrency_prices_with_deep\/?st=JAK71BXS&amp;sh=a283370f\">https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7g5zx9\/predicting_cryptocurrency_prices_with_deep\/?st=JAK71BXS&amp;sh=a283370f<\/a><\/p>\n<p><a href=\"https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7he36r\/what_is_nlp_get_started\/?st=JARMLMP7&amp;sh=2886d0a9\">https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7he36r\/what_is_nlp_get_started\/?st=JARMLMP7&amp;sh=2886d0a9<\/a><\/p>\n<p><a href=\"https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7h7grz\/essential_guide_to_keep_up_with_aimlcv\/?st=JARCN2DT&amp;sh=fe113479\">https:\/\/www.reddit.com\/r\/learnmachinelearning\/comments\/7h7grz\/essential_guide_to_keep_up_with_aimlcv\/?st=JARCN2DT&amp;sh=fe113479<\/a><\/p>\n<p><a href=\"http:\/\/blog.kaggle.com\/2017\/12\/06\/introduction-to-neural-networks-2\/\">http:\/\/blog.kaggle.com\/2017\/12\/06\/introduction-to-neural-networks-2\/<\/a><\/p>\n<p><a href=\"https:\/\/docs.google.com\/presentation\/d\/1kSuQyW5DTnkVaZEjGYCkfOxvzCqGEFzWBy4e9Uedd9k\/preview?imm_mid=0f9b7e&amp;cmp=em-data-na-na-newsltr_20171213&amp;slide=id.g183f28bdc3_0_90\">https:\/\/docs.google.com\/presentation\/d\/1kSuQyW5DTnkVaZEjGYCkfOxvzCqGEFzWBy4e9Uedd9k\/preview?imm_mid=0f9b7e&amp;cmp=em-data-na-na-newsltr_20171213&amp;slide=id.g183f28bdc3_0_90<\/a><\/p>"},{"title":"Building a Deep Learning \/ Cryptocurrency PC (#3): Mining","link":"https:\/\/thekevinscott.com\/deep-learning-cryptocurrency-pc-3-mining\/","pubDate":"Sun, 10 Dec 2017 09:00:00 +0000","guid":"https:\/\/thekevinscott.com\/deep-learning-cryptocurrency-pc-3-mining\/","description":"<p>Among the buzzwords in the tech world of 2017, two tower above the rest: <strong>deep\nlearning<\/strong> and <strong>cryptocurrencies<\/strong>. It seems that everyone I know (in tech)\nwants to learn these things. And guess what \u2014 so do I! So much so that I&rsquo;m\nbuilding my own computer in order to facilitate that learning.<\/p>\n<p>What follows are my notes-to-self as I build a computer to learn about deep\nlearning and cryptocurrency mining. In the previous installments we discussed\n<a href=\"https:\/\/medium.com\/@thekevinscott\/noobs-guide-to-custom-computer-for-cryptocurrency-and-deep-learning-7caa255adfaf\">assembling the\nhardware<\/a>\nand <a href=\"https:\/\/medium.com\/@thekevinscott\/noobs-guide-to-building-a-deep-learning-cryptocurrency-pc-2-the-os-39dd20bd9b21\">installing the\nOS<\/a>.\nIn this installment I&rsquo;ll talk about how to set up a cryptocurrency miner and\nconnect to a pool.<\/p>\n<iframe style=\"height:400px;width:100%;max-width:800px;margin:30px auto;\" src=\"https:\/\/upscri.be\/96fcab\/?as_embed\"><\/iframe>\n<hr \/>\n<p>To recap, in case you&rsquo;re just getting started with this series: my goal in\npurchasing and building my own PC was to have hardware on hand to run machine\nlearning algorithms on, and bring myself up to speed on the exciting advances\nhappening in AI. But in between running training algorithms, my computer (and\nit&rsquo;s hungry hungry GPUs) will sit fallow. We can&rsquo;t have that!<\/p>\n<p>When I&rsquo;m not running the computer over training data, I want to have it mining.\nEven if it&rsquo;s making only a a little profit, it&rsquo;s still better than nothing.<\/p>\n<h1 id=\"cryptocurrencies\">Cryptocurrencies<\/h1>\n<p>Your first instinct when getting into mining might be to try and mine bitcoin.\nThis would almost certainly be a mistake.<\/p>\n<p>Bitcoin today is mined primarly via <a href=\"https:\/\/www.buybitcoinworldwide.com\/mining\/hardware\/\">ASICS\nhardware<\/a>, equipment\nspecialized for mining bitcoins and other cryptocurrencies (but mostly\nbitcoins). Unless your goal in life is to mine bitcoins \u2014 and I suppose there&rsquo;s\nnothing wrong with that \u2014 ASICS hardware is not a good investment. And without\nASICS hardware, it&rsquo;s hard to compete with the other miners.<\/p>\n<p>There are cryptocurrencies especially <a href=\"https:\/\/www.coindesk.com\/scrypt-miners-cryptocurrency-arms-race\/\">designed to prevent mining via\nspecialized hardware, called scrypt\ncoins<\/a>:<\/p>\n<blockquote>\n<p>One of the biggest differences between scrypt and SHA-256 is that the former\nrelies heavily on computing resources aside from the processing unit itself,\nparticularly memory. Conversely, SHA-256 doesn&rsquo;t. This makes it difficult for\nscrypt-based systems to scale up and use lots of computing power, because they\nwould have to use a proportional amount of memory, and that is expensive. \u2014\n<a href=\"https:\/\/www.coindesk.com\/scrypt-miners-cryptocurrency-arms-race\/\">Danny Bradbury,\nCoindesk<\/a><\/p>\n<\/blockquote>\n<p>While more specialized hardware is beginning to come to market, you&rsquo;re probably\nsafe picking one of these scrypt currencies to mine. Ethereum is a good choice,\nso that&rsquo;s what I&rsquo;m going to start with.<\/p>\n<h1 id=\"the-tools\">The tools<\/h1>\n<p>To get started mining, you need\n<a href=\"https:\/\/github.com\/ethereum\/go-ethereum\">ethereum<\/a>, a\n<a href=\"https:\/\/github.com\/ethereum-mining\/ethminer\">miner<\/a>, and a <strong>wallet<\/strong> to send\nthe mined coins to.<\/p>\n<p>First, enable the ethereum repository:<\/p>\n<pre><code>sudo add-apt-repository -y ppa:ethereum\/ethereum\nsudo apt-get update\n<\/code><\/pre>\n<p>Then, install ethereum:<\/p>\n<pre><code>sudo apt-get install ethereum\n<\/code><\/pre>\n<p>Next, you need to install the miner, <code>ethminer<\/code>. (There are other miners as\nwell, like qtMiner, cudaminer, eth-proxy). You can either install via <code>apt-get<\/code>\nor from source; because I&rsquo;m a masochist I chose the latter.<\/p>\n<p>Head on over to the <a href=\"https:\/\/github.com\/ethereum-mining\/ethminer\/releases\">releases\npage<\/a> and download the\nmost recent release:<\/p>\n<pre><code>wget\ntar xvzf ethminer-0.12.0-Linux.tar.gz\n<\/code><\/pre>\n<p>And then try running <code>ethminer<\/code>. You should see something like:<\/p>\n<img src=\".\/ethminer.png\" \/>\n<p>This is good! It&rsquo;s working, it just needs to be configured with options.<\/p>\n<p>Finally, the <strong>wallet<\/strong>. There&rsquo;s lots of wallets available, with Mist being the\nofficially supported version. You don&rsquo;t actually need your wallet to be local;\nit can be hosted anywhere, including on an exchange like Coinbase.<\/p>\n<h1 id=\"pools\">Pools<\/h1>\n<p>Mining cryptocurrencies is kind of like a bunch of people in a field of\nhaystacks looking for needles. Every so often somebody gets lucky and finds one,\nshouts it out to the world, and makes a chunk of money, and then the process\nrepeats.<\/p>\n<p>This is all well and good for that lucky finder, but \u2014 especially nowadays, when\nyou&rsquo;re <a href=\"https:\/\/qz.com\/1055126\/photos-china-has-one-of-worlds-largest-bitcoin-mines\/\">competing against industrial-strength mining\noperations<\/a>\n\u2014 the chances of you solving a particular cryptographic puzzle first are slim to\nnil. If you&rsquo;re a small fry like me, you&rsquo;re better off joining a mining pool.<\/p>\n<p>A mining pool allows a group of miners&rsquo; computers to join forces and work on\nearning cryptocurrency as a team. When a new coin is mined, the profits will be\nshared with the contributors based on the amount of computing power they put in.<\/p>\n<p><a href=\"https:\/\/www.buybitcoinworldwide.com\/ethereum\/mining-pools\/\">https:\/\/www.buybitcoinworldwide.com\/ethereum\/mining-pools\/<\/a><\/p>\n<h2 id=\"ethermine\">ethermine<\/h2>\n<p>The first pool I joined was <a href=\"http:\/\/ethpool.org\/\">ethpool<\/a>, and I subsequently\nswitched to <a href=\"https:\/\/ethermine.org\/\">ethermine<\/a> (which appears to be running the\nsame code? it&rsquo;s unclear) as their payout scheme was more predictable.<\/p>\n<p>To start up mining, I ran:<\/p>\n<pre><code>.\/ethminer --farm-recheck 200 --cuda-parallel-hash 8 --cuda-grid-size 1024 --cuda-streams 16 --cuda-block-size 128 -G -S us1.ethermine.org:4444 -FS eu1.ethermine.org:4444 -O &lt;My_Ethereum_Address&gt;.&lt;My_RigName&gt;\n<\/code><\/pre>\n<p>Here&rsquo;s the definitions for each option:<\/p>\n<ul>\n<li><code>farm-recheck<\/code> \u2014 <em><n> Leave n ms between checks for changed work (default: 500).\nWhen using stratum, use a high value (i.e. 2000) to get more stable hashrate\noutput<\/em><\/li>\n<li><code>cuda-parallel-hash<\/code> \u2014** **<em>Define how many hashes to calculate in a kernel, can\nbe scaled to achieve better performance. Default=4<\/em><\/li>\n<li><code>cuda-block-size<\/code> \u2014 <em>Set the CUDA block work size. Default is 128<\/em><\/li>\n<li><code>cuda-grid-size<\/code> \u2014 <em>Set the CUDA grid size. Default is 8192<\/em><\/li>\n<li><code>cuda-streams<\/code>\u2014 <em>Set the number of CUDA streams. Default is 2<\/em><\/li>\n<\/ul>\n<p>My understanding of <code>farm-recheck<\/code> is, it&rsquo;s an for <a href=\"https:\/\/forum.ethereum.org\/discussion\/5379\/mining-parameters\">option to set how often the\nprogram checks<\/a>\nfor new work to work through. The lower you set it, the lower the chances of\nworking on stale blocks, but set it too low and you might see instability in the\nhashing output.<\/p>\n<p><em>The other options I&rsquo;m not 100% familiar with, and so I just went with the\ndefaults. Feel free to leave a comment if you have a plain english explanation\nof them.<\/em><\/p>\n<p>Finally, you&rsquo;ll pick the servers to connect to \u2014 I chose <code>us1<\/code>and <code>eu1<\/code> \u2014 and\nfinally put in your wallet address and a name to identify your mining computer.\nYou may need to open up the ports <code>4444<\/code> on your machine to allow connections.<\/p>\n<p><em>If you&rsquo;re having trouble with ethminer, there&rsquo;s an active <em><a href=\"https:\/\/gitter.im\/ethereum-mining\/ethminer\">Gitter\nroom<\/a><\/em> as well.<\/em><\/p>\n<p>Once you start mining, you can go to your miner page and check out your stats.\nFor instance, the <a href=\"https:\/\/ethermine.org\/miners\/45d961bddef7bd389d76b22907eb4856a4383aa6\">dashboard view\nshows<\/a>:<\/p>\n<img src=\".\/miner-dashboard.png\" \/>\n<p>Which gives you a nice overview of your stats, and also a rough estimate of your payouts:<\/p>\n<img src=\".\/payouts.png\" \/>\n<h2 id=\"tuning\">Tuning<\/h2>\n<p>Once you&rsquo;ve got your rig mining, you may want to squeeze out every ounce of\nprocessing power. If so, here&rsquo;s a few links to point you in the right direction!<\/p>\n<h3 id=\"the-awesome-wiki-article-at-rethermining\">The awesome wiki article at <code>\/r\/EtherMining<\/code>:<\/h3>\n<p><a href=\"https:\/\/www.reddit.com\/r\/EtherMining\/wiki\/index\">https:\/\/www.reddit.com\/r\/EtherMining\/wiki\/index<\/a><\/p>\n<h3 id=\"a-conversation-on-overclocking\">A conversation on overclocking:<\/h3>\n<p><a href=\"https:\/\/bitcointalk.org\/index.php?topic=1712831.0\">https:\/\/bitcointalk.org\/index.php?topic=1712831.0<\/a><\/p>\n<h3 id=\"a-few-articles-about-tweaking-miners\">A few articles about tweaking miners:<\/h3>\n<p><a href=\"https:\/\/robekworld.com\/hash-rate-improvement-with-nvidia-asus-1060-1070-gpu-for-ether-like-mining-gpu-tweak-ii-e3cde220812f\">https:\/\/robekworld.com\/hash-rate-improvement-with-nvidia-asus-1060-1070-gpu-for-ether-like-mining-gpu-tweak-ii-e3cde220812f<\/a><\/p>\n<p><a href=\"http:\/\/www.legitreviews.com\/geforce-gtx-1070-ethereum-mining-small-tweaks-great-hashrate-low-power_195451\">http:\/\/www.legitreviews.com\/geforce-gtx-1070-ethereum-mining-small-tweaks-great-hashrate-low-power_195451<\/a><\/p>\n<p><a href=\"http:\/\/cryptomining-blog.com\/7341-how-to-squeeze-some-extra-performance-mining-ethereum-on-nvidia\/\">http:\/\/cryptomining-blog.com\/7341-how-to-squeeze-some-extra-performance-mining-ethereum-on-nvidia\/<\/a><\/p>\n<h3 id=\"how-about-writing-your-own-miner\">How about writing your own miner?<\/h3>\n<p><a href=\"https:\/\/www.reddit.com\/r\/ethereum\/comments\/7caqpb\/a_tiny_miner_i_wrote_to_understand_how_mining\/?sh=f25cfa84&amp;st=J9W5B865&amp;utm_content=title&amp;utm_medium=post_embed&amp;utm_name=ef770faa323446d0909650522f22e37a&amp;utm_source=embedly&amp;utm_term=7caqpb\">https:\/\/www.reddit.com\/r\/ethereum\/comments\/7caqpb\/a_tiny_miner_i_wrote_to_understand_how_mining\/?sh=f25cfa84&amp;st=J9W5B865&amp;utm_content=title&amp;utm_medium=post_embed&amp;utm_name=ef770faa323446d0909650522f22e37a&amp;utm_source=embedly&amp;utm_term=7caqpb<\/a><\/p>\n<hr \/>\n<p>And that&rsquo;s what you need to get a mining rig set up! Piece of cake, right? The\ngood news is once it&rsquo;s set up you can just sort of let it run without touching\nit. Just make sure nobody trips over the power cable.<\/p>\n<p>The final installment of this series will be about getting some basic AI\nlearning algorithms running on the hardware. If you want to hear about those,\ndrop your email below and I&rsquo;ll let you know when I publish it!<\/p>\n<iframe style=\"height:400px;width:100%;max-width:800px;margin:30px auto;\" src=\"https:\/\/upscri.be\/96fcab\/?as_embed\"><\/iframe>"},{"title":"Building a Deep Learning \/ Cryptocurrency PC (#2): The OS","link":"https:\/\/thekevinscott.com\/deep-learning-cryptocurrency-pc-2-os\/","pubDate":"Fri, 06 Oct 2017 09:00:00 +0000","guid":"https:\/\/thekevinscott.com\/deep-learning-cryptocurrency-pc-2-os\/","description":"<p>Among the buzzwords in the tech world of 2017, two tower above the rest: <strong>deep\nlearning<\/strong> and <strong>cryptocurrencies<\/strong>. It seems that everyone I know (in tech)\nwants to learn these things. And guess what \u2014 so do I! So much so that I&rsquo;m\nbuilding my own computer in order to facilitate that learning.<\/p>\n<p>What follows are my notes-to-self as I build a computer to learn about deep\nlearning and cryptocurrency mining. <a href=\"https:\/\/medium.com\/@thekevinscott\/noobs-guide-to-custom-computer-for-cryptocurrency-and-deep-learning-7caa255adfaf\">In the previous installment we discussed\nassembling the hardware.\n<\/a>In\nthis installment I&rsquo;ll talk about choosing and installing the OS, configuring it\nand making sure the hardware is working, and some basic OS security.<\/p>\n<iframe style=\"height:400px;width:100%;max-width:800px;margin:30px auto;\" src=\"https:\/\/upscri.be\/96fcab\/?as_embed\"><\/iframe>\n<hr \/>\n<h1 id=\"the-operating-system\">The operating system<\/h1>\n<p>Last time we left off immediately after assembling our computer. Our computer\ncame without an operating system, so the next step is to choose and install one.<\/p>\n<p>I chose Ubuntu Desktop. Lots of the tutorials I see for deep learning are\nwritten for Ubuntu, and I&rsquo;m comfortable with Linux. Windows also seems to have\nplenty of support but I&rsquo;m not nearly as familiar with the MS ecosystem.<\/p>\n<p><a href=\"https:\/\/www.quora.com\/What-are-the-most-popular-linux-distros-for-deep-learning-research\">I don&rsquo;t think\nit<\/a>\n<a href=\"https:\/\/bitcointalk.org\/index.php?topic=166986.0\">matters a ton which OS you\nchoose<\/a>; <a href=\"https:\/\/forums.servethehome.com\/index.php?threads\/best-os-and-software-for-machine-learning-and-deep-learning.15890\/\">most of what you&rsquo;ll\nbe using it for is\nplatform-agnostic<\/a>.<\/p>\n<p>The big thing to keep an eye out for is CUDA support. <a href=\"http:\/\/nvidia.custhelp.com\/app\/answers\/detail\/a_id\/2135\/~\/which-operating-systems-are-supported-by-cuda?\">Luckily, most of the\noperating systems (Windows, Linux and\nMac)<\/a>\nare supported. So don&rsquo;t go choosing some crazy new-fangled OS nobody&rsquo;s heard of!<\/p>\n<p>So, the conclusion: Pick a Linux distro, Ubuntu unless you&rsquo;re familiar with\nLinux and comfortable with the command line, or pick Windows and go find\nyourself <a href=\"https:\/\/www.lifewire.com\/cryptocoin-mining-for-beginners-2483064\">another\ntutorial<\/a>.<\/p>\n<h2 id=\"installing\">Installing<\/h2>\n<p>Once you&rsquo;ve chosen your OS, the first step is to burn an image of the distro to\na USB stick. <a href=\"https:\/\/tutorials.ubuntu.com\/tutorial\/tutorial-create-a-usb-stick-on-windows#0\">Ubuntu provides a very straightforward\ntutorial<\/a>.\nOnce you&rsquo;ve got that, you&rsquo;ll restart your computer and reboot from USB.\nInstructions on doing this differ depending on the motherboard so read your\nmanual; for me, I had to restart holding down F10.<\/p>\n<p>You then will run through the OS installation, of which plenty of guides exist\non the internet.<\/p>\n<p>Now at some point, this being a computer you assembled from scratch, it&rsquo;s likely\nyou&rsquo;ll run into problems. For me, I had a hell of a time getting my display to\ncome on and stay on. I&rsquo;m still not 100% sure why. Assuming your hardware differs\nfrom mine, your problems are likely to be unique to you. Don&rsquo;t panic. Google is\nyour friend, and it&rsquo;s likely that any problem you run into, others have run into\ntoo.<\/p>\n<p>The problems I struggled with:<\/p>\n<ul>\n<li>I had a hell of a time getting the display to work. Turning <code>acpi=off<\/code> in the\ngrub bootloader, <a href=\"https:\/\/askubuntu.com\/questions\/861743\/installation-of-ubuntu-16-04-from-a-usb-drive-freezes\">as described\nhere<\/a>,\nhelped me (<a href=\"https:\/\/www.howtogeek.com\/196655\/how-to-configure-the-grub2-boot-loaders-settings\/\">here&rsquo;s how to persist settings in the grub\nbootloader<\/a>).\nThe computer may struggle to figure out which graphics card (the built-in one,\nor one of the GPUs) to use for the monitor as well. <a href=\"https:\/\/ubuntuforums.org\/showthread.php?t=1613132\">Here&rsquo;s some more info on\n<\/a><code>acpi<\/code><a href=\"https:\/\/ubuntuforums.org\/showthread.php?t=1613132\">.<\/a><\/li>\n<li>My ethernet didn&rsquo;t work out of the box. <a href=\"http:\/\/www.ubuntugeek.com\/ubuntu-networking-configuration-using-command-line.html\">This guide does a good\nrundown<\/a>\nof Ubuntu internet issues, and <a href=\"https:\/\/ubuntuforums.org\/showthread.php?t=25557\">this guide by\ndataw0lf<\/a> was a godsend.<\/li>\n<li><a href=\"http:\/\/www.linuxandubuntu.com\/home\/how-to-install-latest-nvidia-drivers-in-linux\">How to install nvidia\ndrivers<\/a>\nonce you have internet.<\/li>\n<li>Wait, did I install a 32-bit or 64-bit distro? <a href=\"https:\/\/unix.stackexchange.com\/questions\/12453\/how-to-determine-linux-kernel-architecture\">Here&rsquo;s how to\ntell.<\/a><\/li>\n<\/ul>\n<p>Just keep at it, and don&rsquo;t hesitate to ask the community if you&rsquo;re wrestling\nwith a particularly thorny issue. They&rsquo;re plenty friendly, so long as you&rsquo;ve\ndone your homework beforehand!<\/p>\n<h1 id=\"system-diagnosis\">System Diagnosis<\/h1>\n<p>Now we&rsquo;re booting into Linux and we&rsquo;ve got the basics installed. As my next\nstep, I want to check that the hardware is all running correctly and there&rsquo;s\nnothing out of the ordinary. To check that, I learned about a few good tools on\nLinux. You may have to install a few of these, if something is missing try\ninstalling it with:<\/p>\n<p><code>sudo apt-get install xxxxx<\/code><\/p>\n<h2 id=\"system-wide-commands\">System-wide commands<\/h2>\n<p><code>lshw<\/code> is used to <strong>list hardware<\/strong>. This will dump out a ton of information in\nyour system.<\/p>\n<p><code>inxi<\/code> is another good one. Here&rsquo;s my <code>inxi -Fx<\/code> output:<\/p>\n<p><span class=\"figcaption_hack\">inxi -Fx<\/span><\/p>\n<p>Let&rsquo;s step through component by component and run a sanity check to make sure\neverything is working.<\/p>\n<h2 id=\"cpu\">CPU<\/h2>\n<p>There&rsquo;s a few commands you can use to get more information on your CPU:<\/p>\n<p><code>cat \/proc\/cpuinfo<\/code>prints out information on your CPU. <a href=\"https:\/\/unix.stackexchange.com\/questions\/146051\/number-of-processors-in-proc-cpuinfo\">Here&rsquo;s a good Stack\nOverflow\nthread<\/a>\ngoing into more detail on what this output means.<\/p>\n<p><span class=\"figcaption_hack\">cat \/proc\/cpuinfo<\/span><\/p>\n<p><code>lshw<\/code> will list out all hardware, and the CPU section has some relevant\ninformation:<\/p>\n<p><span class=\"figcaption_hack\">lshw<\/span><\/p>\n<p><code>lscpu<\/code> will give you some more CPU-specific information, including MHz and\ncache information:<\/p>\n<p><span class=\"figcaption_hack\">lscpu<\/span><\/p>\n<p>These commands are great if you&rsquo;re familiar with the ins and outs of hardware,\nbut I barely have a clue what I&rsquo;m looking at. I just want to know if the CPU is\nworking or not.<\/p>\n<p>For that, <code>top<\/code> and <code>htop<\/code> will do the trick.<\/p>\n<p><code>top<\/code>:<\/p>\n<p><img src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*zlUXYjZbjuJFbzFwRIxDIQ.png\" alt=\"\">\n<span class=\"figcaption_hack\">Output from <code>top<\/code><\/span><\/p>\n<p>And to get more graphical, use <code>htop<\/code>:<\/p>\n<p><span class=\"figcaption_hack\">Output from htop<\/span><\/p>\n<p><a href=\"http:\/\/www.deonsworld.co.za\/2012\/12\/20\/understanding-and-using-htop-monitor-system-resources\/\">This\narticle<\/a>\ndoes a pretty deep dive into <code>htop<\/code>. To ensure I&rsquo;m getting 100% throughput on\neach of the cores, this article at\n<a href=\"https:\/\/peteris.rocks\/blog\/htop\/#load-average\">peteris.rocks<\/a> does a good job\nexplaining how:<\/p>\n<blockquote>\n<p>If you run <code>cat \/dev\/urandom &gt; \/dev\/null<\/code> which repeatedly generates random\nbytes and writes them to a special file that is never read from, you will see\nthat there are now two running process.<\/p>\n<\/blockquote>\n<p>Running this gets me to 100.0% on all cores. I&rsquo;m not 100% sure this is a valid\ntest of the CPU but it seems good enough for me.<\/p>\n<h2 id=\"hard-drive\">Hard Drive<\/h2>\n<p>Running <code>lsblk<\/code> prints information about hard drive partitions and other storage\ndevices. Running that gives me:<\/p>\n<p><img src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*7TFaMOHq7OohkB0X-ZW7WQ.png\" alt=\"\">\n<span class=\"figcaption_hack\">lsblk<\/span><\/p>\n<p>Which is roughly the size I expect. <code>df<\/code> is another one that gives you\ninformation about hard drive space:<\/p>\n<p><span class=\"figcaption_hack\">df<\/span><\/p>\n<h2 id=\"ram\">RAM<\/h2>\n<p><code>free -m<\/code> will check the amount of RAM:<\/p>\n<p><img src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*zlk3wpJnoECxLDDh01e_-A.png\" alt=\"\">\n<span class=\"figcaption_hack\">free -m<\/span><\/p>\n<p>That doesn&rsquo;t look so good \u2014 I expected 16 GBs! Let&rsquo;s check <code>lshw<\/code>:<\/p>\n<p><span class=\"figcaption_hack\">lshw<\/span><\/p>\n<p>This tells me I installed one of the RAM chips wrong. Time to break out the\nscrewdriver again.<\/p>\n<h2 id=\"gpus\">GPUs<\/h2>\n<p><code>lspci<\/code> will give you information on PCI buses and devices in your system,\nincluding your GPUs:<\/p>\n<p><span class=\"figcaption_hack\">lspci<\/span><\/p>\n<p>Another command you can use is <code>nvidia-smi<\/code>:<\/p>\n<p><span class=\"figcaption_hack\">nvidia-smi<\/span><\/p>\n<p>This gives you information like fan usage, temperature, memory-usage, and more\ngoodies.<\/p>\n<h2 id=\"cooling\">Cooling<\/h2>\n<p>Check CPU temperature with <code>sensors<\/code> :<\/p>\n<p><span class=\"figcaption_hack\">sensors<\/span><\/p>\n<p>Between this and <code>nvidia-smi<\/code> above you should be able to get a good idea of\nwhether your cooling supply is adequate or not.<\/p>\n<h2 id=\"power\">Power<\/h2>\n<p>Is the system getting enough power? Short of <a href=\"https:\/\/askubuntu.com\/questions\/421955\/software-to-find-desktop-power-usage\">measuring the power between the\ncomputer and the\nelectrical<\/a>,\nI don&rsquo;t think there&rsquo;s a 100% software solution to find this out.<\/p>\n<p><code>powertop<\/code> is a utility that measures consumption. It seems like it&rsquo;s mostly\nuseful for laptops, but <a href=\"http:\/\/fsckin.com\/2007\/10\/21\/intel-powertop-not-just-for-laptops\/\">there&rsquo;s some good info for desktops as Wayne points out\nhere<\/a>.<\/p>\n<p>Ultimately, making sure you have adequate power needs is something you should do\nbefore assembling your system, either by checking\n<a href=\"https:\/\/pcpartpicker.com\/\">pcpartpicker.com<\/a> when choosing your parts or\n<a href=\"http:\/\/www.buildcomputers.net\/power-consumption-of-pc-components.html\">checking something like this\nlink<\/a>\n(thanks Sean!). And if you can afford it leave yourself plenty of breathing room\nin the watts department.<\/p>\n<h1 id=\"security\">Security<\/h1>\n<p>Having satisfied my worrywart side that everything is running smoothly, the next\nstep is to make sure we&rsquo;ve configured our system following standard Linux\nsecurity procedures.<\/p>\n<p>I&rsquo;m going to be using this machine to practice deep learning (not too worried\nabout someone hacking this) and for mining cryptocurrencies (more concerned\nabout something hacking this!)<\/p>\n<p>A good place to start is the <a href=\"https:\/\/wiki.ubuntu.com\/BasicSecurity\">Basic Security guide hosted on Ubuntu\nWiki<\/a>:<\/p>\n<p>I assume you&rsquo;re familiar with the general Linux security procedures \u2014 strong\npasswords, limited root access, that sort of thing. If not, spend some time\nc<a href=\"https:\/\/ubuntuforums.org\/showthread.php?t=510812\">atching up on that<\/a>.<\/p>\n<h4 id=\"automatic-security-updates\">Automatic Security Updates<\/h4>\n<p>I&rsquo;m lazy as sin and if there&rsquo;s one thing I hate it&rsquo;s manual security updates.\nLuckily Ubuntu provides an automatic update mechanism:<\/p>\n<p>I followed the \u201cunattended-upgrades\u201d package and it was a straightforward\nprocess.<\/p>\n<p>This will likely cause issues if I&rsquo;m running deep learning simulations\novernight, but I can tackle that issue in the future. For now I&rsquo;m erring on the\nside of caution.<\/p>\n<h2 id=\"firewall\">Firewall<\/h2>\n<p>A firewall will allow you control over which ports are accessible publicly.\nGenerally you want the most restrictive policy you can stomach.<\/p>\n<p>Here&rsquo;s the <a href=\"https:\/\/wiki.ubuntu.com\/BasicSecurity\/Firewall\">Ubuntu wiki for\nfirewalls<\/a> (<em>if you&rsquo;re noticing a trend, it&rsquo;s that the Ubuntu wiki has some pretty darn good information!<\/em>):<\/p>\n<p>I&rsquo;m going to use <code>ufw<\/code> because I&rsquo;m on the terminal. I went with the\nrecommendations the guide laid out \u2014 DHCP, web and mail access \u2014 but left out the torrent ports since I don&rsquo;t need them.<\/p>\n<p><a href=\"https:\/\/ubuntuforums.org\/showthread.php?t=510812\">This is an absolutely massive security\nguide<\/a> by bodhi.zazen. It&rsquo;s well worth a read and implementation.<\/p>\n<hr \/>\n<p>If you want to hear about how I tackled the crypto mining and deep learning setups, drop your email below and I&rsquo;ll let you know when I publish those\nsections.<\/p>\n<iframe style=\"height:400px;width:100%;max-width:800px;margin:30px auto;\" src=\"https:\/\/upscri.be\/96fcab\/?as_embed\"><\/iframe>\n<hr \/>"},{"title":"Building a Deep Learning \/ Cryptocurrency PC (#1): The Hardware","link":"https:\/\/thekevinscott.com\/deep-learning-cryptocurrency-pc-1-hardware\/","pubDate":"Mon, 25 Sep 2017 09:00:00 +0000","guid":"https:\/\/thekevinscott.com\/deep-learning-cryptocurrency-pc-1-hardware\/","description":"<p><span class=\"dropcap\">A<\/span>mong the buzzwords in the tech world of 2017, two tower above the rest: <strong>deep\nlearning<\/strong> and <strong>cryptocurrencies<\/strong>. It seems that everyone wants to learn more\nabout these things. And guess what \u2014 so do I! So much so that I&rsquo;m building my\nown computer in order to facilitate that learning.<\/p>\n<p>What follows are my notes-to-self as I build a computer to learn about deep\nlearning and cryptocurrency mining. In this installment we&rsquo;ll just discuss\nbuilding the hardware. If you&rsquo;d like to hear about configuring the OS, getting\nstarted with crypto mining, or getting started with deep learning algorithms,\ndrop me your email below and I&rsquo;ll keep you in the loop.<\/p>\n<hr>\n<p>Some quick definitions, for those unfamiliar:<\/p>\n<blockquote>\n<p>A <strong>cryptocurrency<\/strong> \u2026 is a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Digital_asset\">digital\nasset<\/a> designed to work as a\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Medium_of_exchange\">medium of exchange<\/a> using\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cryptography\">cryptography<\/a> to secure the\ntransactions and to control the creation of additional units of the currency. \u2014\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cryptocurrency\">Wikipedia<\/a><\/p>\n<\/blockquote>\n<blockquote>\n<p><strong>Machine learning<\/strong> is the subfield of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Computer_science\">computer\nscience<\/a> that, according to\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Arthur_Samuel\">Arthur Samuel<\/a>, gives \u201ccomputers\nthe ability to learn without being explicitly programmed.\u201d \u2014 also\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Machine_learning\">Wikipedia<\/a><\/p>\n<\/blockquote>\n<blockquote>\n<p><strong>Deep learning<\/strong> \u2026 is part of a broader family of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Machine_learning\">machine\nlearning<\/a> methods based on\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Learning_representation\">learning data\nrepresentations<\/a>, as\nopposed to task-specific algorithms. \u2014\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Deep_learning\">Wikipedia<\/a><\/p>\n<\/blockquote>\n<hr>\n<p><span class=\"dropcap\">W<\/span>hy build a PC to learn this stuff? It&rsquo;s important to note that you don&rsquo;t have\nto. Your laptop is capable of running the same software, but the performance\nyou\u2018ll get out of a dedicated GPU is miles beyond what your laptop&rsquo;s CPU can\ndeliver. You&rsquo;ll spend more money replacing your worn-out laptop than you&rsquo;ll make\nmining cryptocurrencies, and anything beyond basic deep learning training will\ntake forever.<\/p>\n<h2 id=\"gpus\">GPUs<\/h2>\n<p>GPUs are specialized chips for processing data in parallel. Originally developed\nto power intensive graphics (like in video games), more recently their\narchitecture has been discovered to be a perfect fit for the short, repetitive\nand parallelizable tasks at the heart of both machine learning and\ncryptocurrency mining.<\/p>\n<h2 id=\"the-cloud\">The cloud<\/h2>\n<p>You can rent GPUs in the cloud, for instance, <a href=\"https:\/\/aws.amazon.com\/about-aws\/whats-new\/2016\/09\/introducing-amazon-ec2-p2-instances-the-largest-gpu-powered-virtual-machine-in-the-cloud\/\">with\nAWS<\/a>.\nUnfortunately they&rsquo;re expensive and it&rsquo;s a much better deal to run things\nlocally. (This will probably change in the future).<\/p>\n<p>So it makes economic and temporal sense to run these things locally.<\/p>\n<h1 id=\"hardware\">Hardware<\/h1>\n<p>I haven&rsquo;t built a computer from scratch in over 20 years. I bought a Mac in 2004\nand haven&rsquo;t looked back since. I was lucky enough to have a good friend with\nsome experience who was able to guide me in the right direction.<\/p>\n<h2 id=\"picking-a-gpu\">Picking a GPU<\/h2>\n<p>The most important question: Which GPU to purchase?<\/p>\n<p><a href=\"http:\/\/timdettmers.com\/2017\/04\/09\/which-gpu-for-deep-learning\/\">Tim Dettmers has a fantastic in-depth\narticle<\/a>\ncomparing a number of GPUs out on the market. Go read his article if you want a\ntruly exhaustive look at the costs and benefits of available GPUs.<\/p>\n<p>He writes:<\/p>\n<blockquote>\n<p>NVIDIA&rsquo;s standard libraries made it very easy to establish the first deep\nlearning libraries in CUDA, while there were no such powerful standard libraries\nfor AMD&rsquo;s OpenCL. Right now, there are just no good deep learning libraries for\nAMD cards \u2014 so NVIDIA it is.<\/p>\n<\/blockquote>\n<p>I&rsquo;d heard this sentiment from others. It seems that if you want to do machine\nlearning, you&rsquo;re best off going with NVIDIA.<\/p>\n<p>I ended up picking a pair of GTX 970s, which was primarily a budget decision\n(this is a hobby, after all). There&rsquo;s lots available on eBay.<\/p>\n<h2 id=\"the-rest-of-the-parts\">The rest of the parts<\/h2>\n<p>A number of folks have written about their experiences building deep\nlearning-capable machines, and they were invaluable for helping me figure out\nwhat parts to buy:<\/p>\n<ul>\n<li><a href=\"https:\/\/medium.com\/towards-data-science\/building-your-own-deep-learning-box-47b918aea1eb\">Building your own deep learning box<\/a><\/li>\n<li><a href=\"https:\/\/medium.com\/@acrosson\/building-a-deep-learning-box-d17d97e2905c\">Building a deep learning box<\/a><\/li>\n<li><a href=\"https:\/\/www.oreilly.com\/learning\/build-a-super-fast-deep-learning-machine-for-under-1000\">Build a super fast deep learning machine for under 1000<\/a><\/li>\n<\/ul>\n<p>As these articles point you, make sure to enter your components into\n<a href=\"https:\/\/pcpartpicker.com\/\">pcpartpicker.com<\/a> before purchasing, to ensure\ncomponents work together. I failed to do this, and as you&rsquo;ll read below, this\nnecessitated a trip back to the Amazon store a second time.<\/p>\n<img src=\"parts.jpeg\" \/>\n<span class=\"figcaption_hack\">The parts!<\/span>\n<h1 id=\"shopping-list\">Shopping List<\/h1>\n<p>Here&rsquo;s what I bought (<em>affiliate links<\/em>):<\/p>\n<ul>\n<li>Motherboard: <a href=\"https:\/\/www.amazon.com\/gp\/product\/B012N6ESTC\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B012N6ESTC&amp;linkCode=as2&amp;tag=theoryincorpo-20&amp;linkId=d41ae2f81f2541aef2deccbb45e833bf\">Gigabyte\nGA-Z170X<\/a><\/li>\n<li>CPU: <a href=\"https:\/\/www.amazon.com\/gp\/product\/B012M8M7TY\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B012M8M7TY&amp;linkCode=as2&amp;tag=theoryincorpo-20&amp;linkId=9f1372b58ef093b59173144815d494fd\">Intel Core i5 6600K 3.50\nGHz<\/a><\/li>\n<li>RAM: <a href=\"https:\/\/www.amazon.com\/gp\/product\/B01HKF450S\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B01HKF450S&amp;linkCode=as2&amp;tag=theoryincorpo-20&amp;linkId=68496b76cc73a6eaaee01e222fee703e\">Corsair Vengeance\n16GB<\/a><\/li>\n<li>PSU: <a href=\"https:\/\/www.amazon.com\/gp\/product\/B00K85X2A2\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B00K85X2A2&amp;linkCode=as2&amp;tag=theoryincorpo-20&amp;linkId=32845692521e8b74f6bda127b2e82917\">EVGA SuperNova\n750W<\/a><\/li>\n<li>Hard Drive: <a href=\"https:\/\/www.amazon.com\/gp\/product\/B01LXS4TY6\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B01LXS4TY6&amp;linkCode=as2&amp;tag=theoryincorpo-20&amp;linkId=ada0a1fc170f85869054605fbab24684\">Samsung 960\n1TB<\/a><\/li>\n<li>CPU Cooler: <a href=\"https:\/\/www.amazon.com\/gp\/product\/B005O65JXI\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B005O65JXI&amp;linkCode=as2&amp;tag=theoryincorpo-20&amp;linkId=c0c8d938f6267fdd4e2f3047319003ae\">Cooler Master\nHyper<\/a><\/li>\n<li>Case (first attempt): <a href=\"https:\/\/www.amazon.com\/gp\/product\/B0716715R1\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B0716715R1&amp;linkCode=as2&amp;tag=theoryincorpo-20&amp;linkId=5f5509b9f3f463d9abf46d3666410bfb\">AeroCool\nAero300<\/a><\/li>\n<li>Case (second attempt): <a href=\"https:\/\/www.amazon.com\/gp\/product\/B01M3Y5FJ2\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B01M3Y5FJ2&amp;linkCode=as2&amp;tag=theoryincorpo-20&amp;linkId=7d67eab5133134d5ed151de4dbd5f984\">Corsair Carbide Series\n270R<\/a><\/li>\n<li>Wifi adapter: this <a href=\"https:\/\/www.amazon.com\/gp\/product\/B003MTTJOY\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B003MTTJOY&amp;linkCode=as2&amp;tag=theoryincorpo-20&amp;linkId=2755d7e0d0a7f51bd3798ac05f5cdfa5\">exceptionally tiny wifi USB\nadapter<\/a>\ncame in handy (and continues to) as I struggle with ethernet issues<\/li>\n<\/ul>\n<h2 id=\"psu\">PSU<\/h2>\n<p>The PSU is the power supply that runs the whole rig.<\/p>\n<p>There&rsquo;s three types of PSUs: modular, non-modular and semi-modular. Modular PSUs\nhave cables you can disconnect, whereas non-modulars have cables that are\nattached. Semi-modular PSUs usually have the CPU and motherboard cables attached\nand the rest pluggable.<\/p>\n<p>I bought a modular 750W PSU. In my limited experience of one, modular PSUs need\nsome additional clearance behind them to accommodate their cables. The original\ncase I bought, AeroCool, lacked enough space behind the PSU to fit the cables,\nnecessitating a second trip to the Amazon store. A visit to <a href=\"https:\/\/pcpartpicker.com\/\">pcpartpicker.com<\/a> would have alerted me beforehand. Lesson learned!<\/p>\n<p>To determine what kind of wattage you need, add up all your parts&rsquo; wattage\nneeds. Pay particular attention to the GPUs&rsquo; needs, and give yourself some extra\nbreathing room. I&rsquo;m not sure what happens if you run out of wattage but I would\nguess it sucks.<\/p>\n<img src=\"psu.jpeg\" \/>\n<p>First step out of the box is to make sure the PSU turns on and power is being\ndelivered. And in my case, all systems were go!<\/p>\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=2h_NYl4DRF4&amp;index=1&amp;list=PLxvB6AORmso4SILfs3sj8vYCohM-RnwcD\">Here&rsquo;s a great video I watched about installing a\nPSU<\/a>,\nand if you happen to buy an EVGA PSU, here&rsquo;s <a href=\"https:\/\/www.youtube.com\/watch?v=rucfmsGjPow&amp;list=PLxvB6AORmso4SILfs3sj8vYCohM-RnwcD&amp;index=2\">EVGA&rsquo;s specific\ntutorial<\/a>.<\/p>\n<h2 id=\"motherboard--cpu\">Motherboard &amp; CPU<\/h2>\n<p>For machine and deep learning applications, the CPU is less important than the\nGPUs, who do most of the heavy lifting. You need a CPU that&rsquo;ll accommodate the\nGPUs, and <a href=\"https:\/\/www.quora.com\/Which-hardware-components-CPU-RAM-GC-etc-are-needed-for-a-machine-learning-deep-learning-home-PC-computer-to-run-fast\">it should have as many cores as the\nGPUs<\/a>.<\/p>\n<p>For the motherboard, go with one that supports PCIe 3.0. If you&rsquo;re planning to\ngo up to 4 GPUs make sure your motherboard supports that (you&rsquo;ll also probably\nwant a stronger PSU and some serious cooling \u2014 I elected to use only 2 GPUs).<\/p>\n<p>Here&rsquo;s a photo of the motherboard:<\/p>\n<img src=\"motherboard.jpeg\" \/>\n<span class=\"figcaption_hack\">Ain't it a beaut?<\/span>\n<p>The first step is to install the CPU. <a href=\"https:\/\/www.youtube.com\/watch?v=1w6UZNeGgXU&amp;index=3&amp;list=PLxvB6AORmso4SILfs3sj8vYCohM-RnwcD\">I followed this video on installing a\nCPU.<\/a>\nI was surprised by how easy it is; the thing just clicks into place!<\/p>\n<img src=\"cpu.jpeg\" \/>\n<span class=\"figcaption_hack\">This time, with more CPU!<\/span>\n<p><em>Side note:<\/em> The eagle-eyed reader may observe me assembling this on a carpet.\nDon&rsquo;t be me. Carpets cause static electricity and <strong>static electricity is bad\nfor electronics<\/strong>. Soon after this photo was taken I quickly realized the error\nof my ways and migrated to a non-carpeted floor and became obsessive about\ntouching metal objects for the remainder of the build out.<\/p>\n<h2 id=\"thermal-paste\">Thermal Paste<\/h2>\n<p>Not having done this in over 20 years, this was a completely new thing to me.<\/p>\n<p>CPUs require a paste be applied in order to dispel heat. Here&rsquo;s a video on\napplying <a href=\"https:\/\/www.youtube.com\/watch?v=-hNgFNH7zhQ&amp;list=PLxvB6AORmso4SILfs3sj8vYCohM-RnwcD&amp;index=5\">thermal\npaste<\/a>.\nThe narrator&rsquo;s pasting technique is \ud83d\udd25.<\/p>\n<h2 id=\"cpu-cooler\">CPU Cooler<\/h2>\n<p>The CPU Cooler sits on top of the CPU and sucks the heat out to be dispelled via\na fan. It looks super cool, and it was fun to install too! <a href=\"https:\/\/www.youtube.com\/watch?v=XLlrqzwxJig&amp;index=4&amp;list=PLxvB6AORmso4SILfs3sj8vYCohM-RnwcD\">Here&rsquo;s the video I\nfollowed to install\nit<\/a>.<\/p>\n<p>And here it is, looking so snazzy:<\/p>\n<img src=\"cooler.jpeg\" \/>\n<span class=\"figcaption_hack\">So cool, Cooler Master Hyper<\/span>\n<h2 id=\"ram\">RAM<\/h2>\n<p>Ram is a cinch to install. Here&rsquo;s a <a href=\"https:\/\/www.youtube.com\/watch?v=O5WMyYrEq1Y&amp;list=PLxvB6AORmso4SILfs3sj8vYCohM-RnwcD&amp;index=7\">video that&rsquo;ll show you\nhow<\/a>.<\/p>\n<img src=\"ram.jpeg\" \/>\n<h2 id=\"hard-drive\">Hard Drive<\/h2>\n<p>I bought an M.2 hard drive, which my limited experience indicates is the easiest\nhard drive to install (<em>note: I did not attempt to install other types of hard\ndrives<\/em>). <a href=\"https:\/\/www.youtube.com\/watch?v=XmvNQvFxKcs&amp;index=10&amp;list=PLxvB6AORmso4SILfs3sj8vYCohM-RnwcD\">Here&rsquo;s a video demonstrating how to install\nit<\/a>.\nIt basically snaps right onto the motherboard and you&rsquo;re done.<\/p>\n<p><em>Side note:<\/em> I actually installed the GPUs before installing the hard drive, and\nhad to remove those GPUs to get it in as the M.2 sits underneath them (at least,\non this particular motherboard). So if you have an M.2, do this step before the\nGPUs.<\/p>\n<h2 id=\"gpus-1\">GPUs<\/h2>\n<p>Finally, here come the racehorses!<\/p>\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=Yinrkn4TvnU&amp;list=PLxvB6AORmso4SILfs3sj8vYCohM-RnwcD&amp;index=8\">Here&rsquo;s a video on installing\nGPUs<\/a>.\nIt&rsquo;s pretty straightforward: you line the chips up with the PCI-e connectors,\npop one of the tabs, and push (softly \u2014 you don&rsquo;t want to force it) until the\ntab clicks back into place.<\/p>\n<p>Here&rsquo;s a before shot of the empty PCI slots:<\/p>\n<img src=\"empty-pci.jpeg\" \/>\n<p>And with the GPUs installed:<\/p>\n<img src=\"gpu.jpeg\" \/>\n<h2 id=\"putting-it-in-the-case\">Putting it in the case<\/h2>\n<p>The last step is to get the motherboard into the case and seal it up.<\/p>\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=iTkGuioG5RU&amp;list=PLxvB6AORmso4SILfs3sj8vYCohM-RnwcD&amp;index=9\">Here&rsquo;s a video walking through how to install into a\ncase.<\/a>\nI suspect every case is different and you&rsquo;re better off searching for your\nparticular build, but c&rsquo;est la vie. You&rsquo;ll have to connect the case&rsquo;s cables to\nthe motherboard&rsquo;s, and for that you will need to refer to the respective manuals\nfor instructions.<\/p>\n<hr>\n<img src=\"frankenstein.jpeg\" \/>\n<p>With everything assembled, all that was left was to hit the power button on the\nPSU. I felt a little like Dr. Frankenstein hovering over his monster. Would the\nbeast wake up?<\/p>\n<p>I flipped the PSU. The machine did not turn on. I spent five minutes freaking\nout thinking I had fried some circuitry or installed something wrong.<\/p>\n<p>I then realized the case also has an on switch. So I turned that on.<\/p>\n<p>And the beast awoke!<\/p>\n<img src=\"final.jpeg\" \/>\n<p>At this point I had a functioning machine, and it was into the land of software\n(and a nice cold beer). I&rsquo;ll save that for next time.<\/p>\n<p>If you want to hear about my travails configuring the BIOS, installing Linux,\nand actually tackling the crypto mining and deep learning setups, drop your\nemail below and I&rsquo;ll let you know when I publish those.<\/p>"},{"title":"Background Images in React Native","link":"https:\/\/thekevinscott.com\/background-images-in-react-native\/","pubDate":"Tue, 09 May 2017 09:00:00 +0000","guid":"https:\/\/thekevinscott.com\/background-images-in-react-native\/","description":"<p>A <a href=\"http:\/\/stackoverflow.com\/questions\/29322973\/whats-the-best-way-to-add-a-full-screen-background-image-in-react-native\">common question<\/a> amongst React Native developers is how to put a background image on a view.<\/p>\n<p>On the web, it\u2019s a piece of cake:<\/p>\n<pre><code>&lt;div style={{ backgroundImage: 'url(\/my-image.png)' }}&gt;...&lt;\/div&gt;\n<\/code><\/pre>\n<p>In React Native, there\u2019s no <code>background-image<\/code> tag; instead, the <code>&lt;Image&gt;<\/code>\ncomponent does the heavy lifting.<\/p>\n<h2 id=\"layouts\">Layouts<\/h2>\n<p><img src=\"images\/sample.jpg\" alt=\"The sample image we&amp;rsquo;ll be using\">\n<capt>The sample image we&rsquo;ll be using<\/capt><\/p>\n<p>There&rsquo;s <a href=\"https:\/\/facebook.github.io\/react-native\/docs\/image.html#resizemode\">5 layouts<\/a> to\nbe aware of that an image can take.<\/p>\n<ul>\n<li><code>center<\/code> - Centers the image, without resizing it.<\/li>\n<li><code>repeat<\/code> - Repeats the image, without resizing it.<\/li>\n<li><code>stretch<\/code> - Stretches the image to fit its bounds, without preserving the\nimage\u2019s aspect ratio.<\/li>\n<li><code>contain<\/code> - Resizes the image to fit its bounds, while also preserving its\naspect ratio.<\/li>\n<li><code>cover<\/code> - Resizes the image so its shorter side fits its bounds, while also\npreserving its aspect ratio. In practice, this means that the longer side while\noverlap the borders of its bounds.<\/li>\n<\/ul>\n<p>Here\u2019s examples of each in practice:<\/p>\n<p><img src=\"images\/center.png\" alt=\"Center Image\">\n<capt>center<\/centeR><\/p>\n<p><img src=\"images\/contain.jpg\" alt=\"Contain Image\">\n<capt>contain<\/capt><\/p>\n<p><img src=\"images\/cover.png\" alt=\"Cover Image\">\n<capt>cover<\/capt><\/p>\n<p><img src=\"images\/repeat.jpg\" alt=\"Repeat Image\">\n<capt>repeat<\/capt><\/p>\n<p><img src=\"images\/stretch.png\" alt=\"Stretch Image\">\n<capt>stretch<\/capt><\/p>\n<h2 id=\"referencing-images\">Referencing images<\/h2>\n<p>If you haven\u2019t used <code>&lt;Image \/&gt;<\/code> before, a quick note on assets. There\u2019s two ways\nof serving images, over the network and locally. Using local images will be\nfaster but result in a larger app binary, and can\u2019t be updated on the fly.<\/p>\n<p>If you\u2019re using remote images, keep in mind two things:<\/p>\n<ol>\n<li>Use <code>https<\/code> links instead of <code>http<\/code>. <a href=\"https:\/\/developer.apple.com\/news\/?id=12212016b\">Apple will block\nnon-<\/a><code>https<\/code><a href=\"https:\/\/developer.apple.com\/news\/?id=12212016b\">\nlinks<\/a>, and in my experience\nthis error will happen silently.<\/li>\n<li>For larger images, explore the caching policies <a href=\"https:\/\/facebook.github.io\/react-native\/docs\/images.html#cache-control-ios-only\">detailed\nhere<\/a>\nto reduce network requests for your users.<\/li>\n<\/ol>\n<p>If instead you decide to serve images locally, keep in mind images are served\nrelative from your app root folder. I usually put my local images into an assets\nfolder with other media, so from <code>index.ios.js<\/code> I can call them with:<\/p>\n<p><code>require('.\/assets\/my-image.png')<\/code><\/p>\n<p>Finally, if you add a new image to your app and come across an error like this:<\/p>\n<p><img src=\"images\/error.png\" alt=\"Error Message\">\n<capt>Error Message<\/capt><\/p>\n<p>It probably means you need to restart your packager, so it can pick up the\nimported image.<\/p>\n<h2 id=\"examples\">Examples<\/h2>\n<p>Let\u2019s show an example where we fetch an image from a public URL and position it\nabsolutely:<\/p>\n<script src=\"https:\/\/gist.github.com\/thekevinscott\/0381ad0ff8e2fe29c47f0e1ab71d5b74.js\"><\/script>\n<p>Easy as that! The key is the use of <code>flex: 1<\/code>, which will cause the <code>&lt;Image \/&gt;<\/code>component to fill its container. You can read <a href=\"https:\/\/facebook.github.io\/react-native\/docs\/flexbox.html\">more about Flexbox\nhere<\/a>.<\/p>\n<p>You can play around with <code>resizeMode<\/code> to see the different layout options.<\/p>\n<h2 id=\"with-text\">With Text<\/h2>\n<p>Usually a background image sits behind something else. There\u2019s two ways to\nachieve that: using the <code>&lt;Image \/&gt;<\/code> as the view layer itself, or wrapping it in\nanother <code>&lt;View \/&gt;<\/code>.<\/p>\n<p>Here\u2019s an example using the <code>&lt;Image \/&gt;<\/code> as the wrapper component:<\/p>\n<script src=\"https:\/\/gist.github.com\/thekevinscott\/0b2ba3dbd3e3c0b2efd9fd91a08a7696.js\"><\/script>\n<p>And here\u2019s an example wrapping the <code>&lt;Image \/&gt;<\/code> in a container <code>&lt;View \/&gt;<\/code>:<\/p>\n<script src=\"https:\/\/gist.github.com\/thekevinscott\/114fc100d47f68b5bd805c9fd32c35c0.js\"><\/script>\n<p>I slightly prefer the latter approach, as I think it\u2019s more flexible if you need\nto make further adjustments or include other elements, but either approach\nworks.<\/p>"},{"title":"Tabbing Through Input Fields","link":"https:\/\/thekevinscott.com\/tabbing-through-input-fields\/","pubDate":"Fri, 05 May 2017 09:00:00 +0000","guid":"https:\/\/thekevinscott.com\/tabbing-through-input-fields\/","description":"<p>On the web, it\u2019s common to tab through forms, an intuitive and <a href=\"https:\/\/www.nngroup.com\/articles\/web-form-design\/\">UX-friendly pattern<\/a>. You get this out of the box with web forms, but when building apps with React Native, you need to implement this functionality yourself. Fortunately, it\u2019s a cinch to set up.<\/p>\n<h2 id=\"native-form-ux-vs-web-form-ux\">Native Form UX vs. Web Form UX<\/h2>\n<p>First, let\u2019s understand what native UX we\u2019re trying to emulate on React Native.<\/p>\n<p>Here\u2019s a video of navigation through the native iOS contacts app:<\/p>\n<p><embed height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/rHjJh0qk1dg\" frameborder=\"0\" allow=\"autoplay; encrypted-media\" allowfullscreen=\"1\" caption=\"iOS\"><\/embed><\/p>\n<p>And here\u2019s a video of navigation through a Web form:<\/p>\n<p><embed height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/JN_y8E4Erh8\" frameborder=\"0\" allow=\"autoplay; encrypted-media\" allowfullscreen=\"1\" caption=\"Web form\"><\/embed><\/p>\n<p>In summary, the iOS web browser gives us next and previous buttons, but for a native iOS app, these aren\u2019t present, and <a href=\"https:\/\/github.com\/facebook\/react-native\/issues\/641#issuecomment-94522058\">React Native doesn\u2019t support them natively, either<\/a>.<\/p>\n<p>I believe the reason for this discrepancy is that, natively, the \u201creturn\u201d key performs double duty, tabbing through the form and submitting once the form is complete. On the web, the \u201creturn\u201d key will submit the form by default, necessitating the next\/previous buttons.<\/p>\n<p>We\u2019ll focus on emulating the native functionality, relying on the \u201creturn\u201d key\nto tab through the form and submit it when complete.<\/p>\n<h2 id=\"keyboards-and-textinput-on-react-native\">Keyboards and TextInput on React Native<\/h2>\n<p>We\u2019ll be using <code>TextInput<\/code> and <code>View<\/code> from the <code>react-native<\/code> library, like so:<\/p>\n<pre><code>import {\nTextInput,\nView,\n} from 'react-native';\n<\/code><\/pre>\n<p>Each <code>TextInput<\/code> <a href=\"https:\/\/facebook.github.io\/react-native\/docs\/textinput.html#keyboardtype\">defines its own keyboard that appears when\nfocused<\/a>.\nThis allows a particular input field to specify <code>numeric<\/code>, <code>numpad<\/code>, or a number\nof different options.<\/p>\n<p><code>TextInput<\/code>s are also responsible for determining which input to send focus to\nnext, and <a href=\"https:\/\/facebook.github.io\/react-native\/docs\/textinput.html#onsubmitediting\">they provide a handy prop for implementing\nthis<\/a>.<\/p>\n<h2 id=\"capturing-the-field-reference\">Capturing the field reference<\/h2>\n<p>The first thing we\u2019ll need is to capture the <code>ref<\/code> of a particular input field.<\/p>\n<p>If you\u2019re not familiar, <a href=\"https:\/\/facebook.github.io\/react\/docs\/refs-and-the-dom.html\">a\n<\/a><code>ref<\/code><a href=\"https:\/\/facebook.github.io\/react\/docs\/refs-and-the-dom.html\"> is a\nreference to the React\ncomponent<\/a>. It\u2019s\nbest practice to specify a callback function and capture the referenced\ncomponent from the arguments.<\/p>\n<p>In our example, we\u2019re storing each <code>TextInput<\/code>\u2018s ref on an internal <code>inputs<\/code>\nobject we\u2019ll define in the constructor. <strong>We specify a custom index we\u2019ll use\nlater to focus on the input.<\/strong><\/p>\n<pre><code>&lt;TextInput\nref={ input =&gt; {\nthis.inputs['one'] = input;\n}}\n...\n\/&gt;\n<\/code><\/pre>\n<p>Since the ref is defined in the <code>render<\/code> function, don\u2019t store the reference\nwith <code>setState<\/code>; <a href=\"https:\/\/github.com\/facebook\/react\/issues\/5591\">doing so will cause an infinite\nloop<\/a> andmany tears will be shed.<\/p>\n<h2 id=\"triggering-focus\">Triggering focus<\/h2>\n<p>Next, we need to focus on the next element. We do that by hooking into the\n<code>onSubmitEditing<\/code> prop and supplying it with a custom focus function on the\ncomponent.<\/p>\n<pre><code>onSubmitEditing={() =&gt; {\n\/\/ specify the key of the ref, as done in the previous section.\nthis.focusNextField('next-field');\n}}\n<\/code><\/pre>\n<p>Then, we set up the field. If we zoom out to the component level:<\/p>\n<pre><code>class App extends React.Component {\nconstructor(props) {\nsuper(props);\nthis.focusNextField = this.focusNextField.bind(this);\n\/\/ to store our input refs\nthis.inputs = {};\n}\nfocusNextField(key) {\nthis.inputs[key].focus();\n}\n...\n}\n<\/code><\/pre>\n<p>Two things to point out:<\/p>\n<ul>\n<li><a href=\"http:\/\/egorsmirnov.me\/2015\/08\/16\/react-and-es6-part3.html\">We need to bind the focus\nfunction<\/a> to the\nclass so we have an accurate reference to <code>this<\/code>. This is generally done in the\nconstructor.<\/li>\n<li>The focus action accepts a key indicating which input to focus on. That key\nmatches what we use in the <code>ref<\/code> callback above.<\/li>\n<\/ul>\n<h2 id=\"avoiding-the-disappearing-keyboard\">Avoiding the disappearing keyboard<\/h2>\n<p>Sometimes as we\u2019re tabbing between fields, the keyboard will disappear and\nreappear. We can avoid this by using a prop on <code>TextInput<\/code> called\n<code>blurOnSubmit<\/code>:<\/p>\n<pre><code>&lt;TextInput\nblurOnSubmit={ false }\n...\n\/&gt;\n<\/code><\/pre>\n<p>This property forces the keyboard to remain visible. Since we\u2019re immediately\ntabbing to our next field, this behavior works nicely for us.<\/p>\n<h2 id=\"return-key\">Return key<\/h2>\n<p>Updating <a href=\"https:\/\/facebook.github.io\/react-native\/docs\/textinput.html#returnkeytype\">the return\nkey<\/a>\nto match the correct action isn\u2019t strictly necessary (and natively iOS doesn\u2019t\nchange its appearance) but I think updating to the relevant return key type is a\nnice touch:<\/p>\n<pre><code>&lt;TextInput\nreturnKeyType={ &quot;next&quot; }\n...\n\/&gt;\n&lt;TextInput\nreturnKeyType={ &quot;done&quot; }\n...\n\/&gt;\n<\/code><\/pre>\n<p>This indicates how to show a <code>done<\/code> return key on the final input, and a\n<code>next<\/code>return key on the rest of em.<\/p>\n<h3 id=\"putting-it-all-together\">Putting it all together<\/h3>\n<p>The final gist is here:<\/p>\n<script src=\"https:\/\/gist.github.com\/thekevinscott\/22b66e5fe9ae35d633a28e27c129bc8b.js\"><\/script>\n<p>You can see it in action on iOS and Android:<\/p>\n<p><embed height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/rHjJh0qk1dg\" frameborder=\"0\" allow=\"autoplay; encrypted-media\" allowfullscreen=\"1\" caption=\"iOS\"><\/embed><\/p>\n<p><embed height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/KFazq6ZBFyw\" frameborder=\"0\" allow=\"autoplay; encrypted-media\" allowfullscreen=\"1\" caption=\"Android\"><\/embed><\/p>"},{"title":"Emojis in Javascript","link":"https:\/\/thekevinscott.com\/emojis-in-javascript\/","pubDate":"Sat, 05 Nov 2016 07:00:00 +0000","guid":"https:\/\/thekevinscott.com\/emojis-in-javascript\/","description":"<p>Parsing emoji in Javascript is\u2026 not easy.<\/p>\n<p>This article is a collection of the research I did while getting up to speed on emoji in Javascript. First, a quick dive into the inner workings of Unicode, followed by how emoji in particular are represented in JavaScript. Finally, let\u2019s walk through writing a regular expression designed to handle all manner of emoji, heavily inspired by <a href=\"https:\/\/github.com\/lodash\/lodash\/blob\/4.16.6\/lodash.js\">lodash\u2019s implementation of split<\/a>.<\/p>\n<p><a href=\"https:\/\/www.npmjs.com\/package\/emoji-tree\">If your eyes glaze over like mine did while reading this, I put all the code into an npm library you can find here.<\/a><\/p>\n<hr>\n<h3 id=\"a-digression-into-unicode\">A digression into Unicode<\/h3>\n<p>The terms you want to digest are the following:<\/p>\n<ul>\n<li><strong>Code point<\/strong> \u2014 <a href=\"https:\/\/mathiasbynens.be\/notes\/javascript-escapes\">A numerical representation of a specific Unicode\ncharacter<\/a>.<\/li>\n<li><strong>Character Code<\/strong>\u2014 Another name for a code point.<\/li>\n<li><strong>Code Unit<\/strong>\u2014 An encoding of a code point, measured in bits. Javascript uses\nUTF-16.<\/li>\n<li><strong>Decimal<\/strong>\u2014 A way to represent code points in base 10.<\/li>\n<li><strong>Hexadecimal<\/strong> \u2014 A way to represent code points in base 16.<\/li>\n<\/ul>\n<p>Let\u2019s demonstrate with an example. Take as our specimen, the letter <strong>A<\/strong>.<\/p>\n<p><img src=\"a.png\" alt=\"An image of an a\">\n<capt>Sad sack A. Cheer up bud, we\u2019re about to turn you into a code point!<\/capt><\/p>\n<p>The letter <strong>A<\/strong> is represented by the code point 65 (in decimal), or 41 (in\nhexadecimal).<\/p>\n<script src=\"https:\/\/gist.github.com\/thekevinscott\/c6d7511bb078dfda54785974ddf8d0de.js\"><\/script>\n<p><a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/String\/codePointAt\">codePointAt<\/a>\nand\n<a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/String\/fromCodePoint\">fromCodePoint<\/a>\nare new methods introduced in ES2015 that can handle unicode characters whose\nUTF-16 encoding is greater than 16 bits, which includes emojis. Use these\ninstead of <strong>charCodeAt<\/strong>, which doesn\u2019t handle emoji correctly.<\/p>\n<p>Here\u2019s an example of using these methods, courtesy of\n<a href=\"http:\/\/xahlee.info\/js\/js_unicode_code_point.html\">xahlee.info<\/a>:<\/p>\n<pre><code>console.log(\n&quot;\ud83d\ude38&quot;.charCodeAt(0)\n); \/\/ prints 55357, WRONG!\nconsole.log(\n&quot;\ud83d\ude38&quot;.codePointAt(0)\n); \/\/ prints 128568, correct\n<\/code><\/pre>\n<p>I will be using hexadecimal representations (<strong>\\u0041<\/strong>) from now on, because\nour future regex will be built that way. A few things to note about hexadecimal\nrepresentation in Javascript:<\/p>\n<h4 id=\"all-hexadecimal-code-points-must-be-4-characters\">All hexadecimal code points must be 4 characters.<\/h4>\n<p>If the character code is less than 4 characters, it must be left padded with\nzeros.<\/p>\n<pre><code>\/\/ This is invalid\n\\u41\n\/\/ This is valid\n\\u0041\n<\/code><\/pre>\n<h4 id=\"all-hexadecimal-code-points-are-case-insensitive\">All hexadecimal code points are case insensitive.<\/h4>\n<pre><code>\/\/ these are equivalent\n&quot;\\uD83D&quot;\n&quot;\\ud83d&quot;\n<\/code><\/pre>\n<h4 id=\"they-can-be-notated-in-two-forms\">They can be notated in two forms<\/h4>\n<p>In Javascript, hexadecimal can be represented in two ways: <strong>\\u0041<\/strong> and\n<strong>0x0041<\/strong>. Jump into your browser console and you\u2019ll see the following are\nequivalent:<\/p>\n<pre><code>String.fromCodePoint(0x0041);\n&gt; 'A'\n'\\u0041';\n&gt; 'A'\n<\/code><\/pre>\n<h3 id=\"back-to-emojis\">Back to Emojis<\/h3>\n<p>Originally, the range of code points was 16 bits, which encompassed the English\nalphabet (now known as the Basic Multilingual Plane). Now, in addition to that\noriginal range, there are <a href=\"https:\/\/en.wikipedia.org\/wiki\/Plane_(Unicode)\">16 more\nplanes<\/a> (17 total) to choose\nfrom.<\/p>\n<p>The rest of the planes beyond the BMP are referred to as the \u201castral planes\u201d,\nwhich include emoji. Emoji live on Plane 1, the Supplementary Multilingual\nPlane.<\/p>\n<p><img src=\"consortium.jpeg\" alt=\"The Consotrium\"><\/p>\n<div class=\"caption\">And the [Consortium](http:\/\/unicode.org\/consortium\/consort.html) said, let there be emoji<\/div>\n<p>What do you think the following will produce?<\/p>\n<pre><code>&quot;\ud83d\ude00&quot;.length\n<\/code><\/pre>\n<p><strong>If you said 1, you are mistaken my friend! The correct answer is 2.<\/strong><\/p>\n<p>In Javascript, <a href=\"http:\/\/www.2ality.com\/2013\/09\/javascript-unicode.html\">a string is a sequence of 16-bit code\npoints<\/a>. Since emoji are\nencoded above the BMP, it means that they are represented by a pair of code\npoints, also known as a surrogate pair.<\/p>\n<p>So for instance, <strong>0x1F600<\/strong>, which is \ud83d\ude00, is represented by:<\/p>\n<pre><code>&quot;\\uD83D\\uDE00&quot;\n<\/code><\/pre>\n<p>(The first pair is called the <strong>lead surrogate<\/strong>, and the latter the <strong>tail\nsurrogate<\/strong>.)<\/p>\n<p>Go ahead and copy that surrogate pair into your browser, and you\u2019ll see \ud83d\ude00.\nJavascript interprets this pair of characters as having a length of 2. That\u2019s\nwhy you can\u2019t just do something like:<\/p>\n<pre><code>&quot;abc\ud83d\ude00&quot;.split('')\n&gt;[&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;\ufffd&quot;, &quot;\ufffd&quot;]\n<\/code><\/pre>\n<p>So, how do we get the surrogate pair? <a href=\"http:\/\/www.2ality.com\/2013\/09\/javascript-unicode.html\">There\u2019s a great explanation\nhere<\/a>, and here\u2019s a gist\nillustrating going from emoji to decimal to surrogate pair and back again:<\/p>\n<script src=\"https:\/\/gist.github.com\/thekevinscott\/54dec813e12e7984d17e1badc30b930c.js\"><\/script>\n<p>Because of these limitations within Javascript, in order to parse strings\ncontaining emoji, we need some fancy footwork.<\/p>\n<h3 id=\"writing-a-regular-expression\">Writing a regular expression<\/h3>\n<p>Luckily, the internet is awash in smarter folks than I. The\n<a href=\"https:\/\/github.com\/lodash\/lodash\/blob\/4.16.6\/lodash.js\">lodash<\/a> library has\nproduced a rock solid emoji regular expression. Is is:<\/p>\n<pre><code>(?:[\\u2700-\\u27bf]|(?:\\ud83c[\\udde6-\\uddff]){2}|[\\ud800-\\udbff][\\udc00-\\udfff])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|\\ud83c[\\udffb-\\udfff])?(?:\\u200d(?:[^\\ud800-\\udfff]|(?:\\ud83c[\\udde6-\\uddff]){2}|[\\ud800-\\udbff][\\udc00-\\udfff])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|\\ud83c[\\udffb-\\udfff])?)*\n<\/code><\/pre>\n<p>Woof, that\u2019s a monster! Still, we\u2019re enterprising programmers, we\u2019re not afraid\nof a little regex, right? Let\u2019s reverse engineer this.<\/p>\n<p>From the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Emoji#Unicode_blocks\">Wikipedia emoji\nentry<\/a>, there\u2019s a couple\nranges of emoji (many of which have unassigned values, presumably for future\nemoji):<\/p>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Dingbat#Unicode\">Dingbats<\/a> (U+2700 to U+27BF, 33\nout of 192 of which are emoji)<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Miscellaneous_Symbols_and_Pictographs\">Miscellaneous Symbols and\nPictographs<\/a>\n(U+1F300 to U+1F5FF, 637 of 768 of which are emoji)<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Supplemental_Symbols_and_Pictographs\">Supplemental Symbols and\nPictographs<\/a>\n(U+1F900 to U+1F9FF, 80 out of 82 of which are emoji)<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Emoticons_(Unicode_block)\">Emoticons<\/a> (U+1F600 to\nU+1F64F)<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Transport_and_Map_Symbols\">Transport and Map\nSymbols<\/a> (U+1F680 to\nU+1F6FF, 92 out of 103 of which are emoji)<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Miscellaneous_Symbols\">Miscellaneous Symbols<\/a>\n(U+2600 to U+26FF, 77 out of 256 of which are emoji)<\/li>\n<\/ul>\n<p>To make this easier, I\u2019m assuming anything in those ranges is emoji. Our\naudience uses the English alphabet over SMS, so tough luck if I trawl up any\nother unsuspecting characters.<\/p>\n<h4 id=\"dingbats\">Dingbats<\/h4>\n<p>They range from U+2700 to U+27BF, so the regular expression for that looks like:<\/p>\n<pre><code>[\\u2700-\\u27bf]\n\/[\\u2700-\\u27bf]\/.test('\n')\n&gt; true\n<\/code><\/pre>\n<h4 id=\"miscellaneous-symbols-and-pictographs\"><strong>Miscellaneous Symbols and Pictographs<\/strong><\/h4>\n<p>These range from U+1F300 to U+1F5FF, with the following surrogate pairs:<\/p>\n<pre><code>toUTF16(0x1F300)\n&gt; &quot;\\uD83C\\uDF00&quot;\ntoUTF16(0x1F5FF)\n&gt; &quot;\\uD83D\\uDDFF&quot;\n<\/code><\/pre>\n<p>The regex for this range, from lodash\u2019s implementation, is:<\/p>\n<pre><code>[\\ud800-\\udbff][\\udc00-\\udfff]\n\/[\\ud800-\\udbff][\\udc00-\\udfff]\/.test(String.fromCodePoint(0x1F5FF))\n&gt; true\n\/[\\ud800-\\udbff][\\udc00-\\udfff]\/.test(String.fromCodePoint(0x1F300))\n&gt; true\n<\/code><\/pre>\n<h4 id=\"supplemental-symbols-and-pictographs\"><strong>Supplemental Symbols and Pictographs<\/strong><\/h4>\n<p>From U+1F900 to U+1F9FF, with the following surrogate pairs:<\/p>\n<pre><code>toUTF16(0x1F910)\n&gt; &quot;\\uD83E\\uDD10&quot;\ntoUTF16(0x1F9C0)\n&gt; &quot;\\uD83E\\uDDC0&quot;\n<\/code><\/pre>\n<p>We can reuse the same regex as above:<\/p>\n<pre><code>\/[\\ud800-\\udbff][\\udc00-\\udfff]\/.test(String.fromCodePoint(0x1F910))\n&gt; true\n\/[\\ud800-\\udbff][\\udc00-\\udfff]\/.test(String.fromCodePoint(0x1F9C0))\n&gt; true\n<\/code><\/pre>\n<h4 id=\"emoticons\">Emoticons<\/h4>\n<p>From U+1F600 to U+1F64F, with surrogate pairs:<\/p>\n<pre><code>toUTF16(0x1F600)\n&gt; &quot;\\uD83D\\uDE00&quot;\ntoUTF16(0x1F64F)\n&gt; &quot;\\uD83D\\uDE4F&quot;\n<\/code><\/pre>\n<p>Also covered by that same regex:<\/p>\n<pre><code>\/[\\ud800-\\udbff][\\udc00-\\udfff]\/.test(String.fromCodePoint(0x1F600))\n&gt; true\n\/[\\ud800-\\udbff][\\udc00-\\udfff]\/.test(String.fromCodePoint(0x1F64F))\n&gt; true\n<\/code><\/pre>\n<h4 id=\"transport-and-map-symbols\"><strong>Transport and Map Symbols<\/strong><\/h4>\n<p>Includes U+1F680 to U+1F6FF, with surrogate pairs:<\/p>\n<pre><code>toUTF16(0x1F680)\n&gt; &quot;\\uD83D\\uDE80&quot;\ntoUTF16(0x1F6FF)\n&gt; &quot;\\uD83D\\uDEFF&quot;\n<\/code><\/pre>\n<p>Also covered by that same regex:<\/p>\n<pre><code>\/[\\ud800-\\udbff][\\udc00-\\udfff]\/.test(String.fromCodePoint(0x1F680))\n&gt; true\n\/[\\ud800-\\udbff][\\udc00-\\udfff]\/.test(String.fromCodePoint(0x1F6FF))\n&gt; true\n<\/code><\/pre>\n<h4 id=\"miscellaneous-symbols\"><strong>Miscellaneous Symbols<\/strong><\/h4>\n<p>Includes U+2600 to U+26FF, with surrogate pairs:<\/p>\n<pre><code>toUTF16(0x2600)\n&gt; &quot;\\u2600&quot;\ntoUTF16(0x26FF)\n&gt; &quot;\\u26FF&quot;\n<\/code><\/pre>\n<p>We can write a regex for this like so:<\/p>\n<pre><code>\/[\\u2600-\\u26FF]\/\n\/[\\u2600-\\u26FF]\/.test(String.fromCodePoint(0x2600))\n&gt; true\n\/[\\u2600-\\u26FF]\/.test(String.fromCodePoint(0x26FF))\n&gt; true\n<\/code><\/pre>\n<h4 id=\"lodashs-mysterious-other-regex\">lodash\u2019s mysterious other regex<\/h4>\n<p>There\u2019s another section in the beginning of that original lodash regex we\nhaven\u2019t looked at yet:<\/p>\n<pre><code>(?:\\ud83c[\\udde6-\\uddff]){2}\n<\/code><\/pre>\n<p>If we examine what those characters represent, we get:<\/p>\n<pre><code>&quot;\\ud83c\\udde6&quot;\n&gt; &quot;\ud83c\udde6&quot;\n&quot;\\ud83c\\uddff&quot;\n&gt; &quot;\ud83c\uddff&quot;\n<\/code><\/pre>\n<p>Holy camoley, what the heck are those? I\u2019ll tell you what those are: those are\nthe <a href=\"https:\/\/en.wikipedia.org\/wiki\/Regional_Indicator_Symbol\">regional indicator symbol\nletters<\/a>\n<a href=\"http:\/\/emojipedia.org\/regional-indicator-symbol-letter-a\/\">A<\/a>-<a href=\"http:\/\/emojipedia.org\/regional-indicator-symbol-letter-z\/\">Z<\/a>.\nThese are used to create flags for various countries. For instance:<\/p>\n<pre><code>&quot;\\ud83c\\uddfa&quot;\n&gt; &quot;\ud83c\uddfa&quot;\n&quot;\\ud83c\\uddf8&quot;\n&gt; &quot;\ud83c\uddf8&quot;\n\/\/ when combining &quot;u&quot; + &quot;s&quot;:\n&quot;\\ud83c\\uddfa&quot; + &quot;\\ud83c\\uddf8&quot;\n&gt; &quot;\ud83c\uddfa\ud83c\uddf8&quot;\n<\/code><\/pre>\n<p>So that\u2019s a good section to keep around. <strong>The regex so far is:<\/strong><\/p>\n<pre><code>(?:[\\u2700-\\u27bf]|(?:\\ud83c[\\udde6-\\uddff]){2}|[\\ud800-\\udbff][\\udc00-\\udfff])\n<\/code><\/pre>\n<h3 id=\"lets-test-it-out\">Let\u2019s test it out<\/h3>\n<p>I\u2019m relying on <a href=\"https:\/\/raw.githubusercontent.com\/iamcal\/emoji-data\/master\/emoji.json\">Emoji-data\u2019s\njson<\/a> to\nprovide a library of every emoji. When we run this regular expression against\nthat list, we get 746 matches, 99 misses. Let\u2019s go through the misses:<\/p>\n<h4 id=\"keycaps\">Keycaps<\/h4>\n<p>There are <a href=\"https:\/\/en.wikipedia.org\/wiki\/Basic_Latin_(Unicode_block)\">12 keycap\nemojis<\/a>\n(<a href=\"http:\/\/emojipedia.org\/keycap-number-sign\/\">#\ufe0f\u20e3\ufe0f<\/a>,\n<a href=\"http:\/\/emojipedia.org\/keycap-asterisk\/\">*\ufe0f\u20e3<\/a> and\n<a href=\"http:\/\/emojipedia.org\/keycap-digit-zero\/\">0\ufe0f\u20e3\ufe0f<\/a>\u2013<a href=\"http:\/\/emojipedia.org\/keycap-digit-nine\/\">9\ufe0f\u20e3\ufe0f<\/a>),\nwhich look like:<\/p>\n<pre><code>&quot;\\u0030\\uFE0F\\u20E3&quot;\n&gt; &quot;0\ufe0f\u20e3\ufe0f&quot;\n&quot;\\u0039\\uFE0F\\u20E3&quot;\n&gt; &quot;9\ufe0f\u20e3&quot;\n&quot;\\u0023\\uFE0F\\u20E3&quot;\n&gt; &quot;#\ufe0f\u20e3&quot;\n&quot;\\u002A\\uFE0F\\u20E3&quot;\n&gt; &quot;*\ufe0f\u20e3&quot;\n<\/code><\/pre>\n<p>(That middle \u201c\\uFE0F\u2019 is optional, by the way.)<\/p>\n<p>These are covered by the following:<\/p>\n<pre><code>\/[\\u0023-\\u0039]\\ufe0f?\\u20e3\/\n\/[\\u0023-\\u0039]\\ufe0f?\\u20e3\/.test(&quot;\\u0023\\uFE0F\\u20E3&quot;)\n&gt; true\n\/[\\u0023-\\u0039]\\ufe0f?\\u20e3\/.test(&quot;\\u0039\\u20E3&quot;)\n&gt; true\n<\/code><\/pre>\n<h4 id=\"other-miscellaneous-emoji\"><strong>Other Miscellaneous Emoji<\/strong><\/h4>\n<p>Towards the bottom of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Emoji#Unicode_blocks\">Unicode Block Emoji\nentry<\/a> on Wikipedia is the\nfollowing:<\/p>\n<blockquote>\n<p>Additional emoji can be found in the following Unicode blocks:\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Arrows_(Unicode_block)\">Arrows<\/a> (8 codepoints\nconsidered emoji), <a href=\"https:\/\/en.wikipedia.org\/wiki\/Basic_Latin_(Unicode_block)\">Basic\nLatin<\/a> (12), <a href=\"https:\/\/en.wikipedia.org\/wiki\/CJK_Symbols_and_Punctuation\">CJK\nSymbols and\nPunctuation<\/a> (2),\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Enclosed_Alphanumeric_Supplement\">Enclosed Alphanumeric\nSupplement<\/a>(41),\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Enclosed_Alphanumerics\">Enclosed Alphanumerics<\/a>\n(1), <a href=\"https:\/\/en.wikipedia.org\/wiki\/Enclosed_CJK_Letters_and_Months\">Enclosed CJK Letters and\nMonths<\/a> (2),\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Enclosed_Ideographic_Supplement\">Enclosed Ideographic\nSupplement<\/a> (15),\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/General_Punctuation\">General Punctuation<\/a> (2),\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Geometric_Shapes\">Geometric Shapes<\/a> (8), <a href=\"https:\/\/en.wikipedia.org\/wiki\/Latin-1_Supplement_(Unicode_block)\">Latin-1\nSupplement<\/a>\n(2), <a href=\"https:\/\/en.wikipedia.org\/wiki\/Letterlike_Symbols\">Letterlike Symbols<\/a> (2),\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Mahjong_Tiles_(Unicode_block)\">Mahjong Tiles<\/a>\n(1), <a href=\"https:\/\/en.wikipedia.org\/wiki\/Miscellaneous_Symbols_and_Arrows\">Miscellaneous Symbols and\nArrows<\/a> (7),\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Miscellaneous_Technical\">Miscellaneous Technical<\/a>\n(18), <a href=\"https:\/\/en.wikipedia.org\/wiki\/Playing_cards_in_Unicode\">Playing Cards<\/a>\n(1), and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Supplemental_Arrows-B\">Supplemental\nArrows-B<\/a> (2).<\/p>\n<\/blockquote>\n<p>Why the heck are these other random emoji scattered around like detritus? I\nbelieve the reason is: \u201cbecause of history\u201d. But I don\u2019t really know. If you\nknow, leave a comment and educate us all!<\/p>\n<p>I won\u2019t go through these one by one. <a href=\"https:\/\/github.com\/thekevinscott\/emoji-tree\/blob\/master\/lib\/emojiRegex.js\">You can look in my Github repo for a\nbreakdown of the regex for each\nblock.<\/a>\nSuffice to say the regex that covers all these pesky buggers is:<\/p>\n<pre><code>[\\u0023-\\u0039]\\ufe0f?\\u20e3|\\u3299|\\u3297|\\u303d|\\u3030|\\u24c2|\\ud83c[\\udd70-\\udd71]|\\ud83c[\\udd7e-\\udd7f]|\\ud83c\\udd8e|\\ud83c[\\udd91-\\udd9a]|\\ud83c[\\udde6-\\uddff]|[\\ud83c\\ude01-\\ude02]|\\ud83c\\ude1a|\\ud83c\\ude2f|[\\ud83c\\ude32-\\ude3a]|[\\ud83c\\ude50-\\ude51]|\\u203c|\\u2049|[\\u25aa-\\u25ab]|\\u25b6|\\u25c0|[\\u25fb-\\u25fe]|\\u00a9|\\u00ae|\\u2122|\\u2139|\\ud83c\\udc04|[\\u2600-\\u26FF]|\\u2b05|\\u2b06|\\u2b07|\\u2b1b|\\u2b1c|\\u2b50|\\u2b55|\\u231a|\\u231b|\\u2328|\\u23cf|[\\u23e9-\\u23f3]|[\\u23f8-\\u23fa]|\\ud83c\\udccf|\\u2934|\\u2935|[\\u2190-\\u21ff]\n<\/code><\/pre>\n<h3 id=\"conclusion\">Conclusion<\/h3>\n<p>Which means that\u2026 drum roll\u2026 the final regex for parsing emojis is:<\/p>\n<pre><code>(?:[\\u2700-\\u27bf]|(?:\\ud83c[\\udde6-\\uddff]){2}|[\\ud800-\\udbff][\\udc00-\\udfff]|[\\u0023-\\u0039]\\ufe0f?\\u20e3|\\u3299|\\u3297|\\u303d|\\u3030|\\u24c2|\\ud83c[\\udd70-\\udd71]|\\ud83c[\\udd7e-\\udd7f]|\\ud83c\\udd8e|\\ud83c[\\udd91-\\udd9a]|\\ud83c[\\udde6-\\uddff]|[\\ud83c\\ude01-\\ude02]|\\ud83c\\ude1a|\\ud83c\\ude2f|[\\ud83c\\ude32-\\ude3a]|[\\ud83c\\ude50-\\ude51]|\\u203c|\\u2049|[\\u25aa-\\u25ab]|\\u25b6|\\u25c0|[\\u25fb-\\u25fe]|\\u00a9|\\u00ae|\\u2122|\\u2139|\\ud83c\\udc04|[\\u2600-\\u26FF]|\\u2b05|\\u2b06|\\u2b07|\\u2b1b|\\u2b1c|\\u2b50|\\u2b55|\\u231a|\\u231b|\\u2328|\\u23cf|[\\u23e9-\\u23f3]|[\\u23f8-\\u23fa]|\\ud83c\\udccf|\\u2934|\\u2935|[\\u2190-\\u21ff])\n<\/code><\/pre>\n<p>Hopefully that dispells some of the confusion around parsing emoji.<\/p>\n<p><em>Update: Thanks to Matteo for pointing out a bug in the above regex!<\/em><\/p>"},{"title":"Popular Use Cases for Chatbots","link":"https:\/\/thekevinscott.com\/popular-use-cases-for-chatbots\/","pubDate":"Tue, 18 Oct 2016 07:00:00 +0000","guid":"https:\/\/thekevinscott.com\/popular-use-cases-for-chatbots\/","description":"<p><img src=\"marcus.jpeg\" alt=\"David Marcus talking about bots\">\n<capt>David Marcus talking \u2018bout bots.<\/capt><\/p>\n<p>It\u2019s been six months since Facebook launched their support for chatbots on\nMessenger. Bots hit a <a href=\"https:\/\/news.ycombinator.com\/item?id=11914812\">huge hype\ncycle<\/a>, leading even <a href=\"https:\/\/www.engadget.com\/2016\/09\/13\/facebook-messenger-chief-admits-bot-launch-was-overhyped\/\">David\nMarcus to\nadmit<\/a>\nthat bots were a bit overhyped.<\/p>\n<p>Nonetheless, some bots have carved out a strong niche for themselves and\npersevered by playing to their strengths. I\u2019m curious to understand what ideal\nuse cases are for bots, and when they can match or exceed equivalent app or web\nexperiences.<\/p>\n<h3 id=\"mo-apps-mo-problems\">Mo\u2019 Apps, Mo\u2019 Problems<\/h3>\n<p><a href=\"https:\/\/chatbotsmagazine.com\/how-bots-will-completely-kill-websites-and-mobile-apps-656db8e6fc03\">Chatbots can often be thought of as replacements or supplements to existing\napps or\nwebsites<\/a>.\nWhat are some common gripes around apps or websites?<\/p>\n<ol>\n<li><strong>Friction<\/strong>\u2014 Apps need to be downloaded, and websites often suffer from bloat\nand long loading times. <a href=\"http:\/\/www.emarketer.com\/Article\/How-Many-Apps-Do-Smartphone-Owners-Use\/1013309\">A majority of\napps<\/a>\ndon\u2019t even last a <em>day<\/em>, and users are spending the <a href=\"https:\/\/techcrunch.com\/2015\/06\/22\/consumers-spend-85-of-time-on-smartphones-in-apps-but-only-5-apps-see-heavy-use\/\">majority of their\ntime<\/a>\nin just 5 apps. Meanwhile, people are <a href=\"https:\/\/blog.kissmetrics.com\/loading-time\/\">notoriously unforgiving of slow loading\nwebsites<\/a>, and Google even penalizes\n<a href=\"https:\/\/webmasters.googleblog.com\/2010\/04\/using-site-speed-in-web-search-ranking.html\">sites that load too\nslowly<\/a>,\nbut despite all that the <a href=\"https:\/\/www.soasta.com\/blog\/page-bloat-average-web-page-2-mb\/\">average website weighs in at more than\n2mb<\/a>.<\/li>\n<li><strong>Confusing Design<\/strong>\u2014 Poorly designed apps can be a nightmare to navigate, and\nlead to frustrated users. If a team doesn\u2019t have the bandwidth or the expertise\nto craft a properly designed experience it can make a product unusable.<\/li>\n<li><strong>Information Overload<\/strong>\u2014 Websites can be complicated and suffer from way too\nmuch information. In the interest of trying to appeal to as many people as\npossible, the majority of that information is going to be irrelevant or\ndistracting.<\/li>\n<\/ol>\n<p><img src=\"microphone.png\" alt=\"An image of a microphone icon\"><\/p>\n<h3 id=\"bringing-a-microphone-to-a-lan-party\">Bringing a Microphone to a LAN Party<\/h3>\n<p>Another way to think about it is: what advantages does a conversational UI bring\nto the table? When is natural language a better way of interacting with a\ncomputer than a well-designed GUI?<\/p>\n<ol>\n<li><strong>Specific and urgent information<\/strong>\u2014 When a customer needs an answer\nimmediately, and that question is not easily discoverable, a conversational UI\ncan offer the most direct path to satisfying that need. Customer service is a\ngreat example of this use case.<\/li>\n<li><strong>Disparate Information Sources<\/strong>\u2014 Monitoring different information streams\nacross domains, and providing combined feedback and facilitating user actions,\ncan oftentimes be done easier in a bot on the fly. <a href=\"https:\/\/www.linkedin.com\/pulse\/5-futuristic-use-cases-bots-business-tech-pietro-casella\">As Pietro Casella puts\nit<\/a>:\n\u201cMulti domain Bots will be Bots that present relevant information from multiple\nsources in a combined manner. They wil also be bots that are able to translate\nanswers into actions on the underlying systems.\u201d<\/li>\n<li><strong>Basic actions<\/strong>\u2014 For rote or fairly basic purchases, it can be faster to\ninteract with a bot. Things like <a href=\"https:\/\/www.messenger.com\/t\/dominos\/\">ordering a\npizza<\/a> or\n<a href=\"https:\/\/www.messenger.com\/t\/1800flowers\">flowers<\/a> can often be done faster via\ntext message than through an app.<\/li>\n<li><strong>Appealing to a non-tech-savvy audience<\/strong>\u2014 We shouldn\u2019t ignore the fact that a\nwhole swath of the world is still acquiring technological literacy. A natural\nlanguage interface can often be a more intuitive method of interaction, if the\ninteraction is well designed. IM interactions <a href=\"https:\/\/chatbotslife.com\/the-half-life-of-knowledge-5f13b3ae3a07#.tavlqrhfs\">receive some of the highest\ncustomer satisfaction\nscores<\/a>\ncompared to email, phone calls and websites.<\/li>\n<\/ol>\n<h3 id=\"opportunities-for-chatbots-to-step-up\">Opportunities for Chatbots to Step Up<\/h3>\n<p>So with that all, where are the areas chatbots can match or exceed GUIs?<\/p>\n<h4 id=\"a-users-need-is-urgent-andor-specific\">A User\u2019s Need is Urgent and\/or Specific<\/h4>\n<p><a href=\"https:\/\/chatbotsmagazine.com\/how-with-the-help-of-chatbots-customer-service-costs-could-be-reduced-up-to-30-b9266a369945\">Customer service is probably the flagship use case for a\nchatbot<\/a>.\nSpecific requests that are context-specific (location, time-based,\npurchase-history, etc.) are good candidates to be offloaded to a chatbot.<\/p>\n<p>Additionally, because downloading an app or navigating to a bloated website are\nhigh friction activities, interacting via text can often be a much more\nsatisfying experience.<\/p>\n<p><a href=\"https:\/\/www.messenger.com\/t\/1800flowers\">1\u2013800-Flowers<\/a> is a great example of a\npurchase interaction that doesn\u2019t need an app.<\/p>\n<h4 id=\"a-user-is-struggling-to-navigate-your-app-or-website\">A User is Struggling to Navigate Your App or Website<\/h4>\n<p>In a perfect world, every app would be wildly intuitive and users would never be\nconfused. Obviously that\u2019s not our world.<\/p>\n<p><a href=\"https:\/\/chatbotsmagazine.com\/when-do-bots-beat-apps-when-context-and-convenience-matter-most-443c9191bb2b\">Bots can simplify a complicated system by responding directly to a user\u2019s\nneeds<\/a>.\nAlternatively, chatbots can act as a user\u2019s designated agent, going out and\nperforming their bidding.<\/p>\n<p>Two great examples of chatbots that navigate tricky systems on the user\u2019s behalf\n(and also tie together disparate data sources) are\n<a href=\"http:\/\/www.donotpay.co.uk\/signup.php\">DoNotPay<\/a>, a chatbot for challenging\nparking tickets, and <a href=\"http:\/\/m.me\/mytruebill\">Truebill<\/a>, a bot that identifies\nand cancels unnecessary subscriptions.<\/p>\n<h4 id=\"the-information-youre-delivering-can-be-personalized\">The Information You\u2019re Delivering Can Be Personalized<\/h4>\n<p>Stefan Kojouharov calls this a \u201csuperpower\u201d: the <a href=\"http:\/\/venturebeat.com\/2016\/10\/16\/chatbots-have-a-superpower-over-apps\/\">ability for bots to be\npersonalized<\/a>.<\/p>\n<blockquote>\n<p>Websites and apps are meant to solve problems for groups, but bots can solve\nproblems for individuals.<\/p>\n<\/blockquote>\n<p>It feels like this could potentially upend the way we as designers interact with\nusers. Designing truly unique interactions with users will require new methods\nand tools than what we have now, but could unlock far more engaging experiences\nthan what we have today.<\/p>\n<h4 id=\"a-user-wants-to-remain-engaged\">A User Wants to Remain Engaged<\/h4>\n<p>Proactive notifications could be an opportunity for bots. Being able to reach\nout and let a user know something in a personalized or context aware fashion\ngives bots a leg up on more naive app push notifications, and being able to\nmessage a user where they already are (for instance, on Facebook messenger or\nvia SMS) and drop right into the interface is an opportunity to reduce friction.<\/p>\n<p>News bots, like <a href=\"https:\/\/www.messenger.com\/t\/cnn\">CNN\u2019s bot<\/a>, are great examples\nof chatbots that proactively keep the user abreast of developments.<\/p>\n<p>The flipside of this is that when bots inevitably overstep the bounds of good\ntaste, they\u2019ll be treated as spam and banned. <strong>Teach your chatbots to be good\ninternet citizens and don\u2019t over-notify.<\/strong><\/p>\n<h4 id=\"a-user-needs-specific-answers-outside-of-business-hours\">A User Needs Specific Answers Outside of Business Hours<\/h4>\n<p>Before the internet, contacting a business meant waiting until business hours.\nOnce the internet came along, you could interact with a digital storefront at\nall hours of the day. Bots promise to do the same thing to real time\ncommunication with businesses, by making staff available 24\/7.<\/p>\n<blockquote>\n<p>\u201cOn a conversation you expect immediate response. Major advances in real time\ndata availability are making the conversation possible.\u201d \u2014 <a href=\"http:\/\/%5Bhttps:\/\/www.linkedin.com\/pulse\/5-futuristic-use-cases-bots-business-tech-pietro-casella%5D\">Pietro\nCasella<\/a><\/p>\n<\/blockquote>\n<h4 id=\"a-designer-needs-a-rough-and-dirty-prototyping-tool\">A Designer Needs a Rough and Dirty Prototyping Tool<\/h4>\n<p>With little visual design and, increasingly, no need to write a line of code,\nbots could be an ideal way to test ideas and assumptions on the fly. No need to\nresort to visual mockups, wireframes, or websites, just launch a bot and collect\nyour insights!<\/p>\n<h4 id=\"a-designer-needs-a-tool-to-break-the-ice-during-a-group-interaction\">A Designer Needs a Tool to Break the Ice During a Group Interaction<\/h4>\n<p>Group interactions, especially among strangers or colleagues, acan be staid. A\nbot could be a welcome participant for introducing some levity or breaking the\nice.<\/p>\n<p>A great example of this from my personal experience is the <a href=\"http:\/\/giphy.com\/posts\/slack-adds-giphy-to-every-chatroom-wut\">Giphy\nchatbot<\/a> in\nSlack, ubiquitous in most groups I frequent.<\/p>\n<blockquote>\n<p>\u201cThey hold out the promise of being able to generate more conversation than\nmight have otherwise occurred between the humans. To act as conversation\nstarters\/primers. And\/or conversational maintainers.\u201d \u2014 <a href=\"https:\/\/techcrunch.com\/2016\/05\/05\/a-few-words-on-chatbots\/\">Natasha\nLomas<\/a><\/p>\n<\/blockquote>\n<h3 id=\"conclusion\">Conclusion<\/h3>\n<p>While undoubtedly overhyped, chatbots still have a number of use cases where\nthey win out over traditional apps or websites. And it feels like we have yet to\nsee the killer bots (no pun intended) emerge on the platform.<\/p>"},{"title":"Testing Chatbots","link":"https:\/\/thekevinscott.com\/testing-chatbots-how-to-ensure-a-bot-says-the-right-thing-at-the-right-time\/","pubDate":"Wed, 20 Jul 2016 09:00:00 +0000","guid":"https:\/\/thekevinscott.com\/testing-chatbots-how-to-ensure-a-bot-says-the-right-thing-at-the-right-time\/","description":"<p>I love tests. I used to abhor them, but I\u2019ve been burned enough times by enough codebases to appreciate a robust testing suite, and its ability to ensure well functioning code and developer sanity.<\/p>\n<p>That\u2019s why, as we developed our scripted bot, Emoji Salad, an Emoji Pictionary\nSMS-based chat game, tests were baked in from the start. These tests, and in\nparticular the integration tests, have been vital throughout development.\nThey\u2019ve allowed us to iterate faster, cheaper, and with confidence that we\u2019re\nnot introducing new bugs along the way.<\/p>\n<p>Chatbots bring their own set of challenges to testing. For us, those include,\nforemost, testing across environments: integrating third party platforms, like\nFacebook Messenger or Telegram, into a test suite is hard without a\nservice-level separation between message delivery and bot logic. Additionally,\nIf your platform charges for usage (like Twilio) and you\u2019re running your test\nsuite on it, testing can quickly send you to the poorhouse. Finally, while\nthere\u2019s no substitute for human QA, testing that the bot can coordinate across\nmultiple people is a huge drag; having an automated solution for testing means\nyou\u2019ll test more often.<\/p>\n<p><img src=\"bot.gif\" alt=\"Conspiring Bot\"><\/p>\n<h3 id=\"bots-conspiring-against-your-tests\">Bots Conspiring Against Your Tests<\/h3>\n<p>Some of the challenges we\u2019ve faced while writing tests for our scripted Chatbot\ninclude:<\/p>\n<h4 id=\"chatbots-can-vary-the-things-they-say\">Chatbots Can Vary the Things They Say<\/h4>\n<p>A well-designed bot <a href=\"https:\/\/chatbotsmagazine.com\/designing-a-chatbots-personality-52dcf1f4df7d\">brings some personality to the\ntable<\/a>,\nand a prime component of that is variability in conversation. Though our bot\nfollows a script, it picks at random from a number of prewritten responses.\nOther sources of variability might be messages dependent on time of day, or\nlocation.<\/p>\n<h4 id=\"a-bot-can-say-the-wrong-thing-in-the-right-way\">A Bot Can Say the Wrong Thing in the Right Way<\/h4>\n<p>A common scenario in our game is a message like:<\/p>\n<p><img src=\"common-scenario.png\" alt=\"A common scenario\"><\/p>\n<p>Where the original code is:<\/p>\n<pre><code>\ud83d\udc7e Hey %(avatar)s %(nickname)s, this is your time to shine. Your phrase is: %(clue)s\n<\/code><\/pre>\n<p>Our Test Suite accepts wildcards for any of the arguments, allowing us to\nseparate the contents from the container, and test each separately.<\/p>\n<h4 id=\"bots-initiating-conversations-need-attention-too\">Bots Initiating Conversations Need Attention, Too<\/h4>\n<p>Not all bot communication is initiated by the user. Bots can proactively message\nusers too. For instance, Poncho sent me a severe weather alert the other day:<\/p>\n<p><img src=\"poncho.png\" alt=\"Poncho\"><\/p>\n<h4 id=\"bots-might-coordinate-across-multiple-users\">Bots Might Coordinate Across Multiple Users<\/h4>\n<p>A particular challenge for us is QAing functionality across multiple users.<\/p>\n<p>Having a testing suite has made group QA much less painful. In our suite we can\nspecify the user doing the sending in an array of messages:<\/p>\n<pre><code>[\n{ player: players[0], msg: \u2018invite \u2018+players[1].number},\n{ player: players[1], msg: \u2018yes\u2019},\n{ player: players[1], msg: players[1].nickname },\n{ player: players[1], msg: players[1].avatar },\n{ player: players[1], msg: \u2018invite \u2018+players[2].number},\n{ player: players[2], msg: \u2018yes\u2019},\n{ player: players[2], msg: players[2].nickname }\n];\n<\/code><\/pre>\n<h4 id=\"isolating-third-party-components\">Isolating Third Party Components<\/h4>\n<p>We rely on a semantic API as a backup check for answer submissions (like\nmatching <em>Mario Brothers<\/em> to <em>Super Mario Bros<\/em>.) We don\u2019t currently have a\ngreat solution for mocking third party components within integration tests.<\/p>\n<p>As we move into more NLP-driven interactions with our bot, this will quickly\nbecome more of a concern. If you\u2019ve got thoughts on good ways to test this,\nleave a comment!<\/p>\n<h3 id=\"how-we-built-our-test-suite\">How We Built Our Test Suite<\/h3>\n<p>Our Bot is <a href=\"https:\/\/chatbotsmagazine.com\/we-moved-to-a-services-based-architecture-while-building-our-bot-and-it-is-awesome-e64316d83922\">built on a number of different micro\nservices<\/a>,\neach one with its own unit tests, while the integration tests sit apart in their\nown codebase.<\/p>\n<p><img src=\"suite.png\" alt=\"Our testing suite\">\n<capt>Our integration testing architecture<\/capt><\/p>\n<p>We have a dedicated testing queue that receives messages from the Test Suite,\nand captures responses from the Bot. The Test Suite pings the testing queue\nperiodically for new messages, and each incoming message ID is matched up with\nthe received message ID.<\/p>\n<p>There\u2019s four types of responses the Test Suite can expect to get back:<\/p>\n<ol>\n<li>No response \u2014 <em>for instance, if someone is blacklisted<\/em><\/li>\n<li>An invalid\nresponse \u2014 <em>the wrong message entirely<\/em><\/li>\n<li>A valid response with invalid\ndata \u2014 <em>aka, the correct container but the wrong content<\/em><\/li>\n<li>A valid response \u2014 <em>any of a number of variations<\/em><\/li>\n<\/ol>\n<p>The Test Suite will get the response by querying for a specific message ID (the\n\u2018sent\u2019 ID).<\/p>\n<p>For our bot, message variability is typically random, so our testing framework\npulls all possible variations the bot might respond with and checks for the\npresence of any of them. If there\u2019s a match, that\u2019s a valid response. For\ninvalid responses, it\u2019s easy to see from the reported error whether it\u2019s the\ndata or the container that\u2019s incorrect.<\/p>\n<p>Finally, for messages where we expect no response, we set up a timer to check\nthat no message matching the sent message was received within a certain timespan\n(2 seconds). This timed solution works for us since we only have a handful of\nedge cases where we expect no response.<\/p>"},{"title":"Services-based architectures","link":"https:\/\/thekevinscott.com\/we-moved-to-a-services-based-architecture-while-building-our-bot-and-it-is-awesome\/","pubDate":"Tue, 12 Jul 2016 07:00:00 +0000","guid":"https:\/\/thekevinscott.com\/we-moved-to-a-services-based-architecture-while-building-our-bot-and-it-is-awesome\/","description":"<p>The first draft of Emoji Salad, our Emoji Pictionary bot, was a monolithic\nNode.js app. The server would respond only when a request came in via SMS. If\nthere was a critical error, the whole server would fall on our heads, making it\ndifficult to diagnosis the error and impossible to continue conversations. On\ntop of that, deployments were nightmares that brought the possibility of\ndisrupted conversations, if messages were received during a deployment.<\/p>\n<p>Our current implementation uses a microservice based approach:<\/p>\n<p><img src=\"diagram.png\" alt=\"A diagram of our architecture\"><\/p>\n<p>Moving to a services-based architecture has brought some big wins for us:<\/p>\n<h4 id=\"separation-of-concerns\">Separation of concerns<\/h4>\n<p>Isolating each service to its core functionality has made it easier to reason\nabout what each one does. Since each service must interact through a defined\nAPI, we can drastically refactor one service without affecting any of the\nothers.<\/p>\n<p>For example, our first draft used an HTTP-like model, where server routes would\nbe matched via regex against incoming messages. At a certain point in\ndevelopment, we realized we wanted the bot to be able to initiate messages to\nusers, things like<\/p>\n<blockquote>\n<p>Hey, we\u2019re still waiting on that clue submission!<\/p>\n<\/blockquote>\n<p>and gentle encouragements like:<\/p>\n<blockquote>\n<p>This clue\u2019s a doozy! Don\u2019t forget, the emojis are \ud83d\udcfd\ud83e\udd16\u2764\ufe0f\ud83e\udd16<\/p>\n<\/blockquote>\n<p>With a service-oriented architecture, we were able to rebuild how the Bot\nservice processed messages, so it could spin itself to initiate messages to\nusers, without affecting the other services (or even having to redeploy them).<\/p>\n<p>Also, since we\u2019ve done the work of defining how exactly the Bot expects to\nreceive incoming messages, adding support for other messaging platforms (like\nFacebook Messenger) is a cinch.<\/p>\n<h4 id=\"improved-testability\">Improved Testability<\/h4>\n<p>With isolated services, it\u2019s easier to write targeted tests.<\/p>\n<p>By requiring each service to have a well-designed API for interaction, it\nbecomes easier to make sure you\u2019re testing the right things. Smaller services\nshould have fewer dependencies, too. And as an added bonus, service-only test\nsuites should take less time to run, which means you can run them more often.<\/p>\n<p>We do have extension integration tests across all the services, which is a topic\nfor another article.<\/p>\n<h4 id=\"easier-deployments\">Easier Deployments<\/h4>\n<p>It\u2019s easier to deploy individual services, and fix them when they go down. For\ninstance, our message queues rarely change; we have the confidence to know that\nif the Bot itself goes down, we\u2019ll still be collecting incoming messages from\nusers, and we\u2019ll be able to handle those messages once the Bot comes back\nonline.<\/p>\n<p>Services do introduce overhead around managing multiple services and their\ndependencies, but Vagrant and Docker go a long way towards smoothing out that\nprocess.<\/p>\n<hr>\n<p>There\u2019s plenty more refactoring we could do; for instance, we\u2019re eager to break\napart our Bot service further, which is currently handling both the message\nparsing and the logic of the script driving the bot interactions.<\/p>\n<p>Overall, moving to a services-based architecture has allowed us to iterate\nfaster with much more overall stability, meaning a better bot experience from\nstart to finish.<\/p>"},{"title":"Cross Platform Bots","link":"https:\/\/thekevinscott.com\/cross-platform-bots\/","pubDate":"Wed, 25 May 2016 09:56:00 +0000","guid":"https:\/\/thekevinscott.com\/cross-platform-bots\/","description":"<p>In 2007, when Apple released the iPhone, Jobs made a controversial decision not\nto allow Flash on iOS. Given the long history between Adobe and Apple, not to\nmention the prevalence of Flash on the 2007-era internet, this was more than a\nlittle shocking. As a result of the flak, Jobs penned an essay <a href=\"http:\/\/www.apple.com\/hotnews\/thoughts-on-flash\/\">where he wrote\nthat the most important reason<\/a>\nfor not allowing Flash on the iPhone was because:<\/p>\n<blockquote>\n<p>Flash is a cross platform development tool. It is not Adobe\u2019s goal to help\ndevelopers write the best iPhone, iPod and iPad apps. It is their goal to help\ndevelopers write cross platform apps.\n\u2026\nOur motivation is simple \u2014 we want to provide the most advanced and innovative\nplatform to our developers, and we want them to stand directly on the shoulders\nof this platform and create the best apps the world has ever seen. \u2014 <a href=\"http:\/\/www.apple.com\/hotnews\/thoughts-on-flash\/\">Steve\nJobs<\/a><\/p>\n<\/blockquote>\n<p>Today, in the face of a rapid onslaught of bots, designers are faced with\nensuring a high quality experience across a myriad of messaging platforms. Some\nargue for consistency across platforms:<\/p>\n<blockquote>\n<p>*Users should experience a similar, consistent interaction across platforms with\nyour bot, and be able to share these services with multiple users on different\nplatforms. For instance, if you\u2019re on FB Messenger, you should be able to\ncoordinate your shopping with users on Kik, WhatsApp, LINE &amp; etc easily without\nleaving the Facebook interface. \u2014\n<a href=\"https:\/\/medium.com\/chat-bots\/why-the-future-of-bots-will-be-multi-platform-67c503afaa7#.nfe3hmg88\">Kip<\/a><\/p>\n<\/blockquote>\n<p>And:<\/p>\n<blockquote>\n<p>Does cross platform design become a thing of the past?? Adding functionality is\nas simple as adding a user to your chat \u2014 no longer a need to design for\nseparate platforms. \u2014 <a href=\"https:\/\/medium.com\/chat-bots\/the-message-is-the-medium-11e2a4da145c#.6l0ch4xo4\">The medium is the\nmessage<\/a><\/p>\n<\/blockquote>\n<p>I see parallels between what happened with apps, and what is to come with bots.\n<a href=\"https:\/\/www.taoeffect.com\/blog\/2010\/04\/steve-jobs-response-on-section-3-3-1\/\">Jobs put it\neloquently<\/a>:<\/p>\n<blockquote>\n<p>We\u2019ve been there before, and intermediate layers between the platform and the\ndeveloper ultimately produces sub-standard apps and hinders the progress of the\nplatform.<\/p>\n<\/blockquote>\n<p>Should bots be consistent across platforms, or not? To answer that, let\u2019s see what the major differences between messaging platforms are. From there we can see if there\u2019s some grand unified \u201ctheory of everything\u201d for how to approach bot design across platforms.<\/p>\n<hr>\n<h3 id=\"differences-between-messaging-platforms\">Differences between messaging platforms<\/h3>\n<p><img src=\".\/platforms.png\" alt=\"Differences between platforms\"><\/p>\n<p>Currently, the messaging ecosystem is divided between a number of players. You\ncan think of each of these like its own Operating System. Every Operating System\nhas its own culture, its own language, its own best practices for facilitating\ncommunication.<\/p>\n<p><img src=\".\/city.gif\" alt=\"WeChat\"><\/p>\n<blockquote>\n<p>\u201cWeChat is like a fully fledged cityscape where all the electrical and plumbing\nhave been installed\u2026developers can come in and build all kinds of unique and\ndistinctive real estate that assists people as they go about their daily\nlives.\u201d<br> \u2014 <a href=\"http:\/\/www.mslgroup.cn\/whitepapers\/MSLGROUP_We_Chat_about_WeChat_Dec2013_EN.pdf\">We chat about\nWeChat<\/a>,\nMLS Group<\/p>\n<\/blockquote>\n<p>There are three broad areas where platforms differentiate themselves from each\nother: technical capabilities, audience, and communication styles.<\/p>\n<h4 id=\"1-technical-capabilities\">1) Technical capabilities<\/h4>\n<p>Every platform supports a unique subset of technical capabilities.<\/p>\n<p>The most basic, and certainly the most ubiquitous, is SMS. Almost everyone with\na cell phone has SMS, and everyone knows how to text. That ubiquity is nothing\nto be scoffed at; it\u2019s hard to find an easier onboarding process than sending a\ntext.<\/p>\n<p><img src=\".\/typing.gif\" alt=\"A Chat Bot typing indicator\"><\/p>\n<p>At the same time, (non-iMessage) SMS lacks some features that would be <em>really<\/em>\nnice to have. There\u2019s no delivery receipts, no typing indicators. You can send\ntext and pictures, but forget about anything like Facebook\u2019s structured\nmessages. And if your app depends on sending rapid fire messages that arrive in\norder, well, you\u2019re in for a <a href=\"https:\/\/www.twilio.com\/help\/faq\/sms\/can-my-sms-messages-arrive-in-order\">world of\nhurt<\/a>.<\/p>\n<p><img src=\".\/fatcats.png\" alt=\"Fat Cats\">\n<capt>Fat cats on Facebook<\/capt><\/p>\n<p>At the other end of the spectrum, Facebook Messenger supports typing indicators\nand read receipts, along with structured messages. Messenger also supports\nsending pictures, stickers, and audio. Messenger comes with its own limitations,\nhowever, like requiring bots to go through an approval process against\nFacebook\u2019s Terms of Service.<\/p>\n<p>The limitations of particular platforms lead to some interesting tradeoff\ndecisions. If you support both SMS and Facebook Messenger, should you design for\nthe lowest common denominator (SMS) and build your interactions in text-only? Or\nshould you offer structured messages to your Facebook customers, and text-only\ninteractions over SMS? What if you\u2019re building a game, and speed is of the\nessence? Are you comfortable putting your SMS customers at a disadvantage? Do\nyou force SMS customers to only play with other SMS customers?<\/p>\n<h4 id=\"2-audience\">2) Audience<\/h4>\n<p>Every platform brings with it a unique audience, with potentially unique\nexpectations.<\/p>\n<p><img src=\".\/top-3.jpeg\" alt=\"Top 3\">\n<capt>From <a href=\"http:\/\/www.emarketer.com\/\">http:\/\/www.emarketer.com\/<\/a><\/capt><\/p>\n<p>Platforms can be skewed by geography. For instance, <a href=\"http:\/\/www.visionmobile.com\/blog\/2016\/04\/messenger-vs-skype-vs-slack-vs-telegram-how-to-spot-the-winners\/\">WeChat dominates in China,\nwith 700m\nusers.<\/a>\nIn the US, Facebook Messenger tends to dominate with <a href=\"https:\/\/contently.com\/strategist\/2015\/06\/30\/the-state-of-messaging-apps-in-5-charts\/\">60% of the\nmarket<\/a>.\nAppealing to different parts of the world will drastically impact the design of\nyour app; beyond the language differences, designers need to remain cognizant of\nthe cultural implications their interactions could provoke.<\/p>\n<p>Platforms are often skewed by age. <a href=\"http:\/\/www.statista.com\/statistics\/326452\/snapchat-age-group-usa\/\">60% of Snapchat\nusers<\/a> are\nunder 24; SMS almost certainly skews older. Different age groups will exhibit\ndifferent expectations for how to engage with bots.<\/p>\n<p>Race, gender, levels of education, or technical proficiency are all additional\nways platforms can and will differ from one another. For instance, Slack is\nknown for its highly technical early adopters, who are presumably more\ncomfortable engaging with bots.<\/p>\n<h4 id=\"3-communication-styles\">3) Communication Styles<\/h4>\n<p>Finally, the same user may adopt drastically different styles of communication\nacross different platforms.<\/p>\n<p><img src=\"slackbot.png\" alt=\"Slackbot\">\n<capt>Brandon gets it done<\/capt><\/p>\n<p>For instance, does the platform support group or solo conversations, or both?\nBots on Slack generally interact with groups, while bots on Facebook Messenger\nare (for now) solo. SMS falls somewhere in between. These require fundamentally\ndifferent strategies for design, and bots should know whether they\u2019re speaking\nto a crowd or not.<\/p>\n<p>Average message length and expected time-to-response are other ways platforms\ncan differentiate themselves. My conversations over Facebook Messenger tend\ntowards rapid fire communication, and I find the platform encourages this with\ninstant delivery and read receipts. On the other hand, email leads to much\nlonger response times, measured anywhere from minutes to days. Bots can and\nshould mediate their verbosity and response times as appropriate.<\/p>\n<p>Finally, the types of language used across each platform may also differ.\nCertain platforms are present only on mobile, which can lead to typos and a\nshorter, more to the point communication style. Or, a user might expect\nsomething closer to natural language.<\/p>\n<h3 id=\"divide-and-conquer\">Divide and Conquer<\/h3>\n<p>Surveying the landscape, it\u2019s clear that providing a best-in-class experience\nfor a particular platform requires a thorough understanding of that platform\u2019s\nlimitations and strengths. At the same time, there\u2019s clearly a strong incentive,\nboth from a branding perspective and from the a resource perspective, in\nproviding consistent experiences across platforms.<\/p>\n<p>Two conclusions stand out to me: 1) We need a way of separating content from\npresentation. 2) Tools used to build bots should allow and encourage designers\nto iterate, measure and test cheaply and easily.<\/p>\n<h4 id=\"separating-content-from-presentation\">Separating Content from Presentation<\/h4>\n<p>On the web, it\u2019s <a href=\"http:\/\/www.programmableweb.com\/news\/cope-create-once-publish-everywhere\/2009\/10\/13\">common to decouple the business\nlogic<\/a>\n(the API) from the view logic (the UI). The server can speak JSON, and the\nclients can consume that JSON and display in whatever format is best for the\nplatform.<\/p>\n<p>A similar technique might come in handy with bots. We could think of the server\nas the <em>content<\/em> and the client as the <em>personality<\/em>.<\/p>\n<p><strong>Content<\/strong><\/p>\n<p>Content is the business logic and should be the same across platforms.<\/p>\n<p>Whether you order Seamless via Messenger, via SMS, or via the Seamless app, you\nknow you\u2019re going to decide what to eat and then you\u2019re going to order it and\nthen you\u2019re going to eat it. This should be consistent across platforms.<\/p>\n<p><strong>Personality<\/strong><\/p>\n<p>The way we interact with bots \u2014 that\u2019s the part that can be unique to a\nparticular platform. For instance, Poncho on Facebook Messenger is witty, and\nthat\u2019s part of its charm. However, consider that you\u2019re interacting with Poncho\nover SMS. Is it appropriate to send so many messages? What if I\u2019ve got a limited\ndata plan? Or consider a bot on Slack. Do I want a chatty Cathy bot adding to\nthe already considerable Firehose of Slack information? Probably not.<\/p>\n<p><img src=\".\/bender.jpeg\" alt=\"Bender\"><\/p>\n<p>While we can strive for a consistent experience when it comes to the <em>content<\/em>,\nwe should embrace diversity when it comes to the <em>personality<\/em>. Design and build\nfor a specific platform.<\/p>\n<h4 id=\"richer-tools-for-design\">Richer Tools for Design<\/h4>\n<p>On the web, designers have developed a rich set of conventions and tools for\nbuilding best-in-breed experiences. Designers will need similar tools in the bot\nworld to produce the same, things like:<\/p>\n<ul>\n<li>A\/B Tests<\/li>\n<li>Detailed analytics<\/li>\n<li>Prototyping and wireframing tools<\/li>\n<li><a href=\"https:\/\/medium.com\/chat-bots\/usability-heuristics-for-bots-7075132d2c92#.5wcn9dift\">Software\nHeuristics<\/a>\nand best practices<\/li>\n<\/ul>\n<p>The key is that iteration be quick and easy. The faster designers are able to\niterate, the better the bots will be. I\u2019m excited by how much innovation is\nhappening in this space, and looking forward to further innovation; the faster\nand easier it becomes to create a bot, the faster we as designers will be able\nto find our way to best practices.<\/p>\n<hr>\n<blockquote>\n<p>We\u2019ve been there before, and intermediate layers between the platform and the\ndeveloper ultimately produces sub-standard apps and hinders the progress of the\nplatform. \u2014 <a href=\"https:\/\/www.taoeffect.com\/blog\/2010\/04\/steve-jobs-response-on-section-3-3-1\/\">Steve\nJobs<\/a><\/p>\n<\/blockquote>\n<p>Apple was able to <a href=\"http:\/\/daringfireball.net\/2010\/04\/iphone_agreement_bans_flash_compiler\">explicitly forbid which\ntechnologies<\/a>\nwere allowed on the iPhone. When violations of messenger platform etiquette\nhappen, they won\u2019t be because of technology considerations, they\u2019ll be because\nbots fail to implement design best practices. As a result, designers will be a\nkey differentiator between a bot that succeeds and one that fails.<\/p>"},{"title":"Usability Heuristics For Bots","link":"https:\/\/thekevinscott.com\/usability-heuristics-for-bots\/","pubDate":"Tue, 03 May 2016 10:06:00 +0000","guid":"https:\/\/thekevinscott.com\/usability-heuristics-for-bots\/","description":"<p>In 1990, Jakob Nielsen <a href=\"https:\/\/www.nngroup.com\/articles\/ten-usability-heuristics\/\">developed 10 usability\n<\/a><a href=\"https:\/\/www.nngroup.com\/articles\/ten-usability-heuristics\/\">heuristics<\/a>\nfor evaluating user interfaces. These heuristics have stood the test of time,\nproviding designers with a quick and easy way of evaluating the usability of\nsoftware interfaces against a set of universal design principles.<\/p>\n<p>While standards and best practices for building bots will continue to emerge\nover time, for now it\u2019s a little bit Wild West. Are Nielsen\u2019s heuristics still\napplicable to bots? Let\u2019s take a look and see which ones are still relevant;\nthen, let\u2019s run a heuristic evaluation against three popular bots.<\/p>\n<hr>\n<h3 id=\"nielsens-ten-heuristics\">Nielsen\u2019s Ten Heuristics<\/h3>\n<blockquote>\n<ol>\n<li>Visibility of system status\u2014 The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.<\/li>\n<\/ol>\n<\/blockquote>\n<p>The medium of bots is a conversation, and a bot conversation is governed by a\nfew fundamental facts:<\/p>\n<ol>\n<li><strong>Conversations are ephemeral<\/strong>\u2014 Messages are cheap, disposable and become\nstale with time. A message from a day ago is less valuable than a message from a\nfew seconds ago.<\/li>\n<li><strong>Old messages are not up to date<\/strong>\u2014 There\u2019s no guarantee that older\nmessages accurately reflect the current state of the system. The older the\nmessage, the less confident we can be in its relevancy.<\/li>\n<li><strong>Limited real estate<\/strong>\u2014 There\u2019s a hard limit of how many characters are\nvisible at any one time, and a soft limit of how many words a user can\ncomprehend before becoming overwhelmed.<\/li>\n<\/ol>\n<p>So, since messages are ephemeral we should constantly keep the user updated on\nwhat\u2019s going, while also avoiding overwhelming the user with a wall of\ninformation. How do we satisfy both these imperatives?<\/p>\n<p>Here\u2019s what I propose: <strong>The system should allow the user to request\ninformation about what is going on, through appropriate feedback within\nreasonable time.<\/strong> By putting the onus on the user to request system status\nas necessary, we avoid overwhelming the user with potentially redundant\ninformation.<\/p>\n<p>Also, <em>appropriate feedback within reasonable time<\/em>should go without saying.\nBots should be responsive,and if a network request is taking a long time, should\nnot leave the user waiting for a response. Providing a typing indicator, like a\nhuman does, is a delightful touch, and indicating how long a request will take,\nlike \u201chey, still working on this\u2026 give me a few minutes\u201d is just being\nthoughtful.<\/p>\n<blockquote>\n<ol start=\"2\">\n<li>Match between system and the real world\u2014 The system should speak the\nusers\u2019 language, with words, phrases and concepts familiar to the user, rather\nthan system-oriented terms.Follow real-world conventions, making information\nappear in a natural and logical order.<\/li>\n<\/ol>\n<\/blockquote>\n<p>You can\u2019t get closer to speaking users\u2019 language than, well, actually speaking\nusers\u2019 language.<\/p>\n<p>The problem is that understanding language is a really hard problem, and bots\nare going to fall short of being smart enough, at least in the short term.<\/p>\n<blockquote>\n<p>Chatbots, you see, don\u2019t chat very well. Even those built atop the latest tech\nare limited in what they can understand and how well they can respond. For now,\ntalking to a bot is like talking to, well, a machine.That makes conversational\ncommerce feel like a false promise. But maybe the problem isn\u2019t the tech. Maybe\nit\u2019s the promise.\u2014 <a href=\"http:\/\/www.wired.com\/2016\/04\/tech-behind-bots-isnt-good-enough-deliver-promise\/\">Cade Metz,\nWired<\/a><\/p>\n<\/blockquote>\n<p><img src=\"images\/ketchup.gif\" alt=\"Ketchup bot\"><\/p>\n<p>I think the key here is to <strong>know your audience<\/strong>. Some users will appreciate a\ncommand line style interaction style, and others will expect to converse in\nnatural language. Still others might speak in slang or abbreviations. Bots\nshould be built with a solid understanding of the audience they seek to appeal\nto.<\/p>\n<blockquote>\n<ol start=\"3\">\n<li>User control and freedom\u2014 Users often choose system functions by mistake\nand will need a clearly marked \u201cemergency exit\u201d to leave the unwanted state\nwithout having to go through an extended dialogue. Support undo and redo.<\/li>\n<\/ol>\n<\/blockquote>\n<p>Conversations in real life don\u2019t generally support the ability to undo and redo\n(I wish they did), but a conversation with a bot could.<\/p>\n<p>We should assume that <a href=\"https:\/\/en.wikipedia.org\/wiki\/Typographical_error\">fat finger\nsyndrome<\/a> will lead to all\nkinds of typos and misinterpreted messages.Interactions with bots should provide\nan escape hatch, and keep the user aware of valid options during any stage of an\ninteraction.<\/p>\n<p><img src=\"images\/fat-finger.png\" alt=\"Fat Fingers\"><\/p>\n<blockquote>\n<ol start=\"4\">\n<li>Consistency and standards\u2014 Users should not have to wonder whether\ndifferent words, situations, or actions mean the same thing. Follow platform\nconventions.<\/li>\n<\/ol>\n<\/blockquote>\n<p>Platform conventions are still being worked out, but in the meantime we can\ninterpret this heuristic to mean that bots should be <strong>internally consistent<\/strong>;\na bot should stick to a single style of language, <a href=\"https:\/\/chatbotsmagazine.com\/which-is-best-for-you-rule-based-bots-or-ai-bots-298b9106c81d\">whether that\u2019s natural\nlanguage, command line, or something in\nbetween.<\/a><\/p>\n<p>In the case of command line interactions, it\u2019s extra important to distinguish\nbetween keywords and natural language interactions.One technique we\u2019ve used in\nour Emojinary bot is to denote commands as capitalized words: i.e., HELP, or\nNEXT.<\/p>\n<blockquote>\n<ol start=\"5\">\n<li>Error prevention\u2014 Even better than good error messages is a careful\ndesign which prevents a problem from occurring in the first place.Either\neliminate error-prone conditions or check for them and present users with a\nconfirmation option before they commit to the action.<\/li>\n<\/ol>\n<\/blockquote>\n<p>Getting confirmation is imperative in bot interactions. Designers should build\ninteractions with the assumption that errors will happen early and often, given\nthe ambiguity and impreciseness of most human dialogue. <strong>Ask for confirmation\nfrom the user for any critical step in an interaction.<\/strong><\/p>\n<blockquote>\n<ol start=\"6\">\n<li>Recognition rather than recall\u2014 Minimize the user\u2019s memory load by\nmaking objects, actions, and options visible. The user should not have to\nremember information from one part of the dialogue to another. Instructions for\nuse of the system should be visible or easily retrievable whenever appropriate.<\/li>\n<\/ol>\n<\/blockquote>\n<p>Judging from the research we\u2019ve done, users don\u2019t seem to read much, if at\nall.For folks who\u2019ve been building websites for years, this won\u2019t come as a\nsurprise,but it\u2019s a particularly rich irony given that the medium of bot\ncommunication is mostly text.<\/p>\n<p>While building our Emojinary bot, we\u2019ve run usability tests seeking to\nunderstand why users get lost in our onboarding flow. We\u2019ve found that a common\nproblem is users will read the first message we send and then their eyes glaze\nover.They skim the rest.After we ask them to reread the messages in more detail,\ntheir confusion evaporates.<\/p>\n<p>This is a clear example of the bot design failing the users.<strong>If users fail\nto read and comprehend the messages we send, it\u2019s our fault.<\/strong><\/p>\n<p>We need to, again, satisfy two competing goals: We need to avoid overwhelming\nthe user with a wall of text, while also providing clues as to what her options\nare at any given point in the interaction.<\/p>\n<p><img src=\".\/images\/structured-messages.png\" alt=\"Structured Messages\"><\/p>\n<p>Facebook\u2019s <a href=\"https:\/\/developers.facebook.com\/docs\/messenger-platform\/quickstart\">structured\nmessages<\/a>\nare a great solution to this problem. They remove the ambiguity from a\nparticular interaction by providing a discrete set of options for the user to\nchoose from.Over-reliance on structured messages can feel contrived, however, as\nI explore in the next section when evaluating the bots.<\/p>\n<blockquote>\n<ol start=\"7\">\n<li>Flexibility and efficiency of use\u2014 Accelerators-unseen by the novice\nuser-may often speed up the interaction for the expert user such that the system\ncan cater to both inexperienced and experienced users. Allow users to tailor\nfrequent actions.<\/li>\n<\/ol>\n<\/blockquote>\n<p>Lots of Slack bots can be invoked with something like:<\/p>\n<pre><code>\/giphy hotdog\n<\/code><\/pre>\n<p>Which loads a relevant gif into the channel.<\/p>\n<p><img src=\"images\/hotdog.gif\" alt=\"Hot Dogs\">\n<capt>\/giphy hotdog<\/capt><\/p>\n<p>Bots are supremely well positioned to provide these types of invisible\naccelerators to power users.While one user might say \u201cHey, Giphybot, can you\nlook for a picture of hotdogs?\u201d, another user cuts right to the chase with a\ncommand.<\/p>\n<p>An open question for me is: what\u2019s the best way to provide affordance for\ndiscovering these power moves?How can we teach users to become power users\nwithout resorting to a clunky help menu?<\/p>\n<blockquote>\n<ol start=\"8\">\n<li>Aesthetic and minimalist design\u2014 Dialogues should not contain\ninformation which is irrelevant or rarely needed.Every extra unit of information\nin a dialogue competes with the relevant units of information and diminishes\ntheir relative visibility.<\/li>\n<\/ol>\n<\/blockquote>\n<p><img src=\"images\/mullet.gif\" alt=\"Mullet\">\n<capt>Business in the front, party in the back<\/capt><\/p>\n<p>Judging from the interactions I\u2019ve seen with my two bots, first time users\ninvariably launch into a very human-like interaction:<\/p>\n<pre><code>\u201cHey there. How\u2019s it going?\u201d\n<\/code><\/pre>\n<p>How should bots respond to queries unrelated to their core competency?Should\nthey be all business, or engage in some frivolity? If I\u2019m a bot selling shoes,\nand you ask me about my life, should I engage in some small talk? Should I\ngently nudge you back to shoes? If I\u2019m nudging you, how hard do I nudge? If I\nengage in some banter, how much banter should I support?<\/p>\n<p>These questions get at the heart of what a bot\u2019s personality is and should be,\nand crafting compelling personalities will be key differentiators between\nsuccessful bots and the rest of the pack.<\/p>\n<p><img src=\"images\/zappos.jpeg\" alt=\"Zappos Bot\">\n<capt>ZapposBot wants to sell you these shoes<\/capt><\/p>\n<p>Imagine you\u2019re shopping on Zappos for a pair of shoes. The brand voice of Zappos\nis friendly, chatty, super helpful. You wouldn\u2019t expect your conversation to get\nright down to business; I want to get to know ZapposBot a little bit, find out\nwhat makes it tick. Conversely, if I\u2019m talking to some LawyerBot, I expect a\nmore professional interaction, because you\u2019re probably billing me by the message\n(kidding!)<\/p>\n<p>There\u2019s a distinction between content (the <em>information<\/em>), and the medium (the\n<a href=\"https:\/\/chatbotsmagazine.com\/designing-a-chatbots-personality-52dcf1f4df7d\"><em>personality<\/em><\/a>).The\ncontent can stay minimal, but the medium doesn\u2019t have to.If I ask you how your\nday is going, I expect an answer.Bots that fail to oblige users in this way\ninevitably let them down and lead to a subpar experience.<\/p>\n<blockquote>\n<ol start=\"9\">\n<li>Help users recognize, diagnose, and recover from errors\u2014 Error messages\nshould be expressed in plain language (no codes), precisely indicate the\nproblem, and constructively suggest a solution.<\/li>\n<\/ol>\n<\/blockquote>\n<p>Still applicable. If your bot barfs out a \u201c500 error\u201d you\u2019re doing it wrong.<\/p>\n<blockquote>\n<ol start=\"10\">\n<li>Help and documentation\u2014 Even though it is better if the system can be\nused without documentation, it may be necessary to provide help and\ndocumentation. Any such information should be easy to search, focused on the\nuser\u2019s task, list concrete steps to be carried out, and not be too large.<\/li>\n<\/ol>\n<\/blockquote>\n<p>Also still applicable. Help and documentation should be accessible via the bot\nitself.I suspect a convention will be established over time regarding the best\nway to fetch bot documentation; in the meantime, make it quick and easy to get\nhelp and documentation.<\/p>\n<hr>\n<h4 id=\"the-relevant-heuristics\">The Relevant Heuristics<\/h4>\n<p>A few heuristics go together: <strong>Visibility of System Status<\/strong> &amp; <strong>Recognition\nRather than Recall<\/strong> speak to the difficulty of balancing too much information\nwith providing enough information for the user to make informed choices.<strong>User\nControl and Freedom<\/strong> &amp; <strong>Error Prevention<\/strong> both prescribe the same solutions,\nspecifically demanding confirmation for critical steps and providing escape\nhatches. Finally, <strong>Match between system and real world<\/strong> &amp; <strong>Help users\nrecognize, diagnose and recover from errors<\/strong> both speak to the necessity of\nconsistency in language.<\/p>\n<p>This leaves us with six relevant heuristics:<\/p>\n<ol>\n<li><strong>Visibility of System Status &amp; Recognition rather than recall<\/strong>\u2014 Keep the user\nappraised of the system and their options at critical points, and give the user\noptions to request additional information at any point.<\/li>\n<li><strong>Match between system and real world &amp; Help users recognize, diagnose and\nrecover from errors<\/strong>\u2014 Know your audience. Don\u2019t switch communication styles.<\/li>\n<li><strong>User control and freedom &amp; Error Prevention<\/strong>\u2014 Get confirmation from the\nuser at critical points, and provide escape hatches for multi step interactions.<\/li>\n<li><strong>Flexibility and efficiency of use<\/strong>\u2014 Provide accelerators to power users.<\/li>\n<li><strong>Consistency and standards &amp; Aesthetic and minimalist design<\/strong>&amp;mdash Keep the\ncommunication style and personality \/ voice consistent.<\/li>\n<li><strong>Help and documentation<\/strong>\u2014 Provide help within the bot.<\/li>\n<\/ol>\n<p>Let\u2019s take a look at some popular bots and see how they hold up.<\/p>\n<hr>\n<h3 id=\"the-contenders\">The Contenders<\/h3>\n<p><img src=\"images\/contenders.jpeg\" alt=\"The contenders\"><\/p>\n<p>I\u2019m going to look at the following three bots since they were some of the first\nout of the gate on Facebook\u2019s Messenger platform:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.messenger.com\/t\/hiponcho\/\">Poncho<\/a><\/li>\n<li><a href=\"https:\/\/www.messenger.com\/t\/cnn\/\">CNN<\/a><\/li>\n<li><a href=\"https:\/\/www.messenger.com\/t\/1800flowers\/\">1\u2013800-Flowers<\/a><\/li>\n<\/ul>\n<h4 id=\"poncho\">Poncho<\/h4>\n<p>Poncho starts out strong by providing a hint as to what to do: talk about the\nweather.<\/p>\n<p><img src=\"images\/poncho1.png\" alt=\"Image of Poncho chat\"><\/p>\n<p>Ok, sounds good, let\u2019s talk about it, Poncho!<\/p>\n<p><img src=\"images\/poncho2.png\" alt=\"Image of Poncho chat\"><\/p>\n<p>This is an awesome interaction so far. Let\u2019s recap:<\/p>\n<ul>\n<li>There\u2019s no ambiguity about what to do or say<\/li>\n<li>Poncho has confirmed the information provided (my location)<\/li>\n<li>Poncho is giving me the ability to \u201cundo\u201d; aka, amend the information I\u2019ve\nprovided.<\/li>\n<\/ul>\n<p>Let\u2019s see what it looks like if I say \u201cNope\u201d:<\/p>\n<p><img src=\"images\/poncho3.png\" alt=\"Image of Poncho chat\"><\/p>\n<p>Decent, though I\u2019m not a fan of Poncho running me through identical language\nhere (Is that the right city?). Up to this point, I forgot I was conversing with\na bot; now my innocence is lost.<\/p>\n<p>Anyways, let\u2019s keep going and say \u201cYea\u201d.<\/p>\n<p><img src=\"images\/poncho4.png\" alt=\"Image of Poncho chat\"><\/p>\n<p>I get the weather along with the next CTA; I don\u2019t want to set any notifications\nfor now, but thanks for asking, Poncho.<\/p>\n<p><img src=\"images\/poncho5.png\" alt=\"Image of Poncho chat\"><\/p>\n<p>This is just a fantastic job of providing inline documentation, and Poncho even\nprompted me to ask for this help. Awesome. Also a great example of structured\nmessages in Facebook to remove any potential for ambiguity.<\/p>\n<p>Let\u2019s see if we can get at some system visibility:<\/p>\n<p><img src=\"images\/poncho6.png\" alt=\"Image of Poncho chat\"><\/p>\n<p>Succinct and clear.<\/p>\n<p>Poncho also does a great job with banter, and when you reach the limits he pulls\nyou back:<\/p>\n<p><img src=\"images\/poncho7.png\" alt=\"Image of Poncho chat\"><\/p>\n<h4 id=\"verdict\">Verdict<\/h4>\n<ol>\n<li><strong>Visibility of System Status &amp; Recognition rather than recall<\/strong>\u2014 Responds to\nqueries regarding system status. Provides structured messages to guide the user.<\/li>\n<li><strong>Match between system and real world &amp; Help users recognize, diagnose and\nrecover from errors<\/strong>\u2014Poncho uses vocabulary I\u2019m familiar with.<\/li>\n<li><strong>User control and freedom &amp; Error Prevention<\/strong>\u2014 I made an \u201cerror\u201d entering my\nlocation and Poncho let me fix it. Awesome.<\/li>\n<li><strong>Flexibility and efficiency of use<\/strong> \u2014 If I come back, Poncho will remember my\nlocation, saving me a few keystrokes.<\/li>\n<li><strong>Consistency and standards &amp; Aesthetic and minimalist design<\/strong>\u2014 Poncho is\npretty straight to the point, and the structured messages are clear and concise.\nPoncho is also a delight to talk to, most of the time.<\/li>\n<li><strong>Help and documentation<\/strong>\u2014 Fantastic example of inline documentation done\nright.<\/li>\n<\/ol>\n<p>Poncho knocks it out of the park.<\/p>\n<h4 id=\"cnn\">CNN<\/h4>\n<p>Next up, CNN.<\/p>\n<p><img src=\"images\/cnn1.png\" alt=\"Image of CNN chat\"><\/p>\n<p>CNN makes good use of structured messages. However, this already feels less like\na conversation and more of a command line.<\/p>\n<p><img src=\"images\/cnn2.png\" alt=\"Image of CNN chat\"><\/p>\n<p>While the CNN bot is definitely not something I\u2019d want to have a beer with, it\nworks in its favor when it comes to guiding my actions; I\u2019m more inclined to\ntreat it like a command line and less like a pal.<\/p>\n<p><img src=\"images\/cnn3.png\" alt=\"Image of CNN chat\"><\/p>\n<p>Other than giving me stories, it\u2019s unclear what I can do, and seems to rely\noverly much on accepting interactions via structured message. \u201cAsk cnn\u201d doesn\u2019t\nreally do too much.<\/p>\n<p><img src=\"images\/cnn4.png\" alt=\"Image of CNN chat\"><\/p>\n<p>This bot gets the job done, but is a tad underwhelming. I\u2019d rather just use the\nwebsite.<\/p>\n<h4 id=\"verdict-1\">Verdict<\/h4>\n<ol>\n<li><strong>Visibility of System Status &amp; Recognition rather than recall<\/strong>\u2014 Structured\nmessages are very clear, but fails to accommodate regular messaging; unclear\nCTAs at times.<\/li>\n<li><strong>Match between system and real world &amp; Help users recognize, diagnose and\nrecover from errors<\/strong>\u2014 Very little opportunity for interacting outside of\nstructured messages.<\/li>\n<li><strong>User control and freedom &amp; Error Prevention<\/strong>\u2014 Not too many opportunities for\nerrors here.<\/li>\n<li><strong>Flexibility and efficiency of use<\/strong> \u2014 CNN bot all but asks you to treat it\nlike a command line, though it seems to not differentiate between power users\nand novices.<\/li>\n<li><strong>Consistency and standards &amp; Aesthetic and minimalist design<\/strong>\u2014 Consistent,\nbut no voice here; it\u2019s like I\u2019m looking at an RSS feed in Messenger.<\/li>\n<li><strong>Help and documentation<\/strong>\u2014 Provides inline help, though it seems like some\nstuff is missing: for instance, what should \u201cask cnn\u201d do?<\/li>\n<\/ol>\n<h4 id=\"1800-flowers\">1\u2013800-Flowers<\/h4>\n<p><img src=\"images\/flowers1.png\" alt=\"Image of 1-800-Flowers chat\"><\/p>\n<p>Leads with a structured message. Doesn\u2019t really make me want to have a\nconversation with this thing. Let\u2019s see what happens if I hit \u201cTalk to support\u201d:<\/p>\n<p><img src=\"images\/flowers2.png\" alt=\"Image of 1-800-Flowers chat\"><\/p>\n<p>Oh crap, I don\u2019t want to talk to a human. Cancel, cancel!<\/p>\n<p><img src=\"images\/flowers3.png\" alt=\"Image of 1-800-Flowers chat\"><\/p>\n<p>Coincidentally, that is an awesome example of undo in action right there.<\/p>\n<p><img src=\"images\/flowers4.png\" alt=\"Image of 1-800-Flowers chat\"><\/p>\n<p>Great use of confirmation right here.<\/p>\n<p><img src=\"images\/flowers5.png\" alt=\"Image of 1-800-Flowers chat\"><\/p>\n<p>Not a fan of this interaction here. They\u2019re forcing the user into using\nstructured messages, which feels limiting. However, it is nice that they\nprompted me to ask \u2018help\u2019. Let\u2019s ask for help.<\/p>\n<p><img src=\"images\/flowers6.png\" alt=\"Image of 1-800-Flowers chat\"><\/p>\n<p>I let the bot sit fallow for a while, and then this happened:<\/p>\n<p><img src=\"images\/flowers7.png\" alt=\"Image of 1-800-Flowers chat\"><\/p>\n<p>That\u2019s a really nice touch!<\/p>\n<h4 id=\"verdict-2\">Verdict<\/h4>\n<ol>\n<li><strong>Visibility of System Status &amp; Recognition rather than recall<\/strong>\u2014 Great job\nusing structured messages to guide a user through the interaction.<\/li>\n<li><strong>Match between system and real world &amp; Help users recognize, diagnose and\nrecover from errors<\/strong>\u2014 Fails, especially at the end: when I tried to enter an\nexact date, it choked.<\/li>\n<li><strong>User control and freedom &amp; Error Prevention<\/strong>\u2014 Did a great job of allowing me\nto go back and change my order.<\/li>\n<li><strong>Flexibility and efficiency of use<\/strong> \u2014 Nice use of humans to fill in the gaps;\nnovice users would appreciate the human touch, while power users can get right\nto ordering.<\/li>\n<li><strong>Consistency and standards &amp; Aesthetic and minimalist design<\/strong>\u2014 It feels like\nI\u2019m interacting with a webpage inside Messenger.<\/li>\n<li><strong>Help and documentation<\/strong>\u2014 Great inline help.<\/li>\n<\/ol>\n<hr>\n<h3 id=\"conclusion\">Conclusion<\/h3>\n<p>While I think there\u2019s still plenty of opportunity to figure out some\nbot-specific design heuristics, Nielsen\u2019s hold up pretty well. We can see that\nall three bots we looked at do a great job of providing affordance around error\nprevention and recovery, and all three provide inline help right in the bot.\nThese go a long way towards grounding the user in the experience.<\/p>\n<p>Crafting a compelling, delightful bot experience is going to be a key\ndifferentiator between the bots that see adoption and those that don\u2019t. Nielsen\u2019s\nheuristics continue to provide a great benchmark to point us in the right\ndirection.<\/p>"},{"title":"Programming A Bot With Facebook Messenger","link":"https:\/\/thekevinscott.com\/programming-a-bot-with-facebook-messenger\/","pubDate":"Sat, 16 Apr 2016 10:06:00 +0000","guid":"https:\/\/thekevinscott.com\/programming-a-bot-with-facebook-messenger\/","description":"<p>If you watched the <a href=\"https:\/\/www.fbf8.com\/\">F8 conference<\/a> this week you\u2019ll know\nthat Facebook introduced new APIs for messaging through Facebook Messenger. You\nmight be asking why you should care. Well I\u2019ll tell you why you should care:\nbots are the <a href=\"https:\/\/medium.com\/chris-messina\/2016-will-be-the-year-of-conversational-commerce-1586e85e3991#.ybruqcfxt\">hot new\nthing<\/a>\nand <a href=\"http:\/\/www.telegraph.co.uk\/technology\/2016\/03\/31\/the-end-of-apps-is-here-long-live-chat-bots\/\">apps are\nextinct<\/a>\nso get on the bot train or get left in the station.<\/p>\n<p>Hyperbole aside (apps are obviously not dying) bots are legitimately cool and\npromise to revolutionize how we interact with services. And it\u2019s a piece of cake\nto get one up and running. Let\u2019s create a simple WeatherBot that tells you the\nweather in some given location, because you\u2019re a busy person who doesn\u2019t have\nthe 2 seconds it takes to Google that information.<\/p>\n<p>Full code is up on\n<a href=\"https:\/\/github.com\/thekevinscott\/getting-started-with-facebook-bots\">Github<\/a>,\nand a <a href=\"http:\/\/m.me\/248054885544417\">live version of the bot is here<\/a>.<\/p>\n<p><strong>Updated December 13th to reflect latest Facebook updates<\/strong><\/p>\n<h4 id=\"getting-started\">Getting Started<\/h4>\n<p>Clone the repo to get started:<\/p>\n<pre><code>git clone\ncd getting-started-with-facebook-bots\nnpm install\n<\/code><\/pre>\n<p><em>Included is a simple express server, ngrok (a tunneling package that opens our\nserver to the internet), and a few other bells and whistles to ease our lives.<\/em><\/p>\n<p>Try running the server:<\/p>\n<pre><code>npm run dev\n<\/code><\/pre>\n<p>You\u2019ll see \u201cHello world\u201d at <a href=\"http:\/\/localhost:5000\/\">http:\/\/localhost:5000<\/a>. Do\nyou see it? Gnarly, you\u2019re a pro! Keep going.<\/p>\n<h4 id=\"ngrok\">Ngrok<\/h4>\n<p>In a new terminal window, run ngrok:<\/p>\n<p>You\u2019ll see something like:<\/p>\n<p><img src=\"ngrok.png\" alt=\"Ngrok\"><\/p>\n<p>What this means is that any requests to the Forwarding URLs will hit your\nlocally running server. Visit the Forwarding URL and you\u2019ll see \u201cHello world\u201d.\nYour local web server is now visible to the entire internet.<\/p>\n<p><em>Make a note of the https version of the Forwarding URL; we\u2019ll need it soon.<\/em><\/p>\n<p>Next, fire up a text editor and open up index.js. Change the first line:<\/p>\n<pre><code>var VERIFY_TOKEN = '&lt;YOUR_VERIFICATION_TOKEN&gt;';\n<\/code><\/pre>\n<p>To a super secret token that only you know. We\u2019ll use this token to authenticate\nwith Facebook in the next step.<\/p>\n<hr>\n<h4 id=\"facebook\">Facebook<\/h4>\n<p>Now it\u2019s time to set up our bot on Facebook. Facebook <a href=\"https:\/\/developers.facebook.com\/docs\/messenger-platform\/quickstart\">has great\ndocumentation<\/a>\nfor creating a bot from scratch, which I will summarize here.<\/p>\n<p>First, create a Facebook <a href=\"https:\/\/www.facebook.com\/pages\/create\">page<\/a> and\n<a href=\"https:\/\/developers.facebook.com\/apps\/\">app<\/a>. Then, in your app settings, head\nto the Messenger tab and look for the <em>Token Generation<\/em> section. Select the\npage you created, and Facebook will generate a unique token for you. Copy this\nto line 2 of index.js:<\/p>\n<pre><code>var PAGE_ACCESS_TOKEN = '&lt;YOUR_PAGE_ACCESS_TOKEN&gt;';\n<\/code><\/pre>\n<p>Next, below in the *Webhooks *section, click on Edit events.<\/p>\n<p>In the modal that appears:<\/p>\n<ol>\n<li>For Callback URL, enter the Forwarding URL you got from ngrok (the https\nversion) with an endpoint of \/webhook. So, for instance, I used:\n<strong><a href=\"https:\/\/5fe0bf6f.ngrok.io\/webhook\">https:\/\/5fe0bf6f.ngrok.io\/webhook<\/a><\/strong><\/li>\n<li>For Verify Token, enter the custom token you invented in the first line of your\nindex.js file.<\/li>\n<li>For Subscription Fields, select everything. After you Verify and Save, you\u2019ll\nsee a request appear in ngrok and Facebook should indicate verification was\nsuccessful.<\/li>\n<\/ol>\n<p>Finally, select the page you created above; this will subscribe your app to that\npage, which means that when someone messages the bot from the page, you\u2019ll get\nthe notification. HOW COOL IS THAT? Go try it! It\u2019s cool.<\/p>\n<p>When you send a message to WeatherBot, you\u2019ll see WeatherBot echo it back. The\nserver will also indicate that it received the message.<\/p>\n<p>The relevant code that\u2019s handling the message receiving is here:<\/p>\n<pre><code>\/\/ respond to post calls from facebook\napp.post('\/webhook\/', function (req, res) {\nvar data = req.body;\n\/\/ Make sure this is a page subscription\nif (data.object === 'page') {\n\/\/ Iterate over each entry - there may be multiple if batched\ndata.entry.forEach(function(entry) {\nvar pageID = entry.id;\nvar timeOfEvent = entry.time;\n\/\/ Iterate over each messaging event\nentry.messaging.forEach(function(event) {\nconsole.log(&quot;Success!&quot;, event);\n});\n});\n\/\/ Assume all went well.\n\/\/\n\/\/ You must send back a 200, within 20 seconds, to let us know\n\/\/ you've successfully received the callback. Otherwise, the request\n\/\/ will time out and we will keep trying to resend.\nres.sendStatus(200);\n}\n});\n<\/code><\/pre>\n<p>Sending a text message looks like this:<\/p>\n<pre><code>var request = require('request');\nfunction callSendAPI(messageData) {\nrequest({\nuri: '\n,\nqs: { access_token: PAGE_ACCESS_TOKEN },\nmethod: 'POST',\njson: messageData\n}, function (error, response, body) {\nif (!error &amp;&amp; response.statusCode == 200) {\nvar recipientId = body.recipient_id;\nvar messageId = body.message_id;\nconsole.log(&quot;Successfully sent generic message with id %s to recipient %s&quot;,\nmessageId, recipientId);\n} else {\nconsole.error(&quot;Unable to send message.&quot;);\n\/\/console.error(response);\nconsole.error(error);\n}\n});\n}\n<\/code><\/pre>\n<p>Great! We\u2019re off and running.<\/p>\n<hr>\n<p>At this point, we\u2019ve got ourselves a functioning bot. However, he \/ she \/ it\ndoesn\u2019t really do anything yet, unless you get a kick out of people echoing your\nwords back to you, in which case, our work here is done!<\/p>\n<p>For the rest of us, let\u2019s teach our bot some tricks to make it do something\nuseful.<\/p>\n<h4 id=\"weather\">Weather<\/h4>\n<p>The bot should respond with the current weather in a particular location. We\u2019ll\nuse <a href=\"https:\/\/developer.yahoo.com\/weather\/\">Yahoo weather<\/a> as our API. The\nendpoint we\u2019ll be using is this gnarly-looking URL:<\/p>\n<pre><code>var weatherEndpoint = \u2018https:\/\/query.yahooapis.com\/v1\/public\/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20in%20(select%20woeid%20from%20geo.places(1)%20where%20text%3D%22' + location + \u2018%22)&amp;format=json&amp;env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys\u2019;\n<\/code><\/pre>\n<p>Replace the POST response (the \/webhook listener) with this:<\/p>\n<pre><code>\/\/ respond to post calls from facebook\napp.post('\/webhook\/', function (req, res) {\nvar data = req.body;\n\/\/ Make sure this is a page subscription\nif (data.object === 'page') {\n\/\/ Iterate over each entry - there may be multiple if batched\ndata.entry.forEach(function(entry) {\nvar pageID = entry.id;\nvar timeOfEvent = entry.time;\n\/\/ Iterate over each messaging event\nentry.messaging.forEach(function(event) {\nif (event.message) {\nreceivedMessage(event);\n} else {\nconsole.log(&quot;Webhook received unknown event: &quot;, event);\n}\n});\n});\n\/\/ Assume all went well.\n\/\/\n\/\/ You must send back a 200, within 20 seconds, to let us know\n\/\/ you've successfully received the callback. Otherwise, the request\n\/\/ will time out and we will keep trying to resend.\nres.sendStatus(200);\n}\n});\nfunction receivedMessage(event) {\nconsole.log('incoming event', event);\nvar senderID = event.sender.id;\nvar recipientID = event.recipient.id;\nvar timeOfMessage = event.timestamp;\nvar message = event.message;\nconsole.log(JSON.stringify(message));\nvar messageId = message.mid;\nvar messageText = message.text;\nvar messageAttachments = message.attachments;\nif (messageText) {\nsendTextMessage(senderID, messageText);\n}\n}\nfunction getWeather(callback, location) {\nvar weatherEndpoint = '\n+ location + '%22)&amp;format=json&amp;env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys';\nconsole.log(weatherEndpoint);\nrequest({\nurl: weatherEndpoint,\njson: true\n}, function(error, response, body) {\ntry {\nvar condition = body.query.results.channel.item.condition;\ncallback(&quot;Today is &quot; + condition.temp + &quot; and &quot; + condition.text + &quot; in &quot; + location);\n} catch(err) {\nconsole.error('error caught', err);\ncallback(&quot;There was an error&quot;);\n}\n});\n}\nfunction sendTextMessage(recipientId, messageText) {\nconsole.log('incoming message text', messageText);\ngetWeather(function(message) {\nvar messageData = {\nrecipient: {\nid: recipientId\n},\nmessage: {\ntext: message\n}\n};\ncallSendAPI(messageData);\n}, messageText);\n}\n<\/code><\/pre>\n<p>This code initiates a call to Yahoo\u2019s servers with the incoming message, and\nreturns the current weather for that location.<\/p>\n<hr>\n<h4 id=\"conclusion\">Conclusion<\/h4>\n<p>Gosh, they grow up so fast, don\u2019t they? Feels like just yesterday our baby bot\nwas just a few twinkling lines of code in our eyes.<\/p>\n<p>At this point you have all the tools you need to build an army of bots to do\nyour bidding.<\/p>"},{"title":"Javascript Internationalization","link":"https:\/\/thekevinscott.com\/javascript-internationalization\/","pubDate":"Thu, 23 Jul 2015 07:00:00 +0000","guid":"https:\/\/thekevinscott.com\/javascript-internationalization\/","description":"<p>I recently did some research on Javascript internationalization for a (mostly)\nclient-side Javascript app. Here are my thoughts and findings.<\/p>\n<h3 id=\"some-definitions\">Some definitions<\/h3>\n<p><strong>i18n<\/strong> (<em>internationalization<\/em>) \u2014 The process by which software is made\nlanguage and locale neutral. i18n stands for 18 characters in the word\n<em>internationalization<\/em>.<\/p>\n<p><strong>l10n<\/strong> (<em>localization<\/em>) \u2014 The processes to localize software into a specific\nlocale (includes translations, rules about numbers, currencies, dates, and\nmore).<\/p>\n<h2 id=\"high-level-considerations\">High-Level Considerations<\/h2>\n<p>Besides language translation, there\u2019s other localization requirements to be\nconsidered.<\/p>\n<ul>\n<li><strong>Dates<\/strong> \u2014 Date formats change across cultures. For example, 10\/4\/15 means\nOctober 4th in the US, and April 10th in the UK.<\/li>\n<li><strong>Times<\/strong> \u2014 Different locales require either a 24-hour clock or 12-hour clock.\nAlso, some locales use different notations, like 5h10 in French.<\/li>\n<li><strong>Formatting of numbers<\/strong> \u2014 Different locales use different digits to represent\nnumbers. So, 3,025.23 in English would be 3.025,23 in Greman, and 3 025,23 in\nFrench.<\/li>\n<li><strong>Images<\/strong> \u2014 If you have images with text, you need to make sure to provide\nversions for each locale.<\/li>\n<li><strong>UI Spacing<\/strong> \u2014 You need to provide enough space in the UI to handle expanded\nlengths of words. IBM has provided <a href=\"http:\/\/www-01.ibm.com\/software\/globalization\/guidelines\/a3.html\">design\nguidelines<\/a>\nthat specify an additional 200% space for short words; The W3 provides an\nexample of a translation for <a href=\"http:\/\/www.w3.org\/International\/articles\/article-text-size.en\">Flickr requiring an additional 300% in\nItalian<\/a>.<\/li>\n<li><strong>Text Sorting<\/strong> \u2014 Text sorting can vary by language. For instance, German has\ntwo types of sort order, <a href=\"https:\/\/hacks.mozilla.org\/2014\/12\/introducing-the-javascript-internationalization-api\/\">phonebook and\ndictionary<\/a>,\nwhich determine whether to sort by sounds (umlauted vowels become to character\npairs: \u00e4 \u2192 ae, \u00f6 \u2192 oe, \u00fc \u2192 ue.) or by character order.<\/li>\n<li><strong>Punctuation<\/strong> \u2014 Different languages use different punctuation. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Internationalization_and_localization\">For\ninstance<\/a>,\ndouble quotes in English (\u201c \u201c) are represented as guillemets in French (\u00ab \u00bb).<\/li>\n<li><strong>Keyboard shortcuts<\/strong> \u2014 If you have hotkeys that map to English words, these\nshould be updated with a mapping for each locale.<\/li>\n<li><strong>Outbound links<\/strong> \u2014 External links to documentation will need to take language\ninto account.<\/li>\n<li><strong>Accessibility<\/strong> \u2014 If the software offers accessibility options, those will\nneed to take the locales into consideration.<\/li>\n<\/ul>\n<p>There\u2019s a few more that I don\u2019t need to handle for this particular project:<\/p>\n<ul>\n<li><strong>Currencies<\/strong><\/li>\n<li><strong>Addresses<\/strong> \u2014 including zipcodes<\/li>\n<li><strong>Phone numbers<\/strong><\/li>\n<li><strong>Validation<\/strong> \u2014 Luckily, we have no data input fields that would require\nlocale-specific validation (like number inputs or date\/time inputs).<\/li>\n<\/ul>\n<h2 id=\"technical-considerations\">Technical Considerations<\/h2>\n<h3 id=\"keys\">Keys<\/h3>\n<p>When translating strings, there\u2019s no consensus on <a href=\"https:\/\/stackoverflow.com\/questions\/10654056\/best-practice-for-key-values-in-translation-files\">how to specify\nkeys<\/a>\nused for the strings. Three main strategies are used: English strings,\ndescriptive keys, and object keys.<\/p>\n<h4 id=\"english-strings\">English Strings<\/h4>\n<p>Using a string as a key might look like this:<\/p>\n<pre><code>\/\/ returns &quot;Welcome&quot; in English,\n\/\/ and &quot;Willkommen&quot; in German\n_(&quot;Welcome&quot;)\n<\/code><\/pre>\n<p>Which would return the given string in English, and the translation in another\nlanguage.<\/p>\n<p>This is the format that both <a href=\"https:\/\/www.gnu.org\/software\/gettext\/\">gettext<\/a>\n(Linux\u2019s i18n implementation) and\n<a href=\"https:\/\/developer.apple.com\/library\/mac\/documentation\/Darwin\/Reference\/ManPages\/man1\/genstrings.1.html\">genstrings<\/a>\n(Apple\u2019s i18n tool) use. Using English strings have a number of clear benefits:<\/p>\n<ul>\n<li>It directly shows the meaning of the text.<\/li>\n<li>If a translation is unavailable, it\u2019s possible to fall back to the given string.<\/li>\n<li>Given that this strategy is the closest thing to an i18n key standard, if\nbrowsers were to implement an i18n translation standard in the future, this will\nprobably be the strategy.<\/li>\n<\/ul>\n<p>However, there are drawbacks:<\/p>\n<ul>\n<li>Using English as the default can lead to conflicting keys in other languages.\nFor instance, an English word \u201cEmail\u201d might require two different texts in\nFrench, <a href=\"https:\/\/stackoverflow.com\/questions\/10654056\/best-practice-for-key-values-in-translation-files\">\u201cE-Mail\u201d or \u201cEnvoyer un\ne-mail\u201d.<\/a>.<\/li>\n<li>If an English translation changes, every translation file\u2019s keys needs to be\nupdated (though, presumbly, if an English translation changes, every other\nlanguage will have to change as well).<\/li>\n<li>For longer texts, specifying English strings keys could become verbose,\nparticularly for paragraph-length text.<\/li>\n<\/ul>\n<h4 id=\"descriptive-keys\">Descriptive keys<\/h4>\n<p>Using descriptive keys for each translation is another oft-used solution that\nwould look like this:<\/p>\n<pre><code>\/\/ returns &quot;Welcome&quot; in English,\n\/\/ and &quot;Willkommen&quot; in German\n_(&quot;WELCOME_MESSAGE&quot;)\n<\/code><\/pre>\n<p>This method solves many of the drawbacks of using strings (handling homonyms,\nchanging English translations without updating all language files, and less\nverbose).<\/p>\n<p>However, using abstract keys comes with its own set of disadvantages:<\/p>\n<ul>\n<li>It\u2019s not immediately clear what a given string means, so metadata or comments\ndictating the purpose and placement of strings would be necessary.<\/li>\n<li>No fallback is possible if a translation doesn\u2019t exist.<\/li>\n<li>Collisions in key names could happen much more frequently.<\/li>\n<\/ul>\n<h4 id=\"objects\">Objects<\/h4>\n<p>Finally, we could use objects which is effectively a more advanced key\nstructures:<\/p>\n<pre><code>\/\/ returns &quot;Welcome&quot; in English,\n\/\/ and &quot;Willkommen&quot; in German\n_(&quot;messages.welcome&quot;)\n<\/code><\/pre>\n<p>The main benefit here would be to keep things better organized by namespacing\nvarious texts. This is the strategy that <a href=\"http:\/\/guides.rubyonrails.org\/i18n.html\">Rails uses for their i18n\ntool<\/a>. It would also avoid the issue of\ncollisions highlighted above with a flat JSON object.<\/p>\n<h3 id=\"passing-arguments\">Passing arguments<\/h3>\n<p>In many languages, <a href=\"https:\/\/blogs.oracle.com\/userassistance\/entry\/keeping_it_simple_yet_effective_facebooks_i18n_best_practices\">word order can\nchange<\/a>.\nTherefore, it\u2019s important that translations maintain the ability to change word\norder in sentences. This means string concatenations should be avoided:<\/p>\n<pre><code>\/\/ Bad!\n_(&quot;File moved to &quot;) + folder_name + _(&quot;a few minutes ago&quot;)\n\/\/ Good!\n_(&quot;File moved to % a few minutes ago&quot;, %s)\n<\/code><\/pre>\n<p>Most libraries accept arguments in order or as named parameters which allows for\nflexible input. This allows the developer to decide which method of input to\nuse, dependinding on whether verbosity or clarity is desired.<\/p>\n<pre><code>\/\/ Passing arguments in order\n_('The first letters in the alphabet: %s %s %s', 'a', 'b', 'c')\n\/\/ Passing names arguments\n_('Welcome to %(version), %(user_name)', { version: 'Awesome Software', user_name: 'admin' })\n<\/code><\/pre>\n<h3 id=\"plurals\">Plurals<\/h3>\n<p>Another problem between different locales <a href=\"http:\/\/unicode.org\/repos\/cldr-tmp\/trunk\/diff\/supplemental\/language_plural_rules.html\">concerns\nplurals<\/a>.<\/p>\n<p>Different languages have different rules for plurals. Polish, for instance, <a href=\"http:\/\/alistapart.com\/article\/pluralization-for-javascript\">has\nfour plural forms<\/a>:<\/p>\n<blockquote>\n<p><em>A plural rule defines a plural form using a formula that includes a counter. A\ncounter is the number of items you\u2019re trying to pluralize. Say we\u2019re working\nwith \u201c2 rabbits.\u201d The number before the word \u201crabbits\u201d is the counter. In this\ncase, it has the value 2. Now, if we take the English language as an example, it\nhas two plural forms: singular and plural. Therefore, our rules look like this:<\/em><\/p>\n<\/blockquote>\n<blockquote>\n<p><em>If the counter has the integer value of 1, use the singular: \u201crabbit.\u201dIf the\ncounter has a value that is not equal to 1, use the plural: \u201crabbits.\u201d<\/em><\/p>\n<\/blockquote>\n<blockquote>\n<p><em>However, the same isn\u2019t true in Polish, where the same word \u2014 \u201crabbit,\u201d or\n\u201ckr\u00f3lik\u201d \u2014 can take more than two forms:<\/em><\/p>\n<\/blockquote>\n<blockquote>\n<p><em>If the counter has the integer value of 1, use \u201ckr\u00f3lik.\u201dIf the counter has a\nvalue that ends in 2\u20134, excluding 12\u201314, use \u201ckr\u00f3lika.\u201dIf the counter is not 1\nand has a value that ends in either 0 or 1, or the counter ends in 5\u20139, or the\ncounter ends in 12\u201314, use \u201ckr\u00f3lik\u00f3w.\u201dIf the counter has any other value than\nthe above, use \u201ckr\u00f3liki.\u201d<\/em><\/p>\n<\/blockquote>\n<p>To solve this, <a href=\"http:\/\/userguide.icu-project.org\/formatparse\/messages\">ICU\u2019s\nMessageFormat<\/a> provides a\nstandard for formatting plurals in strings across languages. (It also specifies\nhow to handle genders in different languages.)<\/p>\n<p>A message with plurals might look like this:<\/p>\n<pre><code>'There {scans, plural, one{is # scan} other{are # scans}}';\n<\/code><\/pre>\n<p>The message above shows plurals inlined in the message. Another strategy is to\ndefine each plural message separately:<\/p>\n<pre><code>{\nSCAN_MESSAGE: {\none: &quot;There is # scan&quot;,\nother: &quot;There are # scans&quot;\n}\n}\n<\/code><\/pre>\n<p>Defining plurals using the latter strategy can become very verbose, especially\nfor longer text strings. Inlining plurals as shown in the first example is much\nmore concise.<\/p>\n<h3 id=\"translation-formats\">Translation formats<\/h3>\n<p>The obvious file format contender is JSON.<\/p>\n<p>However, <a href=\"https:\/\/www.gnu.org\/software\/gettext\/\">gettext<\/a>, Linux\u2019s standard\ntranslation library, uses a format called <a href=\"https:\/\/www.gnu.org\/software\/gettext\/manual\/html_node\/PO-Files.html\">PO\nfiles<\/a>\nwhich is also worth considering.<\/p>\n<p>The <a href=\"http:\/\/pology.nedohodnik.net\/doc\/user\/en_US\/ch-poformat.html\">format of a PO\nfile<\/a> is:<\/p>\n<pre><code>white-space\n# translator-comments\n#. extracted-comments\n#: reference\u2026\n#, flag\u2026\n#| msgid previous-untranslated-string\nmsgid untranslated-string\nmsgstr translated-string\n<\/code><\/pre>\n<p>There is wide support across languages for reading PO files, including\n<a href=\"https:\/\/github.com\/mikejholly\/node-po\">Node<\/a>,\n<a href=\"https:\/\/polib.readthedocs.org\/en\/latest\/quickstart.html\">Python<\/a>, and\n<a href=\"https:\/\/www.gnu.org\/software\/gettext\/manual\/html_node\/Java.html\">Java<\/a>.<\/p>\n<p>A major argument in favor of using PO files is that it would allow us to take\nadvantage of the ecosystem that\u2019s been built up around the PO format. For\ninstance, <a href=\"https:\/\/poedit.net\/\">POEdit<\/a> is a popular translating tool; numerous\n<a href=\"https:\/\/webtranslateit.com\/en\/docs\/file_formats\/\">online web services<\/a> offer PO\nsupport; and any experienced translator will certainly be familiar with the PO\nformat.<\/p>\n<p>Additionally, if you\u2019re sharing translation files with a particular\nnon-Javascript backend, PO might be a more appropriate file format than JSON.<\/p>\n<h2 id=\"debugging-and-development\">Debugging and development<\/h2>\n<p>When it comes to debugging translation integration, a number of recommendations\nare worth highlighting.<\/p>\n<h3 id=\"pseudo-language\">Pseudo Language<\/h3>\n<p>A common recommendation is to set up an \u201cpseudo language\u201d for testing. This will\nhighlight all translated strings in the application and allow developers to\nquickly locate any untranslated strings.<\/p>\n<p><a href=\"http:\/\/www.agileconnection.com\/article\/internationalization-best-practices-agile-teams?page=0,1\">One\nsolution<\/a>\nis to simply pad English strings, making it easy to see missing strings:<\/p>\n<blockquote>\n<p><em>\u201c\u91cc\u00ee\u00dfEnter your name:\u91cc\u00ee\u00df\u201d<\/em><\/p>\n<\/blockquote>\n<p>Another option is to replace English with a <a href=\"http:\/\/www.techrepublic.com\/blog\/10-things\/10-tips-and-best-practices-for-software-localization\/\">repeating\ncharacter<\/a>:<\/p>\n<blockquote>\n<p><em>\u201cXXXXX\u201d<\/em><\/p>\n<\/blockquote>\n<p><a href=\"http:\/\/www.hanselman.com\/blog\/GlobalizationInternationalizationAndLocalizationInASPNETMVC3JavaScriptAndJQueryPart1.aspx\">My\nfavorite<\/a>\nis replacing English characters with similar, yet distinctly foreign,\nreplacements:<\/p>\n<blockquote>\n<p><em>\u015c\u0119\u013e\u0119\u010d\u0167 \u00e4\u0149 \u00e4\u010d\u010d\u0151\u016b\u0149\u0167 \u00fe\u0119\u013e\u0151\u0175 \u0167\u0151 v\u012f\u0119\u0175 \u0151\u0159 \u0111\u0151\u0175\u0149\u013e\u0151\u00e4\u0111 y\u0151\u016b\u0159 \u00e4v\u00e4\u012f\u013e\u00e4\u00fe\u013e\u0119 \u0151\u0149\u013e\u012f\u0149\u0119 \u015f\u0167\u00e4\u0167\u0119m\u0119\u0149\u0167\u015f.<\/em><\/p>\n<\/blockquote>\n<h3 id=\"identifiers\">Identifiers<\/h3>\n<p>HTML elements containing translations should be tagged with the particular key\nof the translation in development mode. This will make it easy to identify a\nparticular translation if it acts up.<\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>On Github <a href=\"https:\/\/github.com\/scottlabs\/i18n-research\">I\u2019ve evaluated<\/a> a number\nof i18n libraries. For my purposes, I\u2019ve chosen to go with stitching a number of\nlibraries together in lieu of a framework, specifically MessageFormat.js,\nMoment.js, and Intl (with Intl.js polyfill).<\/p>\n<ul>\n<li><a href=\"https:\/\/medium.com\/tag\/javascript?source=post\">JavaScript<\/a><\/li>\n<li><a href=\"https:\/\/medium.com\/tag\/internationalization?source=post\">Internationalization<\/a><\/li>\n<\/ul>"}]}}