{"@attributes":{"version":"2.0"},"channel":{"title":"Brilliantly wrong","description":"Brilliantly Wrong \u2014 Alex Rogozhnikov's blog about math, machine learning, programming, physics and biology.","link":"https:\/\/arogozhnikov.github.io\/","pubDate":"Thu, 12 Feb 2026 06:12:48 +0000","lastBuildDate":"Thu, 12 Feb 2026 06:12:48 +0000","generator":"Jekyll v3.10.0","item":[{"title":"State of Wall in Protein Language Models in 2026","description":"<p>Last year Pascal Notin wrote a great post summarizing important observation about AI + proteins:\n<a href=\"https:\/\/pascalnotin.substack.com\/p\/have-we-hit-the-scaling-wall-for\">Have we hit the scaling wall for protein language models?<\/a>. (Spoiler: the answer is \u2018yes\u2019)<\/p>\n\n<p>Briefest summary if you didn\u2019t read it:<\/p>\n\n<ul>\n  <li>PLMs\u2019 performance on fitness prediction (\u2018transferability\u2019 of skills) plateaus after 1B and declines after 5B parameters. This holds for multiple PLM families<\/li>\n  <li>leading approaches combine MSAs and 3d structure. Even very simple methods that combine these sources of information outperform billion-parameter models<\/li>\n  <li>training on genetic sequences (that\u2019s quite a lot of additional signal!) doesn\u2019t help \u2014 Evo and Evo-2 are near the bottom of the leaderboard<\/li>\n<\/ul>\n\n<blockquote>\n  <p><strong>Remark:<\/strong> I\u2019ll focus on sequence-based models, and declare folding and inverse folding as out-of-scope for this post.<\/p>\n<\/blockquote>\n\n<p>New models appeared on ProteinGym leaderboard since Pascal\u2019s post, but conclusions hold. \nAnd later analysis from another group corroborates this:\n<a href=\"https:\/\/pmc.ncbi.nlm.nih.gov\/articles\/PMC11601519\/\">Medium-sized PLMs perform well at transfer learning on realistic datasets<\/a>.\nFolding models keep using embeddings from (very old) ESM-2.<\/p>\n\n<p>We\u2019re in a weird position when we have a lot of sequencing data (and computing power), but we can\u2019t put it to work.\nLet\u2019s take a tour across recent literature and see if there are any signs of going beyond this scaling wall.<\/p>\n\n<blockquote>\n  <p><strong>Remark:<\/strong> for comparison, widely used structure models (AlphaFold2 \/ AlphaFold3 \/ proteinMPNN) are even less than 1B parameters. This could be explained by a smaller size of PDB compared to UniProt, or maybe it\u2019s just a common trait of molecular biology.<\/p>\n<\/blockquote>\n\n<h2 id=\"amplify-is-scaling-necessary\">AMPLIFY: is scaling necessary?<\/h2>\n\n<p><a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2024.09.23.614603v2\">preprint<\/a><\/p>\n\n<p>Interestingly, the authors explicitly start from noting that the premise that \u201cscale leads to performance\u201d is likely false in PLMs, and then use recent LLM pretraining techniques to achieve better perplexity than ESM-2 using a cheaper and smaller model.<\/p>\n\n<p><img src=\"\/images\/protein_lms\/amplify_perplexity.png\" alt=\"perplexity of AMPLIFY\" \/><\/p>\n\n<p>They explore removal of UniProt clustering (used in most models) to increase size\/diversity of training data. Their main argument: clustering adds too much weight to non-realistic sequences.<\/p>\n\n<p>Validation, interestingly, is a subset of human proteome \u2014 choice here is important because final ranking in perplexity is highly affected by similarity of distribution to training data.<\/p>\n\n<p>Turns out, quality of sequencing data matters a lot \u2014 significant improvements correlate with largest \u201cclean-ups\u201d in UniProt.<\/p>\n\n<p>Other interesting bits:<\/p>\n\n<ul>\n  <li>AF2 can\u2019t distinguish between non-proteins and disordered proteins (PLMs of course can)<\/li>\n  <li>sequence recovery is very good (a lot of analysis in supplements)<\/li>\n  <li>analysis of performance on downstream tasks (like protein properties) is lacking, but this was covered in other papers.<\/li>\n<\/ul>\n\n<p>Overall: yes, we can significantly improve perplexity\/recovery, and model size isn\u2019t crucial.<\/p>\n\n<h2 id=\"structure-alignment-of-esm2-and-amplify\">Structure-alignment of ESM2 and AMPLIFY<\/h2>\n\n<p><a href=\"https:\/\/arxiv.org\/pdf\/2505.16896v2\">preprint<\/a><\/p>\n\n<p>Multiple works in this list sprinkle structure tokens in training (and sometimes inference). This work instead utilizes a CLIP-like contrastive alignment step between PLM token and protein GNN (GearNet) structure token. Second loss is a direct prediction of structure tokens.<\/p>\n\n<p>This delivers a good improvements on contact predictions, fold and secondary structure, but interestingly not so much for downstream tasks (specially in Table 8\/ Fugire 10 SaAMPLIFY isn\u2019t better than plain AMPLIFY).<\/p>\n\n<p>SaESM-2 (aligned ESM-2) transfers to downstream tasks better than SaAMPLIFY \u2014 again confirming very poor correlation between perplexity and transferability.<\/p>\n\n<p><img src=\"\/images\/protein_lms\/sa_proteins_transfer.png\" alt=\"SaESM \/ SaAMPLIFY transferability\" \/><\/p>\n\n<h2 id=\"prosst-quantized-structure-tokens\">ProSST: quantized structure tokens<\/h2>\n\n<p><a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2024.04.15.589672v3.full.pdf\">preprint<\/a><\/p>\n\n<p>ProSST heads the leaderboard in proteinGYM, let\u2019s see the recipe:<\/p>\n\n<ol>\n  <li>introduced structure tokens by encoding 40 neighbors<\/li>\n  <li>attention separately encodes sequence, structure tokens and relative position (ablation against the plain attention shows unrealistic improvement, could they have forgotten relpos?)<\/li>\n  <li>pre-trained on AFDB (18.8M structures selected) using ESM-style MLM objective<\/li>\n<\/ol>\n\n<p>Result is SOTA generalization to downstream tasks. Peak performance is reached at ~110M parameters, and then goes down.<\/p>\n\n<p>Model requires knowing the protein structure during prediction, which is somewhat limiting. Huge structural database was used, and perplexity still improves with size, but not downstream performance.<\/p>\n\n<p><img src=\"\/images\/protein_lms\/proSST_trasnfer.png\" alt=\"proSST transfer\" \/><\/p>\n\n<h2 id=\"vespag\">VespaG<\/h2>\n\n<p><a href=\"https:\/\/academic.oup.com\/bioinformatics\/article\/40\/11\/btae621\/7907184\">paper<\/a><\/p>\n\n<p>VespaG is a tiny projection on top of ESM-2 embeddings, and achieves SOTA performance among sequence-only models. Trick is to \u201calign\u201d token embedding produced by ESM-2 (or other PLM) to MSA-based statistics computed by GEMME.<\/p>\n\n<p>From their analysis, again, highest performance is reached on 650M ESM2, and then goes down \u2014 mirroring results of plain ESM family with some additional boost in quality.<\/p>\n\n<h2 id=\"scaling-and-data-saturation-in-protein-language-models\">Scaling and Data Saturation in Protein Language Models<\/h2>\n\n<p><a href=\"https:\/\/arxiv.org\/pdf\/2507.22210\">paper<\/a><\/p>\n\n<p>Paper starts with a nice <a href=\"https:\/\/arxiv.org\/abs\/2507.00885\">reference<\/a>: in LLM world relation of scaling law and downstream performance is not direct (likely even less so with RL finetuning strategies)<\/p>\n\n<p>And show how this observation translates to the world of proteins by training a number of AMPLIFY models:<\/p>\n<ul>\n  <li>let\u2019s chunk every sequence. Training on more chunks from <em>same<\/em> sequences consistently improves performance, while adding newer sequences can hurt it<\/li>\n  <li>When stratifying by MSA depth, proteins with larger MSAs (as measured by Neff\/L) tended to show improved prediction performance with later model training years, unlike those with smaller MSAs<\/li>\n  <li>\u201cwhen partitioning by functional assay type, proteins evaluated using Organismal Fitness as the readout exhibited the most consistent improvement over time, whereas other categories showed more variable or flat trajectories\u201d \u2014 this is reasonable, after all nature crafts sequences only by fitness<\/li>\n<\/ul>\n\n<p>Finally, an experiment with one specific family shows that supervised dataset can replace a decade of collecting protein data in the wild, so \u2026 just collecting sequences-in-the-wild is still useful but inefficient.<\/p>\n\n<h2 id=\"training-compute-optimal-protein-language-models\">Training Compute-Optimal Protein Language Models<\/h2>\n\n<p><a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2024\/file\/8066ae1446b2bbccb5159587cc3b3bcc-Paper-Conference.pdf\">neurips proceedings<\/a><\/p>\n\n<p>Metagenomic sequences are diverse and abundant, likely a good complement to UniProt \u2014 so authors add ColabFoldDB in training.<\/p>\n\n<p>Paper builds a good contrast between MLMs and causal LMs (CLMs). MLMs are efficient and easy to overfit, opposite to CLMs.<\/p>\n\n<p>They claim that optimal training recipe is starting from CLMs, then switchin loss to MLM; Surprisingly, training on two losses at the same time isn\u2019t better. Authors argue that flops-optimal scaling favors larger models (and they train up to 10B parameters). Results are mixed:<\/p>\n\n<ul>\n  <li>transfer to downstream tasks isn\u2019t impressive<\/li>\n  <li>contact prediction: minor fine-tuning of ~1B model achieves higher quality than larger model<\/li>\n<\/ul>\n\n<p>Insteresting observation: BERT\u2019s 15% masking ratio (used in ESMs) is still a good choice in protein MLMs.<\/p>\n\n<h2 id=\"ankh3-combining-sequence-denoising-and-completion\">Ankh3: combining sequence denoising and completion<\/h2>\n\n<p><a href=\"https:\/\/arxiv.org\/pdf\/2505.20052\">preprint<\/a><\/p>\n\n<p>This paper stands out because 1. they show good improvement in contact prediction 2. 6B model is overall better than 2B model.<\/p>\n\n<p>A model jointly optimized on two objectives: encoder-decoder protein completion and MLM denoising (with 15%, 20% or 50% masking probability, and apparently short spans were masked, not individual tokens). Both points contradict previous paper in this list \u2014 could be results of encoder-decoder architecture.<\/p>\n\n<p>Preprint leaves many questions unanswered:<\/p>\n<ul>\n  <li>model is deep (72 layers), so it could be just ineffecient<\/li>\n  <li>evaluation is limited to datasets without easy \u2018leaderboard\u2019 to estimate downstream performance.<\/li>\n  <li>I\u2019m a bit concerned that ESM-2 and Ankh results were \u201csourced from ankh paper\u201d instead of being reproduced.<\/li>\n<\/ul>\n\n<h2 id=\"progen3-scaling-unlocks-broader-generation-and-deeper-functional-understanding-of-proteins\">ProGen3: Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins<\/h2>\n\n<p><a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2025.04.15.649055v1\">preprint<\/a><\/p>\n\n<ol>\n  <li>Employ huge curated dataset (PPA-1) that combines genomic and metagenomic sources and excludes fragments.<\/li>\n  <li>Model is trained on left-to-right, right-to-left and span infilling objectives (finally!). Then aligned on downstream tasks using IRPO \u2014 modification of DPO.<\/li>\n<\/ol>\n\n<p>Results: non-aligned perfomance frequently peaks at ~3B, aligned performance usually still improves. \nLarger models can generate proteins from more clusters, with tiny implevements in expression.<\/p>\n\n<p>Exact numbers on proteinGYM aren\u2019t impressive, but overall dynamics after alignment looks encouraging.<\/p>\n\n<h2 id=\"dplm-1--dplm-2--esm-3\"><a href=\"https:\/\/arxiv.org\/abs\/2402.18567\">DPLM-1<\/a> \/ <a href=\"https:\/\/arxiv.org\/abs\/2410.13782\">DPLM-2<\/a> \/ <a href=\"https:\/\/www.science.org\/doi\/10.1126\/science.ads0018\">ESM-3<\/a><\/h2>\n\n<p>These models were trained with a sufficient amount of structural information in a form of structure tokens.<\/p>\n\n<p>DPLM-1 achieves better downstream performance on multiple tasks on 3B model (no larger model was analyzed), but DPLM-2 (with primary focus on structure tokens based on LFQ) reports only 650M model \u2014 I treat this as implicit signal of scaling boundary. Interestingly, DLPM-2 shows worse downstream performance, and authors link this to missing PLM pretraining in DPLM-2.<\/p>\n\n<p>Combination of scaling + PLM pretraining + better structure tokens would be very interesting, but this didn\u2019t happen yet with DPLMs (or happened and result wasn\u2019t good enough for publication).<\/p>\n\n<p>ESM-3 is somewhat close, but they don\u2019t report any actual translatable properties of the model; performance reported by proteinGYM isn\u2019t impressive and ESM-C 300M has similar performance to ESM-C 600M.<\/p>\n\n<h2 id=\"msa-as-a-context-for-plms\">MSA as a context for PLMs<\/h2>\n\n<p>MSA-based models (like MsaPairformer) show better transferability compared to PLMs (and they are smaller than PLMs).<\/p>\n\n<p>PoET model started a direction in PLMs where homologous sequences are passed as a context while architecture is still a classical transformer.<\/p>\n\n<p>This direction inherits weak sides of both PLMs and MSA-based models: 1. one still has to retrieve MSAs 2. alignment should be done by model implicitly 3. more weights compared to MSA-based models and 4. long+deep MSAs are expensive because of quadratic attention.<\/p>\n\n<p>One paper from this family (<a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2025.11.12.688125v1.abstract\">Profluent E1<\/a>, also trained on PPA-1) claims good perfomance on gym and contact prediction (better than MsaPairformer and other PLMs) and shows positive scaling \u2026 up to 600M. From plots I\u2019d expect further improvement on contact prediction, but not on downstream tasks.\nGiven cost of training, it isn\u2019t surprising that largest model is only 600M.<\/p>\n\n<h1 id=\"final-thoughts--directions\">Final thoughts \/ directions<\/h1>\n\n<p>Multiple years of research in PLMs did not bring a recognized recipe to utilize vast sequencing data.\nRecent literature contains some interesting hints, but not strong hypotheses how to do this. \nPLMs more and more incorporate structural or MSA features, which pushes performance; model sizes still mostly don\u2019t matter.<\/p>\n\n<p>PLMs started from assumption that better perplexity means overall better understanding of protein sequences, as it worked in NLP. This assumption is wrong, and likely in NLP it isn\u2019t true either: longer training on natural language worked because pretty much any reasonable problem was already discussed with examples in the training data. Later progress in NLP was guaranteed by numerous problem-oriented curated datasets, scaling only helped in storing knowledge\/patterns in the model.<\/p>\n\n<p>If, in addition to protein sequences, training data contained various tokens related to expression, function, interaction, biophysical properties, etc., then all those metrics would go up. \nProtein sequences alone don\u2019t provide enough training signal. \nCan correlation with other genes from the same organism provide a more useful context? Can functional descrption form a better prompt? \nSome teams work on this, so we\u2019ll see soon.<\/p>\n\n<p>Is there a double descent in biology? Given the size of ESM-3 I\u2019ll put this hypothesis off the table.<\/p>\n\n<p>Are we memorizing phylogenetic noise? Almost surely yes. Larger models can generate proteins from more families (as shown by E1), while the best property prediction is still provided by analysis of MSAs (within the same family).<\/p>\n\n<p>Maybe nature does not care much about <em>our<\/em> downstream tasks. Maybe much memory isn\u2019t necessary to memorize everything useful in biology (we\u2019re far from optimal performance, so probably not).<\/p>\n\n<p>Simple but likely more fruitful direction at this point would be to curate a large dataset with diverse downstream properties.<\/p>\n\n<p><strong>Confounding factors?<\/strong> We don\u2019t accept assay results at face value, but we generally assume that protein sequences are free of confounding effects (except for phylogenetic noise). \nIn <a href=\"https:\/\/arxiv.org\/pdf\/2512.20924\">\u201cClever Hans in Chemistry\u201d<\/a> authors show that models can guess the author of molecule; knowing author, they can guess the activity without looking at the molecule itself. \n<em>Could similar cues appear in non-frequent sequences?<\/em> \nLike the sequencing technology, or assembly method? \nThis is yet another hypothesis why we don\u2019t see generalization.<\/p>\n","pubDate":"Sun, 01 Feb 2026 12:00:00 +0000","link":"https:\/\/arogozhnikov.github.io\/2026\/02\/01\/protein-lms.html","guid":"https:\/\/arogozhnikov.github.io\/2026\/02\/01\/protein-lms.html","category":["protein","language models","deep learning"]},{"title":"Fastest Autograd in the West","description":"<p>Who needs fast autograd? Seemingly everyone these days!<\/p>\n\n<p>And once upon a time I needed an autograd that is <strong>actually fast<\/strong>.\nLeaving project details aside, here are the requirements:<\/p>\n\n<ul>\n  <li>we test many computation graphs (graph is changing constantly)<\/li>\n  <li>many-many scalar operations with roughly <strong>10k\u2014100k nodes<\/strong> in each graph<\/li>\n  <li>every graph should be compiled and ran around <strong>10k times<\/strong> both forward and backward<\/li>\n  <li>this should be done <strong>wicked fast<\/strong>, and with a convenient pythonic interface<\/li>\n<\/ul>\n\n<p>Path that awaits us ahead:<\/p>\n<ol>\n  <li>autograd in torch<\/li>\n  <li>autograd in jax<\/li>\n  <li>autograd in python<\/li>\n  <li>autograd in rust<\/li>\n  <li>autograd in C<\/li>\n  <li>autograd in assembly<\/li>\n<\/ol>\n\n<p>Plus a significant amount of sloppy code and timings on M1 macbook.<\/p>\n\n<h3 id=\"lets-autograd-in-pytorch\">Let\u2019s autograd in pytorch<\/h3>\n\n<p>We start our journey with pytorch \u2014 the default autograd engine in research. \nWe\u2019ll create a graph with many nodes, and to keep things simple our benchmark has only several kinds of operations: unary (softplus), binary (multiplication), n-ary (sum) and n-to-n (softmax).<\/p>\n\n<p>This allows using just a few operations, but resembles a realistic load.\nAll benchmarks in this post will reimplement the same logic as below.<\/p>\n\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">def<\/span> <span class=\"nf\">run_graph<\/span><span class=\"p\">(<\/span><span class=\"n\">initial_variables<\/span><span class=\"p\">,<\/span> <span class=\"n\">n_operations<\/span><span class=\"p\">:<\/span> <span class=\"nb\">int<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">nodes<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">initial_variables<\/span><span class=\"p\">]<\/span>\n\n    <span class=\"k\">for<\/span> <span class=\"n\">op<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">n_operations<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">match<\/span> <span class=\"n\">op<\/span> <span class=\"o\">%<\/span> <span class=\"mi\">4<\/span><span class=\"p\">:<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">0<\/span><span class=\"p\">:<\/span>\n                <span class=\"c1\"># softplus\n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">F<\/span><span class=\"p\">.<\/span><span class=\"n\">softplus<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">1<\/span><span class=\"p\">:<\/span>\n                <span class=\"c1\"># sum\n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">30<\/span><span class=\"p\">:<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">:<\/span><span class=\"mi\">5<\/span><span class=\"p\">]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">2<\/span><span class=\"p\">:<\/span>\n                <span class=\"c1\"># prod\n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">20<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">])<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">3<\/span><span class=\"p\">:<\/span>\n                <span class=\"c1\"># softmax\n<\/span>                <span class=\"n\">softmaxes<\/span> <span class=\"o\">=<\/span> <span class=\"n\">F<\/span><span class=\"p\">.<\/span><span class=\"n\">softmax<\/span><span class=\"p\">(<\/span><span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"n\">stack<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">4<\/span><span class=\"p\">:],<\/span> <span class=\"n\">dim<\/span><span class=\"o\">=<\/span><span class=\"mi\">0<\/span><span class=\"p\">),<\/span> <span class=\"n\">dim<\/span><span class=\"o\">=<\/span><span class=\"mi\">0<\/span><span class=\"p\">)<\/span>\n                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">extend<\/span><span class=\"p\">(<\/span><span class=\"n\">softmaxes<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"k\">return<\/span> <span class=\"n\">nodes<\/span>\n\n\n<span class=\"k\">def<\/span> <span class=\"nf\">run_benchmark_pytorch<\/span><span class=\"p\">(<\/span><span class=\"n\">n_iterations<\/span><span class=\"p\">,<\/span> <span class=\"n\">n_operations<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">init_vars<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"n\">arange<\/span><span class=\"p\">(<\/span><span class=\"mi\">100<\/span><span class=\"p\">,<\/span> <span class=\"n\">dtype<\/span><span class=\"o\">=<\/span><span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"n\">float32<\/span><span class=\"p\">,<\/span> <span class=\"n\">requires_grad<\/span><span class=\"o\">=<\/span><span class=\"bp\">True<\/span><span class=\"p\">)<\/span>\n    <span class=\"k\">for<\/span> <span class=\"n\">_<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">n_iterations<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">nodes<\/span> <span class=\"o\">=<\/span> <span class=\"n\">run_graph<\/span><span class=\"p\">(<\/span>\n            <span class=\"n\">initial_variables<\/span><span class=\"o\">=<\/span><span class=\"n\">init_vars<\/span><span class=\"p\">,<\/span>\n            <span class=\"n\">n_operations<\/span><span class=\"o\">=<\/span><span class=\"n\">n_operations<\/span><span class=\"p\">,<\/span>\n        <span class=\"p\">)<\/span>\n        <span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">].<\/span><span class=\"n\">backward<\/span><span class=\"p\">()<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Run-time for 10k ops x 100 iterations: 11.3 seconds\n<br \/>Run-time for 10k ops x 10k iterations: <strong>1130 seconds<\/strong> (estimate)<\/p>\n\n<p>Given we created 100M python objects, it\u2019s actually quite fast.\nAnd yes, that\u2019s not going to deliver an interactive experience.<\/p>\n\n<p>Let\u2019s also discuss <code class=\"language-plaintext highlighter-rouge\">torch.compile<\/code>, a major innovation in pytorch 2.0.<\/p>\n\n<p>At 100 operations torch.compile takes 4.5 seconds. \nExecution gets faster: for 100 operations and 10k iterations it takes 4.52 seconds with torch.compile and 10.4 seconds without. \nCompilation + execution are still in the same ballpark. \nFor bigger graphs (1k operations) <code class=\"language-plaintext highlighter-rouge\">torch.compile<\/code> crashes.<\/p>\n\n<h3 id=\"lets-autograd-in-jax\">Let\u2019s autograd in jax<\/h3>\n\n<p>Jax is the new cool kid\u2026 well, not that new anymore.\nBut in some aspects it is very interesting. Jax\u2019s focus on JIT-compiling static graphs is very suitable for the problem at hand.<\/p>\n\n<p>Implementation for benchmark is similar to pytorch:<\/p>\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import<\/span> <span class=\"nn\">jax<\/span>\n<span class=\"kn\">import<\/span> <span class=\"nn\">numpy<\/span> <span class=\"k\">as<\/span> <span class=\"n\">np<\/span>\n\n<span class=\"k\">def<\/span> <span class=\"nf\">run_graph_jax<\/span><span class=\"p\">(<\/span><span class=\"n\">initial_variables<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">nodes<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">initial_variables<\/span><span class=\"p\">]<\/span>\n    <span class=\"k\">for<\/span> <span class=\"n\">op<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">n_operations<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">match<\/span> <span class=\"n\">op<\/span> <span class=\"o\">%<\/span> <span class=\"mi\">4<\/span><span class=\"p\">:<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">0<\/span><span class=\"p\">:<\/span>\n                <span class=\"c1\"># softplus\n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">jax<\/span><span class=\"p\">.<\/span><span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"n\">softplus<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">1<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># sum\n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">30<\/span><span class=\"p\">:<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">:<\/span><span class=\"mi\">5<\/span><span class=\"p\">]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">2<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># prod \n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">20<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">])<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">3<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># softmax\n<\/span>                <span class=\"n\">softmaxes<\/span> <span class=\"o\">=<\/span> <span class=\"n\">jax<\/span><span class=\"p\">.<\/span><span class=\"n\">nn<\/span><span class=\"p\">.<\/span><span class=\"n\">softmax<\/span><span class=\"p\">(<\/span><span class=\"n\">jax<\/span><span class=\"p\">.<\/span><span class=\"n\">numpy<\/span><span class=\"p\">.<\/span><span class=\"n\">stack<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">4<\/span><span class=\"p\">:]),<\/span> <span class=\"n\">axis<\/span><span class=\"o\">=<\/span><span class=\"mi\">0<\/span><span class=\"p\">)<\/span>\n                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">extend<\/span><span class=\"p\">(<\/span><span class=\"n\">softmaxes<\/span><span class=\"p\">)<\/span>\n                \n    <span class=\"k\">return<\/span> <span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span>\n\n<span class=\"n\">run_graph_and_grad<\/span> <span class=\"o\">=<\/span> <span class=\"n\">jax<\/span><span class=\"p\">.<\/span><span class=\"n\">value_and_grad<\/span><span class=\"p\">(<\/span><span class=\"n\">run_graph_jax<\/span><span class=\"p\">)<\/span>\n<span class=\"c1\"># or \n<\/span><span class=\"n\">run_graph_and_grad<\/span> <span class=\"o\">=<\/span> <span class=\"n\">jax<\/span><span class=\"p\">.<\/span><span class=\"n\">jit<\/span><span class=\"p\">(<\/span><span class=\"n\">jax<\/span><span class=\"p\">.<\/span><span class=\"n\">value_and_grad<\/span><span class=\"p\">(<\/span><span class=\"n\">run_graph_jax<\/span><span class=\"p\">))<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Without jit computations are extremely slow: \n<br \/> 1k ops x 10 iterations =&gt; 15.9 seconds\n<br \/> 10k ops x 10k iterations =&gt; 159,000 seconds (estimate)<\/p>\n\n<p>That\u2019s a bit longer than forever! But whole point of jax is to JIT-compile stuff. So let\u2019s do it.<\/p>\n\n<p>jit: compilation of 1k ops = 47 seconds\n<br \/> jit: run-time for 1k ops x 10k iterations = 0.66 seconds\n<br \/> jit: 10k ops x 10k iterations (compilation + run-time) =&gt; <strong>470 seconds<\/strong> (estimate)<\/p>\n\n<p>Speed up in execution time is more than impressive, but we spend  &gt;99% of time compiling.<\/p>\n\n<h4 id=\"tensorflow\">Tensorflow<\/h4>\n<p>Someone will mention TF anyway. I\u2019ll leave this as an exercise for you, TF fans.<\/p>\n\n<h3 id=\"lets-autograd-in-python\">Let\u2019s autograd in python<\/h3>\n\n<p>Done with baselines, time to see if we can speed things up.<\/p>\n\n<p>Let\u2019s create a simplistic pseudo-framework and see how it competes with previous candidates.\nWe\u2019ll implement a tape-like autograd where operations order is explicitly tracked in a tape.<\/p>\n\n<details>\n  <summary class=\"code-summary\">show autograd engine in plain python\n<\/summary>\n  <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">class<\/span> <span class=\"nc\">NaiveVar<\/span><span class=\"p\">:<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">__init__<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">val<\/span><span class=\"p\">):<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span> <span class=\"o\">=<\/span> <span class=\"n\">val<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.<\/span>\n    \n<span class=\"k\">class<\/span> <span class=\"nc\">NaiveTape<\/span><span class=\"p\">:<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">__init__<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">input_values<\/span><span class=\"p\">):<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">ops<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[]<\/span>\n        \n    <span class=\"k\">def<\/span> <span class=\"nf\">sum<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"o\">*<\/span><span class=\"nb\">vars<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">res<\/span> <span class=\"o\">=<\/span> <span class=\"n\">NaiveVar<\/span><span class=\"p\">(<\/span><span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span> <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">vars<\/span><span class=\"p\">))<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">ops<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">((<\/span><span class=\"s\">'sum'<\/span><span class=\"p\">,<\/span> <span class=\"nb\">vars<\/span><span class=\"p\">,<\/span> <span class=\"n\">res<\/span><span class=\"p\">))<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">res<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">prod<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">var1<\/span><span class=\"p\">,<\/span> <span class=\"n\">var2<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">res<\/span> <span class=\"o\">=<\/span> <span class=\"n\">NaiveVar<\/span><span class=\"p\">(<\/span><span class=\"n\">var1<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span> <span class=\"o\">*<\/span> <span class=\"n\">var2<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span><span class=\"p\">)<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">ops<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">((<\/span><span class=\"s\">'prod'<\/span><span class=\"p\">,<\/span> <span class=\"p\">[<\/span><span class=\"n\">var1<\/span><span class=\"p\">,<\/span> <span class=\"n\">var2<\/span><span class=\"p\">],<\/span> <span class=\"n\">res<\/span><span class=\"p\">))<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">res<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">softmax<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"o\">*<\/span><span class=\"nb\">vars<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">vals<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">v<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span> <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">vars<\/span><span class=\"p\">]<\/span>\n        <span class=\"n\">maxval<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">max<\/span><span class=\"p\">(<\/span><span class=\"n\">vals<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">vals<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">v<\/span> <span class=\"o\">-<\/span> <span class=\"n\">maxval<\/span> <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">vals<\/span><span class=\"p\">]<\/span>\n        <span class=\"n\">denom<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">vals<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">res<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">NaiveVar<\/span><span class=\"p\">(<\/span><span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">denom<\/span><span class=\"p\">)<\/span> <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">vals<\/span><span class=\"p\">]<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">ops<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">((<\/span><span class=\"s\">'softmax'<\/span><span class=\"p\">,<\/span> <span class=\"nb\">vars<\/span><span class=\"p\">,<\/span> <span class=\"n\">denom<\/span><span class=\"p\">))<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">res<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">softplus<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">var<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">res<\/span> <span class=\"o\">=<\/span> <span class=\"n\">NaiveVar<\/span><span class=\"p\">(<\/span><span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">log1p<\/span><span class=\"p\">(<\/span><span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">var<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span><span class=\"p\">)))<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">ops<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">((<\/span><span class=\"s\">'splus'<\/span><span class=\"p\">,<\/span> <span class=\"n\">var<\/span><span class=\"p\">,<\/span> <span class=\"n\">res<\/span><span class=\"p\">))<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">res<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">backward<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">var<\/span><span class=\"p\">):<\/span>\n        <span class=\"k\">assert<\/span> <span class=\"n\">var<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">0<\/span>\n        <span class=\"n\">var<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span> <span class=\"o\">+=<\/span> <span class=\"mi\">1<\/span>\n        <span class=\"k\">for<\/span> <span class=\"n\">op<\/span><span class=\"p\">,<\/span> <span class=\"n\">inputs<\/span><span class=\"p\">,<\/span> <span class=\"n\">outputs<\/span> <span class=\"ow\">in<\/span> <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">ops<\/span><span class=\"p\">[::<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">]:<\/span>\n            <span class=\"n\">match<\/span> <span class=\"n\">op<\/span><span class=\"p\">:<\/span>\n                <span class=\"n\">case<\/span> <span class=\"s\">'sum'<\/span><span class=\"p\">:<\/span>\n                    <span class=\"n\">out<\/span> <span class=\"o\">=<\/span> <span class=\"n\">outputs<\/span>\n                    <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">inputs<\/span><span class=\"p\">:<\/span>\n                        <span class=\"n\">v<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">out<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span>\n                <span class=\"n\">case<\/span> <span class=\"s\">'prod'<\/span><span class=\"p\">:<\/span>\n                    <span class=\"n\">out<\/span> <span class=\"o\">=<\/span> <span class=\"n\">outputs<\/span>\n                    <span class=\"n\">in1<\/span><span class=\"p\">,<\/span> <span class=\"n\">in2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">inputs<\/span>\n                    <span class=\"n\">in1<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">in2<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span> <span class=\"o\">*<\/span> <span class=\"n\">out<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span>\n                    <span class=\"n\">in2<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">in1<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span> <span class=\"o\">*<\/span> <span class=\"n\">out<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span>\n                <span class=\"n\">case<\/span> <span class=\"s\">'splus'<\/span><span class=\"p\">:<\/span>\n                    <span class=\"n\">inputs<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">out<\/span><span class=\"p\">.<\/span><span class=\"n\">grad<\/span> <span class=\"o\">\/<\/span> <span class=\"p\">(<\/span><span class=\"mi\">1<\/span> <span class=\"o\">+<\/span> <span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"n\">inputs<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span><span class=\"p\">))<\/span>\n                <span class=\"n\">case<\/span> <span class=\"s\">'softmax'<\/span><span class=\"p\">:<\/span>\n                    <span class=\"k\">pass<\/span> <span class=\"c1\"># skip for now\n<\/span>                <span class=\"n\">case<\/span> <span class=\"n\">_<\/span><span class=\"p\">:<\/span>\n                    <span class=\"k\">raise<\/span> <span class=\"nb\">NotImplementedError<\/span><span class=\"p\">()<\/span>\n<\/code><\/pre><\/div>  <\/div>\n<\/details>\n\n<p>and reimplement reference task using our new pseudo-framework:<\/p>\n<details>\n  <summary class=\"code-summary\">show benchmarking code\n<\/summary>\n  <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">def<\/span> <span class=\"nf\">run_graph_python_and_backward<\/span><span class=\"p\">(<\/span><span class=\"n\">initial_variables<\/span><span class=\"p\">,<\/span> <span class=\"n\">n_operations<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">nodes<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">NaiveVar<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span> <span class=\"k\">for<\/span> <span class=\"n\">x<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">initial_variables<\/span><span class=\"p\">]<\/span>\n    <span class=\"n\">tape<\/span> <span class=\"o\">=<\/span> <span class=\"n\">NaiveTape<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">)<\/span>\n    <span class=\"k\">for<\/span> <span class=\"n\">op<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">n_operations<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">match<\/span> <span class=\"n\">op<\/span> <span class=\"o\">%<\/span> <span class=\"mi\">4<\/span><span class=\"p\">:<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">0<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># softplus\n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"n\">softplus<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">1<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># sum\n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">30<\/span><span class=\"p\">:<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">:<\/span><span class=\"mi\">5<\/span><span class=\"p\">]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">2<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># prod \n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"n\">prod<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">20<\/span><span class=\"p\">],<\/span> <span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">3<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># softmax\n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">extend<\/span><span class=\"p\">(<\/span><span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"n\">softmax<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">4<\/span><span class=\"p\">:]))<\/span>\n\n    <span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"n\">backward<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">])<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">tape<\/span>\n<\/code><\/pre><\/div>  <\/div>\n<\/details>\n\n<p>Run-time for 10k ops and 10k iterations: <strong>312 seconds<\/strong>.<\/p>\n\n<p>Expectably not fast. But compared to previous candidates, that\u2019s actually quite competitive!<\/p>\n\n<h3 id=\"lets-autograd-in-python-again\">Let\u2019s autograd in python, again<\/h3>\n\n<p>This time we move all values into tape instead of keeping in variables. \nAdditionally tape will keep a \u2018static graph\u2019 of computations by recording indices of variables participating in every operation.<\/p>\n\n<details>\n  <summary class=\"code-summary\">show code for autograd in plain python\n<\/summary>\n  <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import<\/span> <span class=\"nn\">numba<\/span>\n<span class=\"kn\">import<\/span> <span class=\"nn\">math<\/span>\n\n<span class=\"k\">class<\/span> <span class=\"nc\">VarInd<\/span><span class=\"p\">:<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">__init__<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">index<\/span><span class=\"p\">):<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span> <span class=\"o\">=<\/span> <span class=\"n\">index<\/span> <span class=\"c1\"># variable is just a unique index in tape\n<\/span>    \n<span class=\"k\">class<\/span> <span class=\"nc\">TapeInd<\/span><span class=\"p\">:<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">__init__<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">):<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">ops<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[]<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">vals<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[]<\/span>  <span class=\"c1\"># flat memory with values\n<\/span>        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">grads<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[]<\/span> <span class=\"c1\"># flat memory with gradients\n<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">make_var<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">value<\/span><span class=\"p\">):<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">vals<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">value<\/span><span class=\"p\">)<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">grads<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.<\/span><span class=\"p\">)<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">VarInd<\/span><span class=\"p\">(<\/span><span class=\"nb\">len<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">vals<\/span><span class=\"p\">)<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">val<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">v<\/span><span class=\"p\">:<\/span> <span class=\"n\">VarInd<\/span><span class=\"p\">):<\/span>\n        <span class=\"k\">return<\/span> <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"n\">v<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span><span class=\"p\">]<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">add_op<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">kls<\/span><span class=\"p\">,<\/span> <span class=\"n\">input_vars<\/span><span class=\"p\">,<\/span> <span class=\"n\">output_vars<\/span><span class=\"p\">):<\/span>\n\t    <span class=\"c1\"># translate variable to indices. self.ops keeps only indices\n<\/span>        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">ops<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">((<\/span><span class=\"n\">kls<\/span><span class=\"p\">,<\/span> <span class=\"p\">[<\/span><span class=\"n\">x<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span> <span class=\"k\">for<\/span> <span class=\"n\">x<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">input_vars<\/span><span class=\"p\">],<\/span> <span class=\"p\">[<\/span><span class=\"n\">x<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span> <span class=\"k\">for<\/span> <span class=\"n\">x<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">output_vars<\/span><span class=\"p\">]))<\/span>        \n        \n    <span class=\"k\">def<\/span> <span class=\"nf\">sum<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"o\">*<\/span><span class=\"nb\">vars<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">res<\/span> <span class=\"o\">=<\/span> <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">make_var<\/span><span class=\"p\">(<\/span><span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">vars<\/span><span class=\"p\">))<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">add_op<\/span><span class=\"p\">(<\/span><span class=\"s\">'sum'<\/span><span class=\"p\">,<\/span> <span class=\"nb\">vars<\/span><span class=\"p\">,<\/span> <span class=\"p\">[<\/span><span class=\"n\">res<\/span><span class=\"p\">])<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">res<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">prod<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">var1<\/span><span class=\"p\">,<\/span> <span class=\"n\">var2<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">res<\/span> <span class=\"o\">=<\/span> <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">make_var<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span><span class=\"p\">(<\/span><span class=\"n\">var1<\/span><span class=\"p\">)<\/span> <span class=\"o\">*<\/span> <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span><span class=\"p\">(<\/span><span class=\"n\">var2<\/span><span class=\"p\">))<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">add_op<\/span><span class=\"p\">(<\/span><span class=\"s\">'prod'<\/span><span class=\"p\">,<\/span> <span class=\"p\">[<\/span><span class=\"n\">var1<\/span><span class=\"p\">,<\/span> <span class=\"n\">var2<\/span><span class=\"p\">],<\/span> <span class=\"p\">[<\/span><span class=\"n\">res<\/span><span class=\"p\">])<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">res<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">softmax<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"o\">*<\/span><span class=\"nb\">vars<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">vals<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">vars<\/span><span class=\"p\">]<\/span>\n        <span class=\"n\">maxval<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">max<\/span><span class=\"p\">(<\/span><span class=\"n\">vals<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">vals<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">v<\/span> <span class=\"o\">-<\/span> <span class=\"n\">maxval<\/span> <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">vals<\/span><span class=\"p\">]<\/span>\n        <span class=\"n\">denom<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">vals<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">res<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">make_var<\/span><span class=\"p\">(<\/span><span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">denom<\/span> <span class=\"p\">)<\/span> <span class=\"k\">for<\/span> <span class=\"n\">v<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">vals<\/span><span class=\"p\">]<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">add_op<\/span><span class=\"p\">(<\/span><span class=\"s\">'softmax'<\/span><span class=\"p\">,<\/span> <span class=\"nb\">vars<\/span><span class=\"p\">,<\/span> <span class=\"n\">res<\/span><span class=\"p\">)<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">res<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">softplus<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">var<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">res<\/span> <span class=\"o\">=<\/span> <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">make_var<\/span><span class=\"p\">(<\/span><span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">log1p<\/span><span class=\"p\">(<\/span> <span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">val<\/span><span class=\"p\">(<\/span><span class=\"n\">var<\/span><span class=\"p\">))<\/span> <span class=\"p\">))<\/span>\n        <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">add_op<\/span><span class=\"p\">(<\/span><span class=\"s\">'splus'<\/span><span class=\"p\">,<\/span> <span class=\"p\">[<\/span><span class=\"n\">var<\/span><span class=\"p\">],<\/span> <span class=\"p\">[<\/span><span class=\"n\">res<\/span><span class=\"p\">])<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">res<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">forward_backward_external<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">grad_var<\/span><span class=\"p\">:<\/span> <span class=\"n\">VarInd<\/span><span class=\"p\">):<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">forward_backward_optimal<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">vals<\/span><span class=\"p\">,<\/span> <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">grads<\/span><span class=\"p\">,<\/span> <span class=\"bp\">self<\/span><span class=\"p\">.<\/span><span class=\"n\">ops<\/span><span class=\"p\">,<\/span> <span class=\"n\">grad_var_index<\/span><span class=\"o\">=<\/span><span class=\"n\">grad_var<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span><span class=\"p\">)<\/span>\n\n<span class=\"k\">def<\/span> <span class=\"nf\">forward_backward_external<\/span><span class=\"p\">(<\/span>\n\t<span class=\"n\">vals<\/span><span class=\"p\">:<\/span> <span class=\"nb\">list<\/span><span class=\"p\">[<\/span><span class=\"nb\">float<\/span><span class=\"p\">],<\/span> \n\t<span class=\"n\">grads<\/span><span class=\"p\">:<\/span> <span class=\"nb\">list<\/span><span class=\"p\">[<\/span><span class=\"nb\">float<\/span><span class=\"p\">],<\/span> \n\t<span class=\"n\">ops<\/span><span class=\"p\">:<\/span> <span class=\"nb\">list<\/span><span class=\"p\">[<\/span><span class=\"nb\">tuple<\/span><span class=\"p\">[<\/span><span class=\"nb\">str<\/span><span class=\"p\">,<\/span> <span class=\"nb\">list<\/span><span class=\"p\">[<\/span><span class=\"nb\">int<\/span><span class=\"p\">],<\/span> <span class=\"nb\">list<\/span><span class=\"p\">[<\/span><span class=\"nb\">int<\/span><span class=\"p\">]]],<\/span>\n\t<span class=\"n\">grad_var_index<\/span><span class=\"p\">:<\/span> <span class=\"nb\">int<\/span>\n<span class=\"p\">):<\/span>\n    <span class=\"n\">v<\/span><span class=\"p\">:<\/span> <span class=\"nb\">list<\/span><span class=\"p\">[<\/span><span class=\"nb\">float<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vals<\/span>\n    <span class=\"n\">g<\/span><span class=\"p\">:<\/span> <span class=\"nb\">list<\/span><span class=\"p\">[<\/span><span class=\"nb\">float<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">grads<\/span>\n    <span class=\"c1\"># forward pass\n<\/span>    <span class=\"k\">for<\/span> <span class=\"n\">op<\/span><span class=\"p\">,<\/span> <span class=\"n\">ins<\/span><span class=\"p\">,<\/span> <span class=\"n\">outs<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">ops<\/span><span class=\"p\">:<\/span>\n        <span class=\"n\">match<\/span> <span class=\"n\">op<\/span><span class=\"p\">:<\/span>\n            <span class=\"n\">case<\/span> <span class=\"s\">'sum'<\/span><span class=\"p\">:<\/span>\n                <span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"k\">for<\/span> <span class=\"n\">i<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">ins<\/span><span class=\"p\">)<\/span>\n            <span class=\"n\">case<\/span> <span class=\"s\">'prod'<\/span><span class=\"p\">:<\/span>\n                <span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]]<\/span>\n            <span class=\"n\">case<\/span> <span class=\"s\">'splus'<\/span><span class=\"p\">:<\/span>\n                <span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">log1p<\/span><span class=\"p\">(<\/span><span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span> <span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"p\">))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"s\">'softmax'<\/span><span class=\"p\">:<\/span>\n                <span class=\"n\">maximal<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">max<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"k\">for<\/span> <span class=\"n\">i<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">ins<\/span><span class=\"p\">)<\/span>\n                <span class=\"n\">exps<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">-<\/span> <span class=\"n\">maximal<\/span><span class=\"p\">)<\/span> <span class=\"k\">for<\/span> <span class=\"n\">i<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">ins<\/span><span class=\"p\">]<\/span>\n                <span class=\"n\">denom<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"n\">outs<\/span><span class=\"p\">)<\/span>\n                <span class=\"k\">for<\/span> <span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">exp<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">zip<\/span><span class=\"p\">(<\/span><span class=\"n\">outs<\/span><span class=\"p\">,<\/span> <span class=\"n\">exps<\/span><span class=\"p\">):<\/span>\n                    <span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">exp<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">denom<\/span>\n\n    <span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">grad_var_index<\/span><span class=\"p\">]<\/span> <span class=\"o\">+=<\/span> <span class=\"mi\">1<\/span>\n\n\t<span class=\"c1\"># backward pass\n<\/span>    <span class=\"k\">for<\/span> <span class=\"n\">op<\/span><span class=\"p\">,<\/span> <span class=\"n\">ins<\/span><span class=\"p\">,<\/span> <span class=\"n\">outs<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">ops<\/span><span class=\"p\">[::<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">]:<\/span>\n        <span class=\"n\">match<\/span> <span class=\"n\">op<\/span><span class=\"p\">:<\/span>\n            <span class=\"n\">case<\/span> <span class=\"s\">'sum'<\/span><span class=\"p\">:<\/span>\n                <span class=\"k\">for<\/span> <span class=\"n\">i<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">ins<\/span><span class=\"p\">:<\/span>\n                    <span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span>\n            <span class=\"n\">case<\/span> <span class=\"s\">'prod'<\/span><span class=\"p\">:<\/span>\n                <span class=\"n\">out<\/span><span class=\"p\">:<\/span> <span class=\"nb\">int<\/span> <span class=\"o\">=<\/span> <span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span>\n                <span class=\"n\">in1<\/span><span class=\"p\">,<\/span> <span class=\"n\">in2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ins<\/span>\n                <span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">in1<\/span><span class=\"p\">]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">in2<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">out<\/span><span class=\"p\">]<\/span>\n                <span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">in2<\/span><span class=\"p\">]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">in1<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">out<\/span><span class=\"p\">]<\/span>\n            <span class=\"n\">case<\/span> <span class=\"s\">'splus'<\/span><span class=\"p\">:<\/span>\n                <span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">\/<\/span> <span class=\"p\">(<\/span><span class=\"mi\">1<\/span> <span class=\"o\">+<\/span> <span class=\"n\">math<\/span><span class=\"p\">.<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"s\">'softmax'<\/span><span class=\"p\">:<\/span>\n\t\t\t\t<span class=\"n\">avg_grad<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">j<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">j<\/span><span class=\"p\">]<\/span> <span class=\"k\">for<\/span> <span class=\"n\">j<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">outs<\/span><span class=\"p\">)<\/span>\n\t\t\t\t<span class=\"k\">for<\/span> <span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">j<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">zip<\/span><span class=\"p\">(<\/span><span class=\"n\">ins<\/span><span class=\"p\">,<\/span> <span class=\"n\">outs<\/span><span class=\"p\">):<\/span>\n\t\t\t\t\t<span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">v<\/span><span class=\"p\">[<\/span><span class=\"n\">j<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"p\">(<\/span><span class=\"n\">g<\/span><span class=\"p\">[<\/span><span class=\"n\">j<\/span><span class=\"p\">]<\/span> <span class=\"o\">-<\/span> <span class=\"n\">avg_grad<\/span><span class=\"p\">)<\/span>\n<\/code><\/pre><\/div>  <\/div>\n  <p>and corresponding launching code<\/p>\n  <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">def<\/span> <span class=\"nf\">run_graph_python_and_backward<\/span><span class=\"p\">(<\/span><span class=\"n\">n_operations<\/span><span class=\"p\">,<\/span> <span class=\"n\">n_iterations<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">tape<\/span> <span class=\"o\">=<\/span> <span class=\"n\">TapeInd<\/span><span class=\"p\">()<\/span>\n    <span class=\"n\">nodes<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"n\">make_var<\/span><span class=\"p\">(<\/span><span class=\"nb\">float<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">))<\/span> <span class=\"k\">for<\/span> <span class=\"n\">x<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"mi\">100<\/span><span class=\"p\">)]<\/span>\n    \n    <span class=\"k\">for<\/span> <span class=\"n\">op<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">n_operations<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">match<\/span> <span class=\"n\">op<\/span> <span class=\"o\">%<\/span> <span class=\"mi\">4<\/span><span class=\"p\">:<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">0<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># softplus\n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"n\">softplus<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">1<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># sum\n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"nb\">sum<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">30<\/span><span class=\"p\">:<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">:<\/span><span class=\"mi\">5<\/span><span class=\"p\">]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">2<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># prod \n<\/span>                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">(<\/span><span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"n\">prod<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">20<\/span><span class=\"p\">],<\/span> <span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">10<\/span><span class=\"p\">]))<\/span>\n            <span class=\"n\">case<\/span> <span class=\"mi\">3<\/span><span class=\"p\">:<\/span> \n                <span class=\"c1\"># softmax\n<\/span>                <span class=\"n\">softmaxes<\/span> <span class=\"o\">=<\/span> <span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"n\">softmax<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">4<\/span><span class=\"p\">:])<\/span>\n                <span class=\"n\">nodes<\/span><span class=\"p\">.<\/span><span class=\"n\">extend<\/span><span class=\"p\">(<\/span><span class=\"n\">softmaxes<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"k\">for<\/span> <span class=\"n\">_<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">n_iterations<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">tape<\/span><span class=\"p\">.<\/span><span class=\"n\">forward_backward<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">])<\/span>\n<\/code><\/pre><\/div>  <\/div>\n<\/details>\n\n<p>Run-time for 10k ops x 10k iterations: <strong>94 seconds<\/strong><\/p>\n\n<p>As we see, moving all values into tape and switching to operating on indices is quite an efficient strategy. \nWe still use python, but are now ~5-10 fold faster than <code class=\"language-plaintext highlighter-rouge\">pytorch<\/code> or <code class=\"language-plaintext highlighter-rouge\">jax<\/code>.<\/p>\n\n<p>At this point, I want to mention one more experiment: code above is organized to be <code class=\"language-plaintext highlighter-rouge\">numba<\/code>-friendly. \n<a href=\"https:\/\/numba.readthedocs.io\/en\/stable\/\">Numba<\/a> is famous for speeding up number crunching in python with minimal changes by providing just-in-time compilation. \nRecent addition of <code class=\"language-plaintext highlighter-rouge\">numba.typed.List<\/code>  makes it possible to efficiently handle list of lists.<\/p>\n\n<p>Run-time with numba, 10k ops x 10k iterations: <strong>41 second<\/strong>. <br \/>\nAt this point we\u2019re &gt;10-fold faster than jax\/pytorch (and still writing code in python).<\/p>\n\n<h3 id=\"lets-autograd-in-rust\">Let\u2019s autograd in rust<\/h3>\n\n<p>Once we moved graph tracking to tape, we can now use something fast to run computations for us. For instance, rust. \nFor rust\u2194python interop I\u2019ve used a small wrapper around <a href=\"https:\/\/github.com\/mityax\/rustimport\">rustimport<\/a>.\n<code class=\"language-plaintext highlighter-rouge\">Rustimport<\/code> allows to conveniently \u201cimport\u201d a single rust file without creating a full-fledged rust project.<\/p>\n\n<p>Some optimization remarks:<\/p>\n<ul>\n  <li><code class=\"language-plaintext highlighter-rouge\">softmax<\/code> was a bottleneck, so I switched to creating temporary arrays on stack instead of Vecs, which required specializing on input sizes<\/li>\n  <li>I followed rust-y approach with iterators to reduce number of boundary checks<\/li>\n  <li>I wondered if match with multiple options checked one-by-one is slow. In synthetic tests it seemed to be relatively fast, but I wish jump table optimization was implemented here\n(e.g. it is supported for <a href=\"https:\/\/users.rust-lang.org\/t\/match-statement-efficiency\/4488\">enums<\/a> in rust, \nand clang <a href=\"https:\/\/stackoverflow.com\/questions\/60109992\/why-is-a-switch-not-optimized-the-same-way-as-chained-if-else-in-c-c\">uses<\/a> this optimization in C for switch-case)<\/li>\n<\/ul>\n\n<details>\n  <summary class=\"code-summary\">show rust code for minimal autograd\n<\/summary>\n  <div class=\"language-rust highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ rustimport:pyo3<\/span>\n<span class=\"k\">use<\/span> <span class=\"nn\">pyo3<\/span><span class=\"p\">::<\/span><span class=\"nn\">prelude<\/span><span class=\"p\">::<\/span><span class=\"o\">*<\/span><span class=\"p\">;<\/span>\n\n\n<span class=\"c1\">\/\/ slower softmax version for larger number of inputs<\/span>\n<span class=\"k\">fn<\/span> <span class=\"nf\">softmax_varlength<\/span><span class=\"p\">(<\/span><span class=\"n\">vals<\/span><span class=\"p\">:<\/span> <span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">f32<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ins<\/span><span class=\"p\">:<\/span> <span class=\"o\">&amp;<\/span><span class=\"p\">[<\/span><span class=\"nb\">usize<\/span><span class=\"p\">],<\/span> <span class=\"n\">outs<\/span><span class=\"p\">:<\/span> <span class=\"o\">&amp;<\/span><span class=\"p\">[<\/span><span class=\"nb\">usize<\/span><span class=\"p\">])<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">let<\/span> <span class=\"k\">mut<\/span> <span class=\"n\">max<\/span> <span class=\"o\">=<\/span> <span class=\"o\">-<\/span><span class=\"mf\">1e20_f32<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">let<\/span> <span class=\"n\">loc_vals<\/span><span class=\"p\">:<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">f32<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ins<\/span><span class=\"nf\">.into_iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.map<\/span><span class=\"p\">(|<\/span><span class=\"n\">i<\/span><span class=\"p\">|<\/span> <span class=\"p\">{<\/span> <span class=\"k\">let<\/span> <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">i<\/span><span class=\"p\">];<\/span> <span class=\"n\">max<\/span> <span class=\"o\">=<\/span> <span class=\"n\">max<\/span><span class=\"nf\">.max<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">);<\/span> <span class=\"n\">x<\/span><span class=\"p\">}<\/span> <span class=\"p\">)<\/span><span class=\"nf\">.collect<\/span><span class=\"p\">();<\/span>\n    <span class=\"k\">let<\/span> <span class=\"k\">mut<\/span> <span class=\"n\">sum<\/span><span class=\"p\">:<\/span> <span class=\"nb\">f32<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.0_f32<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">let<\/span> <span class=\"n\">exps<\/span><span class=\"p\">:<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">f32<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"n\">loc_vals<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.map<\/span><span class=\"p\">(|<\/span><span class=\"n\">v<\/span><span class=\"p\">|<\/span> <span class=\"p\">{<\/span><span class=\"k\">let<\/span> <span class=\"n\">_exp<\/span> <span class=\"o\">=<\/span> <span class=\"nn\">f32<\/span><span class=\"p\">::<\/span><span class=\"nf\">exp<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">v<\/span> <span class=\"o\">-<\/span> <span class=\"n\">max<\/span><span class=\"p\">);<\/span> <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">_exp<\/span><span class=\"p\">;<\/span> <span class=\"n\">_exp<\/span><span class=\"p\">})<\/span><span class=\"nf\">.collect<\/span><span class=\"p\">();<\/span>\n    <span class=\"n\">outs<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.zip<\/span><span class=\"p\">(<\/span><span class=\"n\">exps<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">())<\/span><span class=\"nf\">.for_each<\/span><span class=\"p\">(|(<\/span><span class=\"n\">j<\/span><span class=\"p\">,<\/span> <span class=\"n\">exp<\/span><span class=\"p\">)|<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">j<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">exp<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">sum<\/span> <span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n\n\n<span class=\"c1\">\/\/ vecs are slow! so allocate slices on stack, and explicit grouping of computations also helps<\/span>\n<span class=\"k\">fn<\/span> <span class=\"n\">softmax<\/span><span class=\"o\">&lt;<\/span><span class=\"k\">const<\/span> <span class=\"n\">N<\/span><span class=\"p\">:<\/span> <span class=\"nb\">usize<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">vals<\/span><span class=\"p\">:<\/span> <span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">f32<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ins<\/span><span class=\"p\">:<\/span> <span class=\"o\">&amp;<\/span><span class=\"p\">[<\/span><span class=\"nb\">usize<\/span><span class=\"p\">],<\/span> <span class=\"n\">outs<\/span><span class=\"p\">:<\/span> <span class=\"o\">&amp;<\/span><span class=\"p\">[<\/span><span class=\"nb\">usize<\/span><span class=\"p\">])<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">let<\/span> <span class=\"k\">mut<\/span> <span class=\"n\">loc_vals<\/span><span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"nb\">f32<\/span><span class=\"p\">;<\/span> <span class=\"n\">N<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"mf\">0_f32<\/span><span class=\"p\">;<\/span> <span class=\"n\">N<\/span><span class=\"p\">];<\/span>\n    <span class=\"k\">let<\/span> <span class=\"k\">mut<\/span> <span class=\"n\">exps<\/span><span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"nb\">f32<\/span><span class=\"p\">;<\/span> <span class=\"n\">N<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"mf\">0_f32<\/span><span class=\"p\">;<\/span> <span class=\"n\">N<\/span><span class=\"p\">];<\/span>\n    <span class=\"k\">let<\/span> <span class=\"k\">mut<\/span> <span class=\"n\">max<\/span> <span class=\"o\">=<\/span> <span class=\"o\">-<\/span><span class=\"mf\">1e20_f32<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">let<\/span> <span class=\"k\">mut<\/span> <span class=\"n\">sum<\/span><span class=\"p\">:<\/span> <span class=\"nb\">f32<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">i<\/span><span class=\"p\">)<\/span> <span class=\"k\">in<\/span> <span class=\"n\">ins<\/span><span class=\"nf\">.into_iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.enumerate<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span>\n        <span class=\"k\">let<\/span> <span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">i<\/span><span class=\"p\">];<\/span>\n        <span class=\"n\">loc_vals<\/span><span class=\"p\">[<\/span><span class=\"n\">n<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n        <span class=\"n\">max<\/span> <span class=\"o\">=<\/span> <span class=\"n\">max<\/span><span class=\"nf\">.max<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">}<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">_i<\/span><span class=\"p\">)<\/span> <span class=\"k\">in<\/span> <span class=\"n\">ins<\/span><span class=\"nf\">.into_iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.enumerate<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span>\n        <span class=\"k\">let<\/span> <span class=\"n\">exp<\/span> <span class=\"o\">=<\/span> <span class=\"nn\">f32<\/span><span class=\"p\">::<\/span><span class=\"nf\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">loc_vals<\/span><span class=\"p\">[<\/span><span class=\"n\">n<\/span><span class=\"p\">]<\/span> <span class=\"o\">-<\/span> <span class=\"n\">max<\/span><span class=\"p\">);<\/span>\n        <span class=\"n\">exps<\/span><span class=\"p\">[<\/span><span class=\"n\">n<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">exp<\/span><span class=\"p\">;<\/span>\n        <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">exp<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n    <span class=\"k\">let<\/span> <span class=\"n\">invsum<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">1.0_f32<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">sum<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">j<\/span><span class=\"p\">)<\/span> <span class=\"k\">in<\/span> <span class=\"n\">outs<\/span><span class=\"nf\">.into_iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.enumerate<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span>\n        <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">j<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">exps<\/span><span class=\"p\">[<\/span><span class=\"n\">n<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">invsum<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"k\">fn<\/span> <span class=\"nf\">sigmoid<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">:<\/span> <span class=\"nb\">f32<\/span><span class=\"p\">)<\/span> <span class=\"k\">-&gt;<\/span> <span class=\"nb\">f32<\/span> <span class=\"p\">{<\/span>\n    <span class=\"mf\">1.0<\/span> <span class=\"o\">\/<\/span> <span class=\"p\">(<\/span><span class=\"mf\">1.0<\/span> <span class=\"o\">+<\/span> <span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span><span class=\"nf\">.exp<\/span><span class=\"p\">())<\/span>\n<span class=\"p\">}<\/span>\n\n\n<span class=\"nd\">#[pyfunction]<\/span>\n<span class=\"k\">unsafe<\/span> <span class=\"k\">fn<\/span> <span class=\"nf\">autograd<\/span><span class=\"p\">(<\/span>\n    <span class=\"n\">vals_input<\/span><span class=\"p\">:<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">f32<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">ops<\/span><span class=\"p\">:<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">i32<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">input_ids<\/span><span class=\"p\">:<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">usize<\/span><span class=\"o\">&gt;&gt;<\/span><span class=\"p\">,<\/span> \n    <span class=\"n\">output_ids<\/span><span class=\"p\">:<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">usize<\/span><span class=\"o\">&gt;&gt;<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">backward_node_id<\/span><span class=\"p\">:<\/span> <span class=\"nb\">usize<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">n_iteration<\/span><span class=\"p\">:<\/span> <span class=\"nb\">i32<\/span><span class=\"p\">,<\/span>\n<span class=\"p\">)<\/span> <span class=\"k\">-&gt;<\/span> <span class=\"p\">(<\/span><span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">f32<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">f32<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">let<\/span> <span class=\"k\">mut<\/span> <span class=\"n\">vals<\/span><span class=\"p\">:<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">f32<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vals_input<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.map<\/span><span class=\"p\">(|<\/span><span class=\"n\">x<\/span><span class=\"p\">|<\/span> <span class=\"o\">*<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span><span class=\"nf\">.collect<\/span><span class=\"p\">();<\/span>\n    <span class=\"k\">let<\/span> <span class=\"k\">mut<\/span> <span class=\"n\">grad<\/span><span class=\"p\">:<\/span> <span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">f32<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vals_input<\/span><span class=\"nf\">.into_iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.map<\/span><span class=\"p\">(|<\/span><span class=\"n\">_<\/span><span class=\"p\">|<\/span> <span class=\"mf\">0.0_f32<\/span><span class=\"p\">)<\/span><span class=\"nf\">.collect<\/span><span class=\"p\">();<\/span>\n\n    <span class=\"k\">for<\/span> <span class=\"n\">_<\/span> <span class=\"k\">in<\/span> <span class=\"mi\">0<\/span><span class=\"o\">..<\/span><span class=\"n\">n_iteration<\/span> <span class=\"p\">{<\/span>\n        <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"n\">i_op<\/span><span class=\"p\">,<\/span> <span class=\"n\">op<\/span><span class=\"p\">)<\/span> <span class=\"k\">in<\/span> <span class=\"n\">ops<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.enumerate<\/span><span class=\"p\">(){<\/span>\n            <span class=\"k\">let<\/span> <span class=\"n\">ins<\/span><span class=\"p\">:<\/span> <span class=\"o\">&amp;<\/span><span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">usize<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">input_ids<\/span><span class=\"p\">[<\/span><span class=\"n\">i_op<\/span><span class=\"p\">];<\/span>\n            <span class=\"k\">let<\/span> <span class=\"n\">outs<\/span><span class=\"p\">:<\/span> <span class=\"o\">&amp;<\/span><span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">usize<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">output_ids<\/span><span class=\"p\">[<\/span><span class=\"n\">i_op<\/span><span class=\"p\">];<\/span>\n            \n            <span class=\"k\">match<\/span> <span class=\"n\">op<\/span> <span class=\"p\">{<\/span>\n                <span class=\"mi\">0<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span>\n                    <span class=\"c1\">\/\/ softplus<\/span>\n                    <span class=\"k\">let<\/span> <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]];<\/span>\n                    <span class=\"k\">let<\/span> <span class=\"n\">max<\/span> <span class=\"o\">=<\/span> <span class=\"nn\">f32<\/span><span class=\"p\">::<\/span><span class=\"nf\">max<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"p\">);<\/span>\n                    <span class=\"k\">let<\/span> <span class=\"n\">min<\/span> <span class=\"o\">=<\/span> <span class=\"nn\">f32<\/span><span class=\"p\">::<\/span><span class=\"nf\">min<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"p\">);<\/span>\n                    <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">max<\/span> <span class=\"o\">+<\/span> <span class=\"nn\">f32<\/span><span class=\"p\">::<\/span><span class=\"nf\">ln_1p<\/span><span class=\"p\">(<\/span><span class=\"nn\">f32<\/span><span class=\"p\">::<\/span><span class=\"nf\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">min<\/span> <span class=\"o\">-<\/span> <span class=\"n\">max<\/span><span class=\"p\">));<\/span>\n                <span class=\"p\">}<\/span>\n                <span class=\"mi\">1<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span>\n                    <span class=\"c1\">\/\/ sum<\/span>\n                    <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ins<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.map<\/span><span class=\"p\">(|<\/span><span class=\"n\">i<\/span><span class=\"p\">|<\/span> <span class=\"n\">vals<\/span><span class=\"nf\">.get_unchecked<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">i<\/span><span class=\"p\">))<\/span><span class=\"nf\">.sum<\/span><span class=\"p\">();<\/span>\n                <span class=\"p\">}<\/span>\n                <span class=\"mi\">2<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span>\n                    <span class=\"c1\">\/\/ prod<\/span>\n                    <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]];<\/span>\n                <span class=\"p\">}<\/span>\n                <span class=\"mi\">3<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span>\n                    <span class=\"c1\">\/\/ softmax. we will need switch-case resolution here for most common cases<\/span>\n                    <span class=\"k\">match<\/span> <span class=\"n\">ins<\/span><span class=\"nf\">.len<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span>\n                        <span class=\"mi\">1<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span><span class=\"nn\">softmax<\/span><span class=\"p\">::<\/span><span class=\"o\">&lt;<\/span><span class=\"mi\">1<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span> <span class=\"n\">vals<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">ins<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">outs<\/span><span class=\"p\">)}<\/span>\n                        <span class=\"mi\">2<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span><span class=\"nn\">softmax<\/span><span class=\"p\">::<\/span><span class=\"o\">&lt;<\/span><span class=\"mi\">2<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span> <span class=\"n\">vals<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">ins<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">outs<\/span><span class=\"p\">)}<\/span>\n                        <span class=\"mi\">3<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span><span class=\"nn\">softmax<\/span><span class=\"p\">::<\/span><span class=\"o\">&lt;<\/span><span class=\"mi\">3<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span> <span class=\"n\">vals<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">ins<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">outs<\/span><span class=\"p\">)}<\/span>\n                        <span class=\"mi\">4<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span><span class=\"nn\">softmax<\/span><span class=\"p\">::<\/span><span class=\"o\">&lt;<\/span><span class=\"mi\">4<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span> <span class=\"n\">vals<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">ins<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">outs<\/span><span class=\"p\">)}<\/span>\n                        <span class=\"mi\">5<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span><span class=\"nn\">softmax<\/span><span class=\"p\">::<\/span><span class=\"o\">&lt;<\/span><span class=\"mi\">5<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span> <span class=\"n\">vals<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">ins<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">outs<\/span><span class=\"p\">)}<\/span>\n                        <span class=\"n\">_<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span><span class=\"nf\">softmax_varlength<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span> <span class=\"n\">vals<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">ins<\/span><span class=\"p\">,<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">outs<\/span><span class=\"p\">)}<\/span>\n                    <span class=\"p\">}<\/span>\n                <span class=\"p\">}<\/span>\n                <span class=\"n\">_<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span> <span class=\"nd\">panic!<\/span><span class=\"p\">(<\/span><span class=\"s\">\"\"<\/span><span class=\"p\">);<\/span> <span class=\"p\">}<\/span>\n           <span class=\"p\">}<\/span>\n        <span class=\"p\">}<\/span>\n        <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"n\">backward_node_id<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">1.<\/span><span class=\"p\">;<\/span>\n        \n        <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"n\">i_op<\/span><span class=\"p\">,<\/span> <span class=\"n\">op<\/span><span class=\"p\">)<\/span> <span class=\"k\">in<\/span> <span class=\"n\">ops<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.enumerate<\/span><span class=\"p\">(){<\/span>\n            <span class=\"k\">let<\/span> <span class=\"n\">ins<\/span><span class=\"p\">:<\/span> <span class=\"o\">&amp;<\/span><span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">usize<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">input_ids<\/span><span class=\"p\">[<\/span><span class=\"n\">i_op<\/span><span class=\"p\">];<\/span>\n            <span class=\"k\">let<\/span> <span class=\"n\">outs<\/span><span class=\"p\">:<\/span> <span class=\"o\">&amp;<\/span><span class=\"nb\">Vec<\/span><span class=\"o\">&lt;<\/span><span class=\"nb\">usize<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">output_ids<\/span><span class=\"p\">[<\/span><span class=\"n\">i_op<\/span><span class=\"p\">];<\/span>\n            \n            <span class=\"k\">match<\/span> <span class=\"n\">op<\/span> <span class=\"p\">{<\/span>\n                <span class=\"mi\">0<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span>\n                    <span class=\"c1\">\/\/ softplus<\/span>\n                    <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">*<\/span> <span class=\"nf\">sigmoid<\/span><span class=\"p\">(<\/span><span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]);<\/span>\n                <span class=\"p\">}<\/span>\n                <span class=\"mi\">1<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span>\n                    <span class=\"c1\">\/\/ sum<\/span>\n                    <span class=\"n\">ins<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.for_each<\/span><span class=\"p\">(|<\/span><span class=\"n\">i<\/span><span class=\"p\">|<\/span> <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]);<\/span>\n                <span class=\"p\">}<\/span>\n                <span class=\"mi\">2<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span>\n                    <span class=\"c1\">\/\/ prod<\/span>\n                    <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]];<\/span>\n                    <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"n\">outs<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]];<\/span>\n                <span class=\"p\">}<\/span>\n                <span class=\"mi\">3<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span>\n\t                <span class=\"c1\">\/\/ softmax<\/span>\n                    <span class=\"k\">let<\/span> <span class=\"n\">avg_grad<\/span><span class=\"p\">:<\/span> <span class=\"nb\">f32<\/span> <span class=\"o\">=<\/span> <span class=\"n\">outs<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.map<\/span><span class=\"p\">(|<\/span><span class=\"n\">j<\/span><span class=\"p\">|<\/span> <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">j<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">j<\/span><span class=\"p\">]<\/span> <span class=\"p\">)<\/span><span class=\"nf\">.sum<\/span><span class=\"p\">();<\/span>\n                    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">j<\/span><span class=\"p\">)<\/span> <span class=\"k\">in<\/span> <span class=\"n\">ins<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">()<\/span><span class=\"nf\">.zip<\/span><span class=\"p\">(<\/span><span class=\"n\">outs<\/span><span class=\"nf\">.iter<\/span><span class=\"p\">())<\/span> <span class=\"p\">{<\/span>\n                        <span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">j<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"p\">(<\/span><span class=\"n\">grad<\/span><span class=\"p\">[<\/span><span class=\"o\">*<\/span><span class=\"n\">j<\/span><span class=\"p\">]<\/span> <span class=\"o\">-<\/span> <span class=\"n\">avg_grad<\/span><span class=\"p\">);<\/span>\n                    <span class=\"p\">}<\/span>\n                <span class=\"p\">}<\/span>\n                <span class=\"n\">_<\/span> <span class=\"k\">=&gt;<\/span> <span class=\"p\">{<\/span> <span class=\"nd\">panic!<\/span><span class=\"p\">(<\/span><span class=\"s\">\"\"<\/span><span class=\"p\">);<\/span> <span class=\"p\">}<\/span>\n           <span class=\"p\">}<\/span>\n        <span class=\"p\">}<\/span>        \n    <span class=\"p\">}<\/span>\n    <span class=\"p\">(<\/span><span class=\"n\">vals<\/span><span class=\"p\">,<\/span> <span class=\"n\">grad<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div>  <\/div>\n<\/details>\n\n<p>Run-time for 10k ops x 10k iterations: <strong>1.4 seconds<\/strong><\/p>\n\n<p>Success: we are in the realm of interactive experiences. <br \/>\nRecall we started from &gt;1000 seconds. But should we stop here?<\/p>\n\n<h3 id=\"lets-autograd-in-c\">Let\u2019s autograd in C<\/h3>\n\n<p>Time to implement autograd logic in C. \nFor interop with python I use <a href=\"https:\/\/cffi.readthedocs.io\/en\/stable\/index.html\">python-cffi<\/a>.<\/p>\n\n<p>I went bananas on optimization:<\/p>\n<ul>\n  <li>I used the fact that output nodes are placed consequentially in memory, so we pass only index of the first output<\/li>\n  <li>number of inputs is limited to 8, and those are baked into struct as <code class=\"language-plaintext highlighter-rouge\">int[8]<\/code>, not <code class=\"language-plaintext highlighter-rouge\">int *<\/code>  to avoid jumps in memory<\/li>\n  <li>dynamic stack allocations of variable size (compared to rust, those are straightforward in C)<\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">-O3<\/code>, and unsafe math: <code class=\"language-plaintext highlighter-rouge\">-ffast-math<\/code>. Even experimented memory alignment and restrict-ing pointers, but no luck<\/li>\n<\/ul>\n\n<details>\n  <summary class=\"code-summary\">show me some code in C\n<\/summary>\n  <div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;math.h&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"k\">typedef<\/span> <span class=\"k\">struct<\/span> <span class=\"p\">{<\/span> \n    <span class=\"kt\">int<\/span> <span class=\"n\">opcode<\/span><span class=\"p\">;<\/span>\n    <span class=\"kt\">size_t<\/span> <span class=\"n\">n_arguments<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ used for softmax and sum<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">8<\/span><span class=\"p\">];<\/span>         <span class=\"c1\">\/\/ at most 8 inputs<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">out<\/span><span class=\"p\">;<\/span>            <span class=\"c1\">\/\/ points to the first output variable<\/span>\n<span class=\"p\">}<\/span> <span class=\"n\">MyOperation<\/span><span class=\"p\">;<\/span>\n\n\n<span class=\"n\">MyOperation<\/span> <span class=\"o\">*<\/span> <span class=\"n\">allocate_memory<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n_elements<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"p\">(<\/span><span class=\"n\">MyOperation<\/span> <span class=\"o\">*<\/span><span class=\"p\">)<\/span> <span class=\"n\">malloc<\/span><span class=\"p\">(<\/span><span class=\"k\">sizeof<\/span><span class=\"p\">(<\/span><span class=\"n\">MyOperation<\/span><span class=\"p\">)<\/span> <span class=\"o\">*<\/span> <span class=\"n\">n_elements<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ stable implementation<\/span>\n<span class=\"kt\">double<\/span> <span class=\"n\">logaddexp<\/span><span class=\"p\">(<\/span><span class=\"kt\">double<\/span> <span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"kt\">double<\/span> <span class=\"n\">y<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">x<\/span> <span class=\"o\">&gt;<\/span> <span class=\"n\">y<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">x<\/span> <span class=\"o\">+<\/span> <span class=\"n\">log1p<\/span><span class=\"p\">(<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">y<\/span> <span class=\"o\">-<\/span> <span class=\"n\">x<\/span><span class=\"p\">));<\/span> <span class=\"p\">}<\/span>\n    <span class=\"k\">else<\/span>       <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">y<\/span> <span class=\"o\">+<\/span> <span class=\"n\">log1p<\/span><span class=\"p\">(<\/span><span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span> <span class=\"o\">-<\/span> <span class=\"n\">y<\/span><span class=\"p\">));<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">double<\/span> <span class=\"n\">sigmoid<\/span><span class=\"p\">(<\/span><span class=\"kt\">double<\/span> <span class=\"n\">x<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"mf\">1.0<\/span> <span class=\"o\">\/<\/span> <span class=\"p\">(<\/span><span class=\"mf\">1.0<\/span> <span class=\"o\">+<\/span> <span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"n\">x<\/span><span class=\"p\">));<\/span> <span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">run_multiple_passes<\/span><span class=\"p\">(<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">n_operations<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">MyOperation<\/span> <span class=\"o\">*<\/span><span class=\"n\">ops<\/span><span class=\"p\">,<\/span>\n    <span class=\"kt\">double<\/span> <span class=\"o\">*<\/span><span class=\"n\">values<\/span><span class=\"p\">,<\/span>\n    <span class=\"kt\">double<\/span> <span class=\"o\">*<\/span><span class=\"n\">grads<\/span><span class=\"p\">,<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">n_iterations<\/span>\n<span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">for<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">iteration<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">iteration<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">n_iterations<\/span><span class=\"p\">;<\/span> <span class=\"n\">iteration<\/span><span class=\"o\">++<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n        <span class=\"k\">for<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">operation<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">operation<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">n_operations<\/span><span class=\"p\">;<\/span> <span class=\"n\">operation<\/span><span class=\"o\">++<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n            <span class=\"n\">MyOperation<\/span> <span class=\"n\">op<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ops<\/span><span class=\"p\">[<\/span><span class=\"n\">operation<\/span><span class=\"p\">];<\/span>\n            <span class=\"k\">switch<\/span><span class=\"p\">(<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">opcode<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n                <span class=\"k\">case<\/span> <span class=\"mi\">1<\/span><span class=\"p\">:<\/span> \n                    <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">logaddexp<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.<\/span><span class=\"p\">,<\/span> <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]);<\/span>\n                    <span class=\"k\">break<\/span><span class=\"p\">;<\/span>\n                <span class=\"k\">case<\/span> <span class=\"mi\">2<\/span><span class=\"p\">:<\/span> \n                    <span class=\"p\">{<\/span>\n                        <span class=\"kt\">double<\/span> <span class=\"n\">out<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.<\/span><span class=\"p\">;<\/span>\n                        <span class=\"k\">for<\/span><span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">i<\/span><span class=\"o\">=<\/span><span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">n_arguments<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span><span class=\"o\">++<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n                            <span class=\"n\">out<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]];<\/span>\n                        <span class=\"p\">}<\/span>\n                        <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">out<\/span><span class=\"p\">;<\/span>\n                    <span class=\"p\">}<\/span>\n                    <span class=\"k\">break<\/span><span class=\"p\">;<\/span>\n                <span class=\"k\">case<\/span> <span class=\"mi\">3<\/span><span class=\"p\">:<\/span>\n                    <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]];<\/span>\n                    <span class=\"k\">break<\/span><span class=\"p\">;<\/span>\n                <span class=\"k\">case<\/span> <span class=\"mi\">4<\/span><span class=\"p\">:<\/span>\n                    <span class=\"p\">{<\/span>\n                        <span class=\"kt\">double<\/span> <span class=\"n\">maximal<\/span> <span class=\"o\">=<\/span> <span class=\"o\">-<\/span><span class=\"mf\">1e20<\/span><span class=\"p\">;<\/span>\n                        <span class=\"kt\">size_t<\/span> <span class=\"n\">n_arg<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span><span class=\"p\">)<\/span> <span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">n_arguments<\/span><span class=\"p\">;<\/span>\n                        <span class=\"k\">for<\/span><span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">n_arg<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span><span class=\"o\">++<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n                            <span class=\"n\">maximal<\/span> <span class=\"o\">=<\/span> <span class=\"n\">fmax<\/span><span class=\"p\">(<\/span><span class=\"n\">maximal<\/span><span class=\"p\">,<\/span> <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]]);<\/span>\n                        <span class=\"p\">}<\/span>\n                        <span class=\"kt\">double<\/span> <span class=\"n\">exps<\/span><span class=\"p\">[<\/span><span class=\"n\">n_arg<\/span><span class=\"p\">];<\/span>\n                        <span class=\"kt\">double<\/span> <span class=\"n\">sum<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n                        <span class=\"k\">for<\/span><span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">n_arg<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span><span class=\"o\">++<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n                            <span class=\"n\">exps<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">exp<\/span><span class=\"p\">(<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">-<\/span> <span class=\"n\">maximal<\/span><span class=\"p\">);<\/span>\n                            <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">exps<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">];<\/span>\n                        <span class=\"p\">}<\/span>\n                        <span class=\"k\">for<\/span><span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">n_arg<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span><span class=\"o\">++<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n                            <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span> <span class=\"o\">+<\/span> <span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">exps<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">sum<\/span><span class=\"p\">;<\/span>\n                        <span class=\"p\">}<\/span>\n                    <span class=\"p\">}<\/span>\n                    <span class=\"k\">break<\/span><span class=\"p\">;<\/span>\n            <span class=\"p\">}<\/span>\n        <span class=\"p\">}<\/span>  <span class=\"c1\">\/\/ end forward<\/span>\n\n        <span class=\"c1\">\/\/ TODO set grad for target variable.<\/span>\n\n        <span class=\"k\">for<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">operation<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">operation<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">n_operations<\/span><span class=\"p\">;<\/span> <span class=\"n\">operation<\/span><span class=\"o\">++<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n            <span class=\"n\">MyOperation<\/span> <span class=\"n\">op<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ops<\/span><span class=\"p\">[<\/span><span class=\"n\">n_operations<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span> <span class=\"o\">-<\/span> <span class=\"n\">operation<\/span><span class=\"p\">];<\/span>\n            <span class=\"k\">switch<\/span><span class=\"p\">(<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">opcode<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n                <span class=\"k\">case<\/span> <span class=\"mi\">1<\/span><span class=\"p\">:<\/span> \n                    <span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">sigmoid<\/span><span class=\"p\">(<\/span><span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]);<\/span>\n                    <span class=\"k\">break<\/span><span class=\"p\">;<\/span>\n                <span class=\"k\">case<\/span> <span class=\"mi\">2<\/span><span class=\"p\">:<\/span> \n                    <span class=\"p\">{<\/span>\n                        <span class=\"k\">for<\/span><span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">i<\/span><span class=\"o\">=<\/span><span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">n_arguments<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span><span class=\"o\">++<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span><span class=\"p\">];<\/span> <span class=\"p\">}<\/span>\n                    <span class=\"p\">}<\/span>\n                    <span class=\"k\">break<\/span><span class=\"p\">;<\/span>\n                <span class=\"k\">case<\/span> <span class=\"mi\">3<\/span><span class=\"p\">:<\/span>\n                    <span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]];<\/span>\n                    <span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]];<\/span>\n                    <span class=\"k\">break<\/span><span class=\"p\">;<\/span>\n                <span class=\"k\">case<\/span> <span class=\"mi\">4<\/span><span class=\"p\">:<\/span>\n                    <span class=\"p\">{<\/span>\n                        <span class=\"kt\">size_t<\/span> <span class=\"n\">n_arg<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span><span class=\"p\">)<\/span> <span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">n_arguments<\/span><span class=\"p\">;<\/span>\n                        <span class=\"kt\">double<\/span> <span class=\"n\">avg_grad<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.0<\/span><span class=\"p\">;<\/span>\n                        <span class=\"k\">for<\/span><span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">n_arg<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span><span class=\"o\">++<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n                            <span class=\"n\">avg_grad<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span> <span class=\"o\">+<\/span> <span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span> <span class=\"o\">+<\/span> <span class=\"n\">i<\/span><span class=\"p\">];<\/span>\n                        <span class=\"p\">}<\/span>\n                        <span class=\"k\">for<\/span><span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">n_arg<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span><span class=\"o\">++<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n                            <span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">ins<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span> <span class=\"o\">+<\/span> <span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"p\">(<\/span><span class=\"n\">grads<\/span><span class=\"p\">[<\/span><span class=\"n\">op<\/span><span class=\"p\">.<\/span><span class=\"n\">out<\/span> <span class=\"o\">+<\/span> <span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">-<\/span> <span class=\"n\">avg_grad<\/span><span class=\"p\">);<\/span>\n                        <span class=\"p\">}<\/span>\n                    <span class=\"p\">}<\/span>\n                    <span class=\"k\">break<\/span><span class=\"p\">;<\/span>\n            <span class=\"p\">}<\/span>\n        <span class=\"p\">}<\/span>  <span class=\"c1\">\/\/ end backward<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div>  <\/div>\n<\/details>\n\n<p>Run-time for 10k ops x 10k iterations: <strong>0.99 second<\/strong><\/p>\n\n<p>I liked ergonomics of rust better, but achieving high speed in C is way easier.\nRust\u2019s interop with python is also way more convenient.<\/p>\n\n<h3 id=\"lets-autograd-in-c-again\">Let\u2019s autograd in C (again)<\/h3>\n\n<p>Another approach I\u2019ve taken is to \u2018compile\u2019 traced graph to C.\nSo python produces a long C file where operations are called one-by-one with explicit indices, something like<\/p>\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">...<\/span>\n<span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"mi\">215<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"mi\">195<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"mi\">205<\/span><span class=\"p\">];<\/span>\n<span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"mi\">216<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"mi\">196<\/span><span class=\"p\">]<\/span> <span class=\"o\">+<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"mi\">201<\/span><span class=\"p\">]<\/span> <span class=\"o\">+<\/span> <span class=\"n\">vals<\/span><span class=\"p\">[<\/span><span class=\"mi\">204<\/span><span class=\"p\">];<\/span>\n<span class=\"p\">...<\/span> <span class=\"c1\">\/\/ etcetc, and then backward steps are also written the same way<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Source code is lengthy, outputs are enormous, and to speed up compilation we can set <code class=\"language-plaintext highlighter-rouge\">-O0<\/code> in clang. Using <code class=\"language-plaintext highlighter-rouge\">-O0<\/code> produces slower binaries, but interestingly <em>did not<\/em> speed up compilation.\nBest results I got are around 1 minute for compilation and 1 second for a full run. Surprisingly, eliminating switch\/case and memory lookups for arguments did not result in faster execution.<\/p>\n\n<p>Given that recompilation is needed any time the graph is changed, real time experienced by user is 1 minute. That\u2019s a no go.<\/p>\n\n<h3 id=\"assembly\">Assembly<\/h3>\n\n<p>In this endeavor to get maximal speed, I decided to go down to assembly. Otherwise it feels like an incomplete journey. \nWe can map a computational graph to just a set of low-level instruction, and avoid \u201ccostly\u201d compilation.\nThese days x86\/64 is not a king anymore, but neither armv7\/armv8 is \u2014 \nand writing assembly for several architectures is totally unreasonable.<\/p>\n\n<p>So \u2026 how about using webassembly? It is low-level, fast to compile, and still cross-platform. \nProjects like <code class=\"language-plaintext highlighter-rouge\">wasmer<\/code>\/<code class=\"language-plaintext highlighter-rouge\">wasmtime<\/code> allow interacting with wasm code from other languages.\nThat\u2019s my first encounter with WASM, and I\u2019ve got quite positive impression: WASM mixes lisp-style syntax (for efficient streaming parsing) and execution model of stack machine. \nUnlike canonical stack machines, and unlike canonical assembly, WASM allows grouping expressions, e.g.<\/p>\n\n<div class=\"language-lisp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">;; canonical stack-machine way to compute a * b + c<\/span>\n<span class=\"p\">(<\/span><span class=\"nv\">local.get<\/span> <span class=\"nv\">$a<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">(<\/span><span class=\"nv\">local.get<\/span> <span class=\"nv\">$b<\/span><span class=\"p\">)<\/span>\n<span class=\"nv\">f32.mul<\/span>\n<span class=\"p\">(<\/span><span class=\"nv\">local.get<\/span> <span class=\"nv\">$c<\/span><span class=\"p\">)<\/span>\n<span class=\"nv\">f32.add<\/span>\n\n<span class=\"c1\">;; another way to say write the same, also perfectly legal in wasm<\/span>\n<span class=\"p\">(<\/span><span class=\"nv\">f32.add<\/span> \n    <span class=\"p\">(<\/span><span class=\"nv\">f32.mul<\/span> <span class=\"p\">(<\/span><span class=\"nv\">local.get<\/span> <span class=\"nv\">$a<\/span><span class=\"p\">)<\/span> <span class=\"p\">(<\/span><span class=\"nv\">local.get<\/span> <span class=\"nv\">$b<\/span><span class=\"p\">))<\/span>  \n    <span class=\"p\">(<\/span><span class=\"nv\">local.get<\/span> <span class=\"nv\">$c<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">)<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This convenience allows writing significantly more readable code in WASM compared to ye-olde-assembly. \nLevel of abstraction looks just right to me \u2014 low-level instructions, but no need to manage register allocations.<\/p>\n\n<p>Webassembly is still very close to assembly in terms of instructions, i.e. there is no <code class=\"language-plaintext highlighter-rouge\">exp<\/code>, <code class=\"language-plaintext highlighter-rouge\">log<\/code>, let alone <code class=\"language-plaintext highlighter-rouge\">log1p<\/code>  and alike. \nFortunately, there is a WASM <a href=\"https:\/\/gist.github.com\/going-digital\/02e46c44d89237c07bc99cd440ebfa43\">implementation<\/a> of <code class=\"language-plaintext highlighter-rouge\">exp2<\/code>\/<code class=\"language-plaintext highlighter-rouge\">log2<\/code> by Peter Knight.<\/p>\n\n<p>My major question was if speed of exponentiation is going to be sufficient, as <code class=\"language-plaintext highlighter-rouge\">exp<\/code> consumes significant time in C implementation. \nAlas, in a simple benchmark computing just exponents in wasm takes ~1.9 seconds, leaving it behind rust\/C. \nFor reference, javascript computes the same number of exponents in 0.7 seconds.\nHence, I take WASM branding of \u2018near-native speed\u2019 with a grain of salt, at least in the context of number crunching. \nHopefully this will improve, but for now WASM is out of competition.<\/p>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>So, we achieved a <strong>1000X speed up<\/strong> compared to leading libraries.<\/p>\n\n<p>I don\u2019t find this surprising \u2014 major usecase for autograd system is manipulating large ndarrays. \nMemory management, copy elimination, device synchronization, parallelization of computations \u2014 these things are the main focus, \nand throughput of 1 million ops per second is totally reasonable for the vast majority of scenarios and users.<\/p>\n\n<p>Not for me though. My scenario is totally different in terms of numbers and setup, and tensor-focused autograds are too slow.\nFor the problem at hand departing from the common autograd systems was the right and the only possible choice.\nExploring different options was quite fun, and my expectations were challenged several times along this exploration.<\/p>\n\n<div style=\"text-align: center; font-size: 40px; padding: 110px\">\ud83d\udc4b<\/div>\n","pubDate":"Thu, 28 Dec 2023 12:00:00 +0000","link":"https:\/\/arogozhnikov.github.io\/2023\/12\/28\/fastest-autograd.html","guid":"https:\/\/arogozhnikov.github.io\/2023\/12\/28\/fastest-autograd.html","category":["autograd","optimization"]},{"title":"Optical pooled screens of cells (overview of emerging biotechnology)","description":"<p><em>This month brought two preprints describing optical pooled CRISPR screens.\nWhat\u2019s this new technology, what it can be used for, and why I\u2019ve been waiting for it?\nI\u2019ll make a small comparison of approaches and critically review the papers.<\/em><\/p>\n\n<p><em>Best of all \u2014 \nI am not affiliated with either team, and this is likely the most unbiased review you\u2019ll find<\/em> \ud83d\ude05<\/p>\n\n<h2 id=\"papers-discussed\">Papers discussed:<\/h2>\n\n<ul>\n  <li><strong>PERISCOPE<\/strong> <br \/> aka <em>Perturbation Effect Readout In situ with Single Cell Optical Phenotyping<\/em> \nfrom  <a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2023.08.06.552164v1.full\">A genome-wide atlas of human cell morphology<\/a>\n(Broad Institute)<\/li>\n  <li><strong>CP-POSH<\/strong> <br \/> aka <em>Cell Painting Pooled Optical Screening in Human cells<\/em> from  <a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2023.08.13.553051v2.full.pdf\">A Pooled Cell Painting CRISPR Screening Platform Enables de novo Inference of\nGene Function by Self-supervised Deep Learning<\/a>\n(Insitro Inc.)<\/li>\n<\/ul>\n\n<p>In the next parts I discuss some details from these preprints.<\/p>\n\n<h2 id=\"preface\">Preface<\/h2>\n\n<p>To drive experiments in biological systems you need two components:<\/p>\n<ol>\n  <li>\n    <p><strong>intervention:<\/strong> change something in cell (or organoid, or organism).\n<!--- Fine-grained interventions allow precise verification of hypotheses. ---><\/p>\n\n    <p>For a broad understanding of biological system you want to have detailed control of all of its parts. \nCRISPR solves this by individually acting on any selected gene. \nThis makes CRISPR-driven experiment more interpretable and ensures high coverage of biological processes.<\/p>\n  <\/li>\n  <li>\n    <p><strong>readout:<\/strong> detect change in some characteristic.\nBetter characterization of system would involve high-dimensional description. \nE.g. just measuring cell size, cell death and pH provides little insight into what\u2019s happening.<\/p>\n\n    <p>Several sequencing-based assays provide rich description, and many of them provide single-cell readouts.\n<a href=\"https:\/\/www.nature.com\/articles\/nprot.2016.105\">Cell painting<\/a> stands out: it is much cheaper, \nmicroscopy-based, and still captures a lot of biologically-relevant information.<\/p>\n  <\/li>\n<\/ol>\n\n<p>Effectiveness of the system for unbiased discovery, \n roughly, <em>is a product of these two dimensions<\/em>: \n how well you control the biology and how well you can describe results of intervention.<\/p>\n\n<p>Pooled CRISPR screens with scRNAseq\/scATAC stand out in both dimensions. <br \/> \n They combine 1. complete control via CRISPR with 2. very high-dimensional interpretable readout.\n Sounds awesome (and it is!), but we need to introduce one more factor to the equation:<\/p>\n\n<ol start=\"3\">\n  <li>\n    <p><strong>price per experiment.<\/strong> The more observations you have the merrier. \nWe already found there are a ton of things happening in our biology,\nand to find at least a majority of them in an unbiased manner, a number of attempts is required.<\/p>\n\n    <p>Pooled screens are very efficient in experiment material: every cell is turned into a tiny individual experiment.\nStill, with all multiplexing\/overloading tricks, a <em>cost-per-cell<\/em> in scRNAseq is comparable to <em>cost-per-well<\/em> in cell painting. \nQuite a difference!<\/p>\n  <\/li>\n<\/ol>\n\n<p>Optical pooled CRISPR screening, a focus of this post, replaces expensive sequencing with cheap microscopy, and drops price-per-cell &gt;200 fold (PERISCOPE reports price-per-cell ~$0.001).\nCompared to <em>arrayed<\/em> optical screens, lower requirements for automation can be expected as all conditions share the well.<\/p>\n\n<p>Overall, technology opens an opportunity for massive experimentation.<\/p>\n\n<h2 id=\"why-do-we-need-an-even-more-scalable-assay-\">Why do we need an even more scalable assay? \ud83e\udd14<\/h2>\n\n<p>Great question! \nA number of whole-genome pooled screens have been conducted, \narrayed whole-genome screens were run with cell painting. \nRecursion, who pioneered adoption of Cell Painting, <a href=\"https:\/\/www.recursion.com\/operating-system\">scaled it<\/a> to 2 million wells a week.\nWhy would you wish for <em>even more<\/em>?<\/p>\n\n<p><em>Gene perturbation can be more nuanced<\/em> than just knockout. \nCRISPR tiling, an approach to scan for important positions in genome, requires a lot of experiments.<\/p>\n\n<p>Space of interventions also goes <em>beyond single-gene<\/em> at a time. \nIf e.g. two proteins can perform similar function (\u201calternative pathways\u201d), downregulating just one of them won\u2019t have as much effect \n(periscope paper accidentally needs double KO of M6PR and IGF2R).\nThese cases, when the effect in combination is different from combination of effects, are of high interest and give a more direct hint at underlying biology than just similarity of images.\nAt the same time such cases are (likely) sparse, and should be found across 20k x 20k = 400m combinations\u2026<\/p>\n\n<p>Sometimes you need to interact with more than two genes at a time, for instance to create iPSCs.\nRecall that iPSC creation relies on simultaneous expression of 4 <a href=\"https:\/\/en.wikipedia.org\/wiki\/Induced_pluripotent_stem_cell#Production\">Yamanaka factors<\/a>.\nFor reference, the original <a href=\"https:\/\/www.cell.com\/cell\/fulltext\/S0092-8674(06)00976-7?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867406009767%3Fshowall%3Dtrue\">Yamanaka paper<\/a> screened 24 candidate genes.\nTo improve upon this \u201crecipe\u201d, a large number of combinations should be tried.\nScanning just combinations of 4 factors out of 100 <a href=\"https:\/\/en.wikipedia.org\/wiki\/Transcription_factor\">TFs<\/a> already takes around 4 million attempts.<\/p>\n\n<p>Combinatorial space stays almost unexplored.\nDropping price even more still won\u2019t make it possible to check all possible combinations, and this exploration should be driven by ML.\nML-friendliness thus becomes a requirement.<\/p>\n\n<!---\n<div style=\"float: right; width: 200px; margin: 20px;\" >\n<img src=\"\/images\/opticalscreen\/peptides.png\" height=\"200\" \/><br \/>\n<small markdown=\"True\"><a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/23316341\/\">J. Thundimadathil, 2012<\/a>  <\/small>\n<\/div> -->\n<p>There are non-genetic perturbations that are of high interest: cell environment, additions of chemicals or biologics.\nUnfortunately, usually there is no way to \u2018massively multiplex\u2019 these conditions, and microwell stays the minimal possible unit of experiment. \nNotable exception are <strong>peptides<\/strong>, as those similarly can be barcoded and participate in a pooled screen.\nPeptides can be used both as discovery tool (e.g. to block some interaction or activate receptor) and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Peptide_therapeutics\">as a therapeutic<\/a>.<\/p>\n\n<h2 id=\"challenges-needed-to-be-solved\">Challenges needed to be solved<\/h2>\n\n<p><img src=\"\/images\/opticalscreen\/cp_posh_imaging_pipeline.png\" width=\"700\" \/>\n<small>\nCell Painting (left, 5 channels + composite)\nand base calling in ISS (right) have significant overlap in channels. <br \/>\nImage from CP-POSH preprint.\n<\/small><\/p>\n\n<p>Interventions are encoded with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Guide_RNA\">sgRNA<\/a> barcodes. \nIn situ sequencing (ISS) is used to read the barcode back.<\/p>\n\n<ul>\n  <li>\n    <p><strong>Main issue is merging ISS with cell painting<\/strong>. \nThere is a spectral overlap between channels used for cell painting and ISS, and thus ISS becomes non-reliable.<\/p>\n  <\/li>\n  <li>\n    <p>Cell painting degrades RNA and <strong>destroys barcode<\/strong>. Both teams addressed this by running reverse transcription and RCA (rolling cycle amplification) of DNA before cell painting. \nISS imaging is quite destructive (multiple cycles) and happens after cell painting step.<\/p>\n  <\/li>\n<\/ul>\n\n<h3 id=\"how-periscope-solves-spectral-overlap\">How PERISCOPE solves spectral overlap<\/h3>\n\n<p><img src=\"\/images\/opticalscreen\/periscope_linker.png\" style=\"float: right; width: 400px;\" \/>\nPeriscope team replaced two dyes in cell painting with fluorescent labels attached to probes with disulfide linker (see image).\nLinker is cleaved right after \u201cphenotypic\u201d (cell painting) imaging, and these two channels could be used for ISS.\nFloating fluorescent labels are partially washed and remaining (uniform) signal is cancelled out by image processing pipeline.<\/p>\n\n<p>More specifically, membrane label Concanavalin-A was SS-conjugated to fluorophore directly, \nwhile mitochondria stain mitotracker was replaced with anti-TOMM20 Ab + secondary Ab SS-linked to fluorophore.\n<!-- TODO (can this place be optimized to remove secondary?). --> \nOriginal cell painting avoided antibodies to make the process cheaper and more reproducible.<\/p>\n\n<p>As expected, perturbation of TOMM20 distorts the signal from this channel \u2014 something to keep in mind.<\/p>\n\n<h3 id=\"how-cp-posh-solves-spectral-overlap\">How CP-POSH solves spectral overlap<\/h3>\n\n<div style=\"float: right; width: 400px; padding-left: 20px;\">\n<img src=\"\/images\/opticalscreen\/mitotracker_correlation.png\" style=\"width: 400px;\" \/>\n<small>Correlation of mitoprobe with TOMM20 and Hoechst<\/small>\n<\/div>\n<p>Mitotracker was replaced with Mitoprobe \u2014 a novel RNA-based label for mitochondria, linked to Cy5 fluorophore.\nInterestingly, they optimized a sequence to have high correlation with TOMM20 <strong>and<\/strong> low correlation with Hoechst (nuclei).<\/p>\n\n<p>Resulting image (on the right) shows optimization was successful.<\/p>\n\n<p>RNA sequences were taken from the ribosome after search for fragments that would bind to 12S rRNA and 16S rRNA (two different locations), \nthen tested 8 of them and left two: one for 12s and one for 16s in proportion 1:1. \nThis is an interesting solution and seems to overcome the issues seen in PERISCOPE approach, and likely to work in other species too.<\/p>\n\n<p>This replacement of mitotracker with mitoprobe <em>does not<\/em> remove spectral overlap (there is overlap with base A), \nbut makes it non-essential because RNA is degraded during cell-painting.\nTwo additional spectral overlaps (WGA &lt;&gt; base G) and (phalloidin &lt;&gt; base T) are also solved by degrading, \nand additional steps in the protocol were necessary.\nThese overlaps still seem to play negative role in ISS step (see later).<\/p>\n\n<p>CP-POSH has an additional channel that can be utilized for one study-specific marker, which is later featured in one of experiments.\n(They use deep red \u2014 good choice, as shorter wavelengths can be used by phenotyping!)<\/p>\n\n<!-- I am curious if something similar to mitoprobe can be developed for F-actin (i.e. RNA-based label). \nThis could make ethanol unnecessary. -->\n\n<p>In total both protocols are not straightforward.<\/p>\n\n<h3 id=\"in-situ-sequencing-iss\"><em>In situ<\/em> sequencing (ISS)<\/h3>\n\n<p><img src=\"\/images\/opticalscreen\/in_situ_sequencing.png\" \/>\n<small>Source: <a href=\"https:\/\/www.cell.com\/cell\/pdf\/S0092-8674(19)31067-0.pdf\">Feldman<\/a> et al., 2019<\/small><\/p>\n\n<p>ISS reads the barcode to determine perturbed gene.\nThis part is very similar, as both groups:<\/p>\n<ul>\n  <li>use Illumina\u2019s miseq kit for ISS (sequence-by-synthesis), and both groups used lower resolution (10X) for imaging.<\/li>\n  <li>use padlock with gap to amplify barcode to get reliable signal during sequencing<\/li>\n  <li>finally, barcodes used in both cases are not an additional genetic sequences, but sgRNAs themselves. <br \/>\nNo barcodes \u2014 no problems!<\/li>\n<\/ul>\n\n<p>CP-POSH additionally uses tiny <em>image-to-image convnet to improve calling<\/em> to get +18% correct calls. \nSuch a model can be trained on the screen data itself: \nalmost-correctly called barcodes (with simpler pipeline) are used for training the model.<\/p>\n\n<!---\nAbsence of separate barcodes, while very reliable, has its demerits too: \ncells that replicate from the same transfected cells, are not \u2018true independent observations\u2019, \nas e.g. they can carry the same mutation introduced during transfection. \nAdditional barcodes could tell apart independent transfections and help in lineage tracking.\nOptical pooling has partial remedy to this problem: cells coming from the same origin usually colocalize within a well. \nIt could be an interesting analysis if \u2018families\u2019 of cells carry any additional visual signature that is not shared by other cells with the same sgRNA.\n--->\n\n<h3 id=\"sgrnas\">sgRNAs<\/h3>\n\n<p>Quality of ISS quickly drops with sequence length, so instead of sequencing all ~20 bases of sgRNA, \nthe guides are selected so that reading only first 12-13 bases is enough to guess which sgRNA is expressed in the cell.\nGroups start from existing pools of sgRNAs to guide Cas9, with minor differences in selection procedure:<\/p>\n<ul>\n  <li>Periscope uses 12 cycles and minimal Levenshtein distance \u2265 2, which means they detect if barcode contains one error (and discard the barcode).<\/li>\n  <li>\n    <p>CP-POSH uses 13 cycles and Levenshtein distance \u2265 3, and allows up to 1 error correction.\nMost cells have more than one amplicon, which makes barcode calling even more reliable.\nError correction adds +80% of barcoded cells in their largest screen.<\/p>\n\n    <p>I hypothesize high error rate (despite CNN filtering) is connected to spectral overlaps.<\/p>\n  <\/li>\n<\/ul>\n\n<p>Scope of experiments is different: Periscope covers 20k genes with 4 guides per gene, \nwhile the largest experiment in CP-POSH targets druggable genome \u2014 1.6k genes with 10 guides per gene.<\/p>\n\n<h2 id=\"phenotypic-pipeline-and-analysis\">Phenotypic pipeline and analysis<\/h2>\n\n<p>Both teams avoid training the system on known labels.\nI\u2019ve also been avoiding training with supervision for a while, for a couple of reasons:<\/p>\n\n<ol>\n  <li>no need to drop any data from analysis (no labels \u2192 no cross-validation)<\/li>\n  <li>by providing labels you already bias model into what <em>you believe<\/em> is important. \nCorrespondingly model works to ignore all \u201cirrelevant\u201d information, and the same model can\u2019t be used (reliably) \nfor studying orthogonal questions (e.g. well-to-well variations)<\/li>\n  <li>should there be any confounder, it is less likely to be picked<\/li>\n<\/ol>\n\n<p>It\u2019s actually <strong>impressive how little prior knowledge is required to get a decent grasp of biology just from looking at static cells<\/strong>.\nWe only need to know all genes of the organism to run CRISPR, neural networks don\u2019t need even this piece of information.<\/p>\n\n<p>PERISCOPE relies on <a href=\"https:\/\/cellprofiler.org\/\">Cell Profiler<\/a>, and does not train any specific pipeline. \nAfter averaging morphological profiles across the cells for the same gene, a matrix of gene similarities is computed.<\/p>\n\n<p>CP-POSH relies on <a href=\"https:\/\/github.com\/mouseland\/cellpose\">CellPose<\/a> for segmentation, and either uses CellProfiler-like pipeline (dubbed CellStats) or self-supervised <a href=\"https:\/\/arxiv.org\/abs\/2104.14294\">DINO-ViT<\/a> from FAIR. \nUnsurprisingly, DINO-ViT demonstrates better quality, which improves with higher diversity of interventions provided during training.\nPre-training on cells not ImageNet works much better, as you\u2019d expect (Insitro-ers for some reason like Imagenet-pretrained models as baseline). \nDINO-ViT also uses patches 8x8, more relevant to the scale of cell.<\/p>\n\n<p>A nice detail: they use a well-level compensation. That\u2019s possible thanks to pooling!<\/p>\n\n<p><img src=\"\/images\/opticalscreen\/diffexp_visual_features.png\" style=\"width: 400px; float: right;\" \/>\nBoth papers delve into \u2018differential expression\u2019 of hand-crafted morphological features to provide arguments that readout is valid. \nFor instance, periscope shows that most important features to detect interventions connected to common pathways point to the right cell compartment.<\/p>\n\n<p>On the picture from PERISCOPE you see that disturbing a pathway results in some enrichment of \nimportant features (\u2018differentially expressed\u2018 features) from the corresponding cell compartment.<\/p>\n\n<div style=\"clear: both;\"><\/div>\n\n<h2 id=\"verification--discovery\">Verification &amp; Discovery<\/h2>\n\n<p>\u201cMethod papers\u201d are a special genre of literature: 1) focus of author is technology 2) focus of editor is novel biology 3) authors must provide convincing validation which no one wants to dive in.<\/p>\n\n<p>This rarely converts into a consistent story for screens, and this time is no exception.<\/p>\n\n<p>PERISCOPE compares two different medias, running whole-genome screens in each of them \u2014 an interesting experiment with unclear interpretation: \nthere are genes that \u201cland in different clusters\u201d depending on the media \u2014 \nbut unclear what to do with this information. \nAs I understand, the goal was to demonstrate that running screen \nin a more physiologically relevant media would yield better insights, \nbut it is unclear if differences (Ext Fig.8) indeed show superiority of either media.<\/p>\n\n<p>Another interesting shot is the TMEM251 investigation with significant additional research beyond PERISCOPE. \nIf the TMEM251 story really matters, I\u2019d prefer to see it published separately and better verified (using available info from other pooled screens as well), \nPeriscope in this story was needed only for initial guess based on GSEA \u2014 but this guess could come from other public screens as well.<\/p>\n\n<p>Speaking of GSEA\u2026  \u2014 usage of GSEA in paper (e.g. fig. 6a) makes no sense \ud83d\ude1e.\nGSEA\u2019s power is combining signal from multiple genes with low expression.\nThis problem <em>does not exist<\/em> in optical screens \u2014 as no expression is measured.\nPreranked GSEA (erroneously) relies on zero correlation between genes, \nbut correlation in optical screens is very high. \nIn fact, this high correlation is a subject of several plots in the paper.\nTo compare pathways, just define another direction in embedding space for each pathway, \nas you do for single genes. \nDirection is a (weighted) average of directions for individual genes + measure separation of distributions along direction \n(e.g. ROC AUC).<\/p>\n\n<p><img src=\"\/images\/opticalscreen\/umap_leiden_from_cellposh.png\" width=\"700\" \/>\n<small>Example UMAP from CP-POSH for one of screens<\/small><\/p>\n\n<p>CP-POSH focuses on druggable genome (1640 genes) with a couple of smaller screens.\nEach version of pipeline (data + phenotyping model) is compared against <a href=\"https:\/\/string-db.org\/\">StringDB<\/a>,\nproviding a quantifiable comparison, so they can e.g. demonstrate that targeting more genes is slightly better. \nThey also confirm that trained models generalize to new experiments.<\/p>\n\n<p>Different versions of screen are presented in a uniform way with UMAP+Leiden clustering applied \nto genes with a clear morphological signature (see example above).<\/p>\n\n<p>I was confused by notable divergence between models trained on 300 and 1640 genes, figure 5a. \nIn particular their lists of significant genes (AUC &gt; 0.55) should markedly diverge across models.\nAlso, 0.55 may sound small \u2014 however, bear in mind this is a cell-level classification, \nand combining multiple cells will result in strong discrimination.<\/p>\n\n<p>Both ViT and CellStats \u201cnominate the potential role of TUT1 in cell cycle regulation\u201d. \n(No research made to confirm).\nInterestingly, sgRNA consistency failed for several genes, \nand half of genes have at least one \u2018outlier\u2019 sgRNA (out of 10).<\/p>\n\n<p>In my opinion, CP-POSH has a consistent storyline and more \u2018standardized\u2019 analysis.\nIt looks more like a validation of approach\/platform, \nand less like a bunch of interesting observations (though CP-POSH has these too).\nPERISCOPE presentation is more aligned to \u201cget published in AAA journal\u201d.<\/p>\n\n<p>Neither paper discusses cell cycle, a well-known confounder in single-cell studies, how so? \ud83e\udd37\nOptical screens previously characterized full images, not individual cells, \nand thus did not have to deal with this issue (as there are other cells to get signal from).\nSince neither team used supervision, \npipelines likely cluster dividing cells together, \npreferring this characteristic over perturbation.\nCancelling this in optical screen is an interesting challenge.<\/p>\n\n<h2 id=\"so-which-one-to-choose\">So which one to choose?<\/h2>\n\n<p>Great question, fortunately we have papers to help us! So here is my insight: I don\u2019t know.\n<strong>I can\u2019t meaningfully compare performance of two systems after reading preprints.<\/strong>\nPerformance, I guess, is similar \u2014 but that\u2019s only a guess. \nIf some lab wants to select which one to go with, this becomes a matter of trust \u2014 not how science is supposed to work.\n(ok-ok, one additional channel can actually make this choice).<\/p>\n\n<p>Main selling points of optical pooled screens are simple scalability and fewer confounders,\nwhich ultimately means hypothesis-free or hypothesis-light research.\nI doubt that interpretable morphological features are important for practitioners.<\/p>\n\n<p>Papers lack \u201cpower analysis\u201d on how many cells are needed to reconstruct perturbation profile.\nVery little said about cost ($0.001 per cell \u2014 estimate from PERISCOPE, no cost estimates from CP-POSH).\nThese two factors determine if pooled strategy pays out.<\/p>\n\n<p>Speaking of potential, it is unclear if two sgRNAs per cell can be confidently called with either approach.<\/p>\n\n<h2 id=\"can-we-do-better\">Can we do better?<\/h2>\n\n<p><strong>Screen validation should become a benchmark.<\/strong>\nIt\u2019s about time we had a benchmark of reproduction of gene networks\/gene ontology with some predefined procedure. \nCommunity would benefit from comparing across the screens rather than \u201crediscovering\u201d mTOR in every screen paper.<\/p>\n\n<p>Number one question is \u2014 can screen discover culture-specific biology?\nWhen comparing several cell lines, are gene similarities in optical screen and scRNAseq similar for the same cell line?<\/p>\n\n<p>It would be of high interest to highlight which pathways are detectable in scRNAseq but hardly noticeable in optical pooled screening (and vice versa). \nIt is of value to know if there are pathways that can be seen in an optical screen or in scrnaseq \u2014 and can help in choosing the right instrument for the problem.<\/p>\n\n<p><strong>Compare screen to screen, not screen to \u201ccommon knowledge\u201d.<\/strong>\nCommon pathways are a very rough sanity check.\nSingle UMAP with gene grouped by their similarity is descriptive enough.\nGSEA is a poor argument:  it is embarrassingly easy to find something pleasing with GSEA \nand throw a bunch of impressively small (incorrect) p-values at readers.<\/p>\n\n<p>Comparison screen-to-screen can detect more subtle biology, specific to the biology of culture, and can actually bring interesting insight.<\/p>\n\n<p><strong>Discoveries are usually irrelevant for the story and should not be demanded by journals.<\/strong>\nMethod papers are demanded to \u201cshow novel biology\u201d, and most of \u201cbyproduct discoveries\u201d have no value for readers or authors \u2014 \notherwise those would be a separate paper.<\/p>\n\n<p><em>Faster, cheaper, easier to scale, more reliable, easier to implement<\/em> are \n<strong>great<\/strong> arguments for technology. \nIf whole smartphone industry can\u2019t deliver \u201ca killer feature\u201d every year, \nhow that can be a requirement for every method? \ud83e\udd37<\/p>\n\n<h2 id=\"where-would-this-go\">Where would this go?<\/h2>\n\n<p>Back to point. Pooled optical screening is an exciting technology, \nand it has a number of immediate applications.\nAnd it is super valuable to understand its current limits.<\/p>\n\n<p>For instance, I have the following questions on my mind:<\/p>\n\n<ul>\n  <li>does it transfer? When two labs experiment with same cell line, would they get similar results?\nIn theory, yes, but how about practice?<\/li>\n  <li>similarity and difference with arrayed screens: shared media means studied processed are limited to a single cell,\nbecause cell interactions are not restricted to cells with the same perturbation. \nThis has both pros (clearer signal) and cons (if cell interactions\/collective behavior are of interest).<\/li>\n  <li>is it suitable to automatically find \u2018interesting\u2019 combinations of genes? \nCan we train RL to discover those for us?<\/li>\n  <li>can it handle tissue slices? Can we pool-screen <a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/fragi.2021.714926\/full\">whole mouse<\/a>?<\/li>\n  <li>can vision pipeline handle neurons? Is DINO a good choice for that?<\/li>\n<\/ul>\n\n<p>Hopefully more research will come and we\u2019ll get answers to these and other questions soon.<\/p>\n\n<div style=\"text-align: center; font-size: 40px; padding: 110px\">\ud83d\udc4b<\/div>\n\n<h4 id=\"acknowledgments\">Acknowledgments<\/h4>\n\n<p>Thanks to Kevan Shah and Tatiana Dvorkina for proofreading and comments.\nThanks to CP-POSH team (Ci Chu, Max Salick) and PERISCOPE team (Meraj Ramezani, Paul C. Blainey) for answering questions.<\/p>\n\n<h4 id=\"comments\">Comments<\/h4>\n\n<p>Paul C. Blainey provided some pointers to prior works of his lab, relevant to the questions I discuss in the post:<\/p>\n\n<blockquote>\n  <p>\u2026 a couple of comments that you may find interesting:<\/p>\n  <ul>\n    <li>In Figure S2 of <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC6886477\/\">Feldman et al., 2019<\/a> we showed efficient detection of 2 guides per cell (in ~80% of cells)<\/li>\n    <li>In <a href=\"https:\/\/www.pnas.org\/doi\/10.1073\/pnas.2210623120\">Carlson et al, 2023<\/a> we use a different \nand simple strategy to overlap IHC and SBS in the same channels which is to titrate down the IHC reagents<\/li>\n    <li>Both of these works demonstrate a potentially standardizable validation approach to do a follow-up (\u201csecondary\u201d) \nscreen in an independent experiment with higher replication (more cells and\/or guides per gene). \nThe hit ranks or feature scores can be compared gene-wise or guide-wise across the primary and secondary to check reproducibility of the results. \nThis can be for technical validation (same assay and guides) or biological validation (new assay and\/or new biological model system).<br \/>\nSo far we\u2019re seeing impressive reproducibility which supports some of the more challenging and informative use cases you suggest.<\/li>\n    <li><a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2021.11.28.470116v1.full\">Funk et al, 2022<\/a> demostrated that cell cycle can be treated more explicitly, \nwe added 24-hour live imaging of cells prior to fixation<\/li>\n  <\/ul>\n<\/blockquote>\n\n<!--\nMy comment: for some processes like mitosis \/ cell movement, \nlive imaging can be done together with pooled screen \nand used as a functional validation to provide \"arbitrage\" between different screens.\nThis still requires compared approaches to be implemented in the same lab, \nor, at least, with the same culture. \n-->\n\n<!-- \n\n\n# Cell painting channels:\n\nOriginal cell paingting from the paper: https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5223290\/\n\n\nPhenotypic images were acquired using a 20X 0.75 NA CFI Plan Apo Lambda objective (Nikon MRD00205) and the following Semrock filters for each phenotypic probe: \n\nNucleus (DAPI) dual-band emission 408\/473, dichroic. \nActin (phalloidin) emission ET530\/30 nm, dichroic 495 nm. \nMitochondria (TOMM20) emission 615\/24 nm, dichroic 565 nm. \nEndoplasmic reticulum (Concanavalin A) emission 680\/42 nm, dichroic 660 nm. \nGolgi and plasma membrane (WGA) emission 820\/110 nm, dichroic 765 nm. \n\n\n\nISS cycles were imaged using a 10X 0.45 NA CFl Plan Apo Lambda objective (Nikon) with the following Semrock filters for each base: \nMiseq G emission 575\/30 nm, dichroic 555 nm. excitation 543\/4 nm, \nMiseq T emission 615\/24 nm, dichroic 565 nm. \nMiseq A emission 680\/42 nm, dichroic 660 nm. \nMiseq C emission 732\/68 nm, dichroic 660 nm.\n\n575 (-30) - 732 (+ 68)\nTOMM20 intersects with T\nConA intersects with miseq A\n\n\n\nSame for cell painting -POSH\n\nStain Target Imaging Type Stain Laser Source Laser (nm) Emission Filter (nm) Objective Exposure time (ms) \n\nNucleus Phenotyping Hoechst Celesta Light Source, Lumencor, 90-10525 405 Pentacube , 441x30 20x 0.75 NA, OFN25 DIC N2\nCellular Membranes\/ endoplasmic reticulum Phenotyping ConA Celesta Light Source, Lumencor, 90-10525 488 Pentacube, 511x26 20x 0.75 NA, OFN25 DIC N2\nCellular membrane\/ Golgi\/ER Phenotyping Wheat Germ Agglutinin Celesta Light Source, Lumencor, 90-10525 545 567\/15nm Filter, Semrock, FF01-567\/ 15-25 20x 0.75 NA, OFN25 DIC N2\nCytoskeleton\/ F-actin Phenotyping Phalloidin Celesta Light Source, Lumencor, 90-10525 545 624\/40nm Filter, Semrock, FF01-624\/ 40-25 20x 0.75 NA, OFN25 DIC N2\nMitochondria Phenotyping Mitoprobe Celesta Light Source, Lumencor, 90-10525 637 Pentacube 684x34 20x 0.75 NA, OFN25 DIC N2\nribosomal protein Phenotyping pS6 primary and secondary antibody Celesta Light Source, Lumencor, 90-10525 748 Pentacube 817x66 20x 0.75 NA, OFN25 DIC N2\n\n\nG    545  -> 567\/15nm   <> WGA\nT    545  -> 624\/40nm   <>  Phalloidin one-to-one   - degraded by ethanol\nA    637  -> 676\/29nm   <>  Mitoprobe\nC    637  -> 732\/68nm\n\n\n-->\n\n","pubDate":"Sun, 20 Aug 2023 12:00:00 +0000","link":"https:\/\/arogozhnikov.github.io\/2023\/08\/20\/optical-pooled-screens.html","guid":"https:\/\/arogozhnikov.github.io\/2023\/08\/20\/optical-pooled-screens.html","category":"biology"},{"title":"Einops, retrospective of 5 years","description":"<p>Einops is soon-to-turn 5 years. Right time to have a look back.<\/p>\n\n<p>Some intro: einops is widely used \u2014 around 4 million downloads a month (for calibration - pytorch is 10 million) on pypi and is used in thousands of projects on github.<\/p>\n\n<p>In a number of ways einops is unique:<\/p>\n\n<ul>\n  <li>bends tensors for a number of very different frameworks. AFAIK all other efforts to make something truly multi-framework either died too soon or avoided touching internals of models<\/li>\n  <li>never pulled back released features. At the same time einops lived much longer than any major version of tensorflow or pytorch. Some backends it originally supported (mxnet, chainer) are dead by now<\/li>\n  <li>bug tracker was empty for years, compared to usual hundreds in projects of similar scope. Now it reports several hardly fixable inconsistencies that appeared as frameworks introduced more features<\/li>\n  <li>einops adoption happens mostly through the code sharing between teams\/projects, and not by hype-waving. Several mentions in twitter brought waves of likes but almost none were converted to users at that point.\nPaper appeared only after einops circulated for three years in the wild nature of github, when it was pristine clear that idea \u201cclicks\u201d.<\/li>\n  <li>\u201cmagical\u201d universal dispatching, so users could write <code class=\"language-plaintext highlighter-rouge\">rearrange(x, 'b c h w -&gt; b h w c')<\/code> and not care about <code class=\"language-plaintext highlighter-rouge\">x<\/code>\u2019s framework\/device\/dtype\/C-ordering. While this is more of a \u2018fancy\u2019 functionality, it was important during initial adoption. <!-- Magical is not a great description for technology, but einops was many times described as \"magic\" with a positive vibe in this word. --><\/li>\n  <li>no dependencies (except Python). Everything else is optional, even numpy<\/li>\n  <li>there is no corporation\/university behind einops, it is mostly a single-person effort<\/li>\n<\/ul>\n\n<h2 id=\"tough-place\">Tough place?<\/h2>\n\n<p>A while ago Stephan H. asked <em>what is challenging about einops<\/em> as a project.<\/p>\n\n<p>I don\u2019t think I\u2019ve made a great answer back then. \nAnd probably couldn\u2019t anyway, because question assumes there is a specific \u201ctough place\u201d, but the assumption is wrong.<\/p>\n\n<p>Also \u201ctough place\u201d is very subjective and after working for some time over any project,\nif you\u2019re successful, there will be no \u201ctough\u201d place, because you focus on those parts that are \u201ctough\u201d \nand get them better either by decomposing their complexity or by just learning to manage with it.<\/p>\n\n<h2 id=\"unique-technical-challenges\">Unique technical challenges<\/h2>\n\n<p>I decided to dedicate some time to write a better answer for this question.\nFirst prototype was built in a couple of hours, but project itself took months, so clearly there were non-trivial parts. \nEinops as a project has a number of (conflicting) technical restrictions that create a significant pressure:<\/p>\n\n<ul>\n  <li>\n    <p>frameworks. Einops supports a dozen of them, and that\u2019s unique. \nWorse, each framework has its specifics, and this creates significant internal tension within a project, \nwhich I\u2019ll discuss a lot in the next points<\/p>\n  <\/li>\n  <li>\n    <p>even worse, frameworks have multiple regimes of work within the same framework (i.e. torch alone has torch.compile, tracing, scripting, \u2018plain run\u2019, torch.fx, cuda graph capturing, and maybe more). They all have different behaviors<\/p>\n  <\/li>\n  <li>\n    <p>landscape is not steady and frameworks appear and gone, even worse, sometimes change their API, and sometimes by breaking existing API (looking at you, keras and TF). Their dependencies may contradict each other (stares at protobuf)<\/p>\n  <\/li>\n  <li>\n    <p>support for eager computations.<\/p>\n\n    <p>That\u2019s how code usually runs these pytorchy days. In this case, the hot path should be <em>really<\/em> fast, and have absolutely minimal overhead. Einops deals with this with a number of caches that make usual loopy computations super-efficient. Shape checks (usually skipped by lazy everyone) are conducted only once per shape.<\/p>\n  <\/li>\n  <li>\n    <p>support for symbolic computations and traceability.<\/p>\n\n    <p>Two little-known facts first: 1. einops can deal with symbolic tensors (i.e. can operate tensors with unknown size of one or several axes, which may sound slightly impossible at first) and 2. einops \u201cdisappears\u201d during tracing and provides models that contain an equivalent set of framework-native operations, and moreover traced operation  correctly work for inputs of different shape.<\/p>\n\n    <p>As a result, execution flow has to rely only on traceable operations over shape\u2019s elements, and e.g. one can\u2019t just compute correct result shape in cpp\/rust<\/p>\n  <\/li>\n  <li>\n    <p>shape checks for symbolic tensors.<\/p>\n\n    <p>For example <code class=\"language-plaintext highlighter-rouge\">rearrange(x, '(h h2) (w w2) -&gt; (h w) h2 w2', h2=4, w2=w2)<\/code> demands that first axis is divisible by 4, and the second axis is divisible by <code class=\"language-plaintext highlighter-rouge\">w2<\/code>, while dimensions of tensors are unknown.\nAn additional restriction: einops can\u2019t use built-in graph asserts like tf.Asserts because of their framework-specificity.\nClever organization of computations in ops ensures that code fails for wrong inputs without introducing additional elements of static graph<\/p>\n  <\/li>\n  <li>\n    <p>support for scripting: this requirement dramatically narrows a subset of Python that can be used, and in some cases demands specifying wrong type hints for internal functions because correct types like <code class=\"language-plaintext highlighter-rouge\">tuple[str, ...]<\/code> are not supported by <code class=\"language-plaintext highlighter-rouge\">torchscript<\/code><\/p>\n  <\/li>\n  <li>\n    <p>support for tensor-rank polymorphism, that is, the same operation with ellipsis can handle inputs of different dimensions. Initially this was done by a clever trick that pre-packed \u2018ellipsis axes\u2019 into one, but recent changes in frameworks (see next point) required developing some new approach<\/p>\n  <\/li>\n  <li>\n    <p>special axes. Frameworks try to extend a concept of tensor = ndarray which worked so well. \nExamples are sharding axes in distributed tensors and jagged arrays. \nThis clearly was out of initial design and, as I mentioned, required significant redesign of einops.<\/p>\n  <\/li>\n  <li>\n    <p>frameworks divergences: differences in the names\/interfaces of operations, missing operations like logsumexp, inconsistencies in support of einsum.<\/p>\n  <\/li>\n  <li>\n    <p>layers definition is quite different across frameworks, and specially <code class=\"language-plaintext highlighter-rouge\">flax<\/code> required some personal approach.<\/p>\n  <\/li>\n  <li>\n    <p>view semantics. Einops tries to provide a view to the input if possible, \nmaking operation itself very cheap, as no real computation happens.<\/p>\n  <\/li>\n  <li>\n    <p>additional pressure is my perfectionism, and trying to keep the bar very high.\nThese days I don\u2019t think extreme reliability should be assumed from side\/personal projects.<\/p>\n  <\/li>\n<\/ul>\n\n<!-- - python's typing does not know how to exclude lists -->\n\n<p>Appearing problems with new features like <code class=\"language-plaintext highlighter-rouge\">torch.fx<\/code> may be interpreted as <em>einops gives cracks<\/em>, reality is - einops as a notation and approach are just fine.\nIt is enjoyed by many, and community wants to use notation with new framework features. \nAnd notation fits that.\nBut the terrible basement that tensor manipulation is built upon (i.e. reshape\/view\/transpose and similar) gives cracks - more and more visible, and building a layer of cement upon is \u2026 not wise.\nAs I discussed several times, einops\u2019 core operation should be available at the lowest level of graph representation\n\u2014 but I don\u2019t expect this advice to be heard.<\/p>\n\n<p>Support for a large zoo of frameworks is (retrospectively) a questionable investment.\nExamples: cupy and chainer were almost never used but also were trivial to maintain and develop. \nWhile mxnet\/gluon required very special attitude.\nSupporting multiple frameworks to me was an insurance that frameworks did not try to create \u201ctheir very own version of einops\u201d, and did not create non-compatible extensions (as they did for numpy).<\/p>\n\n<p>These days projects that don\u2019t use einops still use its core ideas\nby writing parts of einops patterns: <code class=\"language-plaintext highlighter-rouge\">(b h) t c<\/code>, <code class=\"language-plaintext highlighter-rouge\">b*h  t  c<\/code> and similar.\nBecause that\u2019s the best way to communicate internal structure of tensor \n(\u2026 when you agree on C-ordering of course, construct relies on it significantly).<\/p>\n\n<h2 id=\"unique-conceptual-challenges\">Unique conceptual challenges<\/h2>\n\n<!-- It is easy to think about einops as a python package, but it is more of **approach** to write a readable, reliable and efficient code, that was conveniently provided to python users. -->\n\n<p>Einops is more of approach to writing code than a package, but package is a necessary tool to bring those ideas into practice. On approach level there are a number of hurdles too.<\/p>\n\n<p>Turns out design of operations is very challenging: einops received a long list of suggestions and ideas, and very few were accepted. Folks just introduced to einops think \u201ceinops are helpful, so let\u2019s invent something similar\u201d, but <em>similar<\/em> does not imply <em>helpful<\/em>.<\/p>\n\n<p>Let\u2019s take a story of <code class=\"language-plaintext highlighter-rouge\">einops.pack<\/code> and <code class=\"language-plaintext highlighter-rouge\">einops.unpack<\/code> for a demonstration of this point: \nconcatenation of different-shape tensors was of interest (for me) even before the first public release.\nMy design at that time was universal enough, similar to the rest of einops, but too verbose and inconvenient:<\/p>\n\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">[<\/span><span class=\"n\">r<\/span><span class=\"p\">,<\/span> <span class=\"n\">g<\/span><span class=\"p\">,<\/span> <span class=\"n\">b<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">rechunk<\/span><span class=\"p\">([<\/span><span class=\"n\">rgb<\/span><span class=\"p\">],<\/span> <span class=\"s\">'b h w [r+g+b] -&gt; b h w [r, g, b]'<\/span><span class=\"p\">,<\/span> <span class=\"n\">r<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">g<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">b<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n<\/code><\/pre><\/div><\/div>\n<p>\u2026 thus it was not included. Later it was minimized by restricting transpositions:<\/p>\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\"># this one poorly works with type hinting\n<\/span><span class=\"p\">[<\/span><span class=\"n\">r<\/span><span class=\"p\">,<\/span> <span class=\"n\">g<\/span><span class=\"p\">,<\/span> <span class=\"n\">b<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">rechunk<\/span><span class=\"p\">(<\/span><span class=\"n\">rgb<\/span><span class=\"p\">,<\/span> <span class=\"s\">'b h w *'<\/span><span class=\"p\">,<\/span> <span class=\"s\">'r+g+b -&gt; [r, g, b]'<\/span><span class=\"p\">,<\/span> <span class=\"n\">r<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">g<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">b<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span> \n<\/code><\/pre><\/div><\/div>\n<p>until I finally realized that this operation better to be totally different from <code class=\"language-plaintext highlighter-rouge\">rearrange<\/code> and should not have any names for the concatenated\/split axes:<\/p>\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">[<\/span><span class=\"n\">r<\/span><span class=\"p\">,<\/span> <span class=\"n\">g<\/span><span class=\"p\">,<\/span> <span class=\"n\">b<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">unpack<\/span><span class=\"p\">(<\/span><span class=\"n\">rgb<\/span><span class=\"p\">,<\/span> <span class=\"s\">'b h w *'<\/span><span class=\"p\">,<\/span> <span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">])<\/span>\n<\/code><\/pre><\/div><\/div>\n<p>which was soon generalized into unpacking with arbitrary shapes.<\/p>\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">[<\/span><span class=\"n\">r<\/span><span class=\"p\">,<\/span> <span class=\"n\">g<\/span><span class=\"p\">,<\/span> <span class=\"n\">b<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">unpack<\/span><span class=\"p\">(<\/span><span class=\"n\">rgb<\/span><span class=\"p\">,<\/span> <span class=\"s\">'b h w *'<\/span><span class=\"p\">,<\/span> <span class=\"p\">[[<\/span><span class=\"mi\">1<\/span><span class=\"p\">],<\/span> <span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">],<\/span> <span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]])<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Original design of operation could not support arbitrary shapes.\nOk, technically it could, but that would be ugly and miserable.\nNew design solved another issue \u2014 memorizing axes that were composed, another common request for einops.<\/p>\n\n<p>I\u2019ve come up with a final design (which I still find optimal) only <em>two years later<\/em>. \nA number of suggestions popped around that were similar to the original version.<\/p>\n\n<p>To see that operation \u2018clicks\u2019, <strong>a whole research is needed<\/strong>:<\/p>\n\n<ul>\n  <li>collect use-cases (and this requires a broad view of SOTA and how it may change over the next years)<\/li>\n  <li>convert use-cases to code examples, and prepare baseline implementations without new operation<\/li>\n  <li>implement with your suggestion, and in most cases, conclude that doesn\u2019t look good enough<\/li>\n<\/ul>\n\n<p>There are more complicated parts, like \u201cis it easy to read?\u201d, \u201cis this code confusing?\u201d and finally \u201chow to make this all efficient given all restrictions above?\u201d.<\/p>\n\n<p>Allocating time for these (mostly unsuccessful) attempts is tough.<\/p>\n\n<!-- Python. Python stands in a way sometimes. Julia's line-level macros maybe would be a more convenient syntax, and e.g. writing something like\n```python\nx_out['b h w c'] = x['b c h w']\n``` -->\n\n<p>Additional challenge: \u201cfewer, but more universal operations\u201d.<\/p>\n\n<p>There is a gap between \u201cI find this helpful\u201d and \u201cthis will be actively used\u201d.\nIt is easy to come up with a long list of operations that will be helpful in <em>some<\/em> cases, but how users would figure this out? I don\u2019t think anyone checks einops\u2019 docs regularly, so operation will never pop up in mind. \nSee, <em>usefulness of operation strongly depends on its universality<\/em>, i.e. ability to cover many cases, and einops are good at this because it was one of requirements.<\/p>\n\n<h2 id=\"adoption-challenges-management-challenges\">Adoption challenges, management challenges<\/h2>\n\n<p>Einops adoption was very slow. If it was a commercial project, it is likely to run out of money before getting sufficient traction.<\/p>\n\n<p>But the project was designed to be resilient. Somewhat an internal requirement: should be usable for at least a couple of years even in the worst scenario: no maintenance at all, and deep learning landscape changes even faster than before.<\/p>\n\n<p>From the very beginning maintenance debt was minimized \u2014  that means, very restricted design, fewer features. \nI assessed very carefully which things can be broken.\nOnce I was asked during an interview: why may it stop working? I said \u2014  only if API of core operations will change. Time shown this was the correct answer.<\/p>\n\n<p>Another issue is <em>extremely low adoption of layers<\/em>. I have no good explanation to it, they are very useful.<\/p>\n\n<h2 id=\"reasons-for-slow-adoption\">Reasons for slow adoption?<\/h2>\n\n<p><strong>No hyping<\/strong>. In part, because I am bad at it, and in part, because I am not that interested in answering \nbasic questions from folks attracted by new shiny things. \nAs a byproduct, early adopters of einops are mostly very advanced folks who knew what to expect from the tool\nand cared more about quality of their code than the rest of ML community.<\/p>\n\n<p>Consequently, einops has <em>no dedicated community<\/em> (discord server or so). \nIn the long run I think no community is better than abandoned community (which happens in many projects).\nThere are a number of ein-tools around the github addressing specific cases, \nmaybe somewhat centralized community could help with initial adoption.<\/p>\n\n<p>Another important factor is <strong>a significant prejudice against string-templated operations<\/strong>, which is for \nthree reasons: 1. einsum was historically slow 2. einsum is the only operation of this kind in the frameworks. 3. everyone knows parsing is slow, and idea of \u2018parse once\u2019 rarely crosses the mind.<\/p>\n\n<p>Einops <em>caches results of pattern parsing<\/em>. \nBut even repeating this many times in paper\/documentation will not overcome prejudice \u2014  because if you\u2019re already convinced it is slow, why would you read paper?<\/p>\n\n<p>A couple of speed issues were reported to einops repo, while those were not even related to einops \u2014  a vivid demonstration of this bias.<\/p>\n\n<p><strong>No critical case<\/strong>. Tool becomes an immediate hit only if it addresses an existing case that is very \npoorly covered by previous tools. Or rarely because of hype.<\/p>\n\n<p>Not that you can\u2019t bend tensors without einops. And not that adding single <code class=\"language-plaintext highlighter-rouge\">rearrange<\/code> magically makes your code better.\nEinops is an approach \u2014  and approach still requires investment to get a habit of writing and reading new kind of code. Real conversion happens only after one needs to read someone\u2019s else code and finds out that reading einopsy code is significantly easier.<\/p>\n\n<h1 id=\"concluding-thought\">Concluding thought<\/h1>\n\n<p>Einops, as said, is one of a kind, and its development trajectory deviates significantly from the \u2018normal\u2019 development.<\/p>\n\n<p>How would you call a system that is shaped by hard constraints? \nI\u2019d call this \u201cengineering art\u201d.<\/p>\n\n","pubDate":"Thu, 13 Jul 2023 12:00:00 +0000","link":"https:\/\/arogozhnikov.github.io\/2023\/07\/13\/retrospective-thoughts-on-einops.html","guid":"https:\/\/arogozhnikov.github.io\/2023\/07\/13\/retrospective-thoughts-on-einops.html","category":["einops","tensor manipulations"]},{"title":"Schema migration should be a responsibility of DB","description":"<p>A great achievement of the past decade in programming is a shift in paradigm from <em>transition<\/em>-focused to <em>state<\/em>-focused.<\/p>\n\n<p>This shift is clearly seen in front-end (user interfaces):\nIn react\/preact\/vue and other frontend frameworks a component has a state and defines how state should be represented (rendered) in html. \nThe aim of a framework is to \u2018migrate DOM\u2019 to desired html representation with minimal overhead.<\/p>\n\n<p>This shift is clearly seen in management of cloud resources. \nIn AWS CDK, pulumi, terraform and other <a href=\"https:\/\/en.wikipedia.org\/wiki\/Infrastructure_as_code\">IaC<\/a> tools user defines desired state of infrastructure, and it is responsibility of a tool to produce a correct \u2018migration of infrastructure\u2019.<\/p>\n\n<p>This shift is visible in dependency management:\nDependency management relies on expected state (which packages\/libraries are required) and less on imperative instructions that dictate order of installation.\nImperative glue here is still very common \u2014 e.g. dockerfiles, but tools like nix\/nixos eliminate the glue as well.<\/p>\n\n<!-- Streamlit (tool used by data\/ml folks) uses state (kept on client-side) to define the contents of the page. Every user action changes the state, and triggers computation of a new content with (mostly) preserved state.  -->\n\n<p>In databases, in particular in ORMs, this shift had (only partially) happened around two decades ago. \nUser changes ORM classes, and the framework produces migrations.<\/p>\n\n<p>Generally speaking, in all these cases we define desired state of the system, <em>not<\/em> necessary changes.\nMovement to state-focused programming dramatically simplified management of complex systems. \nIt\u2019s like you laying out a plan of street while the question of moving all belongings\/walls is solved for you.<\/p>\n\n<h2 id=\"whats-wrong-with-migrations-in-rdbms\">What\u2019s wrong with migrations in RDBMS?<\/h2>\n\n<p>Switching to auto-migration tools helps to focus on important - e.g. current relations in RDBMS - and not how we ended up with this set of relations.\nPlus, coherence between DB and code (ORMs or schema-definition tools) is now granted.<\/p>\n\n<p>Adoption of auto-migration tools is still very low (even compared to ORMs), and in my opinion, because of <strong>how this process is organized<\/strong>.<\/p>\n\n<p>We have dozens of relational DBMS, and yes, they look similar, but there are tons of nuances that make them all different.<\/p>\n\n<p>And we have a number of tools to produce migrations: sqlalchemy+alembic in python, entity framework in .net, a dozen of tools for Hibernate in Java, and every community\/ecosystem tries to develop a solution that can migrate a large number of deviating databases in a uniform way.<\/p>\n\n<p>No big surprise all of them have very limited success given that scope of project is unlimited.<\/p>\n\n<p>Auto-migration tools like alembic are also tough to develop and maintain:<\/p>\n<ul>\n  <li>they need to understand schema definition in a language (in python, in this case)<\/li>\n  <li>they need to introspect current schema of the database<\/li>\n  <li>they need to compute \u2019diff\u2019 based on matching these two schema definitions, neither of which were created with automated schema migration in mind<\/li>\n  <li>deal with all peculiarities of dialects in schema definition and schema migration<\/li>\n  <li>for all operations alembic creates counterparts in python code, which is like introducing +1 language<\/li>\n<\/ul>\n\n<p>The same problems doesn\u2019t hurt frontend frameworks as much, because there are currently ~2.5 browser engines, and a ton of work done by standardization committees around js, and \u2026 after ditching react\/vue you still have to deal with discrepancies, this time yourself.\nThe same problems are faced by IaC tools, and this eventually will become one more (significant) barrier for migration between clouds.<\/p>\n\n<p><img src=\"\/images\/migrations\/migration-db.png\" width=\"800\" \/>\n<small>\nComparison of existing solutions (python\u2019s alembic is taken as example), \nand comparison to this proposal. Note that on the left there are multiple steps that cross the boundary\nof ORM\/migrator or migrator\/DB.\n<\/small><\/p>\n\n<h2 id=\"solution\">Solution<\/h2>\n\n<ul>\n  <li>schema migration is generated by database<\/li>\n  <li>tool only declares desired state<\/li>\n<\/ul>\n\n<p>This will move responsibility for db-specificity migration to db developers, and that\u2019s for good.<\/p>\n\n<h3 id=\"where-to-start\">Where to start?<\/h3>\n\n<p>In a minimal implementation, DB provides a function. Function is given two db <code class=\"language-plaintext highlighter-rouge\">schemas<\/code> (think of postgresql\/oracle\/sql server schemas, or individual databases in mysql) and compares them to produce a migration from an observed difference. \nMigration tool would create a temporary schema with a desired state and call a procedure to produce migration.<\/p>\n\n<p>That\u2019s not something unseen: pgAdmin has \u2018Schema Diff\u2019, SQL Server Data Tools has \u2018Schema Compare\u2019. \nSo tools do exist, but they are not part of the database, and they don\u2019t have a uniform interface.<\/p>\n\n<h3 id=\"consequences\">Consequences<\/h3>\n\n<p>When we push migrations to database developers\u2026<\/p>\n<ul>\n  <li>migrations would be almost immediately available in any programming language<\/li>\n  <li>on a longer range, we should expect improvements in SDL (schema definition languages) to account for common migration scenarios.<\/li>\n<\/ul>\n<details>\n<summary> Example of these changes <\/summary>\n<div>\n    <p>For example, if you start from something like<\/p>\n    <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">Relation<\/span> <span class=\"n\">Person<\/span><span class=\"p\">:<\/span>\n  <span class=\"n\">name<\/span><span class=\"p\">:<\/span> <span class=\"n\">string<\/span>\n<\/code><\/pre><\/div>    <\/div>\n    <p>and migrate it to<\/p>\n    <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">Relation<\/span> <span class=\"n\">Person<\/span><span class=\"p\">:<\/span>\n   <span class=\"n\">full_name<\/span><span class=\"p\">:<\/span>  <span class=\"n\">string<\/span>\n<\/code><\/pre><\/div>    <\/div>\n\n    <p>From the point of a migration tool it is not clear that you just renamed a field, not deleted \u2018name\u2019 and created \u2018full_name\u2019. \nThus an additional technical identifier is necessary, for instance:<\/p>\n\n    <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">Relation<\/span> <span class=\"n\">Person<\/span><span class=\"p\">:<\/span>\n  <span class=\"n\">name<\/span><span class=\"p\">:<\/span> <span class=\"n\">string<\/span><span class=\"p\">,<\/span> <span class=\"n\">oid<\/span><span class=\"o\">=<\/span><span class=\"err\">\u2018<\/span><span class=\"mi\">7<\/span><span class=\"n\">dsd8<\/span><span class=\"err\">\u2019<\/span>\n<\/code><\/pre><\/div>    <\/div>\n    <p>to<\/p>\n    <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">Relation<\/span> <span class=\"n\">Person<\/span><span class=\"p\">:<\/span>\n  <span class=\"n\">full_name<\/span><span class=\"p\">:<\/span> <span class=\"n\">string<\/span><span class=\"p\">,<\/span> <span class=\"n\">oid<\/span><span class=\"o\">=<\/span><span class=\"err\">\u2018<\/span><span class=\"mi\">7<\/span><span class=\"n\">dsd8<\/span><span class=\"err\">\u2019<\/span>\n<\/code><\/pre><\/div>    <\/div>\n\n    <p>now it is clear that renaming happened. There are a number of other ways to have smoother support of migrations.<\/p>\n\n    <p>However, this will be just an idea until DB developers don\u2019t have to think about migration.<\/p>\n\n  <\/div>\n<\/details>\n\n<ul>\n  <li>there are cases when db just does not provide tools to produce migrations. Like postgresql enum that just can\u2019t be migrated safely by alembic, so <a href=\"https:\/\/github.com\/sqlalchemy\/alembic\/issues\/278\">this issue<\/a> is unresolved for years, and that\u2019s not on alembic side.<\/li>\n<\/ul>\n\n<p><br \/><\/p>\n\n<p>Well\u2026 we can just implement improvements as a stand-alone solution, e.g. within ORM, right?.<\/p>\n\n<p>No, we can\u2019t. As I described, to make it somewhat useful, you need to support numerous dialects, and creating such migration tools is a big job (comparable to creating a new database). \nCreating such tools for multiple languages is probably more job than just creating db from scratch.<\/p>\n\n<p><br \/><\/p>\n\n<p><br \/><\/p>\n\n<p>That\u2019s the main feature I expect from my next db: declarative SDL with schema migrations taken by DB.\nI know that EdgeDB already provides such functionality, but if you know other tools that have this implemented - drop me a letter.<\/p>\n\n","pubDate":"Sun, 29 Jan 2023 01:00:00 +0000","link":"https:\/\/arogozhnikov.github.io\/2023\/01\/29\/migrations.html","guid":"https:\/\/arogozhnikov.github.io\/2023\/01\/29\/migrations.html","category":["schema migrations","databases"]},{"title":"Delimiter-first code","description":"<style>\n.alex-boxes {\n    display: flex;\n    justify-content: space-around;\n}\n.lvl1 {\n    color: darkred;\n}\n.lvl2 {\n    color: darkgreen;\n}\n.lvl3 {\n    color: darkblue;\n}\n.lvl1, .lvl2, .lvl3 {\n    padding-right: 2px;\n}\n.lvl1:before, .lvl2:before, .lvl3:before {\n    content: \"<lvl\";\n}\n.lvl1:after, .lvl2:after, .lvl3:after {\n    content: \">\";\n}\ncmnt {\n    \/* comments *\/\n    display: inline;\n    color: #7f9f7f;\n}\nstrn {\n    \/* string literals *\/\n    display: inline;\n    color: #cc9393;\n}\npnct { \n    \/* punctuation *\/\n    display: inline;\n    color: #41706f;\n}\nkwrg {\n    \/* kwarg *\/\n    display: inline;\n    color: #eee;\n}\nhngr {\n    \/* hanging elements - bracket \/ parenthesis \/ start of multiline *\/\n    display: inline;\n    color: #d8f;\n}\ncaret {\n    display: inline;\n}\ncaret:after {\n    content: \"\u13c6\";\n    color: #AAA;\n}\n\n.precode {\n    background-color: #2b2b2b; \n    color: #dcdccc;\n    overflow-x: visible;\n}\n\ncaret:after {\n    animation: blink-animation 1.5s infinite;\n}\n@keyframes blink-animation {\n    0%  { opacity: 0.8; }\n    10% { opacity: 0.4; }\n    40% { opacity: 0.4; }\n    50% { opacity: 0.8; }\n}\n<\/style>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>I argue for wider usage of delimiter-first in the code<\/p>\n<ul>\n  <li><code class=\"language-plaintext highlighter-rouge\">three friends [tic, tac, toe]<\/code> becomes <code class=\"language-plaintext highlighter-rouge\">three friends \u30fbtic \u30fbtac \u30fbtoe<\/code>.<\/li>\n<\/ul>\n\n<p>A new top-level syntax for programming languages is proposed to show advantages of this method.\nNew syntax is arguably as simple, but more consistent, better preserves visual structure and solves some issues in code formatting.<\/p>\n\n<h2 id=\"related-comma-first-formatting\">Related: comma-first formatting<\/h2>\n\n<p>A well-known proposal is to write commas first in languages like javascript, JSON or SQL, which don\u2019t have trailing commas (JS has these days, but not the other two):<\/p>\n\n<div class=\"alex-boxes\">\n  <div class=\"language-sql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>    <span class=\"c1\">-- trailing commas              <\/span>\n    <span class=\"k\">SELECT<\/span> <span class=\"n\">employee_name<\/span><span class=\"p\">,<\/span>\n      <span class=\"n\">company_name<\/span><span class=\"p\">,<\/span>\n      <span class=\"n\">salary<\/span><span class=\"p\">,<\/span>\n      <span class=\"n\">state_code<\/span><span class=\"p\">,<\/span>\n      <span class=\"n\">city<\/span>\n    <span class=\"k\">FROM<\/span> <span class=\"nv\">`employees`<\/span>\n<\/code><\/pre><\/div>  <\/div>\n  <div class=\"language-sql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>    <span class=\"c1\">-- leading commas               <\/span>\n    <span class=\"k\">SELECT<\/span> <span class=\"n\">employee_name<\/span>\n         <span class=\"p\">,<\/span> <span class=\"n\">company_name<\/span>\n         <span class=\"p\">,<\/span> <span class=\"n\">salary<\/span>\n         <span class=\"p\">,<\/span> <span class=\"n\">state_code<\/span>\n         <span class=\"p\">,<\/span> <span class=\"n\">city<\/span>\n    <span class=\"k\">FROM<\/span> <span class=\"nv\">`employees`<\/span>\n<\/code><\/pre><\/div>  <\/div>\n<\/div>\n\n<p>While it is <strong>not what I am discussing here<\/strong>, there is a large overlap.\nThis style wasn\u2019t widely adopted, and it is interesting why.<\/p>\n\n<p>All criticism essentially comes down to: \n1) tools can solve common issues solved by this notation\n2) it is not natural \/ you don\u2019t write text like this.<\/p>\n\n<p>Argument 1) is irrelevant since tools can handle any notation, even completely non-readable for human. \nArgument 2) is weak, however similarity to known things drastically simplifies adoption.<\/p>\n\n<p>Over time, however, code culture diverged in multiple ways from \u2018usual writing\u2019: \nwe enumerate from zero, write identifiers with underscores, don\u2019t follow usual rules for quotes, and indent code instead of writing in paragraphs.\nWhen some tools have shown that the alternative way works, further adoption happens more easily.<\/p>\n\n<p>More importantly, argument 2) is really broken:<\/p>\n\n<div class=\"alex-boxes\">\n  <div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>    \u30fbthis version          \n    \u30fbis far more \n    \u30fbnatural\n<\/code><\/pre><\/div>  <\/div>\n  <div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>    than this version\u30fb          \n    with a delimiter\u30fb\n    after\n<\/code><\/pre><\/div>  <\/div>\n<\/div>\n\n<p>so when it came to enumerating in a visually distinctive way, \n\u2018usual writing\u2019 uses delimiter-first.<\/p>\n\n<p>I want to point the source of this controversy with one more example:<\/p>\n\n<pre>\nYou need eggs, cheese, bread.               <span style=\"color: #484\"># ok<\/span>  \nYou need ,eggs ,cheese ,bread.              <span style=\"color: #844\"># sucks<\/span>\nYou need a) eggs b) cheese c) bread.        <span style=\"color: #484\"># ok<\/span>\nYou need 1. eggs 2. cheese 3. bread.        <span style=\"color: #484\"># ok<\/span>\nYou need \u30fbeggs \u30fbcheese \u30fbbread.            <span style=\"color: #484\"># ok<\/span>   \n<\/pre>\n\n<p>So complains are not because delimiter-first looks wrong - in fact, it is common.\nIt is about commas being used as <em>leading<\/em> elements, not trailing - a lesson to remember.<\/p>\n\n<p>Both argument 1) and 2) pinpoint reasons <em>why things the way they are<\/em>: habit and tools.\nBut different code examples (<a href=\"https:\/\/hoffa.medium.com\/winning-arguments-with-data-leading-with-commas-in-sql-672b3b81eac9\">SQL examples<\/a> by Felipe Hopfa and <a href=\"https:\/\/gist.github.com\/isaacs\/357981\">JS examples<\/a> by Isaac Z. Schlueter) show benefits of delimiter-first.<\/p>\n\n<p>I expected to find in discussions some code examples where delimiter-last is better, but I didn\u2019t.<\/p>\n\n<p><em>Later addition:<\/em> haskell community <a href=\"https:\/\/github.com\/tibbe\/haskell-style-guide\/blob\/master\/haskell-style.md\">adopted<\/a> leading commas in many projects, because trailing commas were not supported at first.\nLater haskell got support for trailing, but now majority <a href=\"https:\/\/www.reddit.com\/r\/haskell\/comments\/hr5c2n\/comment\/fy25hpm\/?utm_source=share&amp;utm_medium=web2x&amp;context=3\">votes<\/a> for advantages of leading commas.<\/p>\n\n<h2 id=\"is-delimiter-a-right-word\">Is \u2018delimiter\u2019 a right word?<\/h2>\n\n<p>Delimiter (just as separator) separates items. Though there is <a href=\"https:\/\/stackoverflow.com\/questions\/9118769\/when-to-use-the-terms-delimiter-terminator-and-separator\">no consensus<\/a> about it.<\/p>\n\n<p>E.g. in <code class=\"language-plaintext highlighter-rouge\">[ 1, 2, 3 ]<\/code> we have a sequence of tokens:<\/p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>start  item delimiter  item  delimiter   item   end\n  [     1       ,        2        ,       3      ]\n<\/code><\/pre><\/div><\/div>\n\n<p>So what I\u2019m arguing for is having a start-of-item token. \nLike this: <code class=\"language-plaintext highlighter-rouge\">\u30fb1 \u30fb2 \u30fb3<\/code>.\nDo we need to point an end of last token? As we\u2019ll see next, that\u2019s usually not the case.<\/p>\n\n<p>We have a special word for end-of-item token: terminator, but no startinator or any similar word.\nI see some irony in this.<br \/> \n<em>(update: find some interesting thoughts I received about this in the comments section)<\/em><\/p>\n\n<p>Meanwhile, I keep using the word \u2018delimiter\u2019 (albeit it\u2019s maybe incorrect)<\/p>\n\n<h2 id=\"collections-in-html\">Collections in HTML<\/h2>\n\n<p>Different markup languages give some food for thought, as they commonly deal with collections.<\/p>\n\n<p>E.g. html allows using start-of-item (<code class=\"language-plaintext highlighter-rouge\">&lt;li&gt;<\/code>) and skipping end-of-item (<code class=\"language-plaintext highlighter-rouge\">&lt;\/li&gt;<\/code>)<\/p>\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;ul&gt;<\/span>\n    <span class=\"nt\">&lt;li&gt;<\/span> first item\n    <span class=\"nt\">&lt;li&gt;<\/span> second item\n<span class=\"nt\">&lt;\/ul&gt;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"collections-in-yaml\">Collections in YAML<\/h2>\n\n<p>Yaml, which focuses on a hierarchy of collections, also uses a delimiter-first approach.<\/p>\n\n<div class=\"language-yaml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"pi\">-<\/span> <span class=\"s\">point <\/span><span class=\"m\">1<\/span>\n  <span class=\"pi\">-<\/span> <span class=\"s\">point <\/span><span class=\"m\">1.1<\/span>\n  <span class=\"pi\">-<\/span> <span class=\"s\">point <\/span><span class=\"m\">1.2<\/span>\n    <span class=\"pi\">-<\/span> <span class=\"s\">point 1.2.1<\/span>\n    <span class=\"pi\">-<\/span> <span class=\"s\">point 1.2.2<\/span>\n  <span class=\"pi\">-<\/span> <span class=\"s\">point <\/span><span class=\"m\">1.3<\/span>\n<span class=\"pi\">-<\/span> <span class=\"s\">point 2<\/span> \n<\/code><\/pre><\/div><\/div>\n\n<p>Let me reinterpret this example. <strong>This reinterpretation is important in further discussion<\/strong>.<\/p>\n\n<p>There are 3 delimiters: <code class=\"language-plaintext highlighter-rouge\">\\n-<\/code>, <code class=\"language-plaintext highlighter-rouge\">\\n__-<\/code> and <code class=\"language-plaintext highlighter-rouge\">\\n____-<\/code> (underscore = whitespace).\nAll three delimiters are distinct, and the whole structure now reads as<\/p>\n\n<pre>\n<span class=\"lvl1\">1<\/span>point 1\n<span class=\"lvl2\">2<\/span>point 1.1\n<span class=\"lvl2\">2<\/span>point 1.2\n<span class=\"lvl3\">3<\/span>point 1.2.1\n<span class=\"lvl3\">3<\/span>point 1.2.2\n<span class=\"lvl2\">2<\/span>point 1.3\n<span class=\"lvl1\">1<\/span>point 2 \n<\/pre>\n\n<p>No end token needed in yaml: the last item ends when a collection ends, i.e. at a delimiter of higher level.\nThere is no need to know or parse anything about an internal structure between two <lvl1> tokens.<\/lvl1><\/p>\n\n<p>Correspondingly, the only expectation we have from contents enclosed between <code class=\"language-plaintext highlighter-rouge\">&lt;lvl2&gt;<\/code> is \nthat there are no tokens <code class=\"language-plaintext highlighter-rouge\">&lt;lvl1&gt;<\/code> or <code class=\"language-plaintext highlighter-rouge\">&lt;lvl2&gt;<\/code> and that\u2019s it.<\/p>\n\n<p>Intermediate conclusion: delimiter-first is very common, \nand in markup languages it is even standard (but not in programing languages!)<\/p>\n\n<h2 id=\"line-should-start-from-n-not-end-with-it\">Line should start from <code class=\"language-plaintext highlighter-rouge\">\\n<\/code>, not end with it<\/h2>\n\n<p>This sounds mad (after many years of programming it just should), but see for yourself:<\/p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Let's assume I've had some very long text ending here.\n\nChapter 2.\nLet's learn about belonging of indentation elements to logical elements.\n<\/code><\/pre><\/div><\/div>\n\n<p>Pay attention to the blank line between last line of previous chapter and a header of new line.\nUndoubtedly, blank line is a part of \u2018Chapter 2\u2019 logical element, \nbecause empty line focuses our attention on \u2018Chapter 2\u2019 label. \nIt is not because we need to end the paragraph.<\/p>\n\n<p>For the same reason, in html additional margins \u2018belong\u2019 to headers, not preceding elements.<\/p>\n\n<p>Same for lines: <em>we highlight a beginning of a new line<\/em>, not an end of previous one.\nIronically, that\u2019s in the name: it is newline, not endline.<\/p>\n\n<p>When we turn to code, the same thought is seen with this small snippet, \nwhere I compare normal <code class=\"language-plaintext highlighter-rouge\">print<\/code> with a hypothetical <code class=\"language-plaintext highlighter-rouge\">print<\/code> that outputs newline before the output:<\/p>\n\n<div class=\"alex-boxes\">\n  <div>\n    <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"s\">'step1. downloading'<\/span><span class=\"p\">,<\/span> <span class=\"n\">end<\/span><span class=\"o\">=<\/span><span class=\"s\">''<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">for<\/span> <span class=\"n\">chunk<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">download<\/span><span class=\"p\">(...):<\/span>\n    <span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"n\">end<\/span><span class=\"o\">=<\/span><span class=\"s\">'.'<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">print<\/span><span class=\"p\">()<\/span> <span class=\"c1\"># to keep steps on separate lines\n<\/span>\n<span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"s\">'step2. processing'<\/span><span class=\"p\">,<\/span> <span class=\"n\">end<\/span><span class=\"o\">=<\/span><span class=\"s\">''<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">for<\/span> <span class=\"n\">chunk<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">process<\/span><span class=\"p\">(...):<\/span>\n    <span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"n\">end<\/span><span class=\"o\">=<\/span><span class=\"s\">'.'<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">print<\/span><span class=\"p\">()<\/span> <span class=\"c1\"># to keep steps on separate lines\n<\/span><\/code><\/pre><\/div>    <\/div>\n    <center>\nCode with \\n auto-printed after the arguments\n<\/center>\n  <\/div>\n  <div>\n    <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"s\">'step1. downloading'<\/span><span class=\"p\">)<\/span>              \n<span class=\"k\">for<\/span> <span class=\"n\">chunk<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">download<\/span><span class=\"p\">(...):<\/span>\n    <span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"n\">start<\/span><span class=\"o\">=<\/span><span class=\"s\">'.'<\/span><span class=\"p\">)<\/span>\n\n<span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"s\">'step2. processing'<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">for<\/span> <span class=\"n\">chunk<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">process<\/span><span class=\"p\">(...):<\/span>\n    <span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"n\">start<\/span><span class=\"o\">=<\/span><span class=\"s\">'.'<\/span><span class=\"p\">)<\/span>\n    \n    \n<\/code><\/pre><\/div>    <\/div>\n    <center>\nCode with \\n auto-printed before the arguments\n<\/center>\n  <\/div>\n<\/div>\n\n<p>result:<\/p>\n<div class=\"language-text highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>step1. downloading.........\nstep2. processing.........\n<\/code><\/pre><\/div><\/div>\n\n<p>Version of code with leading <code class=\"language-plaintext highlighter-rouge\">\\n<\/code> is more straightforward.<\/p>\n\n<p>If things were the opposite way:<\/p>\n<div class=\"language-text highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>.......step1. downloaded\n.......step2. processed\n<\/code><\/pre><\/div><\/div>\n<p>then <code class=\"language-plaintext highlighter-rouge\">\\n<\/code> in the end would be more optimal, but this order is not natural.\nNormally we first describe the collection, then enumerate items, not vice versa.<\/p>\n\n<h2 id=\"unixs-newline-in-the-end-of-line\">Unix\u2019s newline in the end of line<\/h2>\n\n<p>Unix does not use <code class=\"language-plaintext highlighter-rouge\">\\n<\/code> as a delimiter of lines.\nInstead, it is more of line-terminator, because file with text <em>should<\/em> end with <code class=\"language-plaintext highlighter-rouge\">\\n<\/code>.\nNot doing so would break simplicity of unix tools and simplicity of definitions, see <a href=\"https:\/\/stackoverflow.com\/questions\/729692\/why-should-text-files-end-with-a-newline\">this SO thread<\/a>.<\/p>\n\n<p>For layman, why newline is required in unix:<\/p>\n<div class=\"language-bash highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ <\/span><span class=\"nb\">echo<\/span> <span class=\"nt\">-n<\/span> <span class=\"s1\">'good file with newline in the end\\n'<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"nb\">echo<\/span> <span class=\"nt\">-n<\/span> <span class=\"s1\">'another good file with newline in the end\\n'<\/span>\ngood file with newline <span class=\"k\">in <\/span>the end\nanother good file with newline <span class=\"k\">in <\/span>the end\n<\/code><\/pre><\/div><\/div>\n\n<p>Missed newline in the first file:<\/p>\n<div class=\"language-bash highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ <\/span><span class=\"nb\">echo<\/span> <span class=\"nt\">-n<\/span> <span class=\"s1\">'bad file without newline in the end'<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"nb\">echo<\/span> <span class=\"nt\">-n<\/span> <span class=\"s1\">'another good file with newline in the end\\n'<\/span>\nbad file without  newline <span class=\"k\">in <\/span>the endanother good file with newline <span class=\"k\">in <\/span>the end\n<\/code><\/pre><\/div><\/div>\n<p>problem is in the first file, but it is the second one to get printed the wrong way.\nNo such misattrbution issue with newline-first.<\/p>\n\n<p>If it is ok to end each file with <code class=\"language-plaintext highlighter-rouge\">\\n<\/code>, then it is ok to start it with <code class=\"language-plaintext highlighter-rouge\">\\n<\/code>.<\/p>\n\n<p>Having lines start with <code class=\"language-plaintext highlighter-rouge\">\\n<\/code> maintains the simplicity of unix utilities, but is a bit simpler to visualize in editor.<\/p>\n\n<p>Imagine that in parallel universe text and binary files are different in the very first character. What a science finction we could live in!<\/p>\n\n<p><strong>Do I really want to change all files to newline-first?<\/strong> \nOf course not.\nBut I have to point that if in the course of history files were newline-first from the start, that would be a better system.<\/p>\n\n<p>I hypothesize, that newline-last comes from unix mainframes:\nwhen line in shell is entered, it can be passed to a mainframe for processing.\nI can\u2019t confirm this, but it sounds plausible. \nIf so, time has shown that to be a wrong choice: \nall the messengers these days make distinction between new line (enter) and sending messages (shift+enter).\nJupyter knows that, IDEs know that, messengers know that. Terminals still don\u2019t know that.<\/p>\n\n<h2 id=\"using-indentation-to-structure-code\">Using indentation to structure code<\/h2>\n\n<p>Code indentation is available in all major languages, \nbut python (and scala 3, F#, nim, haskell, \u2026) relies on indentation to define logical structure.<\/p>\n\n<p>And that works very well. Let\u2019s see how we can re-interpret the python code the way we did with yaml<\/p>\n\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">class<\/span> <span class=\"nc\">MyClass<\/span><span class=\"p\">:<\/span>\n    <span class=\"k\">def<\/span> <span class=\"nf\">__init__<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">):<\/span>\n        <span class=\"k\">pass<\/span>\n    \n    <span class=\"k\">def<\/span> <span class=\"nf\">some_method<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">):<\/span>\n        <span class=\"k\">pass<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>now we reinterpret the structure with <code class=\"language-plaintext highlighter-rouge\">&lt;lvl1&gt;=\\n<\/code>,  <code class=\"language-plaintext highlighter-rouge\">&lt;lvl2&gt;=\\n____<\/code>,  <code class=\"language-plaintext highlighter-rouge\">&lt;lvl3&gt;=\\n________<\/code>.<\/p>\n\n<pre>\n<span class=\"lvl1\">1<\/span>class MyClass\n<span class=\"lvl2\">2<\/span>def __init__(self)\n<span class=\"lvl3\">3<\/span>pass\n<span class=\"lvl2\">2<\/span>\n<span class=\"lvl2\">2<\/span>def some_method(self):\n<span class=\"lvl3\">3<\/span>pass\n<\/pre>\n\n<p>so, we see very basic organization of code is available just by looking at sequence of start tokens (which simply mirrors indentation).<\/p>\n\n<h2 id=\"some-problems-with-multiline-strings\">Some problems with multiline strings<\/h2>\n\n<p>There are places where python allows code to \u2018escape\u2019 indentation:\ncontinuation of previous line (explicit with \\ or implicit with different brackets)\nand multiline strings.<\/p>\n\n<p>Continuations are \u2018solvable\u2019 with code formatting tools, but not multiline literals:<\/p>\n\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">if<\/span> <span class=\"bp\">True<\/span><span class=\"p\">:<\/span>\n    <span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"s\">\"\"\"\n    This is python's\n    multiline string\n    \"\"\"<\/span><span class=\"p\">)<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Output (###### just shows where the line ends):<\/p>\n\n<pre class=\"precode\">\n<cmnt>######<\/cmnt>\n    This is python's<cmnt>######<\/cmnt>\n    multiline string<cmnt>######<\/cmnt>\n    <cmnt>######<\/cmnt>    \n<\/pre>\n\n<p>To get proper output we need to break visual alignment:<\/p>\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">if<\/span> <span class=\"bp\">True<\/span><span class=\"p\">:<\/span>\n    <span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"s\">\"\"\"This is python's\nmultiline string\n\"\"\"<\/span><span class=\"p\">)<\/span>\n    <span class=\"c1\"># takes effort to realize that the same block of code continues here\n<\/span>    <span class=\"k\">return<\/span> <span class=\"bp\">False<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>There are problems with multiline: first line, last line and indentation.\nMultilines in javascript\/go face all the same issues, so it is a generic problem.<\/p>\n\n<p>I think there is a way to solve this issue too, and it will be discussed.<\/p>\n\n<h2 id=\"delimiter-first-pseudo-python\">Delimiter-first pseudo-python<\/h2>\n\n<p>To better demostrate how all these ideas come together, I\u2019ll imagine a new language (pseudo-python).\nTo focus only on syntax changes, I\u2019ll keep all other aspects of the language the same.<\/p>\n\n<p>I will consider an artificially complicated example. It includes different arguments, list, empty list, string, multiline string, method chaining, multiline logical arithmetics, few or no arguments<\/p>\n\n<p>Goal is to demonstrate that any wild mix is representable and does not produce mess.<\/p>\n\n<div class=\"alex-boxes\">\n  <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">prepare_message<\/span><span class=\"p\">(<\/span>\n    <span class=\"n\">title<\/span><span class=\"o\">=<\/span><span class=\"s\">\"Hey {}, ready for Christmas?\"<\/span><span class=\"p\">.<\/span><span class=\"nb\">format<\/span><span class=\"p\">(<\/span><span class=\"n\">user_name<\/span><span class=\"p\">),<\/span>\n    <span class=\"n\">email<\/span><span class=\"o\">=<\/span><span class=\"n\">email<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">body<\/span><span class=\"o\">=<\/span><span class=\"sa\">f<\/span><span class=\"s\">\"\"\"Reminder: please clean your chimneys!\n\nOh, and prepare \"Santa Landing Spot\" on your roof\n\nThank you <\/span><span class=\"si\">{<\/span><span class=\"n\">user_name<\/span><span class=\"si\">}<\/span><span class=\"s\"> for cooperation,<\/span><span class=\"se\">\\n<\/span><span class=\"s\">Santa Corp.\n\"\"\"<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">additional_sections<\/span><span class=\"o\">=<\/span><span class=\"p\">[<\/span>\n        <span class=\"n\">get_current_promotions<\/span><span class=\"p\">(<\/span><span class=\"n\">n_promotions<\/span><span class=\"o\">=<\/span><span class=\"mi\">4<\/span><span class=\"p\">),<\/span>\n        <span class=\"n\">get_recent_news<\/span><span class=\"p\">(),<\/span>\n    <span class=\"p\">],<\/span>\n    <span class=\"n\">unsubscribe_link<\/span><span class=\"o\">=<\/span><span class=\"n\">generate_unsubscribe_link<\/span><span class=\"p\">(<\/span>\n        <span class=\"n\">email<\/span><span class=\"p\">,<\/span> \n        <span class=\"n\">message<\/span><span class=\"o\">=<\/span><span class=\"n\">message<\/span><span class=\"p\">,<\/span>\n        <span class=\"o\">**<\/span><span class=\"n\">unsubscribe_settings<\/span><span class=\"p\">,<\/span>\n    <span class=\"p\">),<\/span>\n    <span class=\"n\">attachments<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[],<\/span>\n<span class=\"p\">).<\/span><span class=\"n\">schedule_for_submission<\/span><span class=\"p\">(<\/span>\n    <span class=\"n\">holidays_queue<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">important<\/span><span class=\"o\">=<\/span><span class=\"n\">user_is_santa<\/span> <span class=\"o\">|<\/span>  <span class=\"n\">user_is_deer<\/span> \\\n     <span class=\"o\">|<\/span> <span class=\"n\">user_previously_had_issues_with_christmas_delivery<\/span><span class=\"p\">,<\/span>\n<span class=\"p\">)<\/span>\n\n<\/code><\/pre><\/div>  <\/div>\n  <pre class=\"precode\">\nprepare_message<hngr>(<\/hngr>\n    <pnct>,<\/pnct> <kwrg>title=<\/kwrg><strn>\"Hey {}, ready for Christmas?\"<\/strn>.format(user_name)\n    <pnct>,<\/pnct> <kwrg>email=<\/kwrg>email\n    <pnct>,<\/pnct> <kwrg>body=<\/kwrg><hngr>f\"\"\"<\/hngr>\n        <strn>\"Reminder: please clean your chimneys!              <\/strn>\n        <strn>\"                                                   <\/strn>\n        <strn>\"Oh, and prepare \"Santa Landing Spot\" on your roof  <\/strn>\n        <strn>\"                                                   <\/strn>\n        <strn>\"Thank you {<kwrg>user_name<\/kwrg>} for cooperation,\\nSanta Corp.<\/strn>\n    <pnct>,<\/pnct> additional_sections=<hngr>[<\/hngr>\n        <pnct>,<\/pnct> get_current_promotions(n_promotions=4)\n        <pnct>,<\/pnct> get_recent_news()\n    <hngr>]<\/hngr>\n    <pnct>,<\/pnct> unsubscribe_link=generate_unsubscribe_link<hngr>(<\/hngr>\n        <pnct>,<\/pnct> email\n        <pnct>,<\/pnct> message=message\n        <pnct>,<\/pnct> **unsubscribe_settings\n    <hngr>)<\/hngr>\n    <pnct>,<\/pnct> attachments = []\n<hngr>)<\/hngr>\n<pnct>\\<\/pnct>.schedule_for_submission<hngr>(<\/hngr>\n    <pnct>,<\/pnct> holidays_queue\n    <pnct>,<\/pnct> important=user_is_santa | user_is_deer \n      \\| user_previously_had_issues_with_christmas_delivery\n<hngr>)<\/hngr>\n<\/pre>\n<\/div>\n\n<p>I welcome you to study this example for a minute.\nStructure overall did not change much. Note differences in line breaks <code class=\"language-plaintext highlighter-rouge\">\\<\/code> and multiline strings.<\/p>\n\n<p>An important distinction:\nleading commas get the same role as hyphens in yaml: they define structure, their position is not arbitrary.<\/p>\n\n<div class=\"alex-boxes\">\n  <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\"># normal python\n# this is legal code            \n<\/span><span class=\"k\">print<\/span><span class=\"p\">(<\/span>\n    <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> \n        <span class=\"mi\">2<\/span><span class=\"p\">,<\/span>\n<span class=\"p\">)<\/span>\n<\/code><\/pre><\/div>  <\/div>\n  <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\"># proposed\n# this is incorrect code        \n<\/span><span class=\"k\">print<\/span><span class=\"p\">(<\/span>\n    <span class=\"p\">,<\/span> <span class=\"mi\">1<\/span>\n        <span class=\"p\">,<\/span> <span class=\"mi\">2<\/span>\n<span class=\"p\">)<\/span>\n<\/code><\/pre><\/div>  <\/div>\n<\/div>\n\n<p>In new code there is no need in closing brackets (see that yourself by staring at the code more!). <br \/>\nSo let\u2019s remove closing elements:<\/p>\n\n<div class=\"alex-boxes\">\n  <div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">prepare_message<\/span><span class=\"p\">(<\/span>\n    <span class=\"n\">title<\/span><span class=\"o\">=<\/span><span class=\"s\">\"Hey {}, ready for Christmas?\"<\/span><span class=\"p\">.<\/span><span class=\"nb\">format<\/span><span class=\"p\">(<\/span><span class=\"n\">user_name<\/span><span class=\"p\">),<\/span>\n    <span class=\"n\">email<\/span><span class=\"o\">=<\/span><span class=\"n\">email<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">body<\/span><span class=\"o\">=<\/span><span class=\"sa\">f<\/span><span class=\"s\">\"\"\"Reminder: please clean your chimneys!\n\nOh, and prepare \"Santa Landing Spot\" on your roof\n\nThank you <\/span><span class=\"si\">{<\/span><span class=\"n\">user_name<\/span><span class=\"si\">}<\/span><span class=\"s\"> for cooperation,<\/span><span class=\"se\">\\n<\/span><span class=\"s\">Santa Corp.\n\"\"\"<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">additional_sections<\/span><span class=\"o\">=<\/span><span class=\"p\">[<\/span>\n        <span class=\"n\">get_current_promotions<\/span><span class=\"p\">(<\/span><span class=\"n\">n_promotions<\/span><span class=\"o\">=<\/span><span class=\"mi\">4<\/span><span class=\"p\">),<\/span>\n        <span class=\"n\">get_recent_news<\/span><span class=\"p\">(),<\/span>\n    <span class=\"p\">],<\/span>\n    <span class=\"n\">unsubscribe_link<\/span><span class=\"o\">=<\/span><span class=\"n\">generate_unsubscribe_link<\/span><span class=\"p\">(<\/span>\n        <span class=\"n\">email<\/span><span class=\"p\">,<\/span> \n        <span class=\"n\">message<\/span><span class=\"o\">=<\/span><span class=\"n\">message<\/span><span class=\"p\">,<\/span>\n        <span class=\"o\">**<\/span><span class=\"n\">unsubscribe_settings<\/span><span class=\"p\">,<\/span>\n    <span class=\"p\">),<\/span>\n    <span class=\"n\">attachments<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[],<\/span>\n<span class=\"p\">).<\/span><span class=\"n\">schedule_for_submission<\/span><span class=\"p\">(<\/span>\n    <span class=\"n\">holidays_queue<\/span><span class=\"p\">,<\/span>\n    <span class=\"n\">important<\/span><span class=\"o\">=<\/span><span class=\"n\">user_is_santa<\/span> <span class=\"o\">|<\/span>  <span class=\"n\">user_is_deer<\/span> \\\n     <span class=\"o\">|<\/span> <span class=\"n\">user_previously_had_issues_with_christmas_delivery<\/span><span class=\"p\">,<\/span>\n<span class=\"p\">)<\/span>\n<\/code><\/pre><\/div>  <\/div>\n  <pre class=\"precode\">\nprepare_message<hngr>(<\/hngr>\n    <pnct>,<\/pnct> <kwrg>title=<\/kwrg><strn>\"Hey {}, ready for Christmas?\"<\/strn>.format(user_name)\n    <pnct>,<\/pnct> <kwrg>email=<\/kwrg>email\n    <pnct>,<\/pnct> <kwrg>body=<\/kwrg><hngr>f\"\"\"<\/hngr>\n        <strn>\"Reminder: please clean your chimneys!                <\/strn>\n        <strn>\"                                                     <\/strn>\n        <strn>\"Oh, and prepare \"Santa Landing Spot\" on your roof    <\/strn>\n        <strn>\"                                                     <\/strn>\n        <strn>\"Thank you {<kwrg>user_name<\/kwrg>} for cooperation,\\nSanta Corp.  <\/strn>\n    <pnct>,<\/pnct> additional_sections=<hngr>[<\/hngr>\n        <pnct>,<\/pnct> get_current_promotions(n_promotions=4)\n        <pnct>,<\/pnct> get_recent_news()\n    <pnct>,<\/pnct> unsubscribe_link=generate_unsubscribe_link<hngr>(<\/hngr>\n        <pnct>,<\/pnct> email\n        <pnct>,<\/pnct> message=message\n        <pnct>,<\/pnct> **unsubscribe_settings\n    <pnct>,<\/pnct> attachments = []\n<pnct>\\<\/pnct>.schedule_for_submission<hngr>(<\/hngr>\n    <pnct>,<\/pnct> holidays_queue\n    <pnct>,<\/pnct> important=user_is_santa | user_is_deer \n      \\| user_previously_had_issues_with_christmas_delivery\n<\/pre>\n<\/div>\n\n<p>Don\u2019t pay much attention to number of lines - denser code is a byproduct, not a goal.<\/p>\n\n<p>Further I\u2019ll discuss several advantages of this syntax.<\/p>\n\n<h2 id=\"new-multiline-strings\">New multiline strings<\/h2>\n\n<pre class=\"precode\" style=\"overflow-x: scroll;\">\nprint<hngr>(f\"\"\"<\/hngr>\n    <strn>\"This is new<\/strn>\n    <strn>\"multiline string<\/strn>\n<\/pre>\n<p>output:<\/p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>This is new\nmultiline string\n<\/code><\/pre><\/div><\/div>\n\n<p>Everything looks perfect, multiple issues are solved in one shot. But \u2026 with a minor catch: that\u2019s how output looks like in raw form:\n<code><cmnt>\\n<\/cmnt>This is new<cmnt>\\n<\/cmnt>multiline string<\/code> \n(i.e. it is newline-first).\nTechnically, one can produce newline-last outputs, but that\u2019s artificial.\nSee the elegance of match between delimiter-first and newline-first approach: delimiter just gets replaced with newline. That\u2019s an operation that one can visually imagine by shifting all lines to the left.<\/p>\n\n<p>One more example:<\/p>\n<pre class=\"precode\">\nprint<hngr>(f\"\"\"<\/hngr>\n    <strn>\"you can place anything here: ' '' ''' \" \"\" \"\"\" f\"\"\" etc etc.<\/strn>\n    <cmnt># and you can put comments in the middle of multiline<\/cmnt>\n    <strn>\"multiline string can't be broken or terminated by any sequence within a line <\/strn>\n<\/pre>\n\n<p>Now, python literals do not work like that.<\/p>\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"s\">'''\n\"\"\" and '''<\/span> <span class=\"n\">should<\/span> <span class=\"n\">be<\/span> <span class=\"n\">escaped<\/span> <span class=\"p\">(<\/span><span class=\"n\">otherwise<\/span> <span class=\"n\">interpreted<\/span> <span class=\"k\">as<\/span> <span class=\"n\">literal<\/span> <span class=\"n\">terminator<\/span><span class=\"p\">)<\/span>\n<span class=\"s\">'''\n\n\n'''''<\/span>\n<span class=\"s\">'''  # this trick (available in markdown) does not work in python\n'''''<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"new-parsing\">New parsing<\/h2>\n\n<p>In contrast to normal python, line alone does not inform if the instruction is complete, or it should be continued on the next line. \nParsing one more line is required to confirm that current code section is complete \n(only prefix of next line should be parsed, to be more precise).<\/p>\n\n<p>In this approach top-level parsing is quite ignorant to language details, and it relies on the same visual cues as we humans do: parser does not need to analyze line in detail to figure out if the instruction continues or not.<\/p>\n\n<p>Let me \u2018parse\u2019 this example:<\/p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Delimiter   Token class    Rest of line\n            <span class=\"nt\">&lt;lvl1-instr<\/span>   <span class=\"nt\">&gt;<\/span>prepare_message(\n    ,       <span class=\"nt\">&lt;lvl2-item<\/span>    <span class=\"nt\">&gt;<\/span>title=\"Hey {}, ready for Christmas?\".format(user_name)\n    ,       <span class=\"nt\">&lt;lvl2-item<\/span>    <span class=\"nt\">&gt;<\/span>email=email\n    ,       <span class=\"nt\">&lt;lvl2-item<\/span>    <span class=\"nt\">&gt;<\/span>body= f\"\"\"\n        \"   <span class=\"nt\">&lt;lvl3-literal<\/span> <span class=\"nt\">&gt;<\/span>Reminder: please clean your chimneys!\n        \"   <span class=\"nt\">&lt;lvl3-literal<\/span> <span class=\"nt\">&gt;<\/span>\n        \"   <span class=\"nt\">&lt;lvl3-literal<\/span> <span class=\"nt\">&gt;<\/span>Oh, and prepare \"Santa Landing Spot\" on your roof\n        \"   <span class=\"nt\">&lt;lvl3-literal<\/span> <span class=\"nt\">&gt;<\/span>\n        \"   <span class=\"nt\">&lt;lvl3-literal<\/span> <span class=\"nt\">&gt;<\/span>Thank you {user_name} for cooperation,\\nSanta Corp.\n    ,       <span class=\"nt\">&lt;lvl2-item<\/span>    <span class=\"nt\">&gt;<\/span>additional_sections=[\n        ,   <span class=\"nt\">&lt;lvl3-item<\/span>    <span class=\"nt\">&gt;<\/span>get_current_promotions(n_promotions=4)\n        ,   <span class=\"nt\">&lt;lvl3-item<\/span>    <span class=\"nt\">&gt;<\/span>get_recent_news()\n    ,       <span class=\"nt\">&lt;lvl2-item<\/span>    <span class=\"nt\">&gt;<\/span>unsubscribe_link=generate_unsubscribe_link(\n        ,   <span class=\"nt\">&lt;lvl3-item<\/span>    <span class=\"nt\">&gt;<\/span>email\n        ,   <span class=\"nt\">&lt;lvl3-item<\/span>    <span class=\"nt\">&gt;<\/span>message=message\n        ,   <span class=\"nt\">&lt;lvl3-item<\/span>    <span class=\"nt\">&gt;<\/span>**unsubscribe_settings\n    ,       <span class=\"nt\">&lt;lvl2-item<\/span>    <span class=\"nt\">&gt;<\/span>attachments = []\n\\           <span class=\"nt\">&lt;lvl1-continue&gt;<\/span>.schedule_for_submission(\n    ,       <span class=\"nt\">&lt;lvl2-item<\/span>    <span class=\"nt\">&gt;<\/span>holidays_queue\n    ,       <span class=\"nt\">&lt;lvl2-item<\/span>    <span class=\"nt\">&gt;<\/span>important=user_is_santa | user_is_deer \n      \\|    <span class=\"nt\">&lt;lvl2-continue&gt;<\/span>| user_previously_had_issues_with_christmas_delivery\n<\/code><\/pre><\/div><\/div>\n\n<p>By looking only at the sequence of delimiters (there are several subtypes of them), \none can deduct limits of every code block \/ call \/ literal, i.e. derive top-level structure of the program.\nParser now deals with a simpler task of checking that elements fit this pre-defined structure,\nand can point places where \u2018structure\u2019 does not match \u2018content\u2019.<\/p>\n\n<p>Good bye old times when one deleted bracket caused complete rebuild of AST and numerous errors.<\/p>\n\n<h2 id=\"new-code-suggestions\">New code suggestions<\/h2>\n\n<p><em>This paragraph was added later, to unwrap the point that was missed by many readers.<\/em><\/p>\n\n<p>Parsing of correct code is not a problem since 1960s or so.\nReal challenge is on-the-fly parsing of partially incorrect and quickly-changing code in the process of editing.<\/p>\n\n<p>Say I\u2019m a complete novice and typed something wrong:<\/p>\n\n<div class=\"alex-boxes\">\n<pre class=\"precode\">\ndef myfunction(\n    var1 = 'some default value',\n    var2 = (1, (2, 3),\n)\n    var3 = \"variable number 3\"\n\n    var4 = \"\"\"\nSimple unfinished multiline string\n\"\"\" + \\\nvar<caret><\/caret>\n\n    var5 = ())\n<\/pre>\n<\/div>\n\n<p>what should be autosuggested? var1\/2\/3\/4? or nothing? Which would be more helpful?<\/p>\n\n<p>How to inform user which places should be fixed?\nVS Code blames bracket on first line saying it is not closed (while it is closed!) \nand last line for missing colon (no, I don\u2019t want colon there).\nPycharm\u2019s diagnostic messages are slightly better, but it blames line with var3 (which is completely ok).<\/p>\n\n<p>Now, in pseudo-python there is no way to \u2018escape\u2019 indentation and thus code analysis can rely on indentation.\nAnd it is immediately deducible that lines with var2 and var5 have problem, and indent of var3 is incorrect (since colon is missing on previous line).<\/p>\n\n<p>Autosuggestion even in code with multiple unfinished places would be still useful (in similar scenario in pseudo-python it still can suggest var3\/var4, and depending on tolerance additionally var1\/var2). Currently tools don\u2019t suggest anything.<\/p>\n\n<p>As I mentioned, AST undergoes small changes during editing, thus providing highly effecient autosuggestion, code analysis, and highlighting for such language would be simpler, much simpler.<\/p>\n\n<h2 id=\"new-editing\">New editing<\/h2>\n\n<div class=\"alex-boxes\">\n<div>\n    <p>Normal python. <br \/>\nsuppose you want to start a list of arguments<\/p>\n    <pre class=\"precode\">\nprint(<caret><\/caret>)\n<\/pre>\n    <p>after you hit enter in IDE:<\/p>\n    <pre class=\"precode\">\nprint(\n    <caret><\/caret>\n)\n<\/pre>\n    <p>then you type argument and comma. <br \/>\nReady to proceed<\/p>\n    <pre class=\"precode\">\nprint(\n    42,\n    <caret><\/caret>\n)\n<\/pre>\n    <p>Done? Arrow down + enter<\/p>\n    <pre class=\"precode\">\nprint(\n    42,\n    43,\n)\n<caret><\/caret>\n<\/pre>\n    <p>Forgot something? <br \/>\nDouble arrow up, <br \/>\nmove cursor to end of line,<br \/>\nenter<\/p>\n    <pre class=\"precode\">\nprint(\n    42,\n    43,\n    <caret><\/caret>\n)\n<\/pre>\n  <\/div>\n&nbsp;\n<div>\n    <p>Delimiter-first pseudo-python. <br \/>\nsuppose you want to start a list of arguments<\/p>\n    <pre class=\"precode\">\nprint(<caret><\/caret>)\n<\/pre>\n    <p>after you hit enter in IDE comma is auto-added:<\/p>\n    <pre class=\"precode\">\nprint(\n    , <caret><\/caret>\n\n<\/pre>\n    <p>you type only argument. <br \/>\nReady to preceed<\/p>\n    <pre class=\"precode\">\nprint(\n    , 42\n    , <caret><\/caret>\n<\/pre>\n    <p>Done? Enter + shift-tab<\/p>\n    <pre class=\"precode\">\nprint(\n    , 42\n    , 43 \n<caret><\/caret>\n<\/pre>\n    <p>Forgot something?\nTab<\/p>\n    <pre class=\"precode\">\nprint(\n    , 42\n    , 43 \n    , <caret><\/caret>\n<\/pre>\n  <\/div>\n<\/div>\n\n<p>The process of editing such structures was polished with hierarchical lists in word and other text processors.<\/p>\n\n<p>Below is an animated example from workflowy (taken from <a href=\"https:\/\/www.process.st\/take-better-notes\/\">post<\/a> by B. Brandall):\n<img src=\"https:\/\/www.process.st\/wp-content\/uploads\/2016\/01\/ezgif.com-crop-1.gif\" \/><\/p>\n\n<p>Even minimalist note-taking apps these days recognize the importance of hierarchical organization. \nTheir interface focuses on effectively traversing and modifying this structure.<\/p>\n\n<p>But with code - this extremely structured and standardized pieces of linked information - we continue the game of imitation: \u2018hey, that\u2019s just text files, you can use notepad here!\u2019.<\/p>\n\n<h2 id=\"new-versioning\">New versioning<\/h2>\n\n<p>Missing trailing commas make diffs a bit annoying because of including an additional line.<\/p>\n\n<p>New syntax has this solved. In other aspects versioning should work the same.<\/p>\n\n<h2 id=\"new-formatting\">New formatting<\/h2>\n\n<p>The goal of formatting is to produce a visual code structure that is easy to read,\nas if you already see all main components without reading anything.<\/p>\n\n<p>New syntax enforces this, and leaves fewer degrees of freedom.\nWriting something non-readable would be challenging\u2026 I suppose.<\/p>\n\n<p>Role of formatters thus would be minor, or they can be skipped.<\/p>\n\n<h2 id=\"limitations\">Limitations<\/h2>\n\n<p>First, I did not try to solve following perceptual problems:<\/p>\n\n<ul>\n  <li>commas are leading, and I\u2019ve mentioned that this was a problem for comma-first formatting<\/li>\n  <li>open brackets without a matching pair create visual discomfort. Also my eyes already trained to focus on closing brackets, but proper color scheme seems to solve this<\/li>\n<\/ul>\n\n<p>This post is already long, and leaving things closer to python simplifies example.\nI think both points can be improved, and feel free to post your ideas on this.<\/p>\n\n<p>Second, I intentionally focused only on improving multi-line constructs, but single-line collections were left untouched. That does not mean delimiter-first does not work there, but scale of necessary changes is just too high to justify gains. At least for now.<\/p>\n\n<h2 id=\"if-you-made-it-this-far\">If you made it this far<\/h2>\n\n<p>Wow, thank you!<\/p>\n\n<p>I hope an adventure was interesting and slightly mind blowing.<\/p>\n\n<p>Don\u2019t be too surprised if this proposal evokes \u201chey this looks wrong, just plain wrong\u201d reaction. <br \/>\nAfter all, ideas we enjoy these days: enumeration from zero, using registers in names, structural programming, mandatory formatting, \nand even python\u2019s approach to defining code blocks with indentation \u2014 \nevery single one of them were met with a storm of criticism.<\/p>\n\n<div style=\"text-align: center; font-size: 40px; padding: 110px\">\ud83d\udc4b<\/div>\n\n<h3 id=\"comments-\">Comments \ud83d\udcac<\/h3>\n\n<ul>\n  <li>\n    <p>I received and collected a number of links for using delimiter-first in different contexts (lisp\/scheme, formulas, translatable languages), will organize that material when I get time.<\/p>\n  <\/li>\n  <li>\n    <p>Isaac Z. Schlueter advised there is a term \u2018initiator\u2019, used in <em>\u201c\u2026 specification discussion threads, where it\u2019s common to dig deep into the particulars of parsing semantics.  Very much a \u2018deep in the weeds\u2019 kind of technical term.\u201d<\/em> \n  <br \/><br \/>\n  In the context of parsing I found the word \u2018initiator\u2019 in several papers, and only one mention on stackoverflow, so I\u2019ll stick to using word \u2018delimiter\u2019.<\/p>\n  <\/li>\n  <li>\n    <p>Other options mentioned in discussions: introducer,  starter<\/p>\n  <\/li>\n  <li>\n    <p>Peter Hilton noticed that <em>\u201c\u2026 startinators in prose usually called bullets. Some English-language style guides even treat the following punctuation as equivalent.<\/em><\/p>\n\n    <p>Brilliantly Wrong \u2014 Alex Rogozhnikov\u2019s blog about math, machine learning, programming, physics and biology.*<\/p>\n\n    <p>Brilliantly Wrong \u2014 Alex Rogozhnikov\u2019s blog about:<\/p>\n    <ul>\n      <li>math<\/li>\n      <li>machine learning<\/li>\n      <li>programming<\/li>\n      <li>physics<\/li>\n      <li>biology.<\/li>\n    <\/ul>\n\n    <p><em>Note the bullet list\u2019s trailing full stop (period). It\u2019s still one punctuated sentence.\u201d<\/em><\/p>\n\n    <p>Indeed, name \u2018bullet\u2019 sounds very appropriate when discussing code written in delimiter-first style.\n  From parsing side, I don\u2019t feel it\u2019s a good partner to word \u2018terminator\u2019.\n  <br \/><br \/><\/p>\n  <\/li>\n  <li>\n    <p>Thanks to Alexander Molchanov for proofreading, improving text, and leaving comments.<\/p>\n  <\/li>\n  <li>\n    <p>Question: \u201cWho did you write this for?\u201d<\/p>\n\n    <p>I believe that\u2019s a better way to structure code (for readability, editing, and better language tools).\nBased on what I\u2019ve learnt so far, I am sceptical about integration of additional syntax to existing languages:\ntwo notations side-by-side are worse for users than one. \nFrom the perspetive of language maintainers, all tooling would need to deal with two dialects, which is also a downgrade.<\/p>\n\n    <p>So main audience are <em>authors of new programming languages.<\/em> \nHowever, it is not only authors - to get adopted, any new feature should get at least minimal support from community. That\u2019s where this page can help.\nSo more generally, I target people <em>interested in experimenting around new programming languages<\/em>, and interested in challenging status-quo.<\/p>\n  <\/li>\n  <li>\n    <p>Question: \u201cBut how will you represent a couple of multiline lists next to each other?\u201d<\/p>\n\n    <p>This case is handled normally:<\/p>\n    <div class=\"alex-boxes\">\n      <p><\/p>\n      <div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>  f([                \n      a,\n      b,\n  ], [\n      c,\n      d,\n  ])\n<\/code><\/pre><\/div>      <\/div>\n      <div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>  f([                \n      , a\n      , b\n  \\,[\n      , c\n      , d\n\n<\/code><\/pre><\/div>      <\/div>\n      <p><\/p>\n    <\/div>\n\n    <p>For the record, I\u2019d prefer to introduce variables in any case.<\/p>\n  <\/li>\n  <li>\n    <p>Question \u201cDon\u2019t you think that current tools have already solved the issues solved by delimiter-first?\u201d<\/p>\n\n    <p>I developed a simple 4-line code with missed comma that is compeletely fine for flake8 and ruff. And black formatter considers it well-formatter.\nIt took me less than a minute to develop this example, and if you start thinking, I\u2019m sure you\u2019ll find a handful of similar cases.\nAuthors of one utitity that is supposed to mark these cases <a href=\"https:\/\/blog.devgenius.io\/5-of-666-python-repos-had-comma-typos-including-tensorflow-and-pytorch-sentry-and-v8-7bc3ad9a1bb7\">claim<\/a> that \u20185% of 666 Python repos had comma typos (including Tensorflow, and PyTorch, Sentry, and V8)\u2019.<\/p>\n\n    <p>We can continue patching problems with even more tools and more special cases, but I\u2019d better have it solved by design.\nCore point is - <em>delimiter-last is flawed<\/em>.\nMain visual cues (indentation) is on the left, while there are still control sequences that can override indentation, and they are on the right. \nFor this reason <code class=\"language-plaintext highlighter-rouge\">\\<\/code> in the end of line is a bad choice.<\/p>\n  <\/li>\n<\/ul>\n\n<!---\nTODO mention differences in code suggestions\n\nTODO\njtree allows conversion between syntaxes\nhttps:\/\/jtree.treenotation.org\/designer\/#hakon-readme\n\nlisp version of syntax\n    https:\/\/gist.github.com\/armstnp\/bb2a88bcb053d2195f42c60a0cf15a65\nlisp proposals, more (Via Nikishkin) \n    https:\/\/srfi.schemers.org\/srfi-49\/\n    https:\/\/srfi.schemers.org\/srfi-110\/\none more version of lisp:\n    http:\/\/calcit-lang.org\/\n\nelm https:\/\/elm-lang.org\/docs\/style-guide\nocaml https:\/\/github.com\/ocaml-ppx\/ocamlformat\/blob\/main\/test\/failing\/gen\/gen.ml\n\nRuby has no-delimiter lists (not so interesting)\n\nCoffeescript and Civet\nhttps:\/\/github.com\/DanielXMoore\/Civet \"Coffeescript for typescript\"\n\nhttp:\/\/www.rebol.com\/pre-view.html\n\nleslie lamport and formulas https:\/\/www.hpl.hp.com\/techreports\/Compaq-DEC\/SRC-RR-119.pdf\n-->\n","pubDate":"Tue, 29 Nov 2022 01:00:00 +0000","link":"https:\/\/arogozhnikov.github.io\/2022\/11\/29\/delimiter-comes-first.html","guid":"https:\/\/arogozhnikov.github.io\/2022\/11\/29\/delimiter-comes-first.html","category":["delimiter","separator"]},{"title":"Things I wish someone told me about microscopy","description":"<h2 id=\"if-you-want-to-learn-some-culprits-of-microscopy\">If you want to learn some culprits of microscopy<\/h2>\n\n<p>\u2026 you\u2019d better watch this video by microbehunter,\nbecause rest of the post is view of ML person on things \nyou should (not) expect from lab microscopy during experiment design.<\/p>\n\n<iframe width=\"560\" height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/Ir9TGt6zljI\" frameborder=\"0\" allow=\"clipboard-write; encrypted-media; picture-in-picture\" allowfullscreen=\"\">\n<\/iframe>\n\n<p><strong>Warning:<\/strong><br \/>\nThis post contains reflections and is not meant to be an easy reading.<br \/>\nThis post assumes that you understand wave mechanics.<\/p>\n\n<p>I have a nice general background in physics,\nhowever just that was clearly insufficient \u2014 a lot of specific knowledge that is hard to deduce from first principles.<\/p>\n\n<h2 id=\"general-remarks\">General remarks<\/h2>\n\n<ul>\n  <li>there are myriads of different microscopes from trivial ones for mid-schools to EM (electron microscopes) and light-sheets\n    <ul>\n      <li>Ranges of prices from hundreds of dollars to millions. In some applications 100x cheaper microscope can still be more useful<\/li>\n      <li>Manual and automated. Terribly expensive still may be non-automated<\/li>\n    <\/ul>\n  <\/li>\n  <li>microscopes are typically designed to be modular, many parts are interchangeable;\nthere is still vendor- and format- specificity<\/li>\n  <li>when a microscope is automated, that typically means that it can at least move its specimen\n(yes, specimen is moved, microscope\u2019s camera and light path are usually steady)\n    <ul>\n      <li>it may or may not be able to switch excitation \/ emission filters automatically, so \u2018automated\u2019 is not a descriptive word. \nAsk about what is automated<\/li>\n    <\/ul>\n  <\/li>\n  <li>while typically microscopes are just \u2018make a photo with light\u2019 devices, software for microscopes is a tough topic.\n    <ul>\n      <li>manufacturers desire to provide a visual interface with windows and buttons, \nand mapping all countless scenarios to a sequence of buttons is \u2026 challenging<\/li>\n      <li>as a result both API and interface are far from satisfactory<\/li>\n    <\/ul>\n  <\/li>\n  <li>light source is not moved with specimen, but instead aligned and fixed relative to camera.\n    <ul>\n      <li>You can\u2019t image with different shifts but \u2018same light position\u2019<\/li>\n    <\/ul>\n  <\/li>\n  <li>immersion is quite critical when going to higher resolutions (above 20x)<\/li>\n  <li>objective on a microscope has everything aligned and focusing depth can be adjusted or changed.\n(objectives are also pretty expensive). That\u2019s not your smartphone\u2019s refocusing camera. \nSo 40x on your microscope means that object of size n<em>m in focusing plane (which is fixed) \nliterally projects in 40n<\/em>40m on detector plane. \nTo complete arithmetics you only need physical size of pixel in a camera - and voila - you have \u2018size of specimen pixel\u2019.<\/li>\n  <li>for a long time I was surprised that biologists are so limited by the number of fluorescent channels\nthey can image simultaneously (emission spectra overlap, so you want them to be separable).\n    <ul>\n      <li>At the same time they don\u2019t switch to quantum dots (which have much narrower emission spectra).\nPermeability may be an issue here<\/li>\n      <li>And they don\u2019t try to go significantly outside of visible spectrum.\n        <ul>\n          <li><em>probably<\/em> this is due to objectives - correcting aberrations for wide spectrum range is tough<\/li>\n        <\/ul>\n      <\/li>\n      <li>Another factor is penetration depths variability (even within water) for different wavelengths<\/li>\n      <li>You can take images in IR, but going to deep IR is ultra-rare<\/li>\n    <\/ul>\n  <\/li>\n  <li>there is an uncountable amount of imaging techniques. <br \/>\nDozens of them with all their variations, with all covering only some part of information.\n    <ul>\n      <li>Very hard to combine many in the same system (while some useful combinations exist)<\/li>\n      <li>Dream of machine learner - having different imaging systems for the same specimen - can be implemented only in specific cases<\/li>\n    <\/ul>\n  <\/li>\n  <li>more powerful microscope requires identical efforts on sample\/environment side\n    <ul>\n      <li>Higher magnification requires better compensation of motion<\/li>\n      <li>More sensitive to optical properties means you\u2019ll see more artifacts from anything in your system. \nOr maybe plates or slides.\n        <ul>\n          <li>E.g. if method can detect birefringence, any plastic labware is likely to add some birefringence patterns<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li>well edges introduce significant effects, plate edges also introduce some effects for imaging (both also affect biological processes)<\/li>\n  <li><a href=\"https:\/\/www.youtube.com\/user\/iBioEducation\">ibiology<\/a> provides an amazing combination of theory and practice of imaging.\nIt was incredibly helpful<\/li>\n  <li>imaging protocols are hardly readable. Too many things and parameters, no deduplication.\n    <ul>\n      <li>They remind completely unwrapped low-level code for execution by machine, not \u2018settings\u2019.<\/li>\n      <li>I\u2019ve told about software being tough here, right? There are issues with interfaces on all levels<\/li>\n    <\/ul>\n  <\/li>\n  <li>imaging time is a real issue\n    <ul>\n      <li>\u201coh, we can just increase stack size\u201d is correct solution to many questions in theory, \n but not in practice<\/li>\n    <\/ul>\n  <\/li>\n  <li>reproducible focusing may be an issue<\/li>\n  <li>richest sources of information are available only for ex-vivo cells and tissues<\/li>\n  <li>anything that produces nice high-resolution images will be called by biologist \u201cconfocal\u201d \nno matter if confocality is actually used there :)<\/li>\n  <li>believe data, always believe data. \nIf you think something is misaligned - it almost surely is.<\/li>\n<\/ul>\n\n<h2 id=\"contrasting-methods\">Contrasting methods<\/h2>\n\n<iframe width=\"560\" height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/FUa1GTc69y4\" f=\"\" rameborder=\"0\" allow=\"autoplay; clipboard-write; encrypted-media; picture-in-picture\" allowfullscreen=\"\"><\/iframe>\n\n<p>The main way to achieve contrast is by using monochromatic (i.e. laser) light, \nand achieve shift in phase between \u201crays\u201d started from the same source. \nShift in phase affected by specimen provides a contrast visible by a simple detector.<\/p>\n\n<ul>\n  <li>Simplest example is <a href=\"https:\/\/www.olympus-lifescience.com\/en\/microscope-resource\/primer\/techniques\/dic\/dicconfiguration\/\">DIC<\/a> \n(differential interference contrast) - light is split in two parts, \nwhich come through neighboring positions in slide<\/li>\n  <li>Another example is polarization contrast, where light comes though the same specimen but \ndue to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Birefringence\">birefringence<\/a> of some materials different polarizations come with different speed, \nwhich produces retardation of one polarization<\/li>\n  <li><a href=\"https:\/\/www.microscopyu.com\/tutorials\/comparison-of-phase-contrast-and-dic-microscopy\">Phase contrast<\/a> \norganizes interference between scattered and passed through waves.\nPhase delay adds phase to scattered light. Simplest to setup of these three.<\/li>\n<\/ul>\n\n<p>An important property of contrasting optical paths is that optical path lengths \nfor light arriving to the same location should be identical (unless sample perturbations prevent this).\nOptical path is not distance, but time taken by light to travel along a trajectory.<\/p>\n\n<p>That\u2019s a simple thought and sounds like a natural, but when you look at optical system with all its lenses, \nyou should realize it\u2019s non-trivial behavior.<\/p>\n\n<h2 id=\"amazing-variability-of-imaging-techniques\">Amazing variability of imaging techniques<\/h2>\n\n<p>Microscopy world is very limited within one lab (even optical lab) \nbut whole large world of microscopy is so rich and interesting out there.<\/p>\n\n<ul>\n  <li>Multi-photon imaging\n    <ul>\n      <li>deliver energy required for excitation with several photon simultaneously<\/li>\n      <li>requires an expensive laser, but imaging is simple<\/li>\n      <li>can go quite deep into tissue<\/li>\n      <li>can\u2019t guarantee narrow emission spectra because different number of ph<\/li>\n    <\/ul>\n  <\/li>\n  <li>Electron microscopy\n    <ul>\n      <li>super precise (it\u2019s completely different part of spectra)<\/li>\n      <li>ex-vivo samples only<\/li>\n      <li>requires isolated rooms and strong movement compensation<\/li>\n      <li>not something you will simply hold in a lab, but provides extremely detailed image<\/li>\n    <\/ul>\n  <\/li>\n  <li>LSM: light-sheet microscopy is a demonstration that light source does not have to be on the same axis,\nwhile it sounds like an axiom after lab scopes\n    <ul>\n      <li>LLSM is times cooler<\/li>\n    <\/ul>\n  <\/li>\n  <li>\n    <p>TIRF (total internal reflection) microscopy when combined with photo-activable fluorescent proteins (PALM\/STORM) \ncan get to tracking trajectories of individual proteins (while still using visible range spectrum).<\/p>\n  <\/li>\n  <li>\n    <p>Another interesting idea is FRET - allows detecting interaction between single molecules \nif those have appropriate fluorescent tags. <br \/>\nPhotons emitted by one antibody are absorbed by the second one if molecules are in proximity of each other.<\/p>\n  <\/li>\n  <li><a href=\"https:\/\/www.youtube.com\/watch?v=HJnNJIUPm4s\">optical coherence tomography<\/a> OCT\n    <ul>\n      <li>has nothing to do with tomography and even works based on reflected light<\/li>\n      <li>widely used for retina scanning<\/li>\n    <\/ul>\n  <\/li>\n  <li><a href=\"https:\/\/www.youtube.com\/watch?v=tTHvVCPaeWQ\">Ghost imaging<\/a>. Not-yet-there, but idea is mind-blowing\n    <ul>\n      <li>entangle two photons<\/li>\n      <li>the first one hits the target, while the second goes to detector<\/li>\n      <li>entanglement allows partially reconstructing properties of a photon that hit the target<\/li>\n      <li>there are classical variations as well<\/li>\n    <\/ul>\n  <\/li>\n  <li>Structured illumination (SIM)\n    <ul>\n      <li>Moir patterns + a bit of computational magic allows you going slightly \nabove optical resolution limit<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<p>You may want to check this video \nto orient yourself a bit and get a sense of what sounds appropriate for your case.<\/p>\n\n<iframe width=\"560\" height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/01v2kR8dlnQ\" frameborder=\"0\" allow=\"autoplay; clipboard-write; encrypted-media; picture-in-picture\" allowfullscreen=\"\"><\/iframe>\n\n","pubDate":"Sun, 01 Nov 2020 12:00:00 +0000","link":"https:\/\/arogozhnikov.github.io\/2020\/11\/01\/microscopy.html","guid":"https:\/\/arogozhnikov.github.io\/2020\/11\/01\/microscopy.html","category":"Microscopy"},{"title":"Don't write command-line interfaces (generate them)","description":"<p style=\"color: #666677\">\n(a friendly reminder that reading post before commenting is a great idea. \nSome people see this as an argument for GUI, but it is completely misleading)\n<\/p>\n\n<p>A favourite activity of fresh github-bers is writing CLI (command-line interfaces) for anything.<\/p>\n\n<p>Every programmer uses CLI <strong>(true)<\/strong>, so writing CLI makes you more professional <strong>(false)<\/strong>.<\/p>\n\n<p>CLIs are required in everyday maintenance, env\/pipeline\/db management, and checking this and that.\nIt is a glue to keep different subsystems together, but hardly CLI is a reliable programming interface.\nProgress in software engineering left bash calls far behind in terms of reliability and flexibility.<\/p>\n\n<h3 id=\"whats-wrong-with-writing-cli-as-an-interface\">What\u2019s wrong with writing CLI as an \u2018interface\u2019?<\/h3>\n\n<ul>\n  <li>CLI support is an additional logic in your program that makes <strong>no real work<\/strong><\/li>\n  <li>While typically being dumb, CLI logic is frequently <strong>filled with <a href=\"https:\/\/github.com\/search?q=bug+command+line&amp;type=Issues\">mistakes<\/a><\/strong>;\nthus it requires constant maintenance and an additional testing<\/li>\n  <li><strong>Error (exception) handling<\/strong> with CLI is very poor.\nAnother layer of (bad faulty) code is required to make it possible<\/li>\n  <li><strong>Scaling\/extending<\/strong> is not as easy compared to programming language APIs \n(see example in the end)<\/li>\n  <li>CLIs are detached from essential code, which in most cases is a disadvantage.\n    <details>\n      <summary>more on this<\/summary>\n      <p>Forcing users to use CLI means: stay away from my code, you\u2019d better not work with it.\n  Maybe that\u2019s ok \u2014 but if users can code a bit (otherwise why do they use CLI?), \n  that\u2019s not an optimal way \u2014 if something went wrong, \n  do you want to directly see the code+calls that failed or do you want to add \n  several minutes\/hours walking thru command args parsing machinery someone else wrote? \n  <br \/>\n  While being questionable in small projects, a virtual fence becomes more and more obvious when parsing logic\n  (validation, transformation, routing)  grows.<\/p>\n    <\/details>\n  <\/li>\n<\/ul>\n\n<h3 id=\"writing-command-line-interfaces-the-right-way\">Writing command-line interfaces the right way<\/h3>\n\n<ul>\n  <li>write functions<\/li>\n  <li>leave CLI-fication to a special package<\/li>\n<\/ul>\n\n<h3 id=\"which-tool-to-use-for-writing-command-line-interfaces-in-python\">Which tool to use for writing command-line interfaces in Python?<\/h3>\n\n<p>Here are the options that you should consider \u2026<\/p>\n\n<ul>\n  <li><a href=\"https:\/\/docs.python.org\/3\/library\/argparse.html\">argparse<\/a> (or ancient optparse)<\/li>\n  <li><a href=\"https:\/\/click.palletsprojects.com\/en\/7.x\/\">click<\/a><\/li>\n  <li><a href=\"http:\/\/docopt.org\/\">docopt<\/a><\/li>\n  <li><a href=\"https:\/\/github.com\/google\/python-fire\">python-fire<\/a><\/li>\n<\/ul>\n\n<p>\u2026 <strong>deprecated<\/strong>. Yes, consider them deprecated.<\/p>\n\n<p>Prefer <a href=\"https:\/\/hugapi.github.io\/hug\/\">hug<\/a> and <a href=\"https:\/\/github.com\/tiangolo\/typer\">typer<\/a>.\nExample for the latter:<\/p>\n\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import<\/span> <span class=\"nn\">typer<\/span>\n<span class=\"kn\">from<\/span> <span class=\"nn\">pathlib<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">Path<\/span>\n\n<span class=\"n\">app<\/span> <span class=\"o\">=<\/span> <span class=\"n\">typer<\/span><span class=\"p\">.<\/span><span class=\"n\">Typer<\/span><span class=\"p\">()<\/span>\n\n<span class=\"o\">@<\/span><span class=\"n\">app<\/span><span class=\"p\">.<\/span><span class=\"n\">command<\/span><span class=\"p\">()<\/span>\n<span class=\"k\">def<\/span> <span class=\"nf\">find_dragon<\/span><span class=\"p\">(<\/span><span class=\"n\">name<\/span><span class=\"p\">:<\/span> <span class=\"nb\">str<\/span><span class=\"p\">,<\/span> <span class=\"n\">path<\/span><span class=\"p\">:<\/span> <span class=\"n\">Path<\/span><span class=\"p\">,<\/span> <span class=\"n\">min_age_years<\/span><span class=\"p\">:<\/span> <span class=\"nb\">int<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">200<\/span><span class=\"p\">):<\/span>\n    <span class=\"o\">&lt;<\/span><span class=\"n\">implementation<\/span> <span class=\"n\">goes<\/span> <span class=\"n\">here<\/span><span class=\"o\">&gt;<\/span>\n\n<span class=\"o\">@<\/span><span class=\"n\">app<\/span><span class=\"p\">.<\/span><span class=\"n\">command<\/span><span class=\"p\">()<\/span>\n<span class=\"k\">def<\/span> <span class=\"nf\">feed_dragon<\/span><span class=\"p\">(<\/span><span class=\"n\">dragon_name<\/span><span class=\"p\">:<\/span> <span class=\"nb\">str<\/span><span class=\"p\">,<\/span> <span class=\"n\">n_humans<\/span><span class=\"p\">:<\/span> <span class=\"nb\">int<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">3<\/span><span class=\"p\">):<\/span>\n    <span class=\"o\">&lt;<\/span><span class=\"n\">implementation<\/span> <span class=\"n\">goes<\/span> <span class=\"n\">here<\/span><span class=\"o\">&gt;<\/span>\n\n<span class=\"k\">if<\/span> <span class=\"n\">__name__<\/span> <span class=\"o\">==<\/span> <span class=\"s\">\"__main__\"<\/span><span class=\"p\">:<\/span>\n    <span class=\"n\">app<\/span><span class=\"p\">()<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Now it\u2019s ready to be invoked from shell<\/p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>python example.py find_dragon 'Drake' --path \/on\/my\/planet\n<\/code><\/pre><\/div><\/div>\n<p>That\u2019s it! Types are parsed, checked and converted. \nDefaults and description are picked from function itself. \nEven provides bash completions you can install. \nBest part is you wrote no code for that!<\/p>\n\n<h3 id=\"-i-need-to-invoke-my-code-from-bash-with-complex-parameterization\">\u2014 I need to invoke my code from bash with complex parameterization<\/h3>\n\n<p>Exact wording of this question may also include job schedulers, calls on remote machines \nand docker run\/exec \u2014 common reasons that force people to write CLI.<\/p>\n\n<p>Previous recipe may not work in this case, you have two options:<\/p>\n\n<p><strong>Option A.<\/strong><\/p>\n\n<p>Read documentation for <em>deprecated<\/em> packages, \nwrite a ton of code for conversion, validation, testing and mocking.\nAdd documentation, make presentations about CLI logic and neat places of using bash, \nget promoted to Senior CLI architect, give talks and interviews. \nSome junior in your company discovers <em>option B<\/em> and ruins your career.<\/p>\n\n<p><strong>Option B<\/strong>.<\/p>\n\n<p>When there is much to configure, \ndon\u2019t try to build a large parsing machinery to handle all cases, \njust <strong>use code<\/strong> to parameterize calls:<\/p>\n\n<div class=\"language-bash highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>python <span class=\"nt\">-c<\/span> <span class=\"s2\">\"\nfrom mymodule import set_dragon_feeding_schedule, Creatures, Date\nset_dragon_feeding_schedule(\n    feeding_times=['10:00', '14:00', '18:00'],\n    dishes={Creatures.Tiger: 2, Creatures.Human: 1},\n    start_day=Date('1020-03-01'),\n)\n\"<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Instead of<\/p>\n<div class=\"language-bash highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>python <span class=\"nt\">-m<\/span> mymodule <span class=\"se\">\\<\/span>\n    set_dragon_feeding_schedule <span class=\"se\">\\<\/span>\n    <span class=\"nt\">--feeding-times<\/span> <span class=\"o\">[<\/span><span class=\"s1\">'10:00'<\/span>,<span class=\"s1\">'14:00'<\/span>,<span class=\"s1\">'18:00'<\/span><span class=\"o\">]<\/span> <span class=\"c\"># hopefully this way it gets recognized \\<\/span>\n    <span class=\"c\"># how will you define parsing a dict with enum to integer mapping? <\/span>\n    <span class=\"nt\">--dishes<\/span><span class=\"o\">=<\/span>Creatures.Tiger:2 <span class=\"se\">\\<\/span>\n    <span class=\"nt\">--dishes<\/span><span class=\"o\">=<\/span>Creatures.Human:1 <span class=\"se\">\\<\/span>\n    <span class=\"nt\">--start-day<\/span><span class=\"o\">=<\/span>1020-03-21 <span class=\"c\"># BTW bash allows no comments in multiline calls<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<ul>\n  <li>How many lines of code you need to cover parsing logic in previous example?\n    <ul>\n      <li>Try to be reasonable, not optimistic. Don\u2019t forget documentation.<\/li>\n      <li>Add testing, mocking, \u2026 have you <em>ever<\/em> seen that part done properly for CLIs?<\/li>\n    <\/ul>\n  <\/li>\n  <li>Is there anything that you win after writing an explicit CLI parsing? Double quote maybe?<\/li>\n  <li>Exception handling \u2014 simple to add in one case, very tough in the other<\/li>\n<\/ul>\n\n<h3 id=\"-never-realized-that-cli-command-can-be-replaced-by-python-command\">\u2014 Never realized that CLI command can be replaced by python command<\/h3>\n\n<p>You\u2019re welcome! This can save you weeks of time and sleepless nights.<\/p>\n\n<p>Here is definitive guide:<\/p>\n\n<ol>\n  <li>Don\u2019t write yet-another-parser \u2014 python can parse all you need<\/li>\n  <li>Don\u2019t reinvent representing lists, dicts, enums, objects, etc in text \u2014 every programming language has it already solved<\/li>\n  <li>Don\u2019t create new <em>types<\/em> of interfaces \u2014 functions <em>are<\/em> interfaces<\/li>\n  <li>Don\u2019t write parsing logic\/validation \u2014 check parameters instead<\/li>\n<\/ol>\n\n<p>Focus on writing useful and friendly functional interface, not CLI.<\/p>\n\n<h3 id=\"-how-about-an-example-for-dealing-with-more-complex-parameterization\">\u2014 How about an example for dealing with more complex parameterization?<\/h3>\n\n<p>Sure! Here is an example from machine learning.<\/p>\n\n<p>Common headache is supporting multiple optimization algorithms (each having own set of parameters)\nand allowing a number of architectures (each also having different parameters).<\/p>\n\n<div class=\"language-bash highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>python <span class=\"nt\">-c<\/span> <span class=\"s2\">\"\nfrom yourpackage import ResidualNetwork, AdamOptimizer, train, activations\ntrain(\n    optimizer=AdamOptimizer(lr=0.0001, some_param=42, converge=True),\n    model=ResidualNetwork(n_layers_in_each_group=[3,4,5,6], act=activations.ReLU, n_classes=1234),\n    save_path='\/research\/my_experiment_number9999',\n)\n\"<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Compare this piece of clarity and versatility to a parsing nightmare happening in some popular packages.<\/p>\n\n<p>Why it becomes such a nightmare? That\u2019s a great question!<\/p>\n\n<ul>\n  <li>parameters depend on each other in a non-trivial way.\nDifferent model \u2192 different parameters. Added a model \u2014 update CLI.<\/li>\n  <li>there should be a way to associate parameters with an entity they come from\n    <ul>\n      <li>is this parameter for an architecture? for an optimizer? for a dataset?<\/li>\n      <li>entities that appear naturally in programming interfaces are not in the style of bash calls<\/li>\n    <\/ul>\n  <\/li>\n  <li>at some point second model appears (hi GANs!), and possibly a second optimizer, \nseveral types of datasets\u2026 now you need to support all of that in CLI and avoid flag collisions\n    <ul>\n      <li>unlikely you want to frequently drop previous interface, so backward-compatibility will multiply your problems<\/li>\n    <\/ul>\n  <\/li>\n  <li>validation logic that is capable of handling all these scenarios would be huge, buggy \nand not helpful at all<\/li>\n<\/ul>\n\n<p><strong>CLIs don\u2019t scale up well<\/strong>.<br \/>\nThey work well only when you can decompose things into simpler components \u2018each doing one job\u2019.\nBefore writing CLI, it is thus important to know what is the functionality \nyour project provides and how it may change in a year or two.\nIt is very easy to add CLI when the project is in its initial stage \u2014 \nbut as functionality grows, you\u2019ll find it exponentially harder to fit all knobs into CLI.<\/p>\n\n<p>Other programming interfaces survive growth quite easily.<\/p>\n\n<h2 id=\"looking-forward\">Looking forward<\/h2>\n\n<p>In the bright future of programming there will be more natural bridges between different languages.\nWith growing capabilities for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Reflection_(computer_programming)\">reflection<\/a>, \nit will be easier to invoke particular functions from other languages without intermediate bash calls.\n<a href=\"https:\/\/pyo3.rs\/\">Python&lt;&gt;rust<\/a> is a good example of going in this direction.<\/p>\n\n<p>By not writing CLI logic and focusing on programming interface you make code future-proof.\n<a href=\"https:\/\/fastapi.tiangolo.com\/\">Different<\/a> <a href=\"https:\/\/fastapi.tiangolo.com\/alternatives\/\">utilities<\/a> already can convert functions to REST API \n(we may later use some other network APIs like gRCP, and you\u2019ll be able to add it with a couple of lines).\nMore to come, maybe we should expect utilities to auto-wrap your functions for calling from other languages\/hosts\/universes.<\/p>\n\n<p>Code should be designed to be used by other code first.\nConvenience \u2018temporary\u2019 command-line utilities sooner or later become part of bigger automated pipelines \nif no other API proposed.<\/p>\n\n<h2 id=\"tldr\">TL;DR<\/h2>\n\n<ul>\n  <li>simple CLIs should be auto-generated today, don\u2019t write it yourself\n    <ul>\n      <li>other types of APIs can be auto-generated as well<\/li>\n    <\/ul>\n  <\/li>\n  <li>complex CLIs are a problem and think twice (better, 5 times) before trying to replace programming API with CLI\n    <ul>\n      <li>convenient command-line calls are available without writing a single line of CLI code<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<p><br \/><\/p>\n\n<p><br \/><\/p>\n\n<details>\n  <summary>\n<span style=\"font-size: 1.5em;\"> Additional comments <\/span>\n<\/summary>\n  <ul>\n    <li>I use python as an example because 1) need to show some code 2) it is popular 3) I know it well enough. <br \/>\nHowever, the points made should be valid for all modern languages (C++ is not a modern language just in case).<\/li>\n    <li>Itamar Turner-Trauring has an article on a relevant topic in his called \n<a href=\"https:\/\/pythonspeed.com\/articles\/shell-scripts\/\">please stop writing shell scripts<\/a>. \nItamar provides numerous helpful recommendations and tips in his blog, and this is no exception.<\/li>\n  <\/ul>\n<\/details>\n\n<details>\n  <summary>\n<span style=\"font-size: 1.5em;\"> Possible objections <\/span> \n<\/summary>\n  <ul>\n    <li>CLI allows abstracting out from implementation\n      <ul>\n        <li>Exposed functions can be detached from an actual implementation<\/li>\n      <\/ul>\n    <\/li>\n    <li>User may not know programming language I use\n      <ul>\n        <li>Unlikely import and a function call can be misleading. By hiding details you leave user clueless in case something doesn\u2019t work<\/li>\n        <li>Actual choice is whether user should learn a bit of your language or yet-another-CLI system. Hard to find argument for the latter<\/li>\n        <li>If your tool requires detailed configuration, \nyou shouldn\u2019t be afraid to say: you need to write several lines of code, here is an example<\/li>\n      <\/ul>\n    <\/li>\n    <li>My application heavily uses bash\/shell features: pipes, process substitutions and filename expansions\n      <ul>\n        <li>In this case when you want to keep using and supporting CLI<\/li>\n      <\/ul>\n    <\/li>\n  <\/ul>\n<\/details>\n\n<details>\n  <summary>\n<span style=\"font-size: 1.5em;\"> Comments on packages <\/span>\n<\/summary>\n\n  <p><strong>What\u2019s wrong with <code class=\"language-plaintext highlighter-rouge\">python-fire<\/code>?<\/strong><\/p>\n\n  <p>While it builds CLI on the top of exposing functions\/methods,\n<code class=\"language-plaintext highlighter-rouge\">fire<\/code> ignores annotations and tries to guess types based on input.<\/p>\n\n  <p>An example from an official documentation to confirm:<\/p>\n  <div class=\"language-bash highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ <\/span>python example.py 10\nint\n<span class=\"nv\">$ <\/span>python example.py <span class=\"s2\">\"10\"<\/span>\nint\n<span class=\"nv\">$ <\/span>python example.py <span class=\"s1\">'\"10\"'<\/span>\nstr\n<\/code><\/pre><\/div>  <\/div>\n  <p>So 1) no types guaranteed 2) convolved logic 3) to make sure argument is not converted to int,\nwrap in both single and double quotes. \nNow wrap it in a bash call (e.g. during building docker).\nHave fun with escaping quotes for every string argument.<\/p>\n\n  <p><strong><code class=\"language-plaintext highlighter-rouge\">Hug<\/code> has a poor support for CLIs (as of now)<\/strong><\/p>\n\n  <p>Be warned, it ignores flag names. \nThough it has right direction of thought and directly supports <code class=\"language-plaintext highlighter-rouge\">marshmallow<\/code> types.\nBut in the meantime (Oct 2020) <code class=\"language-plaintext highlighter-rouge\">typer<\/code> is a safer choice.<\/p>\n\n  <p>Interface package of a dream is not released yet \u2014 it should support both CLI and web APIs and include some elements from python-fire.\nHowever, this should not stop you, as switches between these packages is almost painless as long as you write no custom logic.<\/p>\n\n<\/details>\n\n<details>\n  <summary>\n<span style=\"font-size: 1.5em;\"> Acknowledgements <\/span>\n<\/summary>\n  <p>Thanks to <a href=\"https:\/\/github.com\/tlikhomanenko\">Tatiana<\/a> for proof-reading an initial version of this post.<\/p>\n<\/details>\n\n<!-- maybe mention TAP https:\/\/github.com\/swansonk14\/typed-argument-parser -->\n","pubDate":"Thu, 01 Oct 2020 12:00:00 +0000","link":"https:\/\/arogozhnikov.github.io\/2020\/10\/01\/dont-write-cli.html","guid":"https:\/\/arogozhnikov.github.io\/2020\/10\/01\/dont-write-cli.html","category":["Programming","Python","Command-line interfaces"]},{"title":"Twin training: trick for better model comparisons","description":"<p>Abstract: <em>Frequently comparing deep learning models?<br \/>\nA simple way to improve comparison is discussed here, \nthis trick becomes specially handy when comparing segmentation models.<\/em><\/p>\n\n<p>Reliable comparison of models is a question important for DL \u201ctheorists\u201d (to evaluate new approaches) \nas well as for practitioners\/engineers (to select an approach for a particular task in hand).\nComparison is time-consuming process, frequently with noisy results.<\/p>\n\n<p>Usual setting incorporates fixed dataset split into train\/val\/test and fixed metric of choice. \nNext, independent runs are conducted for all models under comparison and achieved quality is registered.<\/p>\n\n<p>As a result,<\/p>\n\n<ul>\n  <li>There is a significant noise in comparison (it is rare to rerun each model several times, specially in applications),<\/li>\n  <li>Validation can be done only using whole dataset<\/li>\n  <li>need to remember which version of code was used to generate a particular number, as you can \naccidentally compare things that are not \u2018comparable\u2019 because of e.g. changed augmentation or updates in the dataset\n    <ul>\n      <li>yes, practitioners have to deal with frequent updates in the dataset<\/li>\n    <\/ul>\n  <\/li>\n  <li>can\u2019t use augmentations while testing, since it is hard to guarantee that exactly same augmentations were applied.\nSometimes it is handy to evaluate using several batches as a fast intermediate check. Augmentations in test allow \u2018broader\u2019 check.<\/li>\n<\/ul>\n\n<h2 id=\"what-is-suggested-twin-training\">What is suggested: twin training<\/h2>\n\n<p>Models can be trained <strong>side-by-side within the same process<\/strong>, with as high similarity in the training process as possible.\nSame batches, same augmentations, and of course the same datasets.<\/p>\n\n<ul>\n  <li>If models, say, have identical architecture, their initial weights should be identical (easy to achieve in any DL framework).\n    <ul>\n      <li>As we know, initial state influences optimization, in some cases drastically (that\u2019s not desirable, but happens).<\/li>\n    <\/ul>\n  <\/li>\n  <li>During training, same exact batches with the same exact augmentation should be used to optimize models.\n    <ul>\n      <li>That\u2019s right, you need to augment only once, thus CPU is not a bottleneck.<\/li>\n      <li>Similarly, one should always compare on the same batches.\nTo achieve smooth monitoring rather than \u2018validate once on a while\u2019, take one batch at a time and compute metrics on that batch.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<p>Pseudo-code may look like (fragment):<\/p>\n\n<!-- TODO fix display here -->\n\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">for<\/span> <span class=\"n\">batch<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">train_data<\/span><span class=\"p\">:<\/span>\n    <span class=\"n\">batch<\/span> <span class=\"o\">=<\/span> <span class=\"n\">augment<\/span><span class=\"p\">(<\/span><span class=\"n\">batch<\/span><span class=\"p\">)<\/span>\n    <span class=\"k\">for<\/span> <span class=\"n\">model<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">models<\/span><span class=\"p\">:<\/span>\n        <span class=\"c1\"># make an optimization step for each model using the same batch\n<\/span><\/code><\/pre><\/div><\/div>\n\n<p>Things usually tuned (architecture, loss, augmentations, parameters, optimizers, learning schedules, etc.) - \nall of them can be compared more efficiently this way.<\/p>\n\n<h2 id=\"example\">Example:<\/h2>\n\n<p><img src=\"\/images\/model_comparison\/tensorboard1.png\" width=\"700\" \/><\/p>\n\n<p>There are three models trained in parallel in this screenshot from tensorboard.\nOne can tell when one of models has lower loss and estimate level of \u2018noise\u2019. \nIt is also clear that most jumps and falls in learning curves are due to batches sampled, and are not model-specific behavior. \nIn other words, you can better see the difference between <strong>models<\/strong> not difference between <strong>runs<\/strong>.<\/p>\n\n<p>This demonstrates a typical comparison \u2014 things compared are extremely similar and there is little practical difference.\nModels\u2019 response to the same training input is close to identical. \nIt\u2019s not easy to get the same conclusion by looking at just final scores. \nThat\u2019s a good argument towards including learning curves in the paper.<\/p>\n\n<h2 id=\"bonus-simpler-comparison-of-segmentation-models\">Bonus: simpler comparison of segmentation models<\/h2>\n\n<p>When training models for image segmentation (such as instance segmentation or class-segmentation),\nlack of memory becomes a critical factor. \nBatch sizes become very small, and it is almost impossible to train several segmentation models at once on a single GPU.<\/p>\n\n<p>During segmentation training each sample contributes a lot, since it provides a lot of labels (one per pixel!).<br \/>\nIt is also unlikely that you have thousands of well-labelled high-resolution segmentation images.<\/p>\n\n<p>However when you train several models inside a single script\/notebook, there are no such problems, \n<em>because you never keep intermediate activations for more than one model at a time<\/em>. \nWeights of all models should still be kept in (GPU) memory, but that\u2019s a small fraction of space taken by activations.<\/p>\n\n<h2 id=\"bonus-simple-organization-of-experiments-in-tensorboard\">Bonus: simple organization of experiments in tensorboard<\/h2>\n\n<p><img src=\"\/images\/model_comparison\/folder_organization.png\" height=\"200\" \/><\/p>\n\n<p>Tensorboard recursively scans subfolders for logs, so you can keep each \u2018comparison\u2019 in a separate folder, \nand each compared option saves its logs to a corresponding subfolder.<\/p>\n\n<h2 id=\"alternative-fix-random-seed\">Alternative: fix random seed?<\/h2>\n\n<p>I don\u2019t think that fixed random seed is reliable enough to be considered as an alternative way to achieve similarity in training.<\/p>\n\n<p>THere are many different RNGs provided by different modules, and RNGs are used in too many places. \nAnd you need to precisely control RNG flow in your program.\nBecause if some of your functions use global RNGs like <code class=\"language-plaintext highlighter-rouge\">random<\/code> or <code class=\"language-plaintext highlighter-rouge\">np.random<\/code> directly, \nthis implies that <em>any<\/em> side call to those from anywhere in your program completely changes all following sampled numbers.\nAny \u2018interruption\u2019 in the sequence breaks it. \nRandom numbers on GPU is whole another story.<\/p>\n\n<p>So, you should look through all the augmentations, samplers, dropouts (basically, everything) to verify they don\u2019t use global RNG\u2019s \n(and find that some of them actually do).<\/p>\n\n<p>Long story short, if you <em>have<\/em> to rely on random seeds in DL, \nat least log some control sums to verify that sequence was not broken by an unexpected call from somewhere else.<\/p>\n\n<p>You can still use random seed to achieve reproducible training of the same model.<\/p>\n","pubDate":"Tue, 01 Jan 2019 12:00:00 +0000","link":"https:\/\/arogozhnikov.github.io\/2019\/01\/01\/trick-for-model-comparison.html","guid":"https:\/\/arogozhnikov.github.io\/2019\/01\/01\/trick-for-model-comparison.html","category":["Machine Learning","Engineering","Code improvements"]},{"title":"Einops \u2014 a new style of deep learning code","description":"<p>Recently I\u2019ve open-sourced <a href=\"https:\/\/github.com\/arogozhnikov\/einops\">einops<\/a> \n\u2014 a new (and better) way to write deep learning code.<\/p>\n\n<p>Einops introduces a new notation and new operations.<\/p>\n\n<video controls=\"\" autoplay=\"\">\n  <source src=\"http:\/\/arogozhnikov.github.io\/images\/einops\/einops_video.mp4\" type=\"video\/mp4\" \/>\n  <img src=\"http:\/\/arogozhnikov.github.io\/images\/einops\/einops_video.gif\" alt=\"einops package examples\" \/>\n<\/video>\n\n<p>It perfectly complements existing frameworks (pytorch, tensorflow, gluon, chainer, numpy and others)\nallowing you to write better deep learning code (see <a href=\"http:\/\/arogozhnikov.github.io\/einops\/pytorch-examples.html\">examples for pytorch<\/a>).<\/p>\n\n<p><a href=\"https:\/\/github.com\/arogozhnikov\/einops\">Einops at Github<\/a><\/p>\n\n<p>Tutorials: <a href=\"https:\/\/github.com\/arogozhnikov\/einops\/blob\/master\/docs\/1-einops-basics.ipynb\">part 1<\/a> \nand <a href=\"https:\/\/github.com\/arogozhnikov\/einops\/blob\/master\/docs\/2-einops-for-deep-learning.ipynb\">part 2<\/a>.<\/p>\n","pubDate":"Thu, 06 Dec 2018 12:00:00 +0000","link":"https:\/\/arogozhnikov.github.io\/2018\/12\/06\/einops.html","guid":"https:\/\/arogozhnikov.github.io\/2018\/12\/06\/einops.html","category":["Machine Learning","Engineering","Code improvements"]}]}}