Web Browser Engineering

Pavel Panchekha & Chris Harrelson; one page edition

Twitter · Blog · Discussions

Preface

A computer science degree traditionally includes courses in operating systems, compilers, and databases that replace mystery with code. These courses transform Linux, Postgres, and LLVM into improvements, additions, and optimizations of an understandable core architecture. The lesson transcends the specific system studied: all computer systems, no matter how big and seemingly complex, can be studied and understood.

But web browsers are still opaque, not just to students but to industry programmers and even to researchers. This book dissipates that mystery by systematically explaining all major components of a modern web browser.

Reading This Book

Parts 1–3 of this book construct a basic browser weighing in at around 1000 lines of code, twice that after exercises. The average chapter takes 4–6 hours to read, implement, and debug for someone with a few years’ programming experience. Part 4 of this book covers advanced topics; those chapters are longer and have more code. The final browser weighs in at about 3000 lines.

Your browserThis book assumes that you will be building a web browser along the way while reading it. However, it does present nearly all the code—inlined into the book—for a working browser for every chapter. So most of the time, the book uses the term “our browser”, which refers to the conceptual browser we (you and us, the authors) have built so far. In cases where the book is referring specifically to the implementation you have built, the book says “your browser”. will “work” at each step of the way, and every chapter will build upon the last.This idea is from J. R. Wilcox, inspired in turn by S. Zdancewic’s course on compilers. That way, you will also practice growing and improving complex software. If you feel particularly interested in some component, please do flesh it out, complete the exercises, and add missing features. We’ve tried to arrange it so that this doesn’t make later chapters more difficult.

The code in this book uses Python 3, and we recommend you follow along in the same. When the book shows Python command lines, it calls the Python binary python3.This is for clarity. On some operating systems, python means Python 3, but on others that means Python 2. Check which version you have! That said, the text avoids dependencies where possible and you can try to follow along in another language. Make sure your language has libraries for TLS connections (Python has one built in), graphics (the text uses Tk, Skia, and SDL), and JavaScript evaluation (the text uses DukPy).

This book’s browser is irreverent toward standards: it handles only a sliver of the full HTML, CSS, and JavaScript languages, mishandles errors, and isn’t resilient to malicious inputs. It is also quite slow. Despite that, its architecture matches that of real browsers, providing insight into those 10 million line of code behemoths.

That said, we’ve tried to explicitly note when the book’s browser simplifies or diverges from standards. If you’re not sure how your browser should behave in some edge case, fire up your favorite web browser and try it out.

Acknowledgments

We’d like to recognize the countless people who built the web and the various web browsers. They are wonders of the modern world. Thank you! We learned a lot from the books and articles listed in this book’s bibliography—thank you to their authors. And we’re especially grateful to the many contributors to articles on Wikipedia (especially those on historic software, formats, and protocols). We are grateful for this amazing resource, one which in turn was made possible by the very thing this book is about.

Pavel: James R. Wilcox and I dreamed up this book during a late-night chat at ICFP 2018. Max Willsey proofread and helped sequence the chapters. Zach Tatlock encouraged me to develop the book into a course. And the students of CS 6968, CS 4962, and CS 4560 at the University of Utah found countless errors and suggested important simplifications. I am thankful to all of them. Most of all, I am thankful to my wife Sara, who supported my writing and gave me the strength to finish this many-year-long project.

Chris: I am eternally grateful to my wife Sara for patiently listening to my endless musings about the web, and encouraging me to turn my idea for a browser book into reality. I am also grateful to Dan Gildea for providing feedback on my browser-book concept on multiple occasions. Finally, I’m grateful to Pavel for doing the hard work of getting this project off the ground and allowing me to join the adventure. (Turns out Pavel and I had the same idea!)

A final note

This book is, and will remain, a work in progress. Please leave comments and mark typos; the book has built-in feedback tools, which you can enable with Ctrl-E (or Cmd-E on a Mac). The full source code is also available on GitHub, though we prefer to receive comments through the built-in tools.

Browsers and the Web

I—this is Chris speaking—have known the webBroadly defined, the web is the interlinked network (“web”) of web pages on the internet. If you’ve never made a web page, I recommend MDN’s Learn Web Development series, especially the Getting Started guide. This book will be easier to read if you’re familiar with the core technologies. for all of my adult life. The web for me is something of a technological companion, and I’ve never been far from it in my studies or my work. Perhaps it’s been the same for you. And using the web means using a browser. I hope, as you read this book, that you fall in love with web browsers, just like I did.

The Browser and Me

Since I first encountered the web and its predecessors,For me, bulletin board systems (BBSs) over a dial-up modem connection. A BBS, like a browser, is a window into dynamic content somewhere else on the internet. in the early 1990s, I’ve been fascinated by browsers and the concept of networked user interfaces. When I surfed the web, even in its earliest form, I felt I was seeing the future of computing. In some ways, the web and I grew together—for example, 1994, the year the web went commercial, was the same year I started college; while there I spent a fair amount of time surfing the web, and by the time I graduated in 1999, the browser had fueled the famous dot-com speculation gold rush. Not only that, but the company for which I now work, Google, is a child of the web and was founded during that time.

In my freshman year at college, I attended a presentation by a RedHat salesman. The presentation was of course aimed at selling RedHat Linux, probably calling it the “operating system of the future” and speculating about the “year of the Linux desktop”. But when asked about challenges RedHat faced, the salesman mentioned not Linux but the web: he said that someone “needs to make a good browser for Linux”.Netscape Navigator was available for Linux at that time, but it wasn’t viewed as especially fast or featureful compared to its implementation on other operating systems. Even back then, in the first years of the web, the browser was already a necessary component of every computer. He even threw out a challenge: “How hard could it be to build a better browser?” Indeed, how hard could it be? What makes it so hard? That question stuck with me for a long time.Meanwhile, the “better Linux browser than Netscape” took a long time to appear…

How hard indeed! After eleven years in the trenches working on Chrome, I now know the answer to his question: building a browser is both easy and incredibly hard, both intentional and accidental. And everywhere you look, you see the evolution and history of the web wrapped up in one codebase. It’s fun and endlessly interesting.

So that’s how I fell in love with web browsers. Now let me tell you why you will, too.

The Web in History

The web is a grand, crazy experiment. It’s natural, nowadays, to watch videos, read news, and connect with friends on the web. That can make the web seem simple and obvious, finished, already built. But the web is neither simple nor obvious (and is certainly not finished). It is the result of experiments and research, reaching back to nearly the beginning of computing,And the web also needed rich computer displays, powerful user-interface-building libraries, fast networks, and sufficient computing power and information storage capacity. As so often happens with technology, the web had many similar predecessors, but only took its modern form once all the pieces came together. about how to help people connect and learn from each other.

In the early days, the internet was a world-wide network of computers, largely at universities, labs, and major corporations, linked by physical cables and communicating over application-specific protocols. The (very) early web mostly built on this foundation. Web pages were files in a specific format stored on specific computers. The addresses for web pages named the computer and the file, and early servers did little besides read files from a disk. The logical structure of the web mirrored its physical structure.

A lot has changed. The HyperText Markup Language (HTML) for web pages is now usually dynamically assembled on the fly“Server-side rendering” is the process of assembling HTML on the server when loading a web page. Server-side rendering can use web technologies like JavaScript and even headless browsers. Yet one more place browsers are taking over! and sent on demand to your browser. The pieces being assembled are themselves filled with dynamic content—news, inbox contents, and advertisements adjusted to your particular tastes. Even the addresses no longer identify a specific computer—content distribution networks route requests to any of thousands of computers all around the world. At a higher level, most web pages are served not from someone’s home computerPeople actually did this! And when their website became popular, it often ran out of bandwidth or computing power and became inaccessible. but from a major corporation’s social media platform or cloud computing service.

With all that’s changed, some things have stayed the same, the core building blocks that are the essence of the web:

As a philosophical matter, perhaps one or another of these principles is secondary. One could try to distinguish between the networking and rendering aspects of the web. One could abstract linking and networking from the particular choice of protocol and data format. One could ask whether the browser is necessary in theory, or argue that HTTP, URLs, and hyperlinking are the only truly essential parts of the web.

Perhaps.It is indeed true that one or more of the implementation choices could be replaced, and perhaps that will happen over time. For example, JavaScript might eventually be replaced by another language or technology, HTTP by some other protocol, or HTML by a successor. Yet the web will stay the web, because any successor format is sure to support a superset of functionality, and have the same fundamental structure. The web is, after all, an experiment; the core technologies evolve and grow. But the web is not an accident; its original design reflects truths not just about computing, but about how human beings can connect and interact. The web not only survived but thrived during the virtualization of hosting and content, specifically due to the elegance and effectiveness of this original design.

The key thing to understand is that this grand experiment is not over. The essence of the web will stay, but by building web browsers you have the chance to shape its future.

Real Browser Codebases

So let me tell you what it’s like to contribute to a browser. Some time during my first few months of working on Chrome, I came across the code implementing the<br> tag—look at that, the good old <br> tag, which I’ve used many times to insert newlines into web pages! And the implementation turns out to be barely any code at all, both in Chrome and in this book’s simple browser.

But Chrome as a whole—its features, speed, security, reliability—wow. Thousands of person-years went into it. There is constant pressure to do more—to add more features, to improve performance, to keep up with the “web ecosystem”—for the thousands of businesses, millions of developers,I usually prefer “engineer”—hence the title of this book—but “developer” or “web developer” is much more common on the web. One important reason is that anyone can build a web page—not just trained software engineers and computer scientists. “Web developer” also is more inclusive of additional, critical roles like designers, authors, editors, and photographers. A web developer is anyone who makes web pages, regardless of how. and billions of users on the web.

Working on such a codebase can feel daunting. I often find lines of code last touched 15 years ago by someone I’ve never met; or even now discover files and classes that I never knew existed; or see lines of code that don’t look necessary, yet turn out to be important. What does that 15-year-old code do? What is the purpose of these new-to-me files? Is that code there for a reason?

Every browser has thousands of unfixed bugs, from the smallest of mistakes to myriad mix ups and mismatches. Every browser must be endlessly tuned and optimized to squeeze out that last bit of performance. Every browser requires painstaking work to continuously refactor the code to reduce its complexity, often through the carefulBrowsers are so performance-sensitive that, in many places, merely the introduction of an abstraction—a function call or branching overhead—can have an unacceptable performance cost! introduction of modularization and abstraction.

What makes a browser different from most massive code bases is their urgency. Browsers are nearly as old as any “legacy” codebase, but are not legacy, not abandoned or half-deprecated, not slated for replacement. On the contrary, they are vital to the world’s economy. Browser engineers must therefore fix and improve rather than abandon and replace. And since the character of the web itself is highly decentralized, the use cases met by browsers are to a significant extent not determined by the companies “owning” or “controlling” a particular browser. Other people—including you—can and do contribute ideas, proposals, and implementations.

What’s amazing is that, despite the scale and the pace and the complexity, there is still plenty of room to contribute. Every browser today is open source, which opens up its implementation to the whole community of web developers. Browsers evolve like giant research projects, where new ideas are constantly being proposed and tested out. As you would expect, some features fail and some succeed. The ones that succeed end up in specifications and are implemented by other browsers. Every web browser is open to contributions—whether fixing bugs or proposing new features or implementing promising optimizations.

And it’s worth contributing, because working on web browsers is a lot of fun.

Browser Code Concepts

HTML and CSS are meant to be black boxes—declarative application programming interfaces (APIs)—where one specifies what outcome to achieve, and the browser itself is responsible for figuring out how to achieve it. Web developers don’t, and mostly can’t, draw their web pages’ pixels on their own.

That can make the browser magical or frustrating—depending on whether it is doing the right thing! But that also makes a browser a pretty unusual piece of software, with unique challenges, interesting algorithms, and clever optimizations. Browsers are worth studying for the pure pleasure of it.

What makes that all work is the web browser’s implementations of inversion of control, constraint programming, and declarative programming. The web inverts control, with an intermediary—the browser—handling most of the rendering, and the web developer specifying rendering parameters and content to this intermediary. For example, in HTML there are many built-in form control elements that take care of the various ways the user of a web page can provide input. The developer need only specify parameters such as button names, sizing, and look-and-feel, or JavaScript extension points to handle form submission to the server. The rest of the implementation is taken care of by the browser. Further, these parameters usually take the form of constraints between the relative sizes and positions of on-screen elements instead of specifying their values directly;Constraint programming is clearest during web page layout, where font and window sizes, desired positions and sizes, and the relative arrangement of widgets is rarely specified directly. the browser solves the constraints to find those values. The same idea applies for actions: web pages mostly require that actions take place without specifying when they do. This declarative style means that from the point of view of a developer, changes “apply immediately”, but under the hood, the browser can be lazy and delay applying the changes until they become externally visible, either due to subsequent API calls or because the page has to be displayed to the user.For example, when exactly does the browser compute HTML element styles? Any change to the styles is visible to all subsequent API calls, so in that sense it applies “immediately”. But it is better for the browser to delay style recalculation, avoiding redundant work if styles change twice in quick succession. Maximally exploiting the opportunities afforded by declarative programming makes real-world browsers very complex.

There are practical reasons for the unusual design of a browser. Yes, developers lose some control and agency—when pixels are wrong, developers cannot fix them directly.Loss of control is not necessarily specific to the web—much of computing these days relies on mountains of other people’s code. But they gain the ability to deploy content on the web without worrying about the details, to make that content instantly available on almost every computing device in existence, and to keep it accessible in the future, mostly avoiding software’s inevitable obsolescence.

To me, browsers are where algorithms come to life. A browser contains a rendering engine more complex and powerful than any computer game; a full networking stack; clever data structures and parallel programming techniques; a virtual machine, an interpreted language, and a just-in-time compiler; a world-class security sandbox; and a uniquely dynamic system for storing data.

And the truth is—you use a browser all the time, maybe for reading this book! That makes the algorithms more approachable in a browser than almost anywhere else, because the web is already familiar.

The Role of the Browser

The web is at the center of modern computing. Every year the web expands its reach to more and more of what we do with computers. It now goes far beyond its original use for document-based information sharing: many people now spend their entire day in a browser, not using a single other application! Moreover, desktop applications are now often built and delivered as web apps: web pages loaded by a browser but used like installed applications.Related to the notion of a web app is a Progressive Web App, which is a web app that becomes indistinguishable from a native app through progressive enhancement. Even on mobile devices, apps often embed a browser to render parts of the application user interface (UI).The fraction of such “hybrid” apps that are shown via a “web view” is likely increasing over time. In some markets like China, “super-apps” act like a mobile web browser for web-view-based games and widgets. Perhaps in the future both desktop and mobile devices will largely be containers for web apps. Already, browsers are a critical and indispensable part of computing.

So given this centrality, it’s worth knowing how the web works. And in particular, it’s worth focusing on the browser, which is the user agentThe user agent concept views a computer, or software within the computer, as a trusted assistant and advocate of the human user. and the mediator of the web’s interactions, which ultimately is what makes the web’s principles real. The browser is also the implementer of the web: its sandbox keeps web browsing safe; its algorithms implement the declarative document model; its UI navigates links. Web pages load fast and react smoothly only when the browser is hyper-efficient.

Browsers and You

This book explains how to build a simple browser, one that can—despite its simplicity—display interesting-looking web pages and support many interesting behaviors. As you’ll see, it’s surprisingly easy, and it demonstrates all the core concepts you need to understand a real-world browser. The browser stops being a mystery when it becomes code.

The intention is for you to build your own browser as you work through the early chapters. Once it is up and running, there are endless opportunities to improve performance or add features, some of which are suggested as exercises. Many of these exercises are features implemented in real browsers, and I encourage you to try them—adding features is one of the best parts of browser development!

The book then moves on to details and advanced features that flesh out the architecture of a real browser’s rendering engine, based on my experiences with Chrome. After finishing the book, you should be able to dig into the source code of Chromium, Gecko, or WebKit and understand it without too much trouble.

I hope the book lets you appreciate a browser’s depth, complexity, and power. I hope the book passes along a browser’s beauty—its clever algorithms and data structures, its co-evolution with the culture and history of computing, its centrality in our world. But most of all, I hope the book lets you see in yourself someone building the browser of the future.

History of the Web

This chapter dives into the history of the web itself: where it came from, and how the web and browsers have evolved to date. This history is not exhaustive;For example, there is nothing much about Standard Generalized Markup Language ( SGML) or other predecessors to HTML. (Except in this footnote!) the focus is the key events and ideas that led to the web, and the goals and motivations of its inventors.

The Memex Concept

Figure 1: The original publication of “As We May Think”. (Dunkoman from Wikipedia, CC BY 2.0.)

An influential early exploration of how computers might revolutionize information is a 1945 essay by Vannevar Bush entitled “As We May Think”. This essay envisioned a machine called a memex that helps an individual human see and explore all the information in the world (see Figure 1). It was described in terms of the microfilm screen technology of the time, but its purpose and concept has some clear similarities to the web as we know it today, even if the user interface and technology details differ.

The web is, at its core, organized around the Memex-like goal of representing and displaying information, providing a way for humans to effectively learn and explore. The collective knowledge and wisdom of the species long ago exceeded the capacity of a single mind, organization, library, country, culture, group or language. However, while we as humans cannot possibly know even a tiny fraction of what it is possible to know, we can use technology to learn more efficiently than before, and, in particular, to quickly access information we need to learn, remember, or recall. Consider this imagined research session described by Vannevar Bush—one that is remarkably similar to how we sometimes use the web:

The owner of the memex, let us say, is interested in the origin and properties of the bow and arrow. […] He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items.

Computers, and the internet, allow us to process and store the information we want. But it is the web that helps us organize and find that information, that knowledge, making it useful.Google’s well-known mission statement to “organize the world’s information and make it universally accessible and useful” is almost exactly the same. This is not a coincidence—the search engine concept is inherently connected to the web, and was inspired by the design of the web and its antecedents.

“As We May Think” highlighted two features of the memex: information record lookup, and associations between related records. In fact, the essay emphasizes the importance of the latter—we learn by making previously unknown connections between known things:

When data of any sort are placed in storage, they are filed alphabetically or numerically. […] The human mind does not work that way. It operates by association.

By “association”, Bush meant a trail of thought leading from one record to the next via a human-curated link. He imagined not just a universal library, but a universal way to record the results of what we learn.

The Web Emerges

The concept of hypertext documents linked by hyperlinks was invented in 1964–65 by Project Xanadu, led by Ted Nelson.He was inspired by the long tradition of citation and criticism in academic and literary communities. The Project Xanadu research papers were heavily motivated by this use case. Hypertext is text that is marked up with hyperlinks to other text.A successor called the Hypertext Editing System was the first to introduce the back button, which all browsers now have. Since the system only had text, the “button” was itself text. Sound familiar? A web page is hypertext, and links between web pages are hyperlinks. The format for writing web pages is HTML and the protocol for loading web pages is HTTP, both of which abbreviations contain “HyperText”. See Figure 2 for an example of the early Hypertext Editing System.

Figure 2: A computer operator using the Hypertext Editing System in 1969. (Gregory Lloyd from Wikipedia, CC BY-SA 4.0 International.)

Independently of Project Xanadu, the first hyperlink system appeared for scrolling within a single document; it was later generalized to linking between multiple documents. And just like those original systems, the web has linking within documents as well as between them. For example, the URL http://browser.engineering/history.html#the-web-emerges refers to a document called “history.html”, and specifically to the element in it with the name “the-web-emerges”: this section. Visiting that URL will load this chapter and scroll to this section.

This work also formed and inspired one of the key parts of Douglas Engelbart’s mother of all demos, perhaps the most influential technology demonstration in the history of computing (see Figure 3). That demo not only showcased the key concepts of the web, but also introduced the computer mouse and graphical user interface, both of which are central components of a browser UI.That demo went beyond even this. There are some parts of it that have not yet been realized in any computer system. Watch it!

Figure 3: Doug Engelbart presenting the mother of all demos. (SRI International, via the Doug Engelbart Institute.)

There is of course a very direct connection between this research and the document–URL–hyperlink setup of the web, which built on the hypertext idea and applied it in practice. The HyperTIES system, for example, had highlighted hyperlinks and was used to develop the world’s first electronically published academic journal, the 1988 issue of the Communications of the ACM. Tim Berners-Lee cites that 1988 issue as inspiration for the World Wide Web,Nowadays the World Wide Web is called just “the web”, or “the web ecosystem”—ecosystem being another way to capture the same concept as “World Wide”. The original wording lives on in the “www” in many website domain names. in which he joined the link concept with the availability of the internet, thus realizing many of the original goals of all this work from previous decades.Just as the web itself is a realization of previous ambitions and dreams, today we strive to realize the vision laid out by the web. (No, it’s not done yet!)

The word “hyperlink” may have been coined in 1987, in connection with the HyperCard system on Apple computers. This system was also one of the first, or perhaps the first, to introduce the concept of augmenting hypertext with scripts that handle user events like clicks and perform actions that enhance the UI—just like JavaScript on a web page! It also had graphical UI elements, not just text, unlike most predecessors.

In 1989–1990, the first web browser (named WorldWideWeb, see Figure 4) and web server (named httpd, for HTTP Daemon, according to UNIX naming conventions) were born, written by Tim Berners-Lee. Interestingly, while that browser’s capabilities were in some ways inferior to the browser you will implement in this book,No CSS! No JS! Not even images! in other ways they go beyond the capabilities available even in modern browsers.For example, the first browser included the concept of an index page meant for searching within a site (vestiges of which exist today in the “index.html” convention when a URL path ends in /”), and had a WYSIWYG web page editor (the “contenteditable” HTML attribute on DOM elements (see Chapter 16) have similar semantic behavior, but built-in file saving is gone). Today, the index is replaced with a search engine, and web page editors as a concept are somewhat obsolete due to the highly dynamic nature of today’s web page rendering. On December 20, 1990 the first web page was created. The browser we will implement in this book is easily able to render this web page, even today.Also, as you can see clearly, that web page has not been updated in the meantime, and retains its original aesthetics! In 1991, Berners-Lee advertised his browser and the concept on the alt.hypertext Usenet group.

Figure 4: Screenshot of the WorldWideWeb browser. (Communications of the ACM, August 1994.)

Berners-Lee’s Brief History of the Web highlights a number of other key factors that led to the World Wide Web becoming the web we know today. One key factor was its decentralized nature, which he describes as arising from the academic culture of CERN, where he worked. The decentralized nature of the web is a key feature that distinguishes it from many systems that came before or after, and his explanation of it is worth quoting here (the italics are mine):

There was clearly a need for something like EnquireEnquire was a predecessor web-like database system, also written by Berners-Lee. but accessible to everyone. I wanted it to scale so that if two people started to use it independently, and later started to work together, they could start linking together their information without making any other changes. This was the concept of the web.

This quote captures one of the key value propositions of the web: its decentralized nature. The web was successful for several reasons, but they all had to do with decentralization:

Browsers

The first widely distributed browser may have been ViolaWWW (see Figure 5); this browser also pioneered multiple interesting features such as applets and images. It was in turn the inspiration for NCSA Mosaic (see Figure 6), which launched in 1993. One of the two original authors of Mosaic went on to co-found Netscape, which built Netscape Navigator (see Figure 7), the first commercial browser,By commercial I mean built by a for-profit entity. Netscape’s early versions were also not free software—you had to buy them from a store. They cost about $50. which launched in 1994. Feeling threatened, Microsoft launched Internet Explorer (see Figure 8) in 1995 and soon bundled it with Windows 95.

Figure 5: ViolaWWW. (Viola in a Nutshell.)
Figure 6: Mosaic. (Wikipedia, CC0 1.0.)
Figure 7: Netscape Navigator 1.22. (Wikipedia.)
Figure 8: Internet Explorer 1.0. (Wikipedia, used with permission from Microsoft.)

The era of the “first browser war” ensued: a competition between Netscape Navigator and Internet Explorer. There were also other browsers with smaller market shares; one notable example is Opera. The WebKit project began in 1999; Safari and Chromium-based browsers, such as Chrome and newer versions of Edge, descend from this codebase. Likewise, the Gecko rendering engine was originally developed by Netscape starting in 1997; the Firefox browser is descended from that codebase. During the first browser war, nearly all of the core features of this book’s simple browser were added, including CSS, DOM, and JavaScript.

The “second browser war”, which according to Wikipedia was 2004–2017, was fought between a variety of browsers, in particular Internet Explorer, Firefox, Safari, and Chrome. Initially, Safari and Chrome used the same rendering engine, but Chrome forked into Blink in 2013, which Microsoft Edge adopted by 2020. The second browser war saw the development of many features of the modern web, including widespread use of AJAXAsynchronous JavaScript and XML, where XML stands for eXtensible Markup Language., HTML5 features like <canvas>, and a huge explosion in third-party JavaScript libraries and frameworks.

Web Standards

In parallel with these developments was another, equally important, one—the standardization of web APIs. In October 1994, the World Wide Web Consortium (W3C) was founded to provide oversight and standards for web features. Prior to this point, browsers would often introduce new HTML elements or APIs, and competing browsers would have to copy them. With a standards organization, those elements and APIs could subsequently be agreed upon and documented in specifications. (These days, an initial discussion, design, and specification precedes any new feature.) Later on, the HTML specification ended up moving to a different standards body called the WHATWG, but CSS and other features are still standardized at the W3C. JavaScript is standardized at yet another standards body, TC39 (Technical Committee 39) at ECMA. HTTP is standardized by the IETF. The important point is that the standards process set up in the mid-1990s is still with us.

In the first years of the web, it was not so clear that browsers would remain standard and that one browser might not end up “winning” and becoming another proprietary software platform. There are multiple reasons this didn’t happen, among them the egalitarian ethos of the computing community and the presence and strength of the W3C. Another important reason was the networked nature of the web, and therefore the necessity for web developers to make sure their pages worked correctly in most or all of the browsers (otherwise they would lose customers), leading them to avoid proprietary extensions. On the contrary, browsers worked hard to carefully reproduce each other’s undocumented behaviors—even bugs—to make sure they continued supporting the whole web.

There never really was a point where any browser openly attempted to break away from the standard, despite fears that that might happen.Perhaps the closest the web came to fragmenting was with the late-1990s introduction of features for DHTML—early versions of the Document Object Model you’ll learn about in this book. Netscape and Internet Explorer at first had incompatible implementations of these features, and it took years, the development of a common specification, and significant pressure campaigns on the browsers before standardization was achieved. You can read about this story in much more depth from Jay Hoffman. Instead, intense competition for market share was channeled into very fast innovation and an ever-expanding set of APIs and capabilities for the web, which we nowadays refer to as the web platform, not just the “World Wide Web”. This recognizes the fact that the web is no longer a document viewing mechanism, but has evolved into a fully realized computing platform and ecosystem.There have even been operating systems built around the web! Examples include webOS, which powered some Palm smartphones, Firefox OS (that today lives on in KaiOS-based phones), and ChromeOS, which is a desktop operating system. All of these operating systems are based on using the web as the UI layer for all applications, with some JavaScript-exposed APIs on top for system integration.

Given the outcomes—multiple competing browsers and well-developed standards—it is in retrospect not that relevant which browser “won” or “lost” each of the browser “wars”. In each case the web won, because it gained users and grew in capability.

Open Source

Another important and interesting outcome of the second browser war was that all mainstream browsers todayExamples of Chromium-based browsers include Chrome, Edge, Opera (which switched to Chromium from the Presto engine in 2013), Samsung Internet, Yandex Browser, UC Browser, and Brave. In addition, there are many “embedded” browsers, based on one or another of the three engines, for a wide variety of automobiles, phones, TVs, and other electronic devices. are based on three open-source web rendering / JavaScript engines: Chromium, Gecko, and WebKit.The JavaScript engines are actually in different repositories (as are various other subcomponents), and can and do get used outside the browser as JavaScript virtual machines. One important application is the use of V8 to power node.js. However, each of the three rendering engines does have a corresponding JavaScript implementation, so conflating the two is reasonable. Since Chromium and WebKit have a common ancestral codebase, while Gecko is an open-source descendant of Netscape, all three date back to the 1990s—almost to the beginning of the web.

This is not an accident, and in fact tells us something quite interesting about the most cost-effective way to implement a rendering engine based on a commodity set of platform APIs. For example, it’s common for independent developers, not paid by the company nominally controlling the browser, to contribute code and features. There are even companies and individuals that specialize in implementing browser features! It’s also common for features in one browser to copy code from another. And every major browser being open source feeds back into the standards process, reinforcing the web’s decentralized nature.

Summary

In summary, the history went like this:

  1. Basic research was performed into ways to represent and explore information.

  2. Once the necessary technology became mature enough, the web proper was proposed and implemented.

  3. The web became popular quite quickly, and many browsers appeared in order to capitalize on the web’s opportunity.

  4. Standards organizations were introduced in order to negotiate between the browsers and avoid proprietary control.

  5. Competition between browsers grew their power and complexity at a rapid pace.

  6. Browsers appeared on all devices and operating systems, from desktop to mobile to embedded.

  7. Eventually, all web rendering engines became open source, as a recognition of their being a shared effort larger than any single entity.

The web has come a long way! But one thing seems clear: it isn’t done yet.

Exercises

iii-1 What comes next? Based on what you learned about how the web came about and took its current form, what trends do you predict for its future evolution? For example, do you think it’ll compete effectively against other non-web technologies and platforms?

iii-2 What became of the original ideas? The way the web works in practice is significantly different than the memex; one key difference is that there is no built-in way for the user of the web to add links between pages or notate them. Why do you think this is? Can you think of other goals from the original work that remain unrealized?

Downloading Web Pages

A web browser displays information identified by a URL. And the first step is to use that URL to connect to and download information from a server somewhere on the internet.

Connecting to a Server

Browsing the internet starts with a URL,“URL” stands for “uniform resource locator”, meaning that it is a portable (uniform) way to identify web pages (resources) and also that it describes how to access those files (locator). a short string that identifies a particular web page that the browser should visit.

http://example.org/index.html

Figure 1: The syntax of URLs.

A URL has three parts (see Figure 1): the scheme explains how to get the information; the host name explains where to get it; and the path explains what information to get. There are also optional parts to the URL, like ports, queries, and fragments, which we’ll see later.

From a URL, the browser can start the process of downloading the web page. The browser first asks the local operating system (OS) to put it in touch with the server described by the host name. The OS then talks to a Domain Name System (DNS) server which convertsYou can use a DNS lookup tool like nslookup.io or the dig command to do this conversion yourself. a host name like example.org into a destination IP address like 93.184.216.34.Today there are two versions of IP (Internet Protocol): IPv4 and IPv6. IPv6 addresses are a lot longer and are usually written in hexadecimal, but otherwise the differences don’t matter here. Then the OS decides which hardware is best for communicating with that destination IP address (say, wireless or wired) using what is called a routing table, and then uses device drivers to send signals over a wire or over the air.I’m skipping steps here. On wires you first have to wrap communications in ethernet frames, on wireless you have to do even more. I’m trying to be brief. Those signals are picked up and transmitted by a series of routersOr a switch, or an access point; there are a lot of possibilities, but eventually there is a router. which each choose the best direction to send your message so that it eventually gets to the destination.They may also record where the message came from so they can forward the reply back. When the message reaches the server, a connection is created. Anyway, the point of this is that the browser tells the OS, “Hey, put me in touch with example.org”, and it does.

On many systems, you can set up this kind of connection using the telnet program, like this:The “80” is the port, discussed below.

telnet example.org 80

(Note: When you see a gray outline, it means that the code in question is an example only, and not actually part of our browser’s code.)

You might need to install telnet; it is often disabled by default. On Windows, go to Programs and Features / Turn Windows features on or off in the Control Panel; you’ll need to reboot. When you run it, it’ll clear the screen instead of printing something, but other than that works normally. On macOS, you can use the nc -v command as a replacement for telnet:

nc -v example.org 80

The output is a little different but it works in the same way. On most Linux systems, you can install telnet or nc from the package manager, usually from packages called telnet and netcat.

You’ll get output that looks like this:

Trying 93.184.216.34...
Connected to example.org.
Escape character is '^]'.

This means that the OS converted the host name example.org into the IP address 93.184.216.34 and was able to connect to it.The line about escape characters is just instructions for using obscure telnet features. You can now talk to example.org.

The URL syntax is defined in RFC 3987, whose first author is Tim Berners-Lee—no surprise there! The second author is Roy Fielding, a key contributor to the design of HTTP and also well known for describing the Representational State Transfer (REST) architecture of the web in his Ph.D. thesis, which explains how REST allowed the web to grow in a decentralized way. Today, many services provide “RESTful APIs” that also follow these principles, though there does seem to be some confusion about it.

Requesting Information

Once it’s connected, the browser requests information from the server by giving its path, the path being the part of a URL that comes after the host name, like /index.html. The structure of the request is shown in Figure 2. Type this into telnet to try it.

GET /index.html HTTP/1.0
Host: example.org

Figure 2: An annotated HTTP GET request.

Here, the word GET means that the browser would like to receive information,It could say POST if it intended to send information, plus there are some other, more obscure, options. then comes the path, and finally there is the word HTTP/1.0 which tells the host that the browser speaks version 1.0 of HTTP. There are several versions of HTTP (0.9, 1.0, 1.1, 2.0, and 3.0). The HTTP 1.1 standard adds a variety of useful features, like keep-alive, but in the interest of simplicity our browser won’t use them. We’re also not implementing HTTP 2.0; it is much more complex than the 1.x series, and is intended for large and complex web applications, which our browser can’t run anyway.

After the first line, each line contains a header, which has a name (like Host) and a value (like example.org). Different headers mean different things; the Host header, for example, tells the server who you think it is.This is useful when the same IP address corresponds to multiple host names and hosts multiple websites (for example, example.com and example.org). The Host header tells the server which of multiple websites you want. These websites basically require the Host header to function properly. Hosting multiple domains on a single computer is very common. There are lots of other headers one could send, but let’s stick to just Host for now.

Finally, after the headers comes a single blank line; that tells the host that you are done with headers. So type a blank line into telnet (hit Enter twice after typing the two lines of the request) and you should get a response from example.org.

HTTP/1.0 is standardized in RFC 1945, and HTTP/1.1 in RFC 2616. HTTP was designed to be simple to understand and implement, making it easy for any kind of computer to adopt it. It’s no coincidence that you can type HTTP directly into telnet! Nor is it an accident that HTTP is a “line-based protocol”, using plain text and newlines, similar to the Simple Mail Transfer Protocol (SMTP) for email. Ultimately, the whole pattern derives from early computers only having line-based text input. In fact, one of the first two browsers had a line-mode UI.

The Server’s Response

The server’s response starts with the line in Figure 3.

HTTP/1.0 200 OK

Figure 3: Annotated first line of an HTTP response.

This tells you that the host confirms that it, too, speaks HTTP/1.0, and that it found your request to be “OK” (which has a numeric code of 200). You may be familiar with 404 Not Found; that’s another numeric code and response, as are 403 Forbidden or 500 Server Error. There are lots of these codes, and they have a pretty neat organization scheme:The status text like OK can actually be anything and is just there for humans, not for machines.

Note the genius of having two sets of error codes (400s and 500s) to tell you who is at fault, the server or the browser.More precisely, who the server thinks is at fault. You can find a full list of the different codes on Wikipedia, and new ones do get added here and there.

After the 200 OK line, the server sends its own headers. When I did this, I got these headers (but yours will differ):

Age: 545933
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Mon, 25 Feb 2019 16:49:28 GMT
Etag: "1541025663+gzip+ident"
Expires: Mon, 04 Mar 2019 16:49:28 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (sec/96EC)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1270
Connection: close

There is a lot here, about the information you are requesting (Content-Type, Content-Length, and Last-Modified), about the server (Server, X-Cache), about how long the browser should cache this information (Cache-Control, Expires, Etag), and about all sorts of other stuff. Let’s move on for now.

After the headers there is a blank line followed by a bunch of HTML code. This is called the body of the server’s response, and your browser knows that it is HTML because of the Content-Type header, which says that it is text/html. It’s this HTML code that contains the content of the web page itself.

The HTTP request/response transaction is summarized in Figure 4. Let’s now switch gears from making manual connections to Python.

Figure 4: An HTTP request and response pair are how a web browser gets web pages from a web server.

Wikipedia has nice lists of HTTP headers and response codes. Some of the HTTP response codes are almost never used, like 402 “Payment Required”. This code was intended to be used for “digital cash or (micro) payment systems”. While e-commerce is alive and well without the response code 402, micropayments have not (yet?) gained much traction, even though many people (including me!) think they are a good idea.

Telnet in Python

So far we’ve communicated with another computer using telnet. But it turns out that telnet is quite a simple program, and we can do the same programmatically. It’ll require extracting the host name and path from the URL, creating a socket, sending a request, and receiving a response.In Python, there’s a library called urllib.parse for parsing URLs, but I think implementing our own will be good for learning. Plus, it makes this book less Python-specific.

Let’s start with parsing the URL. I’m going to make parsing a URL return a URL object, and I’ll put the parsing code into the constructor:

class URL:
    def __init__(self, url):
        # ...

The __init__ method is Python’s peculiar syntax for class constructors, and the self parameter, which you must always make the first parameter of any method, is Python’s analog of this in C++ or Java.

Let’s start with the scheme, which is separated from the rest of the URL by ://. Our browser only supports http, so let’s check that, too:

class URL:
    def __init__(self, url):
        self.scheme, url = url.split("://", 1)
        assert self.scheme == "http"

Now we must separate the host from the path. The host comes before the first /, while the path is that slash and everything after it:

class URL:
    def __init__(self, url):
        # ...
        if "/" not in url:
            url = url + "/"
        self.host, url = url.split("/", 1)
        self.path = "/" + url

(When you see a code block with a # ..., like this one, that means you’re adding code to an existing method or block.) The split(s, n) method splits a string at the first n copies of s. Note that there’s some tricky logic here for handling the slash between the host name and the path. That (optional) slash is part of the path.

Now that the URL has the host and path fields, we can download the web page at that URL. We’ll do that in a new method, request:

class URL:
    def request(self):
        # ...

Note that you always need to write the self parameter for methods in Python. In the future, I won’t always make such a big deal out of defining a method—if you see a code block with code in a method or function that doesn’t exist yet, that means we’re defining it.

The first step to downloading a web page is connecting to the host. The operating system provides a feature called “sockets” for this. When you want to talk to other computers (either to tell them something, or to wait for them to tell you something), you create a socket, and then that socket can be used to send information back and forth. Sockets come in a few different kinds, because there are multiple ways to talk to other computers:

By picking all of these options, we can create a socket like so:While this code uses the Python socket library, your favorite language likely contains a very similar library; the API is basically standardized. In Python, the flags we pass are defaults, so you can actually call socket.socket(); I’m keeping the flags here in case you’re following along in another language.

import socket

class URL:
    def request(self):
        s = socket.socket(
            family=socket.AF_INET,
            type=socket.SOCK_STREAM,
            proto=socket.IPPROTO_TCP,
        )

Once you have a socket, you need to tell it to connect to the other computer. For that, you need the host and a port. The port depends on the protocol you are using; for now it should be 80.

class URL:
    def request(self):
        # ...
        s.connect((self.host, 80))

This talks to example.org to set up the connection and prepare both computers to exchange data.

Naturally this won’t work if you’re offline. It also might not work if you’re behind a proxy, or in a variety of more complex networking environments. The workaround will depend on your setup—it might be as simple as disabling your proxy, or it could be much more complex.

Note that there are two parentheses in the connect call: connect takes a single argument, and that argument is a pair of a host and a port. This is because different address families have different numbers of arguments.

The “sockets” API, which Python more or less implements directly, derives from the original “Berkeley sockets” API design for 4.2 BSD Unix in 1983. Of course, Windows and Linux merely reimplement the API, but macOS and iOS actually do still use large amounts of code descended from BSD Unix.

Request and Response

Now that we have a connection, we make a request to the other server. To do so, we send it some data using the send method:

class URL:
    def request(self):
        # ...
        request = "GET {} HTTP/1.0\r\n".format(self.path)
        request += "Host: {}\r\n".format(self.host)
        request += "\r\n"
        s.send(request.encode("utf8"))

The send method just sends the request to the server.send actually returns a number, in this case 47. That tells you how many bytes of data you sent to the other computer; if, say, your network connection failed midway through sending the data, you might want to know how much you sent before the connection failed. There are a few things in this code that have to be exactly right. First, it’s very important to use \r\n instead of \n for newlines. It’s also essential that you put two \r\n newlines at the end, so that you send that blank line at the end of the request. If you forget that, the other computer will keep waiting on you to send that newline, and you’ll keep waiting on its response.Computers are endlessly literal-minded.

Also note the encode call. When you send data, it’s important to remember that you are sending raw bits and bytes; they could form text or an image or video. But a Python string is specifically for representing text. The encode method converts text into bytes, and there’s a corresponding decode method that goes the other way.When you call encode and decode you need to tell the computer what character encoding you want it to use. This is a complicated topic. I’m using utf8 here, which is a common character encoding and will work on many pages, but in the real world you would need to be more careful. Python reminds you to be careful by giving different types to text and to bytes:

>>> type("text")
<class 'str'>
>>> type("text".encode("utf8"))
<class 'bytes'>

If you see an error about str versus bytes, it’s because you forgot to call encode or decode somewhere.

To read the server’s response, you could use the read function on sockets, which gives whatever bits of the response have already arrived. Then you write a loop to collect those bits as they arrive. However, in Python you can use the makefile helper function, which hides the loop:If you’re in another language, you might only have socket.read available. You’ll need to write the loop, checking the socket status, yourself.

class URL:
    def request(self):
        # ...
        response = s.makefile("r", encoding="utf8", newline="\r\n")

Here, makefile returns a file-like object containing every byte we receive from the server. I am instructing Python to turn those bytes into a string using the utf8 encoding, or method of associating bytes to letters.Hard-coding utf8 is not correct, but it’s a shortcut that will work alright on most English-language websites. In fact, the Content-Type header usually contains a charset declaration that specifies the encoding of the body. If it’s absent, browsers still won’t default to utf8; they’ll guess, based on letter frequencies, and you will see ugly � strange áççêñ£ß when they guess wrong. I’m also informing Python of HTTP’s weird line endings.

Let’s now split the response into pieces. The first line is the status line:I could have asserted that 200 is required, since that’s the only code our browser supports, but it’s better to just let the browser render the returned body, because servers will generally output a helpful and user-readable HTML error page even for error codes. This is another way in which the web is easy to implement incrementally.

class URL:
    def request(self):
        # ...
        statusline = response.readline()
        version, status, explanation = statusline.split(" ", 2)

Note that I do not check that the server’s version of HTTP is the same as mine; this might sound like a good idea, but there are a lot of misconfigured servers out there that respond in HTTP 1.1 even when you talk to them in HTTP 1.0.Luckily the protocols are similar enough to not cause confusion.

After the status line come the headers:

class URL:
    def request(self):
        # ...
        response_headers = {}
        while True:
            line = response.readline()
            if line == "\r\n": break
            header, value = line.split(":", 1)
            response_headers[header.casefold()] = value.strip()

For the headers, I split each line at the first colon and fill in a map of header names to header values. Headers are case-insensitive, so I normalize them to lower case.I used casefold instead of lower, because it works better for more languages. Also, whitespace is insignificant in HTTP header values, so I strip off extra whitespace at the beginning and end.

Headers can describe all sorts of information, but a couple of headers are especially important because they tell us that the data we’re trying to access is being sent in an unusual way. Let’s make sure none of those are present.Exercise 1-9 describes how your browser should handle these headers if they are present.

class URL:
    def request(self):
        # ...
        assert "transfer-encoding" not in response_headers
        assert "content-encoding" not in response_headers

The usual way to get the sent data, then, is everything after the headers:

class URL:
    def request(self):
        # ...
        content = response.read()
        s.close()

It’s the body that we’re going to display, so let’s return that:

class URL:
    def request(self):
        # ...
        return content

Now let’s actually display the text in the response body.

The Content-Encoding header lets the server compress web pages before sending them. Large, text-heavy web pages compress well, and as a result the page loads faster. The browser needs to send an Accept-Encoding header in its request to list the compression algorithms it supports. Transfer-Encoding is similar and also allows the data to be “chunked”, which many servers seem to use together with compression.

Displaying the HTML

The HTML code in the response body defines the content you see in your browser window when you go to http://example.org/index.html. I’ll be talking much, much more about HTML in future chapters, but for now let me keep it very simple.

In HTML, there are tags and text. Each tag starts with a < and ends with a >; generally speaking, tags tell you what kind of thing some content is, while text is the actual content.That said, some tags, like img, are content, not information about it. Most tags come in pairs of a start and an end tag; for example, the title of the page is enclosed in a pair of tags: <title> and </title>. Each tag, inside the angle brackets, has a tag name (like title here), and then optionally a space followed by attributes, and its pair has a / followed by the tag name (and no attributes).

So, to create our very, very simple web browser, let’s take the page HTML and print all the text, but not the tags, in it.If this example causes Python to produce a SyntaxError pointing to the end on the last line, it is likely because you are running Python 2 instead of Python 3. Make sure you are using Python 3. I’ll do this in a new function, show:Note that this is a global function and not in the URL class.

def show(body):
    in_tag = False
    for c in body:
        if c == "<":
            in_tag = True
        elif c == ">":
            in_tag = False
        elif not in_tag:
            print(c, end="")

This code is pretty complex. It goes through the request body character by character, and it has two states: in_tag, when it is currently between a pair of angle brackets, and not in_tag. When the current character is an angle bracket, it changes between those states; normal characters, not inside a tag, are printed.The end argument tells Python not to print a newline after the character, which it otherwise would.

We can now load a web page just by stringing together request and show:Like show, this is a global function.

def load(url):
    body = url.request()
    show(body)

Add the following code to run load from the command line:

if __name__ == "__main__":
    import sys
    load(URL(sys.argv[1]))

The first line is Python’s version of a main function, run only when executing this script from the command line. The code reads the first argument (sys.argv[1]) from the command line and uses it as a URL. Try running this code on the URL http://example.org/:

python3 browser.py http://example.org/

You should see some short text welcoming you to the official example web page. You can also try using it on this chapter!

HTML, just like URLs and HTTP, is designed to be very easy to parse and display at a basic level. And in the beginning there were very few features in HTML, so it was possible to code up something not so much more fancy than what you see here, yet still display the content in a usable way. Even our super simple and basic HTML parser can already print out the text of the browser.engineering website.

Encrypted Connections

So far, our browser supports the http scheme. That’s a pretty common scheme. But more and more websites are migrating to the https scheme, and many websites require it.

The difference between http and https is that https is more secure—but let’s be a little more specific. The https scheme, or more formally HTTP over TLS (Transport Layer Security), is identical to the normal http scheme, except that all communication between the browser and the host is encrypted. There are quite a few details to how this works: which encryption algorithms are used, how a common encryption key is agreed to, and of course how to make sure that the browser is connecting to the correct host. The difference in the protocol layers involved is shown in Figure 5.

Figure 5: The difference between HTTP and HTTPS is the addition of a TLS layer.

Luckily, the Python ssl library implements all of these details for us, so making an encrypted connection is almost as easy as making a regular connection. That ease of use comes with accepting some default settings which could be inappropriate for some situations, but for teaching purposes they are fine.

Making an encrypted connection with ssl is pretty easy. Suppose you’ve already created a socket, s, and connected it to example.org. To encrypt the connection, you use ssl.create_default_context to create a context ctx and use that context to wrap the socket s:

import ssl
ctx = ssl.create_default_context()
s = ctx.wrap_socket(s, server_hostname=host)

Note that wrap_socket returns a new socket, which I save back into the s variable. That’s because you don’t want to send any data over the original socket; it would be unencrypted and also confusing. The server_hostname argument is used to check that you’ve connected to the right server. It should match the Host header.

On macOS, you’ll need to run a program called “Install Certificates” before you can use Python’s ssl package on most websites.

Let’s try to take this code and add it to request. First, we need to detect which scheme is being used:

import ssl

class URL:
    def __init__(self, url):
        self.scheme, url = url.split("://", 1)
        assert self.scheme in ["http", "https"]
        # ...

(Note that here you’re supposed to replace the existing scheme parsing code with this new code. It’s usually clear from context, and the code itself, what you need to replace.)

Encrypted HTTP connections usually use port 443 instead of port 80:

class URL:
    def __init__(self, url):
        # ...
        if self.scheme == "http":
            self.port = 80
        elif self.scheme == "https":
            self.port = 443

We can use that port when creating the socket:

class URL:
    def request(self):
        # ...
        s.connect((self.host, self.port))
        # ...

Next, we’ll wrap the socket with the ssl library:

class URL:
    def request(self):
        # ...
        s.connect((self.host, self.port))
        if self.scheme == "https":
            ctx = ssl.create_default_context()
            s = ctx.wrap_socket(s, server_hostname=self.host)
        # ...

Your browser should now be able to connect to HTTPS sites.

While we’re at it, let’s add support for custom ports, which are specified in a URL by putting a colon after the host name, as in Figure 6.

http://example.org:8080/index.html

Figure 6: Where the port goes in a URL.

If the URL has a port we can parse it out and use it:

class URL:
    def __init__(self, url):
        # ...
        if ":" in self.host:
            self.host, port = self.host.split(":", 1)
            self.port = int(port)

Custom ports are handy for debugging. Python has a built-in web server you can use to serve files on your computer. For example, if you run

python3 -m http.server 8000 -d /some/directory

then going to http://localhost:8000/ should show you all the files in that directory. This is a good way to test your browser.

TLS is pretty complicated. You can read the details in RFC 8446, but implementing your own is not recommended. It’s very difficult to write a custom TLS implementation that is not only correct but secure.

At this point you should be able to run your program on any web page. Here is what it should output for a simple example:


  
    This is a simple
    web page with some
    text in it.
  

Summary

This chapter went from an empty file to a rudimentary web browser that can:

Yes, this is still more of a command-line tool than a web browser, but it already has some of the core capabilities of a browser.

Outline

The complete set of functions, classes, and methods in our browser should look something like this:

class URL: def __init__(url) def request() def show(body) def load(url)

Exercises

1-1 HTTP/1.1. Along with Host, send the Connection header in the request function with the value close. Your browser can now declare that it is using HTTP/1.1. Also add a User-Agent header. Its value can be whatever you want—it identifies your browser to the host. Make it easy to add further headers in the future.

1-2 File URLs. Add support for the file scheme, which allows the browser to open local files. For example, file:///path/goes/here should refer to the file on your computer at location /path/goes/here. Also make it so that, if your browser is started without a URL being given, some specific file on your computer is opened. You can use that file for quick testing.

1-3 data. Yet another scheme is data, which allows inlining HTML content into the URL itself. Try navigating to data:text/html,Hello world! in a real browser to see what happens. Add support for this scheme to your browser. The data scheme is especially convenient for making tests without having to put them in separate files.

1-4 Entities. Implement support for the less-than (&lt;) and greater-than (&gt;) entities. These should be printed as < and >, respectively. For example, if the HTML response was &lt;div&gt;, the show method of your browser should print <div>. Entities allow web pages to include these special characters without the browser interpreting them as tags.

1-5 view-source. Add support for the view-source scheme; navigating to view-source:http://example.org/ should show the HTML source instead of the rendered page. Add support for this scheme. Your browser should print the entire HTML file as if it was text. You’ll want to have also implemented Exercise 1-4.

1-6 Keep-alive. Implement Exercise 1-1; however, do not send the Connection: close header (send Connection: keep-alive instead). When reading the body from the socket, only read as many bytes as given in the Content-Length header and don’t close the socket afterward. Instead, save the socket, and if another request is made to the same server reuse the same socket instead of creating a new one. (You’ll also need to pass the "rb" option to makefile or the value reported by Content-Length might not match the length of the string you’re reading.) This will speed up repeated requests to the same server, which are common.

1-7 Redirects. Error codes in the 300 range request a redirect. When your browser encounters one, it should make a new request to the URL given in the Location header. Sometimes the Location header is a full URL, but sometimes it skips the host and scheme and just starts with a / (meaning the same host and scheme as the original request). The new URL might itself be a redirect, so make sure to handle that case. You don’t, however, want to get stuck in a redirect loop, so make sure to limit how many redirects your browser can follow in a row. You can test this with the URL http://browser.engineering/redirect, which redirects back to this page, and its /redirect2 and /redirect3 cousins which do more complicated redirect chains.

1-8 Caching. Typically, the same images, styles, and scripts are used on multiple pages; downloading them repeatedly is a waste. It’s generally valid to cache any HTTP response, as long as it was requested with GET and received a 200 response.Some other status codes like 301 and 404 can also be cached. Implement a cache in your browser and test it by requesting the same file multiple times. Servers control caches using the Cache-Control header. Add support for this header, specifically for the no-store and max-age values. If the Cache-Control header contains any value other than these two, it’s best not to cache the response.

1-9 Compression. Add support for HTTP compression, in which the browser informs the server that compressed data is acceptable. Your browser must send the Accept-Encoding header with the value gzip. If the server supports compression, its response will have a Content-Encoding header with value gzip. The body is then compressed. Add support for this case. To decompress the data, you can use the decompress method in the gzip module. GZip data is not utf8-encoded, so pass "rb" to makefile to work with raw bytes instead. Most web servers send compressed data in a Transfer-Encoding called chunked.There are also a couple of Transfer-Encodings that compress the data. They aren’t commonly used. You’ll need to add support for that, too.

Drawing to the Screen

A web browser doesn’t just download a web page; it also has to show that page to the user. In the twenty-first century, that means a graphical application.There are some obscure text-based browsers: I used w3m as my main browser for most of 2011. I don’t anymore. So in this chapter we’ll equip our browser with a graphical user interface.

Creating Windows

Desktop and laptop computers run operating systems that provide desktop environments: windows, buttons, and a mouse. So responsibility ends up split: programs control their windows, but the desktop environment controls the screen. Therefore:

Doing all of this by hand is a bit of a drag, so programs usually use a graphical toolkit to simplify these steps. Python comes with a graphical toolkit called Tk in the Python package tkinter.The library is called Tk, and it was originally written for a different language called Tcl. Python contains an interface to it, hence the name. Using it is quite simple:

import tkinter
window = tkinter.Tk()
tkinter.mainloop()

Here, tkinter.Tk() asks the desktop environment to create a window and returns an object that you can use to draw to the window. The tkinter.mainloop() call enters a loop that looks like this:This pseudocode may look like an infinite loop that locks up the computer, but it’s not. Either the operating system will multitask among threads and processes, or the pendingEvents call will sleep until events are available, or both; in any case, other code will run and create events for the loop to respond to.

while True:
    for evt in pendingEvents():
        handleEvent(evt)
    drawScreen()
Figure 1: Flowchart of an event-handling cycle.

Here, pendingEvents first asks the desktop environment for recent mouse clicks or key presses, then handleEvent calls your application to update state, and then drawScreen redraws the window. This event loop pattern (see Figure 1) is common in many applications, from web browsers to video games, because in complex graphical applications it ensures that all events are eventually handled and the screen is eventually updated.

Although you’re probably writing your browser on a desktop computer, many people access the web through mobile devices such as phones or tablets. On mobile devices there’s still a screen, a rendering loop, and most other things discussed in this book.For example, most real browsers have both desktop and mobile editions, and the rendering engine code is almost exactly the same for both.

But there are several differences worth noting. Applications are usually full-screen, with only one application drawing to the screen at a time. There’s no mouse and only a virtual keyboard, so the main form of interaction is touch. There is the concept of a “visual viewport” that is not present on a desktop, to accommodate “desktop-only” and “mobile-ready” sites, as well as pinch zoom.Look at the source of this chapter’s webpage. In the <head> you’ll see a “viewport” <meta> tag. This tag tells the browser that the page supports mobile devices; without it, the browser assumes that the site is “desktop-only” and renders it differently, such as allowing the user to use a pinch-zoom or double-tap gesture to focus in on one part of the page. Once zoomed in, the part of the page visible on the screen is the “visual viewport” and the whole documents’ bounds are the “layout viewport”. This is kind of a mix between zooming and scrolling that’s usually absent on desktop. And screen pixel density is much higher, but the total screen resolution is usually lower. Supporting all of these differences is doable, but quite a bit of work. This book won’t go further into implementing them, except in some cases as exercises.

Also, power efficiency is much more important, because the device runs on a battery, while at the same time the central processing unit (CPU) and memory are significantly slower and less capable. That makes it much more important to take advantage of any graphical processing unit (GPU)—the slow CPU makes good performance harder to achieve. Mobile browsers are challenging!

Drawing to the Window

Our browser will draw the web page text to a canvas, a rectangular Tk widget that you can draw circles, lines, and text on. For example, you can create a canvas with Tk like this:You may be familiar with the HTML <canvas> element, which is a similar idea: a two-dimensional rectangle in which you can draw shapes.

window = tkinter.Tk()
canvas = tkinter.Canvas(window, width=800, height=600)
canvas.pack()

The first line creates the window, and the second creates the Canvas inside that window. We pass the window as an argument, so that Tk knows where to display the canvas. The other arguments define the canvas’s size; I chose 800 × 600 because that was a common old-timey monitor size.This size, called Super Video Graphics Array (SVGA), was standardized in 1987, and probably did seem super back then. The third line is a Tk peculiarity, which positions the canvas inside the window. Tk also has widgets like buttons and dialog boxes, but our browser won’t use them: we will need finer-grained control over appearance, which a canvas provides.This is why desktop applications are more uniform than web pages: desktop applications generally use widgets provided by a common graphical toolkit, which makes them look similar.

To keep it all organized let’s put this code in a class:

WIDTH, HEIGHT = 800, 600

class Browser:
    def __init__(self):
        self.window = tkinter.Tk()
        self.canvas = tkinter.Canvas(
            self.window, 
            width=WIDTH,
            height=HEIGHT
        )
        self.canvas.pack()

Once you’ve made a canvas, you can call methods that draw shapes on the canvas. Let’s do that inside load, which we’ll move into the new Browser class:

class Browser:
    def load(self, url):
        # ...
        self.canvas.create_rectangle(10, 20, 400, 300)
        self.canvas.create_oval(100, 100, 150, 150)
        self.canvas.create_text(200, 150, text="Hi!")

To run this code, create a Browser, call load, and then start the Tk mainloop:

if __name__ == "__main__":
    import sys
    Browser().load(URL(sys.argv[1]))
    tkinter.mainloop()

You ought to see: a rectangle, starting near the top-left corner of the canvas and ending at its center; then a circle inside that rectangle; and then the text “Hi!” next to the circle, as in Figure 2.

Figure 2: The expected example output with a rectangle, circle, and text.

Coordinates in Tk refer to x positions from left to right and y positions from top to bottom. In other words, the bottom of the screen has larger y values, the opposite of what you might be used to from math. Play with the coordinates above to figure out what each argument refers to.The answers are in the online documentation.

The Tk canvas widget is quite a bit more powerful than what we’re using it for here. As you can see from the tutorial, you can move the individual things you’ve drawn to the canvas, listen to click events on each one, and so on. I’m not using those features in this book, because I want to teach you how to implement them yourself.

Laying Out Text

Let’s draw a simple web page on this canvas. So far, our browser steps through the web page source code character by character and prints the text (but not the tags) to the console window. Now we want to draw the characters on the canvas instead.

To start, let’s change the show function from the previous chapter into a function that I’ll call lexForeshadowing future developments… which just returns the textual content of an HTML document without printing it:

def lex(body):
    text = ""
    # ...
    for c in body:
        # ...
        elif not in_tag:
            text += c
    return text

Then, load will draw that text, character by character:

def load(self, url):
    # ...
    for c in text:
        self.canvas.create_text(100, 100, text=c)

Let’s test this code on a real web page. For reasons that might seem inscrutable,It’s to delay a discussion of basic typography to the next chapter. let’s test it on the first chapter of 西游记 or Journey to the West, a classic Chinese novel about a monkey. Run this URLRight click on the link and "Copy URL". through request, lex, and load. You should see a window with a big blob of black pixels inset a little from the top left corner of the window.

Why a blob instead of letters? Well, of course, because we are drawing every letter in the same place, so they all overlap! Let’s fix that:

HSTEP, VSTEP = 13, 18
cursor_x, cursor_y = HSTEP, VSTEP
for c in text:
    self.canvas.create_text(cursor_x, cursor_y, text=c)
    cursor_x += HSTEP

The variables cursor_x and cursor_y point to where the next character will go, as if you were typing the text into a word processor. I picked the magic numbers—13 and 18—by trying a few different values and picking one that looked most readable.In Chapter 3, we’ll replace the magic numbers with font metrics.

The text now forms a line from left to right. But with an 800-pixel-wide canvas and 13 pixels per character, one line only fits about 60 characters. You need more than that to read a novel, so we also need to wrap the text once we reach the edge of the screen:

for c in text:
    # ...
    if cursor_x >= WIDTH - HSTEP:
        cursor_y += VSTEP
        cursor_x = HSTEP

The code increases cursor_y and resets cursor_xIn the olden days of typewriters, increasing y meant feeding in a new line, and resetting x meant returning the carriage that printed letters to the left edge of the page. So the American Standard Code for Information Interchange (ASCII) standardized two separate characters—“carriage return” and “line feed”—for these operations, so that ASCII could be directly executed by teletypewriters. That’s why headers in HTTP are separated by \r\n, even though modern computers have no mechanical carriage. once cursor_x goes past 787 pixels.Not 800, because we started at pixel 13 and I want to leave an even gap on both sides. The sequence is shown in Figure 3. Wrapping the text this way makes it possible to read more than a single line.

Here’s a widget demonstrating that concept:

At this point you should be able to load up our example page in your browser and have it look something like Figure 4.

Figure 4: The first chapter of Journey to the West rendered in our browser.

Now we can read a lot of text, but still not all of it: if there’s enough text, not all of the lines will fit on the screen. We want users to scroll the page to look at different parts of it.

In English text, you can’t wrap to the next line in the middle of a word (without hyphenation at least), but in Chinese that’s the default, even for words made up of multiple characters. For example, 开关 meaning “switch” is composed of “on” and “off”, but it’s just fine to line-break after . You can change the default with the word-break CSS property: break-all allows line breaks anywhere, while auto-phrase prevents them inside even inside Chinese or Japanese words or phrases such as 开关. The “auto” part here refers to the fact that the words aren’t identified by the author but instead auto-detected, often using dynamic programming based on a word frequency table.

Scrolling Text

Scrolling introduces a layer of indirection between page coordinates (this text is 132 pixels from the top of the page) and screen coordinates (since you’ve scrolled 60 pixels down, this text is 72 pixels from the top of the screen)—see Figure 5. Generally speaking, a browser lays out the page—determines where everything on the page goes—in terms of page coordinates and then rasters the page—draws everything—in terms of screen coordinates.Sort of. What actually happens is that the page is first drawn into a bitmap or GPU texture, then that bitmap/texture is shifted according to the scroll, and the result is rendered to the screen. Chapter 11 will have more on this topic.

Figure 5: The difference between page and screen coordinates.

Our browser will have the same split. Right now load computes both the position of each character and draws it: layout and rendering. Let’s instead have a layout function to compute and store the position of each character, and a separate draw function to then draw each character based on the stored position. This way, layout can operate with page coordinates and only draw needs to think about screen coordinates.

Let’s start with layout. Instead of calling canvas.create_text on each character, let’s add it to a list, together with its position. Since layout doesn’t need to access anything in Browser, it can be a standalone function:

def layout(text):
    display_list = []
    cursor_x, cursor_y = HSTEP, VSTEP
    for c in text:
        display_list.append((cursor_x, cursor_y, c))
        # ...
    return display_list

The resulting list of things to display is called a display list.The term “display list” is standard. Since layout is all about page coordinates, we don’t need to change anything else about it to support scrolling.

Once the display list is computed, draw needs to loop through it and draw each character. Since draw does need access to the canvas, we make it a method on Browser:

class Browser:
    def draw(self):
        for x, y, c in self.display_list:
            self.canvas.create_text(x, y, text=c)

Now load just needs to call layout followed by draw:

class Browser:
    def load(self, url):
        body = url.request()
        text = lex(body)
        self.display_list = layout(text)
        self.draw()

Now we can add scrolling. Let’s add a field for how far you’ve scrolled:

class Browser:
    def __init__(self):
        # ...
        self.scroll = 0

The page coordinate y then has screen coordinate y - self.scroll:

def draw(self):
    for x, y, c in self.display_list:
        self.canvas.create_text(x, y - self.scroll, text=c)

If you change the value of scroll the page will now scroll up and down. But how does the user change scroll?

Most browsers scroll the page when you press the up and down keys, rotate the scroll wheel, drag the scroll bar, or apply a touch gesture to the screen. To keep things simple, let’s just implement the down key.

Tk allows you to bind a function to a key, which instructs Tk to call that function when the key is pressed. For example, to bind to the down arrow key, write:

def __init__(self):
    # ...
    self.window.bind("<Down>", self.scrolldown)

Here, self.scrolldown is an event handler, a function that Tk will call whenever the down arrow key is pressed.scrolldown is passed an event object as an argument by Tk, but since scrolling down doesn’t require any information about the key press besides the fact that it happened, scrolldown ignores that event object. All it needs to do is increment scroll and redraw the canvas:

SCROLL_STEP = 100

def scrolldown(self, e):
    self.scroll += SCROLL_STEP
    self.draw()

If you try this out, you’ll find that scrolling draws all the text a second time. That’s because we didn’t erase the old text before drawing the new text. Call canvas.delete to clear the old text:

def draw(self):
    self.canvas.delete("all")
    # ...

Scrolling should now work!

Storing the display list makes scrolling faster: the browser isn’t doing layout every time you scroll. Modern browsers take this further, retaining much of the display list even when the web page changes due to JavaScript or user interaction.

In general, scrolling is the most common user interaction with web pages. Real browsers have accordingly invested a tremendous amount of time making it fast; we’ll get to some more of the ways they do this later in the book.

Faster Rendering

Applications have to redraw page contents quickly for interactions to feel fluid,On older systems, applications drew directly to the screen, and if they didn’t update, whatever was there last would stay in place, which is why in error conditions you’d often have one window leave “trails” on another. Modern systems use compositing, which avoids trails and also improves performance and isolation. Applications still redraw their window contents, though, to change what is displayed. Chapter 13 discusses compositing in more detail. and must respond quickly to clicks and key presses so the user doesn’t get frustrated. “Feel fluid” can be made more precise. Graphical applications such as browsers typically aim to redraw at a speed equal to the refresh rate, or frame rate, of the screen, and/or a fixed 60 Hz.Most screens today have a refresh rate of 60 Hz, and that is generally considered fast enough to look smooth. However, new hardware is increasingly appearing with higher refresh rates, such as 120 Hz. It’s not yet clear if browsers can be made that fast. Some rendering engines, games in particular, refresh at lower rates on purpose if they know the rendering speed can’t keep up. This means that the browser has to finish all its work in less than 1/60th of a second, or 16 ms, in order to keep up. For this reason, 16 ms is called the animation frame budget of the application.

But scrolling in our browser is pretty slow.How fast exactly seems to depend a lot on your operating system and default font. Why? It turns out that loading information about the shape of a character, inside create_text, takes a while. To speed up scrolling we need to make sure to do it only when necessary (while at the same time ensuring the pixels on the screen are always correct).

Real browsers have a lot of quite tricky optimizations for this, but for our browser let’s limit ourselves to a simple improvement: skip drawing characters that are offscreen:

for x, y, c in self.display_list:
    if y > self.scroll + HEIGHT: continue
    if y + VSTEP < self.scroll: continue
    # ...

The first if statement skips characters below the viewing window; the second skips characters above it. In that second if statement, y + VSTEP is the bottom edge of the character, because characters that are halfway inside the viewing window still have to be drawn.

Scrolling should now be pleasantly fast, and hopefully close to the 16 ms animation frame budget.On my computer, it was still about double that budget, so there is work to do—we’ll get to that in future chapters. And because we split layout and draw, we don’t need to change layout at all to implement this optimization.

You should also keep in mind that not all web page interactions are animations—there are also discrete actions such as mouse clicks. Research has shown that it usually suffices to respond to a discrete action in [100 ms]—below that threshold, most humans are not sensitive to discrete action speed. This is very different from interactions such as scroll, where a speed of less than 60 Hz or so is quite noticeable. The difference between the two has to do with the way the human mind processes movement (animation) versus discrete action, and the time it takes for the brain to decide upon such an action, execute it, and understand its result.

Summary

This chapter went from a rudimentary command-line browser to a graphical user interface with text that can be scrolled. The browser now:

And here is our browser rendering this very web page (it’s fullly interactive—after clicking on it to focus, you should be able to scroll with the down arrow):This is the full browser source code, cross-compiled to JavaScript and running in an iframe. Click “restart” to choose a new web page to render, then “start” to render it. Subsequent chapters will include one of these at the end of the chapter so you can see how it improves.

Next, we’ll make this browser work on English text, handling complexities like variable-width characters, line layout, and formatting.

Outline

The complete set of functions, classes, and methods in our browser should look something like this:

class URL: def __init__(url) def request() def lex(body) WIDTH, HEIGHT HSTEP, VSTEP def layout(text) SCROLL_STEP class Browser: def __init__() def draw() def load(url) def scrolldown(e)

Exercises

2-1 Line breaks. Change layout to end the current line and start a new one when it sees a newline character. Increment y by more than VSTEP to give the illusion of paragraph breaks. There are poems embedded in Journey to the West; now you’ll be able to make them out.

2-2 Mouse wheel. Add support for scrolling up when you hit the up arrow. Make sure you can’t scroll past the top of the page. Then bind the <MouseWheel> event, which triggers when you scroll with the mouse wheel.It will also trigger with touchpad gestures, if you don’t have a mouse. The associated event object has an event.delta value which tells you how far and in what direction to scroll. Unfortunately, macOS and Windows give the event.delta objects opposite sign and different scales, and on Linux scrolling instead uses the <Button-4> and <Button-5> events.The Tk manual has more information about this. Cross-platform applications are much harder to write than cross-browser ones!

2-3 Resizing. Make the browser resizable. To do so, pass the fill and expand arguments to canvas.pack, and call and bind to the <Configure> event, which happens when the window is resized. The window’s new width and height can be found in the width and height fields on the event object. Remember that when the window is resized, the line breaks must change, so you will need to call layout again.

2-4 Scrollbar. Stop your browser from scrolling down past the last display list entry.This is not quite right in a real browser; the browser needs to account for extra whitespace at the bottom of the screen or the possibility of objects purposefully drawn offscreen. In Chapter 5, we’ll implement this correctly. At the right edge of the screen, draw a blue, rectangular scrollbar. Make sure the size and position of the scrollbar reflects what part of the full document the browser can see, as in Figure 5. Hide the scrollbar if the whole document fits onscreen.

2-5 Emoji. Add support for emoji to your browser 😀. Emoji are characters, and you can call create_text to draw them, but the results aren’t very good. Instead, head to the OpenMoji project, download the emoji for “grinning face” as a PNG file, resize it to 16 × 16 pixels, and save it to the same folder as the browser. Use Tk’s PhotoImage class to load the image and then the create_image method to draw it to the canvas. In fact, download the whole OpenMoji library (look for the “Get OpenMojis” button at the top right)—then your browser can look up whatever emoji is used in the page.

2-6 about:blank. Currently, a malformed URL causes the browser to crash. It would be much better to have error recovery for that, and instead show a blank page, so that the user can fix the error. To do this, add support for the special about:blank URL, which should just render a blank page, and cause malformed URLs to automatically render as if they were about:blank.

2-7 Alternate text direction. Not all languages read and lay out from left to right. Arabic, Persian, and Hebrew are good examples of right-to-left languages. Implement basic support for this with a command-line flag to your browser.Once we get to Chapter 4 you could instead use the dir attribute on the <body> element. English sentences should still lay out left-to-right, but they should grow from the right side of the screen (load this example in your favorite browser to see what I mean).Sentences in an actual right-to-left language should do the opposite. And then there is vertical writing mode for some East Asian languages like Chinese and Japanese.

Formatting Text

In the last chapter, our browser created a graphical window and drew a grid of characters to it. That’s OK for Chinese, but English text features characters of different widths grouped into words that you can’t break across lines.There are lots of languages in the world, and lots of typographic conventions. A real web browser supports every language from Arabic to Zulu, but this book focuses on English. Text is near-infinitely complex, but this book cannot be infinitely long! In this chapter, we’ll add those capabilities. You’ll even be able to read this chapter in your browser!

What is a Font?

So far, we’ve called create_text with a character and two coordinates to write text to the screen. But we never specified its font, size, or style. To talk about those things, we need to create and use font objects.

What is a font, exactly? Well, in the olden days, printers arranged little metal slugs on rails, covered them with ink, and pressed them to a sheet of paper, creating a printed page (see Figure 1). The metal shapes came in boxes, one per letter, so you’d have a (large) box of e’s, a (small) box of x’s, and so on. The boxes came in cases (see Figure 2), one for upper-case and one for lower-case letters. The set of cases was called a font.The word is related to foundry, which would create the little metal shapes. Naturally, if you wanted to print larger text, you needed different (bigger) shapes, so those were a different font; a collection of fonts was called a type, which is why we call it typing. Variations—like bold or italic letters—were called that type’s “faces”.

Figure 1: A drawing of printing press workers. (By Daniel Nikolaus Chodowiecki. Wikipedia, public domain.)
Figure 2: Metal types in letter cases and a composing stick. (By Willi Heidelbach. Wikipedia, CC BY 2.5.)

This nomenclature reflects the world of the printing press: metal shapes in boxes in cases from different foundries. Our modern world instead has dropdown menus, and the old words no longer match it. “Font” can now mean font, typeface, or type,Let alone “font family”, which can refer to larger or smaller collections of types. and we say a font contains several different weights (like “bold” and “normal”),But sometimes other weights as well, like “light”, “semibold”, “black”, and “condensed”. Good fonts tend to come in many weights. several different styles (like “italic” and “roman”, which is what not-italic is called),Sometimes there are other options as well, like maybe there’s a small-caps version; these are sometimes called options as well. And don’t get me started on automatic versus manual italics. and arbitrary sizes.A font looks especially good at certain sizes where hints tell the computer how best to align it to the pixel grid. Welcome to the world of magic ink.This term comes from an essay by Bret Victor that discusses how the graphical possibilities of computers can make for better and easier-to-use applications.

Yet Tk’s font objects correspond to the older meaning of font: a type at a fixed size, style, and weight. For example:You can only create Font objects, or any other kinds of Tk objects, after calling tkinter.Tk(), and you need to import tkinter.font separately.

import tkinter.font
window = tkinter.Tk()
bi_times = tkinter.font.Font(
    family="Times",
    size=16,
    weight="bold",
    slant="italic",
)

Your computer might not have “Times” installed; you can list the available fonts with tkinter.font.families() and pick something else.

Font objects can be passed to create_text’s font argument:

canvas.create_text(200, 100, text="Hi!", font=bi_times)

In the olden times, American typesetters kept their boxes of metal shapes arranged in a California job case, which combined lower- and upper-case letters side by side in one case, making typesetting easier. The upper-/lower-case nomenclature dates from centuries earlier.

Measuring Text

Text takes up space vertically and horizontally, and the font object’s metrics and measure methods measure that space:On your computer, you might get different numbers. That’s right—text rendering is OS-dependent, because it is complex enough that everyone uses one of a few libraries to do it, usually libraries that ship with the OS. That’s why macOS fonts tend to be “blurrier” than the same font on Windows: different libraries make different trade-offs.

>>> bi_times.metrics()
{'ascent': 15, 'descent': 4, 'linespace': 19, 'fixed': 0}
>>> bi_times.measure("Hi!")
24

The metrics call yields information about the vertical dimensions of the text (see Figure 3): the linespace is how tall the text is, which includes an ascent which goes “above the line” and a descent that goes “below the line”.The fixed parameter is actually a boolean and tells you whether all letters are the same width, so it doesn’t really fit here. The ascent and descent matter when words in different sizes sit on the same line: they ought to line up “along the line”, not along their tops or bottoms.

Figure 3: The various vertical metrics of a font. All glyphs in a font share the same ascent, x-height, and descent, and are laid out on a shared baseline. However, the measure (or advance) of glyphs can differ.

Let’s dig deeper. Remember that bi_times is size-16 Times: why does font.metrics report that it is actually 19 pixels tall? Well, first of all, a size of 16 means 16 points, which are defined as 72nds of an inch, not 16 pixels,Actually, the definition of a “point” is a total mess, with many different length units all called “point” around the world. The Wikipedia page has the details, but a traditional American/British point is actually slightly less than 1/72 of an inch. The 1/72 standard comes from PostScript, but some systems predate it; TeX , for example, hews closer to the traditional point, approximating it as 1/72.27 of an inch. which your monitor probably has around 100 of per inch.Tk doesn’t use points anywhere else in its API. It’s supposed to use pixels if you pass it a negative number, but that doesn’t appear to work. Those 16 points measure not the individual letters but the metal blocks the letters were once carved from, so the letters themselves must be less than 16 points. In fact, different size-16 fonts have letters of varying heights:You might even notice that Times has different metrics in this code block than in the earlier one where we specified a bold, italic Times font. The bold, italic Times font is taller, at least on my current macOS system!

>>> tkinter.font.Font(family="Courier", size=16).metrics()
{'fixed': 1, 'ascent': 13, 'descent': 4, 'linespace': 17}
>>> tkinter.font.Font(family="Times", size=16).metrics()
{'fixed': 0, 'ascent': 14, 'descent': 4, 'linespace': 18}
>>> tkinter.font.Font(family="Helvetica", size=16).metrics()
{'fixed': 0, 'ascent': 15, 'descent': 4, 'linespace': 19}

The measure() method is more direct: it tells you how much horizontal space text takes up, in pixels. This depends on the text, of course, since different letters have different widths:Note that the sum of the individual letters’ lengths is not the length of the word. Tk uses fractional pixels internally, but rounds up to return whole pixels in the measure call. Plus, some fonts use something called kerning to shift letters a little bit when particular pairs of letters are next to one another, or even shaping to make two letters look one glyph.

>>> bi_times.measure("Hi!")
24
>>> bi_times.measure("H")
13
>>> bi_times.measure("i")
5
>>> bi_times.measure("!")
7
>>> 13 + 5 + 7
25

You can use this information to lay text out on the page. For example, suppose you want to draw the text “Hello, world!” in two pieces, so that “world!” is italic. Let’s use two fonts:

font1 = tkinter.font.Font(family="Times", size=16)
font2 = tkinter.font.Font(family="Times", size=16, slant='italic')

We can now lay out the text, starting at (200, 200):

x, y = 200, 200
canvas.create_text(x, y, text="Hello, ", font=font1)
x += font1.measure("Hello, ")
canvas.create_text(x, y, text="world!", font=font2)

You should see “Hello,” and “world!”, correctly aligned and with the second word italicized.

Unfortunately, this code has a bug, though one masked by the choice of example text: replace “world!” with “overlapping!” and the two words will overlap. That’s because the coordinates x and y that you pass to create_text tell Tk where to put the center of the text. It only worked for “Hello, world!” because “Hello,” and “world!” are the same length!

Luckily, the meaning of the coordinate you pass in is configurable. We can instruct Tk to treat the coordinate we gave as the top-left corner of the text by setting the anchor argument to "nw", meaning the “northwest” corner of the text:

x, y = 200, 225
canvas.create_text(x, y, text="Hello, ", font=font1, anchor='nw')
x += font1.measure("Hello, ")
canvas.create_text(
    x, y, text="overlapping!", font=font2, anchor='nw')

Modify the draw function to set anchor to "nw"; we didn’t need to do that in the previous chapter because all Chinese characters are the same width.

If you find font metrics confusing, you’re not the only one! In 2012, the Michigan Supreme Court heard Stand Up for Democracy v. Secretary of State, a case ultimately about a ballot referendum’s validity that centered on the definition of font size. The court decided (correctly) that font size is the size of the metal blocks that letters were carved from and not the size of the letters themselves.

Word by Word

In Chapter 2, the layout function looped over the text character by character and moved to the next line whenever we ran out of space. That’s appropriate in Chinese, where each character more or less is a word. But in English you can’t move to the next line in the middle of a word. Instead, we need to lay out the text one word at a time:This code splits words on whitespace. It’ll thus break on Chinese, since there won’t be whitespace between words. Real browsers use language-dependent rules for laying out text, including for identifying word boundaries.

def layout(text):
    # ...
    for word in text.split():
        # ...
    return display_list

Unlike Chinese characters, words are different sizes, so we need to measure the width of each word:

import tkinter.font

def layout(text):
    font = tkinter.font.Font()
    # ...
    for word in text.split():
        w = font.measure(word)
    # ...

Here I’ve chosen to use Tk’s default font. Now, if we draw the text at cursor_x, its right end would be at cursor_x + w. That might be past the right edge of the page, and in this case we need to make space by wrapping to the next line:

def layout(text):
    for word in text.split():
        # ...
        if cursor_x + w > WIDTH - HSTEP:
            cursor_y += font.metrics("linespace") * 1.25
            cursor_x = HSTEP

Note that this code block only shows the insides of the for loop. The rest of layout should be left alone. Also, I call metrics with an argument; that just returns the named metric directly. Finally, note that I multiply the linespace by 1.25 when incrementing y. Try removing the multiplier: you’ll see that the text is harder to read because the lines are too close together.Designers say the text is too “tight”. Instead, it is common to add “line spacing” or “leading”So named because in metal type days, thin pieces of lead were placed between the lines to space them out. Lead is a softer metal than what the actual letter pieces were made of, so it could compress a little to keep pressure on the other pieces. Pronounce it “led-ing” not “leed-ing”. between lines. The 25% line spacing is a typical amount.

So now cursor_x and cursor_y have the location to the start of the word, so we add to the display list and update cursor_x to point to the end of the word:

def layout(text):
    for word in text.split():
        # ...
        display_list.append((cursor_x, cursor_y, word))
        cursor_x += w + font.measure(" ")

I increment cursor_x by w + font.measure(" ") instead of w because I want to have spaces between the words: the call to split() removed all of the whitespace, and this adds it back. I don’t add the space to w in the if condition, though, because you don’t need a space after the last word on a line.

Breaking lines in the middle of a word is called hyphenation, and can be turned on via the hyphens CSS property. The state of the art is the Knuth–Liang hyphenation algorithm, which uses a dictionary of word fragments to prioritize possible hyphenation points. At first, the CSS specification was incompatible with this algorithm, but the recent text-wrap-style property fixed that.

Styling Text

Right now, all of the text on the page is drawn with one font. But web pages sometimes specify that text should be bold or italic using the <b> and <i> tags. It’d be nice to support that, but right now, the code resists this: the layout function only receives the text of the page as input, and so has no idea where the bold and italics tags are.

Let’s change lex to return a list of tokens, where a token is either a Text object (for a run of characters outside a tag) or a Tag object (for the contents of a tag). You’ll need to write the Text and Tag classes:If you’re familiar with Python, you might want to use the dataclass library, which makes it easier to define these sorts of utility classes.

class Text:
    def __init__(self, text):
        self.text = text

class Tag:
    def __init__(self, tag):
        self.tag = tag

lex must now gather text into Text and Tag objects:If you’ve done some or all of the exercises in prior chapters, your code will look different. Code snippets in the book always assume you haven’t done the exercises, so you’ll need to port your modifications.

def lex(body):
    out = []
    buffer = ""
    in_tag = False
    for c in body:
        if c == "<":
            in_tag = True
            if buffer: out.append(Text(buffer))
            buffer = ""
        elif c == ">":
            in_tag = False
            out.append(Tag(buffer))
            buffer = ""
        else:
            buffer += c
    if not in_tag and buffer:
        out.append(Text(buffer))
    return out

Here I’ve renamed the text variable to buffer, since it now stores either text or tag contents before they can be used. The name also reminds us that, at the end of the loop, we need to check whether there’s buffered text and what we should do with it. Here, lex dumps any accumulated text as a Text object. Otherwise, if you never saw an angle bracket, you’d return an empty list of tokens. But unfinished tags, like in Hi!<hr, are thrown out.This may strike you as an odd decision: why not finish up the tag for the author? I don’t know, but dropping the tag is what browsers do.

Note that Text and Tag are asymmetric: lex avoids empty Text objects, but not empty Tag objects. That’s because an empty Tag object represents the HTML code <>, while an empty Text object represents no content at all.

Since we’ve modified lex, we are now passing layout not just the text of the page, but also the tags in it. So layout must loop over tokens, not text:

def layout(tokens):
    # ...
    for tok in tokens:
        if isinstance(tok, Text):
            for word in tok.text.split():
                # ...
    # ...

layout can also examine tag tokens to change font when directed by the page. Let’s start with support for weights and styles, with two corresponding variables:

weight = "normal"
style = "roman"

Those variables must change when the bold and italics open and close tags are seen:

if isinstance(tok, Text):
    # ...
elif tok.tag == "i":
    style = "italic"
elif tok.tag == "/i":
    style = "roman"
elif tok.tag == "b":
    weight = "bold"
elif tok.tag == "/b":
    weight = "normal"

Note that this code correctly handles not only <b>bold</b> and <i>italic</i> text, but also <b><i>bold italic</i></b> text.It even handles incorrectly nested tags like <b>b<i>bi</b>i</i>, but it does not handle <b><b>twice</b>bolded</b> text. We’ll return to this in Chapter 6.

The style and weight variables are used to select the font:

if isinstance(tok, Text):
    for word in tok.text.split():
        font = tkinter.font.Font(
            size=16,
            weight=weight,
            slant=style,
        )
        # ...

Since the font is computed in layout but used in draw, we’ll need to add the font used to each entry in the display list:

if isinstance(tok, Text):
    for word in tok.text.split():
        # ...
        display_list.append((cursor_x, cursor_y, word, font))

Make sure to update draw to expect and use this extra font field in display list entries.

Italic fonts were developed in Italy (hence the name) to mimic a cursive handwriting style called “chancery hand”. Non-italic fonts are called roman because they mimic text on Roman monuments. There is an obscure third option: oblique fonts, which look like roman fonts but are slanted.

A Layout Object

With all of these tags, layout has become quite large, with lots of local variables and some complicated control flow. That is one sign that something deserves to be a class, not a function:

class Layout:
    def __init__(self, tokens):
        self.display_list = []

Every local variable in layout then becomes a field of Layout:

self.cursor_x = HSTEP
self.cursor_y = VSTEP
self.weight = "normal"
self.style = "roman"

The core of the old layout is a loop over tokens, and we can move the body of that loop to a method on Layout:

def __init__(self, tokens):
    # ...
    for tok in tokens:
        self.token(tok)

def token(self, tok):
    if isinstance(tok, Text):
        for word in tok.text.split():
            # ...
    elif tok.tag == "i":
        self.style = "italic"
    # ...

In fact, the body of the isinstance(tok, Text) branch can be moved to its own method:

def word(self, word):
    font = tkinter.font.Font(
        size=16,
        weight=self.weight,
        slant=self.style,
    )
    w = font.measure(word)
    # ...

Now that everything has moved out of Browser’s old layout function, it can be replaced with calls into Layout:

class Browser:
    def load(self, url):
        body = url.request()
        tokens = lex(body)
        self.display_list = Layout(tokens).display_list
        self.draw()

When you do big refactors like this, it’s important to work incrementally. It might seem more efficient to change everything at once, but that efficiency brings with it a risk of failure: trying to do so much that you get confused and have to abandon the whole refactor. So take a moment to test that your browser still works before you move on.

Anyway, this refactor isolated all of the text-handling code into its own method, with the main token function just branching on the tag name. Let’s take advantage of the new, cleaner organization to add more tags. With font weights and styles working, size is the next frontier in typographic sophistication. One simple way to change font size is the <small> tag and its deprecated sister tag <big>.In your web design projects, use the CSS font-size property to change text size instead of <big> and <small>. But since we haven’t yet implemented CSS for our browser (see Chapter 6), we’re stuck using tags here.

Our experience with font styles and weights suggests a simple approach that customizes the size field in Layout. It starts out with:

self.size = 12

That variable is used to create the font object:

font = tkinter.font.Font(
    size=self.size,
    weight=self.weight,
    slant=self.style,
)

And we can change the size in <big> and <small> tags by updating this variable:

def token(self, tok):
    # ...
    elif tok.tag == "small":
        self.size -= 2
    elif tok.tag == "/small":
        self.size += 2
    elif tok.tag == "big":
        self.size += 4
    elif tok.tag == "/big":
        self.size -= 4

Try wrapping a whole paragraph in <small>, like you would a bit of fine print, and enjoy your newfound typographical freedom.

All of <b>, <i>, <big>, and <small> date from an earlier, pre-CSS era of the web. Nowadays, CSS can change how an element appears, so visual tag names like <b> and <small> are out of favor. That said, <b>, <i>, and <small> still have some appearance-independent meanings.

Text of Different Sizes

Start mixing font sizes, like <small>a</small><big>A</big>, and you’ll quickly notice a problem with the font size code: the text is aligned along its top, as if it’s hanging from a clothes line. But you know that English text is typically written with all letters aligned at an invisible baseline instead.

Let’s think through how to fix this. If the bigger text is moved up, it would overlap with the previous line, so the smaller text has to be moved down. That means its vertical position has to be computed later, after the big text passes through token. But since the small text comes through the loop first, we need a two-pass algorithm for lines of text: the first pass identifies what words go in the line and computes their x positions, while the second pass vertically aligns the words and computes their y positions (see Figure 4).

Figure 4: How lines are laid out when multiple fonts are involved. All words are drawn using a shared baseline. The ascent and descent of the whole line is then determined by the maximum ascent and descent of all words in the line, and leading is added before and after the line.

Let’s start with phase one. Since one line contains text from many tags, we need a field on Layout to store the line-to-be. That field, line, will be a list, and text will add words to it instead of to the display list. Entries in line will have x but not y positions, since y positions aren’t computed in the first phase:

class Layout:
    def __init__(self, tokens):
        # ...
        self.line = []
        # ...
    
    def word(self, word):
        # ...
        self.line.append((self.cursor_x, word, font))

The new line field is essentially a buffer, where words are held temporarily before they can be placed. The second phase is that buffer being flushed when we’re finished with a line:

class Layout:
    def word(self, word):
        if self.cursor_x + w > WIDTH - HSTEP:
            self.flush()

As usual with buffers, we also need to make sure the buffer is flushed once all tokens are processed:

class Layout:
    def __init__(self, tokens):
        # ...
        self.flush()

This new flush function has three responsibilities:

  1. it must align the words along the baseline (see Figure 5);
  2. it must add all those words to the display list; and
  3. it must update the cursor_x and cursor_y fields.

Here’s what it looks like, step by step:

Figure 5: Aligning the words on a line.

Since we want words to line up “on the line”, let’s start by computing where that line should be. That depends on the tallest word on the line:

def flush(self):
    if not self.line: return
    metrics = [font.metrics() for x, word, font in self.line]
    max_ascent = max([metric["ascent"] for metric in metrics])

The baseline is then max_ascent below self.y—or actually a little more to account for the leading:Actually, 25% leading doesn’t add 25% of the ascent above the ascender and 25% of the descent below the descender. Instead, it adds 12.5% of the line height in both places, which is subtly different when fonts are mixed. But let’s skip that subtlety here.

baseline = self.cursor_y + 1.25 * max_ascent

Now that we know where the line is, we can place each word relative to that line and add it to the display list:

for x, word, font in self.line:
    y = baseline - font.metrics("ascent")
    self.display_list.append((x, y, word, font))

Note how y starts at the baseline, and moves up by just enough to accommodate that word’s ascent. Now cursor_y must move far enough down below baseline to account for the deepest descender:

max_descent = max([metric["descent"] for metric in metrics])
self.cursor_y = baseline + 1.25 * max_descent

Finally, flush must update the Layout’s cursor_x and line fields:

self.cursor_x = HSTEP
self.line = []

Now all the text is aligned along the line, even when text sizes are mixed. Plus, this new flush function is convenient for other line-breaking jobs. For example, in HTML the <br> tagWhich is a self-closing tag, so there’s no </br>. Many tags that are content, instead of annotating it, are like this. Some people like adding a final slash to self-closing tags, as in <br/>, but this is not required in HTML. ends the current line and starts a new one:

def token(self, tok):
    # ...
    elif tok.tag == "br":
        self.flush()

Likewise, paragraphs are defined by the <p> and </p> tags, so </p> also ends the current line:

def token(self, tok):
    # ...
    elif tok.tag == "/p":
        self.flush()
        self.cursor_y += VSTEP

I add a bit extra to cursor_y here to create a little gap between paragraphs.

By this point you should be able to load up your browser and display an example page, which should look something like Figure 6.

Figure 6: Screenshot of a web page demonstrating different text sizes.

Actually, browsers support not only horizontal but also vertical writing systems, like some traditional East Asian writing styles. A particular challenge is Mongolian script, which is written in lines running top to bottom, left to right. Many Mongolian government websites use the script.

Font Caching

Now that you’ve implemented styled text, you’ve probably noticed—unless you’re on macOSWhile we can’t confirm this in the documentation, it seems that the macOS “Core Text” APIs cache fonts more aggressively than Linux and Windows. The optimization described in this section won’t hurt any on macOS, but also won’t improve speed as much as on Windows and Linux.—that on a large web page like this chapter our browser has slowed significantly from the previous chapter. That’s because text layout, and specifically the part where you measure each word, is quite slow.You can profile Python programs by replacing your python3 command with python3 -m cProfile. Look for the lines corresponding to the measure and metrics calls to see how much time is spent measuring text.

Unfortunately, it’s hard to make text measurement much faster. With proportional fonts and complex font features like hinting and kerning, measuring text can require pretty complex computations. But on a large web page, some words likely appear a lot—for example, this chapter includes the word “the” over 200 times. Instead of measuring these words over and over again, we could measure them once, and then cache the results. On normal English text, this usually results in a substantial speedup.

Caching is such a good idea that most text libraries already implement it, typically caching text measurements in each Font object. But since our text method creates a new Font object for each word, the caching is ineffective. To make caching work, we need to reuse Font objects when possible instead of making new ones.

We’ll store our cache in a global FONTS dictionary:

FONTS = {}

The keys to this dictionary will be size/weight/style triples, and the values will be Font objects.Actually, the values are a font object and a tkinter.Label object. This dramatically improves the performance of metrics for some reason, and is recommended by the Python documentation. We can put the caching logic itself in a new get_font function:

def get_font(size, weight, style):
    key = (size, weight, style)
    if key not in FONTS:
        font = tkinter.font.Font(size=size, weight=weight,
            slant=style)
        label = tkinter.Label(font=font)
        FONTS[key] = (font, label)
    return FONTS[key][0]

Then the word method can call get_font instead of creating a Font object directly:

class Layout:
    def word(self, word):
        font = get_font(self.size, self.weight, self.style)
        # ...

Now identical words will use identical fonts and text measurements will hit the cache.

Fonts for scripts like Chinese can be megabytes in size, so they are generally stored on disk and only loaded into memory on demand. That makes font loading slow and caching even more important. Browsers also have extensive caches for measuring, shaping, and rendering text. Because web pages have a lot of text, these caches turn out to be one of the most important parts of speeding up rendering.

Summary

The previous chapter introduced a browser that laid out characters in a grid. Now it does standard English text layout, so:

You can now use our browser to read an essay, a blog post, or even a book!

Outline

The complete set of functions, classes, and methods in our browser should look something like this:

class URL: def __init__(url) def request() class Text: def __init__(text) class Tag: def __init__(tag) def lex(body) FONTS def get_font(size, weight, style) WIDTH, HEIGHT HSTEP, VSTEP class Layout: def __init__(tokens) def token(tok) def flush() def word(word) SCROLL_STEP class Browser: def __init__() def draw() def load(url) def scrolldown(e)

Exercises

3-1 Centered text. The page titles on this book’s website are centered; make your browser do the same for text between <h1 class="title"> and </h1>. Each line has to be centered individually, because different lines will have different lengths.In early HTML there was a <center> tag that did exactly this, but nowadays centering is typically done in CSS, through the text-align property. The approach in this exercise is of course non-standard, and just for learning purposes.

3-2 Superscripts. Add support for the <sup> tag. Text in this tag should be smaller (perhaps half the normal text size) and be placed so that the top of a superscript lines up with the top of a normal letter.

3-3 Soft hyphens. The soft hyphen character, written \N{soft hyphen} in Python, represents a place where the text renderer can, but doesn’t have to, insert a hyphen and break the word across lines. Add support for it.If you’ve done Exercise 1-4 on HTML entities, you might also want to add support for the &shy; entity, which expands to a soft hyphen. If a word doesn’t fit at the end of a line, check if it has soft hyphens, and if so break the word across lines. Remember that a word can have multiple soft hyphens in it, and make sure to draw a hyphen when you break a word. The word “super­cali­fragi­listic­expi­ali­docious” is a good test case.

3-4 Small caps. Make the <abbr> element render text in small caps, like this. Inside an <abbr> tag, lower-case letters should be small, capitalized, and bold, while all other characters (upper case, numbers, etc.) should be drawn in the normal font.

3-5 Preformatted text. Add support for the <pre> tag. Unlike normal paragraphs, text inside <pre> tags doesn’t automatically break lines, and whitespace like spaces and newlines are preserved. Use a fixed-width font like Courier New or SFMono as well. Make sure tags work normally inside <pre> tags: it should be possible to bold some text inside a <pre>. The results will look best if you also do Exercise 1-4.

Constructing an HTML Tree

So far, our browser sees web pages as a stream of open tags, close tags, and text. But HTML is actually a tree, and though the tree structure hasn’t been important yet, it will be central to later features like CSS, JavaScript, and visual effects. So this chapter adds a proper HTML parser and converts the layout engine to use it.

A Tree of Nodes

The HTML treeThis is the tree that is usually called the DOM tree, for Document Object Model. I’ll keep calling it the HTML tree for now. has one node for each open and close tag pair and a node for each span of text.In reality there are other types of nodes too, like comments, doctypes, CDATA sections, and processing instructions. There are even some deprecated types! A simple HTML document showing the structure is shown in Figure 1.

Figure 1: An HTML document, showing tags, text, and the nesting structure.

For our browser to use a tree, tokens need to evolve into nodes. That means adding a list of children and a parent pointer to each one. Here’s the new Text class, representing text at the leaf of the tree:

class Text:
    def __init__(self, text, parent):
        self.text = text
        self.children = []
        self.parent = parent

Since it takes two tags (the open and the close tag) to make a node, let’s rename the Tag class to Element, and make it look like this:

class Element:
    def __init__(self, tag, parent):
        self.tag = tag
        self.children = []
        self.parent = parent

I added a children field to both Text and Element, even though text nodes never have children, for consistency.

Constructing a tree of nodes from source code is called parsing. A parser builds a tree one element or text node at a time. But that means the parser needs to store an incomplete tree as it goes. For example, suppose the parser has so far read this bit of HTML:

<html><video></video><section><h1>This is my webpage

The parser has seen five tags (and one text node). The rest of the HTML will contain more open tags, close tags, and text, but no matter which tokens it sees, no new nodes will be added to the <video> tag, which has already been closed. So that node is “finished”. But the other nodes are unfinished: more children can be added to the <html>, <section>, and <h1> nodes, depending on what HTML comes next—see Figure 2.

Figure 2: The finished and unfinished nodes while parsing some HTML.

Since the parser reads the HTML file from beginning to end, these unfinished tags are always in a certain part of the tree. The unfinished tags have always been opened but not yet closed; they are always later in the source than the finished nodes; and they are always children of other unfinished tags. To leverage these facts, let’s represent an incomplete tree by storing a list of unfinished tags, ordered with parents before children. The first node in the list is the root of the HTML tree; the last node in the list is the most recent unfinished tag.In Python, and most other languages, it’s faster to add and remove from the end of a list, instead of the beginning.

Parsing is a little more complex than lex, so we’re going to want to break it into several functions, organized in a new HTMLParser class. That class can also store the source code it’s analyzing and the incomplete tree:

class HTMLParser:
    def __init__(self, body):
        self.body = body
        self.unfinished = []

Before the parser starts, it hasn’t seen any tags at all, so the unfinished list storing the tree starts empty. But as the parser reads tokens, that list fills up. Let’s start that by aspirationally renaming the lex function we have now to parse:

class HTMLParser:
    def parse(self):
        # ...

We’ll need to do a bit of surgery on parse. Right now parse creates Tag and Text objects and appends them to the out array. We need it to create Element and Text objects and add them to the unfinished tree. Since a tree is a bit more complex than a list, I’ll move the adding-to-a-tree logic to two new methods, add_text and add_tag.

def parse(self):
    text = ""
    in_tag = False
    for c in self.body:
        if c == "<":
            in_tag = True
            if text: self.add_text(text)
            text = ""
        elif c == ">":
            in_tag = False
            self.add_tag(text)
            text = ""
        else:
            text += c
    if not in_tag and text:
        self.add_text(text)
    return self.finish()

The out variable is gone, and note that I’ve also moved the return value to a new finish method, which converts the incomplete tree to the final, complete tree. So: how do we add things to the tree?

HTML derives from a long line of document processing systems. Its predecessor, SGML, traces back to RUNOFF and is a sibling to troff, now used for Linux manual pages. The committee that standardized SGML now works on the .odf, .docx, and .epub formats.

Constructing the Tree

Let’s talk about adding nodes to a tree. To add a text node we add it as a child of the last unfinished node:

class HTMLParser:
    def add_text(self, text):
        parent = self.unfinished[-1]
        node = Text(text, parent)
        parent.children.append(node)

On the other hand, tags are a little more complex since they might be an open or a close tag:

class HTMLParser:
    def add_tag(self, tag):
        if tag.startswith("/"):
            # ...
        else:
            # ...

An open tag adds an unfinished node to the end of the list:

def add_tag(self, tag):
    # ...
    else:
        parent = self.unfinished[-1]
        node = Element(tag, parent)
        self.unfinished.append(node)

A close tag instead finishes the last unfinished node by adding it to the previous unfinished node in the list:

def add_tag(self, tag):
    if tag.startswith("/"):
        node = self.unfinished.pop()
        parent = self.unfinished[-1]
        parent.children.append(node)
    # ...

Once the parser is done, it turns our incomplete tree into a complete tree by just finishing any unfinished nodes:

class HTMLParser:
    def finish(self):
        while len(self.unfinished) > 1:
            node = self.unfinished.pop()
            parent = self.unfinished[-1]
            parent.children.append(node)
        return self.unfinished.pop()

This is almost a complete parser, but it doesn’t quite work at the beginning and end of the document. The very first open tag is an edge case without a parent:

def add_tag(self, tag):
    # ...
    else:
        parent = self.unfinished[-1] if self.unfinished else None
        # ...

The very last tag is also an edge case, because there’s no unfinished node to add it to:

def add_tag(self, tag):
    if tag.startswith("/"):
        if len(self.unfinished) == 1: return
        # ...

Ok, that’s all done. Let’s test our parser out and see how well it works!

The ill-considered JavaScript document.write method allows JavaScript to modify the HTML source code while it’s being parsed! This is actually a bad idea. An implementation of document.write must have the HTML parser stop to execute JavaScript, but that slows down requests for images, CSS, and JavaScript used later in the page. To solve this, modern browsers use speculative parsing to start loading additional resources even before parsing is done.

Debugging a Parser

How do we know our parser does the right thing—that it builds the right tree? Well the place to start is seeing the tree it produces. We can do that with a quick, recursive pretty-printer:

def print_tree(node, indent=0):
    print(" " * indent, node)
    for child in node.children:
        print_tree(child, indent + 2)

Here we’re printing each node in the tree, and using indentation to show the tree structure. Since we need to print each node, it’s worth taking the time to give them a nice printed form, which in Python means defining the __repr__ function:

class Text:
    def __repr__(self):
        return repr(self.text)

class Element:
    def __repr__(self):
        return "<" + self.tag + ">"

In general it’s a good idea to define __repr__ methods for any data objects, and to have those __repr__ methods print all the relevant fields.

Try this out on the web page corresponding to this chapter, parsing the HTML source code and then calling print_tree to visualize it:

body = URL(sys.argv[1]).request()
nodes = HTMLParser(body).parse()
print_tree(nodes)

You’ll see something like this at the beginning:

 <!doctype html>
   '\n'
   <html lang="en-US" xml:lang="en-US">
     '\n'
     <head>
       '\n  '
       <meta charset="utf-8" />

Immediately a couple of things stand out. Let’s start at the top, with the <!doctype html> tag.

This special tag, called a doctype, is always the very first thing in an HTML document. But it’s not really an element at all, nor is it supposed to have a close tag. Our browser won’t be using the doctype for anything, so it’s best to throw it away:Real browsers use doctypes to switch between standards-compliant and legacy parsing and layout modes.

def add_tag(self, tag):
    if tag.startswith("!"): return
    # ...

This ignores all tags that start with an exclamation mark, which not only throws out doctype declarations but also comments, which in HTML are written <!-- comment text -->.

Just throwing out doctypes isn’t quite enough though—if you run your parser now, it will crash. That’s because after the doctype comes a newline, which our parser treats as text and tries to insert into the tree. Except there isn’t a tree, since the parser hasn’t seen any open tags. For simplicity, let’s just have our browser skip whitespace-only text nodes to side-step the problem:Real browsers retain whitespace to correctly render make<span></span>up as one word and make<span> </span>up as two. Our browser won’t. Plus, ignoring whitespace simplifies later chapters by avoiding a special case for whitespace-only text tags.

def add_text(self, text):
    if text.isspace(): return
    # ...

The first part of the parsed HTML tree for the browser.engineering home page now looks something like this:

 <html lang="en-US" xml:lang="en-US">
   <head>
     <meta charset="utf-8" /="">
       <link rel="prefetch" ...>
         <link rel="prefetch" ...>

Our next problem: why’s everything so deeply indented? Why aren’t these open elements ever closed?

In SGML, document type declarations contained a URL which defined the valid tags, and in older versions of HTML that was also recommended. Browsers do use the absence of a document type declaration to identify very old, pre-SGML versions of HTML,There’s also this crazy thing called “almost standards” or “limited quirks” mode, due to a backward-incompatible change in table cell vertical layout. Yes. I don’t need to make these up! but don’t use the URL, so <!doctype html> is the best document type declaration for modern HTML.

Self-closing Tags

Elements like <meta> and <link> are what are called self-closing: these tags don’t surround content, so you don’t ever write </meta> or </link>. Our parser needs special support for them. In HTML, there’s a specific list of these self-closing tags (the specification calls them “void” tags):A lot of these tags are obscure. Browsers also support some additional, obsolete self-closing tags not listed here, like keygen.

SELF_CLOSING_TAGS = [
    "area", "base", "br", "col", "embed", "hr", "img", "input",
    "link", "meta", "param", "source", "track", "wbr",
]

Our parser needs to auto-close tags from this list:

def add_tag(self, tag):
    # ...
    elif tag in self.SELF_CLOSING_TAGS:
        parent = self.unfinished[-1]
        node = Element(tag, parent)
        parent.children.append(node)

This code looks right, but it doesn’t quite work right. Why not? Because our parser is looking for a tag named meta, but it’s finding a tag named “meta name=...”. The self-closing code isn’t triggered because the <meta> tag has attributes.

HTML attributes add information about an element; open tags can have any number of attributes. Attribute values can be quoted, unquoted, or omitted entirely. Let’s focus on basic attribute support, ignoring values that contain whitespace, which are a little complicated.

Since we’re not handling whitespace in values, we can split on whitespace to get the tag name and the attribute–value pairs:

class HTMLParser:
    def get_attributes(self, text):
        parts = text.split()
        tag = parts[0].casefold()
        attributes = {}
        for attrpair in parts[1:]:
            # ...
        return tag, attributes

HTML tag names are case insensitive, as by the way are attribute names, so I case-fold them.Lower-casing text is the wrong way to do case-insensitive comparisons in languages like Cherokee. In HTML specifically, tag names only use the ASCII characters so lower-casing them would be sufficient, but I’m using Python’s casefold function because it’s a good habit to get into. Then, inside the loop, I split each attribute–value pair into a name and a value. The easiest case is an unquoted attribute, where an equal sign separates the two:

def get_attributes(self, text):
    # ...
    for attrpair in parts[1:]:
        if "=" in attrpair:
            key, value = attrpair.split("=", 1)
            attributes[key.casefold()] = value
    # ...

The value can also be omitted, like in <input disabled>, in which case the attribute value defaults to the empty string:

for attrpair in parts[1:]:
    # ...
    else:
        attributes[attrpair.casefold()] = ""

Finally, the value can be quoted, in which case the quotes have to be stripped out:Quoted attributes allow whitespace between the quotes. Parsing that properly requires something like a finite state machine instead of just splitting on whitespace.

if "=" in attrpair:
    # ...
    if len(value) > 2 and value[0] in ["'", "\""]:
        value = value[1:-1]
    # ...

We’ll store these attributes inside Elements:

class Element:
    def __init__(self, tag, attributes, parent):
        self.tag = tag
        self.attributes = attributes
        # ...

That means we’ll need to call get_attributes at the top of add_tag to get the attributes we need to construct an Element.

def add_tag(self, tag):
    tag, attributes = self.get_attributes(tag)
    # ...

Remember to use tag and attribute instead of text in add_tag, and try your parser again:

 <html>
    <head>
      <meta>
      <link>
      <link>
      <link>
      <link>
      <link>
      <meta>

It’s close! Yes, if you print the attributes, you’ll see that attributes with whitespace (like author on one of the meta tags) are mis-parsed as multiple attributes, and the final slash on the self-closing tags is incorrectly treated as an extra attribute. A better parser would fix these issues. But let’s instead leave our parser as is—these issues aren’t going to be a problem for the browser we’re building—and move on to integrating it with our browser.

Putting a slash at the end of self-closing tags, like <br/>, became fashionable when XHTML looked like it might replace HTML, and old-timers like me never broke the habit. But unlike in XML, in HTML self-closing tags are identified by name, not by some special syntax, so the slash is optional.

Using the Node Tree

Right now, the Layout class works token by token; we now want it to go node by node instead. So let’s separate the old token method into two parts: all the cases for open tags will go into a new open_tag method and all the cases for close tags will go into a new close_tag method:The case for text tokens is no longer needed because our browser can just call the existing add_text method directly.

class Layout:
    def open_tag(self, tag):
        if tag == "i":
            self.style = "italic"
        # ...

    def close_tag(self, tag):
        if tag == "i":
            self.style = "roman"
        # ...

Now we need the Layout object to walk the node tree, calling open_tag, close_tag, and text in the right order:

def recurse(self, tree):
    if isinstance(tree, Text):
        for word in tree.text.split():
            self.word(word)
    else:
        self.open_tag(tree.tag)
        for child in tree.children:
            self.recurse(child)
        self.close_tag(tree.tag)

The Layout constructor can now call recurse instead of looping through the list of tokens. We’ll also need the browser to construct the node tree, like this:

class Browser:
    def load(self, url):
        body = url.request()
        self.nodes = HTMLParser(body).parse()
        self.display_list = Layout(self.nodes).display_list
        self.draw()

Run it—the browser should now use the parsed HTML tree.

The doctype syntax is a form of versioning—declaring which version of HTML the web page is using. But in fact, the html value for doctype signals not just a particular version of HTML, but more generally the HTML living standard.It is not expected that any new doctype version for HTML will ever be added again. It’s called a “living standard” because it changes all the time as features are added. The mechanism for these changes is simply browsers shipping new features, not any change to the “version” of HTML. In general, the web is an unversioned platform—new features are often added as enhancements, but only so long as they don’t break existing ones.Features can be removed, but only if they stop being used by the vast majority of sites. This makes it very hard to remove web features compared with other platforms.

Handling Author Errors

The parser now handles HTML pages correctly—at least when the HTML is written by the sorts of goody-two-shoes programmers who remember the <head> tag, close every open tag, and make their bed in the morning. Mere mortals lack such discipline and so browsers also have to handle broken, confusing, headless HTML. In fact, modern HTML parsers are capable of transforming any string of characters into an HTML tree, no matter how confusing the markup.Yes, it’s crazy, and for a few years in the early 2000s the W3C tried to do away with it. They failed.

The full algorithm is, as you might expect, complicated beyond belief, with dozens of ever-more-special cases forming a taxonomy of human error, but one of its nicer features is implicit tags. Normally, an HTML document starts with a familiar boilerplate:

<!doctype html>
<html>
  <head>
  </head>
  <body>
  </body>
</html>

In reality, all six of these tags, except the doctype, are optional: browsers insert them automatically when the web page omits them. Let’s insert implicit tags in our browser via a new implicit_tags function. We’ll want to call it in both add_text and add_tag:

class HTMLParser:
    def add_text(self, text):
        if text.isspace(): return
        self.implicit_tags(None)
        # ...

    def add_tag(self, tag):
        tag, attributes = self.get_attributes(tag)
        if tag.startswith("!"): return
        self.implicit_tags(tag)
        # ...

Note that implicit_tags isn’t called for the ignored whitespace and doctypes. Let’s also call it in finish, to make sure that an <html> and <body> tag are created even for empty strings:

class HTMLParser:
    def finish(self):
        if not self.unfinished:
            self.implicit_tags(None)
        # ...

The argument to implicit_tags is the tag name (or None for text nodes), which we’ll compare to the list of unfinished tags to determine what’s been omitted:

class HTMLParser:
    def implicit_tags(self, tag):
        while True:
            open_tags = [node.tag for node in self.unfinished]
            # ...

implicit_tags has a loop because more than one tag could have been omitted in a row; every iteration around the loop will add just one. To determine which implicit tag to add, if any, requires examining the open tags and the tag being inserted.

Let’s start with the easiest case, the implicit <html> tag. An implicit <html> tag is necessary if the first tag in the document is something other than <html>:

while True:
    # ...
    if open_tags == [] and tag != "html":
        self.add_tag("html")

Both <head> and <body> can also be omitted, but to figure out which it is we need to look at which tag is being added:

while True:
    # ...
    elif open_tags == ["html"] \
         and tag not in ["head", "body", "/html"]:
        if tag in self.HEAD_TAGS:
            self.add_tag("head")
        else:
            self.add_tag("body")

Here, HEAD_TAGS lists the tags that you’re supposed to put into the <head> element:The <script> tag can go in either the head or the body section, but it goes into the head by default.

class HTMLParser:
    HEAD_TAGS = [
        "base", "basefont", "bgsound", "noscript",
        "link", "meta", "title", "style", "script",
    ]

Note that if both the <html> and <head> tags are omitted, implicit_tags is going to insert both of them by going around the loop twice. In the first iteration open_tags is [], so the code adds an <html> tag; then, in the second iteration, open_tags is ["html"], so it adds a <head> tag.These add_tag methods themselves call implicit_tags, which means you can get into an infinite loop if you forget a case. I’ve been careful to make sure that every tag added by implicit_tags doesn’t itself trigger more implicit tags.

Finally, the </head> tag can also be implicit if the parser is inside the <head> and sees an element that’s supposed to go in the <body>:

while True: