100% found this document useful (3 votes)

5K views1,207 pages

Practical Programming in TCL and TK

Tcl / Tk 8. Is the first scripting language that can handle enterprise-wide integration tasks. In this fully updated third edition, best-selling author Brent Welch presents all you need to know. Welch covers Tcl's extensive network support, as well as safe tcl.

Uploaded by

gunit_le

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

5K views1,207 pages

Practical Programming in TCL and TK

Uploaded by

gunit_le

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1207

Practical Programming in Tcl & Tk, Third Edition By Brent B.

Welch

Publisher: Prentice Hall PTR Pub Date: November 10, 1999 ISBN: 0-13-022028-0 Pages: 832 Supplier: Team FLY

Tcl/Tk 8.2 is the first scripting language that can handle enterprise-wide integration tasks that encompass Windows, Solaris, Macintosh, and other key platforms. Now, in this fully updated Third Edition, Tcl/Tk development team member and best-selling author Brent Welch presents all you need to know to achieve powerful results with Tcl/Tk 8.2 and the new Tcl Web Server. Coverage includes:

Tcl's fundamental mechanisms and operating system interfaces Basic and advanced coding techniques and tools, including the Tcl script library facility Tk and X Windows-with detailed examples and sample widgets The new, extensible Tcl Web Server New Tcl internationalization features and thread support New techniques for working with regular expressions and namespaces You'll find extensive coverage of user interface development, as well as application integration techniques that leverage Tcl/Tk's powerful cross-platform scripting capabilities. Welch covers Tcl's extensive network support, as well as Safe Tcl, C programming with the Tk toolkit, the Tcl compiler, and Tcl/Tk plug-ins for Netscape and Internet Explorer. Whether you're a current Tcl/Tk programmer, or a developer searching for a convenient, powerful multiplatform scripting language, Practical Programming in Tcl and Tk, Third Edition delivers exactly what you're looking for. "This is an excellent book, loaded with useful examples. Newcomers to Tk will find the widget descriptions particularly helpful." -John Ousterhout CEO and founder of Scriptics Corporation and the creator of Tcl/Tk "Brent Welch fills an important need for an introduction to Tcl/Tk with an applied focus and with coverage of many of the useful extensions available . . . I recommend this book to my new students . . . and I keep a copy handy for my own use." -Joseph A. Konstan, Professor of Computer Science University of Minnesota

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Publisher: Prentice Hall PTR Pub Date: November 10, 1999 ISBN: 0-13-022028-0 Pages: 832 Supplier: Team FLY

Copyright List of Examples List of Tables Preface Why Tcl? Tcl and Tk Versions Who Should Read This Book How to Read This Book Other Tcl Books On-line Examples Ftp Archives World Wide Web Newsgroups Typographic Conventions Hot Tips Book Organization What's New in the Third Edition First Edition Thanks Second Edition Thanks Third Edition Thanks Contact the Author Part I. Tcl Basics Chapter 1. Tcl Fundamentals Tcl Commands Hello, World! Variables Command Substitution Math Expressions Backslash Substitution Grouping with Braces and Double Quotes Procedures A Factorial Example More about Variables More about Math Expressions

Comments Substitution and Grouping Summary Fine Points Reference Chapter 2. Getting Started The source Command UNIX Tcl Scripts Windows 95 Start Menu The Macintosh and ResEdit The console Command Command-Line Arguments Predefined Variables Chapter 3. The Guestbook CGI Application A Quick Introduction to HTML CGI for Dynamic Pages The guestbook.cgi Script Defining Forms and Processing Form Data The cgi.tcl Package Next Steps Chapter 4. String Processing in Tcl The string Command The append Command The format Command The scan Command The binary Command Related Chapters Chapter 5. Tcl Lists Tcl Lists Constructing Lists Getting List Elements: llength, lindex, and lrange Modifying Lists: linsert and lreplace Searching Lists: lsearch Sorting Lists: lsort The split Command The join Command Related Chapters Chapter 6. Control Structure Commands
If Then Else Switch While Foreach For Break Catch Error Return

and Continue

Chapter 7. Procedures and Scope The proc Command Changing Command Names with rename Scope The global Command Call by Name Using upvar Variable Aliases with upvar Chapter 8. Tcl Arrays Array Syntax The array Command Building Data Structures with Arrays Chapter 9. Working with Files and Programs Running Programs with exec The file Command Cross-Platform File Naming Manipulating Files and Directories File Attributes Input/Output Command Summary Opening Files for I/O Reading and Writing The Current Directory ?cd and pwd Matching File Names with glob The exit and pid Commands Environment Variables The registry Command Part II. Advanced Tcl Chapter 10. Quoting Issues and Eval Constructing Code with the list Command Exploiting the concat inside eval The uplevel Command The subst Command Chapter 11. Regular Expressions When to Use Regular Expressions Regular Expression Syntax Advanced Regular Expressions Syntax Summary The regexp Command The regsub Command Transforming Data to Program with regsub Other Commands That Use Regular Expressions Chapter 12. Script Libraries and Packages Locating Packages: The auto_path Variable Using Packages Summary of Package Loading The package Command Libraries Based on the tclIndex File

The unknown Command Interactive Conveniences Tcl Shell Library Environment Coding Style Chapter 13. Reflection and Debugging The clock Command The info Command Cross-Platform Support Tracing Variable Values Interactive Command History Debugging Scriptics' TclPro Other Tools Performance Tuning Chapter 14. Namespaces Using Namespaces Namespace Variables Command Lookup Nested Namespaces Importing and Exporting Procedures Callbacks and Namespaces Introspection The namespace Command Converting Existing Packages to use Namespaces
[incr Tcl]

Object System

Notes Chapter 15. Internationalization Character Sets and Encodings Message Catalogs Chapter 16. Event-Driven Programming The Tcl Event Loop The after Command The fileevent Command The vwait Command The fconfigure Command Chapter 17. Socket Programming Client Sockets Server Sockets The Echo Service Fetching a URL with HTTP The http Package Basic Authentication Chapter 18. TclHttpd Web Server Integrating TclHttpd with your Application Domain Handlers Application Direct URLs

Document Types HTML + Tcl Templates Form Handlers Programming Reference Standard Application-Direct URLs The TclHttpd Distribution Server Configuration Chapter 19. Multiple Interpreters and Safe-Tcl The interp Command Creating Interpreters Safe Interpreters Command Aliases Hidden Commands Substitutions I/O from Safe Interpreters The Safe Base Security Policies Chapter 20. Safe-Tk and the Browser Plugin Tk in Child Interpreters The Browser Plugin Security Policies and Browser Plugin Configuring Security Policies Part III. Tk Basics Chapter 21. Tk Fundamentals Hello, World! in Tk Naming Tk Widgets Configuring Tk Widgets Tk Widget Attributes and the Resource Database Summary of the Tk Commands Chapter 22. Tk by Example ExecLog The Example Browser A Tcl Shell Chapter 23. The Pack Geometry Manager Packing toward a Side Horizontal and Vertical Stacking The Cavity Model Packing Space and Display Space Resizing and -expand Anchoring Packing Order Choosing the Parent for Packing Unpacking a Widget Packer Summary Window Stacking Order Chapter 24. The Grid Geometry Manager

A Basic Grid Spanning Rows and Columns Row and Column Constraints The grid Command Chapter 25. The Place Geometry Managery
place

Basics

The Pane Manager The place Command Chapter 26. Binding Commands to Events The bind Command The bindtags Command Event Syntax Modifiers Event Sequences Virtual Events Event Keywords Part IV. Tk Widgets Chapter 27. Buttons and Menus Button Commands and Scope Issues Buttons Associated with Tcl Variables Button Attributes Button Operations Menus and Menubuttons Keyboard Traversal Manipulating Menus and Menu Entries Menu Attributes A Menu by Name Package Chapter 28. The Resource Database An Introduction to Resources Loading Option Database Files Adding Individual Database Entries Accessing the Database User-Defined Buttons User-Defined Menus Chapter 29. Simple Tk Widgets Frames and Toplevel Windows The Label Widget The Message Widget The Scale Widget The bell Command Chapter 30. Scrollbars Using Scrollbars The Scrollbar Protocol The Scrollbar Widget Chapter 31. The Entry Widget Using Entry Widgets

The Entry Widget Chapter 32. The Listbox Widget Using Listboxes Listbox Bindings Listbox Attributes Chapter 33. The Text Widget Text Indices Text Marks Text Tags The Selection Tag Bindings Searching Text Embedded Widgets Embedded Images Looking inside the Text Widget Text Bindings Text Operations Text Attributes Chapter 34. The Canvas Widget Canvas Coordinates Hello, World! The Min Max Scale Example Canvas Objects Canvas Operations Generating Postscript Canvas Attributes Hints Part V. Tk Details Chapter 35. Selections and the Clipboard The Selection Model The selection Command The clipboard Command Selection Handlers Chapter 36. Focus, Grabs, and Dialogs Standard Dialogs Custom Dialogs Animation with the update Command Chapter 37. Tk Widget Attributes Configuring Attributes Size Borders and Relief The Focus Highlight Padding and Anchors Chapter 38. Color, Images, and Cursors Colors Colormaps and Visuals

Bitmaps and Images The Text Insert Cursor The Mouse Cursor Chapter 39. Fonts and Text Attributes Naming a Font X Font Names Font Metrics The font Command Text Attributes Gridding, Resizing, and Geometry A Font Selection Application Chapter 40. Send The send Command The Sender Script Communicating Processes Remote eval through Sockets Chapter 41. Window Managers and Window Information The wm Command The winfo Command The tk Command Chapter 42. Managing User Preferences App-Defaults Files Defining Preferences The Preferences User Interface Managing the Preferences File Tracing Changes to Preference Variables Improving the Package Chapter 43. A User Interface to Bindings A Pair of Listboxes Working Together The Editing Interface Saving and Loading Bindings Part VI. C Programming Chapter 44. C Programming and Tcl Basic Concepts Creating a Loadable Package A C Command Procedure The blob Command Example Strings and Internationalization
Tcl_Main

and Tcl_AppInit

The Event Loop Invoking Scripts from C Chapter 45. Compiling Tcl and Extensions Standard Directory Structure Building Tcl from Source Using Stub Libraries Using autoconf

The Sample Extension Chapter 46. Writing a Tk Widget in C Initializing the Extension The Widget Data Structure The Widget Class Command The Widget Instance Command Configuring and Reconfiguring Attributes Specifying Widget Attributes Displaying the Clock The Window Event Procedure Final Cleanup Chapter 47. C Library Overview An Overview of the Tcl C Library An Overview of the Tk C Library Part VII. Changes Chapter 48. Tcl 7.4/Tk 4.0 wish Obsolete Features The cget Operation Input Focus Highlight Bindings Scrollbar Interface
pack info

Focus The send Command Internal Button Padding Radiobutton Value Entry Widget Menus Listboxes No geometry Attribute Text Widget Color Attributes Color Allocation and tk colormodel Canvas scrollincrement The Selection The bell Command Chapter 49. Tcl 7.5/Tk 4.1 Cross-Platform Scripts The clock Command The load Command The package Command Multiple foreach loop variables Event Loop Moves from Tk to Tcl Network Sockets Multiple Interpreters and Safe-Tcl

The grid Geometry Manager The Text Widget The Entry Widget Chapter 50. Tcl 7.6/Tk 4.2 More file Operations Virtual Events Standard Dialogs New grid Geometry Manager Macintosh unsupported1 Command Chapter 51. Tcl/Tk 8.0 The Tcl Compiler Namespaces Safe-Tcl New lsort
tcl_precision

Variable

Year 2000 Convention Http Package Serial Line I/O Platform-Independent Fonts The tk scaling Command Application Embedding Native Menus and Menubars CDE Border Width Native Buttons and Scrollbars Images in Text Widgets No Errors from destroy
grid rowconfigure

The Patch Releases Chapter 52. Tcl/Tk 8.1 Unicode and Internationalization Thread Safety Advanced Regular Expressions New String Commands The DDE Extension Miscellaneous Chapter 53. Tcl/Tk 8.2 The Trf Patch Faster String Operations Empty Array Names Brower Plugin Compatiblity Chapter 54. Tcl/Tk 8.3 Proposed Tcl Changes Proposed Tk Changes Chapter 55. About The CD-ROM Technical Support Index

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Copyright
Library of Congress Cataloging-in-Publication Data Welch, Brent. B. Practical programming in Tcl and Tk / Brent B. Welch.-- 3rd ed. p. cm. ISBN 0-13-022028-0 1. Tcl (Computer program language) 2. Tk toolkit. I. Title. QA76.73.T44 W45 1999 005.13'3--dc21 99-047206

Credits
Editorial/Production Supervision: Joan L. McNamara Acquisitions Editor: Mark Taub Marketing Manager: Kate Hargett Editorial Assistant: Michael Fredette Cover Design Director: Jerry Votta Cover Design: Design Source Manufacturing Manager: Alexis R. Heydt 2000, 1997 by Prentice Hall PTR Prentice-Hall, Inc. Upper Saddle River, New Jersey 07458

Prentice Hall books are widely used by corporations and government agencies for training, marketing, and resale. The publisher offers discounts on this book when ordered in bulk quantities. For more information, contact: Corporate Sales Department, Prentice Hall PTR, One Lake Street, Upper Saddle River, NJ 07458 Phone: 800-382-3419; Fax: 201-236-7141; email: [email protected] All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. All product names mentioned herein are the trademarks of their respective owners. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 Prentice-Hall International (UK) Limited, London Prentice-Hall of Australia Pty. Limited, Sydney Prentice-Hall Canada Inc., Toronto Prentice-Hall Hispanoamericana, S.A., Mexico Prentice-Hall of India Private Limited, New Delhi Prentice-Hall of Japan, Inc., Tokyo Prentice-Hall (Singapore) Pte. Ltd., Singapore Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro

Dedication
to Jody, Christopher, Daniel, and Michael

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

List of Examples
1.1 The "Hello, World!" example 1.2 Tcl variables 1.3 Command substitution 1.4 Simple arithmetic 1.5 Nested commands 1.6 Built-in math functions 1.7 Grouping expressions with braces 1.8 Quoting special characters with backslash 1.9 Continuing long lines with backslashes 1.10 Grouping with double quotes vs. braces 1.11 Embedded command and variable substitution 1.12 Defining a procedure 1.13 A while loop to compute factorial 1.14 A recursive definition of factorial 1.15 Using set to return a variable value 1.16 Embedded variable references 1.17 Using info to determine if a variable exists 1.18 Controlling precision with tcl_precision 2.1 A standalone Tcl script on UNIX

2.2 A standalone Tk script on UNIX 2.3 Using /bin/sh to run a Tcl script 2.4 The EchoArgs script 3.1 A simple CGI script 3.2 Output of Example 3-1 3.3 The guestbook.cgi script 3.4 The Cgi_Header procedure 3.5 The Link command formats a hypertext link 3.6 Initial output of guestbook.cgi 3.7 Output of guestbook.cgi 3.8 The newguest.html form 3.9 The newguest.cgi script 4.1 Comparing strings with string compare 4.2 Comparing strings with string equal 4.3 Mapping Microsoft World special characters to ASCII 5.1 Constructing a list with the list command 5.2 Using lappend to add elements to a list 5.3 Using concat to splice lists together 5.4 Double quotes compared to the concat and list commands 5.5 Modifying lists with linsert and lreplace 5.6 Deleting a list element by value 5.7 Sorting a list using a comparison function 5.8 Use split to turn input data into Tcl lists 5.9 Implementing join in Tcl 6.1 A conditional if then else command 6.2 Chained conditional with elseif 6.3 Using switch for an exact match 6.4 Using switch with substitutions in the patterns

6.5 A switch with "fall through" cases 6.6 Comments in switch commands 6.7 A while loop to read standard input 6.8 Looping with foreach 6.9 Parsing command-line arguments 6.10 Using list with foreach 6.11 Multiple loop variables with foreach 6.12 Multiple value lists with foreach 6.13 A for loop 6.14 A standard catch phrase 6.15 A longer catch phrase 6.16 There are several possible return values from catch 6.17 Raising an error 6.18 Preserving errorInfo when calling error 6.19 Raising an error with return 7.1 Default parameter values 7.2 Variable number of arguments 7.3 Variable scope and Tcl procedures 7.4 A random number generator. 7.5 Print variable by name 7.6 Improved incr procedure 8.1 Using arrays 8.2 Referencing an array indirectly 8.3 Referencing an array indirectly using upvar 8.4 ArrayInvert inverts an array 8.5 Using arrays for records, version 1 8.6 Using arrays for records, version 2 8.7 Using a list to implement a stack

8.8 Using an array to implement a stack 8.9 A list of arrays 8.10 A list of arrays 8.11 A simple in-memory database 9.1 Using exec on a process pipeline 9.2 Comparing file modify times 9.3 Determining whether pathnames reference the same file 9.4 Opening a file for writing 9.5 A more careful use of open 9.6 Opening a process pipeline 9.7 Prompting for input 9.8 A read loop using gets 9.9 A read loop using read and split 9.10 Copy a file and translate to native format 9.11 Finding a file by name 9.12 Printing environment variable values 10.1 Using list to construct commands 10.2 Generating procedures dynamically with a template 10.3 Using eval with $args 10.4 lassign: list assignment with foreach 10.5 The File_Process procedure applies a command to each line of a file 11.1 Expanded regular expressions allow comments 11.2 Using regular expressions to parse a string 11.3 A pattern to match URLs 11.4 An advanced regular expression to match URLs 11.5 The Url_Decode procedure 11.6 The Cgi_Parse and Cgi_Value procedures 11.7 Cgi_Parse and Cgi_Value store query data in the cgi array

11.8 Html_DecodeEntity 11.9 Html_Parse 12.1 Maintaining a tclIndex file 12.2 Loading a tclIndex file 13.1 Calculating clicks per second 13.2 Printing a procedure definition 13.3 Mapping form data onto procedure arguments 13.4 Finding built-in commands 13.5 Getting a trace of the Tcl call stack 13.6 A procedure to read and evaluate commands 13.7 Using info script to find related files 13.8 Tracing variables 13.9 Creating array elements with array traces 13.10 Interactive history usage 13.11 Implementing special history syntax 13.12 A Debug procedure 13.13 Time Stamps in log records 14.1 Random number generator using namespaces 14.2 Random number generator using qualified names 14.3 Nested namespaces 14.4 The code procedure to wrap callbacks 14.5 Listing commands defined by a namespace 15.1 MIME character sets.and file encodings 15.2 Using scripts in nonstandard encodings 15.3 Three sample message catalog files 15.4 Using msgcat::mcunknown to share message catalogs 16.1 A read event file handler 16.2 Using vwait to activate the event loop

16.3 A read event file handler for a nonblocking channel 17.1 Opening a client socket with a timeout 17.2 Opening a server socket 17.3 The echo service 17.4 A client of the echo service 17.5 Opening a connection to an HTTP server 17.6 Opening a connection to an HTTP server 17.7 Http_Head validates a URL 17.8 Using Http_Head 17.9 Http_Get fetches the contents of a URL 17.10 HttpGetText reads text URLs 17.11 HttpCopyDone is used with fcopy 17.12 Downloading files with http::geturl 17.13 Basic Authentication using http::geturl 18.1 A simple URL domain 18.2 Application Direct URLs 18.3 Alternate types for Application Direct URLs 18.4 A sample document type handler 18.5 A one-level site structure 18.6 A HTML + Tcl template file 18.7 SitePage template procedure 18.8 SiteMenu and SiteFooter template procedures 18.9 The SiteLink procedure 18.10 Mail form results with /mail/forminfo 18.11 Mail message sent by /mail/forminfo 18.12 Processing mail sent by /mail/forminfo 18.13 A self-checking form procedure

18.14 A page with a self-checking form 18.15 The /debug/source application-direct URL implementation 19.1 Creating and deleting an interpreter 19.2 Creating a hierarchy of interpreters 19.3 A command alias for exit 19.4 Querying aliases 19.5 Dumping aliases as Tcl commands 19.6 Substitutions and hidden commands 19.7 Opening a file for an unsafe interpreter 19.8 The Safesock security policy 19.9 The Tempfile security policy 19.10 Restricted puts using hidden commands 19.11 A safe after command 21.1 "Hello, World!" Tk program. 21.2 Looking at all widget attributes 22.1 Logging the output of a program run with exec 22.2 A platform-specific cancel event 22.3 A browser for the code examples in the book 22.4 A Tcl shell in a text widget 22.5 Macintosh look and feel 22.6 Windows look and feel 22.7 UNIX look and feel 23.1 Two frames packed inside the main frame 23.2 Turning off geometry propagation 23.3 A horizontal stack inside a vertical stack 23.4 Even more nesting of horizontal and vertical stacks 23.5 Mixing bottom and right packing sides 23.6 Filling the display into extra packing space

23.7 Using horizontal fill in a menu bar 23.8 The effects of internal padding (-ipady) 23.9 Button padding vs. packer padding 23.10 The look of a default button 23.11 Resizing without the expand option 23.12 Resizing with expand turned on 23.13 More than one expanding widget 23.14 Setup for anchor experiments 23.15 The effects of noncenter anchors 23.16 Animating the packing anchors 23.17 Controlling the packing order 23.18 Packing into other relatives 24.1 A basic grid 24.2 A grid with sticky settings 24.3 A grid with row and column specifications 24.4 A grid with external padding 24.5 A grid with internal padding 24.6 All combinations of -sticky settings 24.7 Explicit row and column span 24.8 Grid syntax row and column span 24.9 Row padding compared to widget padding 24.10 Gridding a text widget and scrollbar 25.1 Centering a window with place 25.2 Covering a window with place 25.3 Combining relative and absolute sizes 25.4 Positioning a window above a sibling with place 25.5 Pane_Create sets up vertical or horizontal panes 25.6 PaneDrag adjusts the percentage

25.7 PaneGeometry updates the layout 26.1 Bindings on different binding tags 26.2 Output from the UNIX xmodmap program 26.3 Emacs-like binding convention for Meta and Escape 26.4 Virtual events for cut, copy, and paste 27.1 A troublesome button command 27.2 Fixing the troublesome situation 27.3 A button associated with a Tcl procedure 27.4 Radiobuttons and checkbuttons 27.5 A command on a radiobutton or checkbutton 27.6 A menu sampler 27.7 A menu bar in Tk 8.0 27.8 A simple menu by name package 27.9 Using the Tk 8.0 menu bar facility 27.10 MenuGet maps from name to menu 27.11 Adding menu entries 27.12 A wrapper for cascade entries 27.13 Using the menu by name package 27.14 Keeping the accelerator display up to date 28.1 Reading an option database file 28.2 A file containing resource specifications 28.3 Using resources to specify user-defined buttons 28.4 Resource_ButtonFrame defines buttons based on resources 28.5 Using Resource_ButtonFrame 28.6 Specifying menu entries via resources 28.7 Defining menus from resource specifications 28.8 Resource_GetFamily merges user and application resources 29.1 Macintosh window styles

29.2 A label that displays different strings 29.3 The message widget formats long lines of text 29.4 Controlling the text layout in a message widget 29.5 A scale widget 30.1 A text widget and two scrollbars 30.2 Scroll_Set manages optional scrollbars 30.3 Listbox with optional scrollbars 31.1 A command entry 32.1 Choosing items from a listbox 33.1 Tag configurations for basic character styles 33.2 Line spacing and justification in the text widget 33.3 An active text button 33.4 Delayed creation of embedded widgets 33.5 Using embedded images for a bulleted list 33.6 Finding the current range of a text tag 33.7 Dumping the text widget 33.8 Dumping the text widget with a command callback 34.1 A large scrolling canvas 34.2 The canvas "Hello, World!" example 34.3 A min max scale canvas example 34.4 Moving the markers for the min max scale 34.5 Canvas arc items 34.6 Canvas bitmap items 34.7 Canvas image items 34.8 A canvas stroke drawing example 34.9 Canvas oval items 34.10 Canvas polygon items 34.11 Dragging out a box

34.12 Simple edit bindings for canvas text items 34.13 Using a canvas to scroll a set of widgets 34.14 Generating postscript from a canvas 35.1 Paste the PRIMARY or CLIPBOARD selection 35.2 Separate paste actions 35.3 Bindings for canvas selection 35.4 Selecting objects 35.5 A canvas selection handler 35.6 The copy and cut operations 35.7 Pasting onto the canvas 36.1 Procedures to help build dialogs 36.2 A simple dialog 36.3 A feedback procedure 37.1 Equal-sized labels 37.2 3D relief sampler 37.3 Padding provided by labels and buttons 37.4 Anchoring text in a label or button 37.5 Borders and padding 38.1 Resources for reverse video 38.2 Computing a darker color 38.3 Specifying an image for a widget 38.4 Specifying a bitmap for a widget 38.5 The built-in bitmaps 38.6 The Tk cursors 39.1 The FontWidget procedure handles missing fonts 39.2 Font metrics 39.3 A gridded, resizable listbox

39.4 Font selection dialog 40.1 The sender application 40.2 Hooking the browser to an eval server 40.3 Making the shell into an eval server 40.4 Remote eval using sockets 40.5 Reading commands from a socket 40.6 The client side of remote evaluation 41.1 Gridded geometry for a canvas 41.2 Telling other applications what your name is 42.1 Preferences initialization 42.2 Adding preference items 42.3 Setting preference variables 42.4 Using the preferences package 42.5 A user interface to the preference items 42.6 Interface objects for different preference types 42.7 Displaying the help text for an item 42.8 Saving preferences settings to a file 42.9 Read settings from the preferences file 42.10 Tracing a Tcl variable in a preference item 43.1 A user interface to widget bindings 43.2 Bind_Display presents the bindings for a widget or class 43.3 Related listboxes are configured to select items together 43.4 Controlling a pair of listboxes with one scrollbar 43.5 Drag-scrolling a pair of listboxes together 43.6 An interface to define bindings 43.7 Defining and saving bindings 44.1 The initialization procedure for a loadable package 44.2 The RandomCmd C command procedure

44.3 The RandomObjCmd C command procedure 44.4 The Tcl_Obj structure 44.5 The Plus1ObjCmd procedure 44.6 The Blob and BlobState data structures 44.7 The Blob_Init and BlobCleanup procedures 44.8 The BlobCmd command procedure 44.9 BlobCreate and BlobDelete 44.10 The BlobNames procedure 44.11 The BlobN and BlobData procedures 44.12 The BlobCommand and BlobPoke procedures 44.13 A canonical Tcl main program and Tcl_AppInit 44.14 A canonical Tk main program and Tk_AppInit 44.15 Calling C command procedure directly with Tcl_Invoke 46.1 The Clock_Init procedure 46.2 The Clock widget data structure 46.3 The ClockCmd command procedure 46.4 The ClockObjCmd command procedure 46.5 The ClockInstanceCmd command procedure 46.6 The ClockInstanceObjCmd command procedure 46.7 ClockConfigure allocates resources for the widget 46.8 ClockObjConfigure allocates resources for the widget 46.9 The Tk_ConfigSpec typedef 46.10 Configuration specs for the clock widget 46.11 The Tk_OptionSpec typedef 46.12 The Tk_OptionSpec structure for the clock widget 46.13 ComputeGeometry computes the widget's size 46.14 The ClockDisplay procedure 46.15 The ClockEventPro handles window events

46.16 The ClockDestroy cleanup procedure 46.17 The ClockObjDelete command

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

List of Tables
1-1 Backslash sequences 1-2 Arithmetic operators from highest to lowest precedence 1-3 Built-in math functions 1-4 Built-in Tcl commands 2-1 Wish command line options 2-2 Variables defined by tclsh and wish 3-1 HTML tags used in the examples 4-1 The string command 4-2 Matching characters used with string match 4-3 Character class names 4-4 Format conversions 4-5 Format flags 4-6 Binary conversion types 5-1 List-related commands 8-1 The array command 9-1 Summary of the exec syntax for I/O redirection 9-2 The file command options 9-3 Array elements defined by file stat 9-4 Platform-specific file attributes

9-5 Tcl commands used for file access 9-6 Summary of the open access arguments 9-7 Summary of POSIX flags for the access argument 9-8 The registry command 9-9 The registry data types 11-1 Basic regular expression syntax 11-2 Additional advanced regular expression syntax 11-3 Character classes 11-4 Backslash escapes in regular expressions 11-5 Embedded option characters used with the (?x) syntax 11-6 Options to the regexp command 11-7 Sample regular expressions 12-1 Options to the pkg_mkIndex command 12-2 The package command 13-1 The clock command 13-2 Clock formatting keywords 13-3 UNIX-specific clock formatting keywords 13-4 The info command 13-5 The history command 13-6 Special history syntax 14-1 The namespace command 15-1 The encoding command 15-2 The msgcat package 16-1 The after command 16-2 The fileevent command 16-3 I/O channel properties controlled by fconfigure 16-4 End of line translation modes 17-1 Options to the http::geturl command

17-2 Elements of the http::geturl state array 17-3 The http support procedures 18-1 Httpd support procedures 18-2 Url support procedures 18-3 Doc procedures for configuration 18-4 Doc procedures for generating responses 18-5 Doc procedures that support template processing 18-6 The form package 18-7 Elements of the page array 18-8 Elements of the env array 18-9 Status application-direct URLs 18-10 Debug application-direct URLs 18-11 Application-direct URLS that e-mail form results 18-12 Basic TclHttpd Parameters 19-1 The interp command 19-2 Commands hidden from safe interpreters 19-3 The safe base master interface 19-4 The safe base slave aliases 20-1 Tk commands omitted from safe interpreters 20-2 Plugin Environment Variables 20-3 Aliases defined by the browser package 20-4 The browser::getURL callbacks 21-1 Tk widget-creation commands 21-2 Tk widget-manipulation commands 21-3 Tk support procedures 23-1 The pack command 23-2 Packing options 24-1 The grid command

24-2 Grid widget options 25-1 The place command 25-2 Placement options 26-1 Event types 26-2 Event modifiers 26-3 The event command 26-4 A summary of the event keywords 27-1 Resource names of attributes for all button widgets 27-2 Button operations 27-3 Menu entry index keywords 27-4 Menu operations 27-5 Menu attribute resource names. 27-6 Attributes for menu entries 29-1 Attributes for frame and toplevel widgets 29-2 Label Attributes 29-3 Message Attributes 29-4 Bindings for scale widgets 29-5 ttributes for scale widgets 29-6 perations on the scale widget 30-1 Bindings for the scrollbar widget 30-2 Attributes for the scrollbar widget 30-3 Operations on the scrollbar widget 31-1 Entry bindings 31-2 Entry attribute resource names 31-3 Entry indices 31-4 Entry operations 32-1 Listbox indices 32-2 Listbox operations

32-3 The values for the selectMode of a listbox 32-4 Bindings for browse selection mode 32-5 Bindings for single selection mode 32-6 Bindings for extended selection mode 32-7 Bindings for multiple selection mode 32-8 Listbox scroll bindings 32-9 Listbox attribute resource names 33-1 Text indices 33-2 Index modifiers for text widgets 33-3 Attributes for text tags 33-4 Options to the search operation 33-5 Window and image alignment options 33-6 Options to the window create operation 33-7 Options to the image create operation 33-8 Bindings for the text widget 33-9 Operations for the text widget 33-10 Text attribute resource names 34-1 Arc attributes 34-2 Bitmap attributes 34-3 Image attributes 34-4 Line attributes 34-5 Oval attributes 34-6 Polygon attributes 34-7 Rectangle attributes 34-8 Indices for canvas text items 34-9 Canvas operations that apply to text items 34-10 Text attributes 34-11 Operations on a canvas widget

34-12 Canvas postscript options 34-13 Canvas attribute resource names 35-1 The selection command 35-2 The clipboard command 36-1 Options to tk_messageBox 36-2 Options to the standard file dialogs 36-3 Options to tk_chooseColor 36-4 The focus command 36-5 The grab command 36-6 The tkwait command 37-1 Size attribute resource names 37-2 Border and relief attribute resource names 37-3 Highlight attribute resource names 37-4 Layout attribute resource names 38-1 Color attribute resource names 38-2 Windows system colors 38-3 Macintosh system colors 38-4 Visual classes for displays 38-5 Summary of the image command 38-6 Bitmap image options 38-7 Photo image attributes 38-8 Photo image operations 38-9 Copy options for photo images 38-10 Read options for photo images 38-11 Write options for photo images 38-12 Cursor attribute resource names 39-1 Font attributes

39-2 X Font specification components 39-3 The font command 39-4 Layout attribute resource names 39-5 Selection attribute resource names 40-1 Options to the send command 41-1 Size, placement and decoration window manager operations 41-2 Window manager commands for icons 41-3 Session-related window manager operations 41-4 Miscellaneous window manager operations 41-5 send command information 41-6 Window hierarchy information 41-7 Window size information 41-8 Window location information 41-9 Virtual root window information 41-10 Atom and window ID information 41-11 Colormap and visual class information 45-1 The Tcl source directory structure 45-2 The installation directory structure 45-3 Standard configure flags 45-4 TEA standard Makefile targets 46-1 Configuration flags and corresponding C types 48-1 Changes in color attribute names 52-1 The testthread command 52-2 The dde command options

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Preface
Tcl stands for Tool Command Language. Tcl is really two things: a scripting language, and an interpreter for that language that is designed to be easy to embed into your application. Tcl and its associated graphical user-interface toolkit, Tk, were designed and crafted by Professor John Ousterhout of the University of California, Berkeley. You can find these packages on the Internet (as explained on page lii) and use them freely in your application, even if it is commercial. The Tcl interpreter has been ported from UNIX to DOS, Windows, OS/2, NT, and Macintosh environments. The Tk toolkit has been ported from the X window system to Windows and Macintosh. I first heard about Tcl in 1988 while I was Ousterhout's Ph.D. student at Berkeley. We were designing a network operating system, Sprite. While the students hacked on a new kernel, John wrote a new editor and terminal emulator. He used Tcl as the command language for both tools so that users could define menus and otherwise customize those programs. This was in the days of X10, and he had plans for an X toolkit based on Tcl that would help programs cooperate with each other by communicating with Tcl commands. To me, this cooperation among tools was the essence of Tcl. This early vision imagined that applications would be large bodies of compiled code and a small amount of Tcl used for configuration and high-level commands. John's editor, mx, and the terminal emulator, tx, followed this model. While this model remains valid, it has also turned out to be possible to write entire applications in Tcl. This is because the Tcl/Tk shell, wish, provides access to other programs, the file system, network sockets, plus the ability to create a graphical user interface. For better or worse, it is now common to find applications that contain thousands of lines of Tcl script. This book was written because, while I found it enjoyable and productive to use Tcl and Tk, there were times when I was frustrated. In addition, working at Xerox PARC, with many experts in languages and systems, I was compelled to understand both the strengths and weaknesses of Tcl and Tk. Although many of my colleagues adopted Tcl and Tk for their projects, they were also just as quick to point out its flaws. In response, I have built up a set of programming techniques that exploit the power of Tcl and Tk while avoiding troublesome areas. This book is meant as a practical guide to help you get the most out of Tcl and Tk and avoid some of the frustrations I experienced. It has been about 10 years since I was introduced to Tcl, and about five years since the first edition of this book. During the last several years I have been working under John Ousterhout, first at Sun Microsystems and now at Scriptics Corporation. I have managed to remain mostly a Tcl programmer while others in our group have delved into the C implementation of Tcl itself. I've been building applications like HTML editors, e-mail user interfaces, Web servers, and the customer database we run our business on. This experience is reflected in this book. The bulk of the book is about Tcl scripting,

and the aspects of C programming to create Tcl extensions is given a lighter treatment. I have been lucky to remain involved in the core Tcl development, and I hope I can pass along the insights I have gained by working with Tcl.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Preface

Why Tcl?
As a scripting language, Tcl is similar to other UNIX shell languages such as the Bourne Shell (sh), the C Shell (csh), the Korn Shell (ksh), and Perl. Shell programs let you execute other programs. They provide enough programmability (variables, control flow, and procedures) to let you build complex scripts that assemble existing programs into a new tool tailored for your needs. Shells are wonderful for automating routine chores. It is the ability to easily add a Tcl interpreter to your application that sets it apart from other shells. Tcl fills the role of an extension language that is used to configure and customize applications. There is no need to invent a command language for your new application, or struggle to provide some sort of userprogrammability for your tool. Instead, by adding a Tcl interpreter, you structure your application as a set of primitive operations that can be composed by a script to best suit the needs of your users. It also allows other programs to have programmatic control over your application, leading to suites of applications that work well together. The Tcl C library has clean interfaces and is simple to use. The library implements the basic interpreter and a set of core scripting commands that implement variables, flow control, and procedures (see page 22). There is also a set of commands that access operating system services to run other programs, access the file system, and use network sockets. Tk adds commands to create graphical user interfaces. Tcl and Tk provide a "virtual machine" that is portable across UNIX, Windows, and Macintosh environments. The Tcl virtual machine is extensible because your application can define new Tcl commands. These commands are associated with a C or C++ procedure that your application provides. The result is applications that are split into a set of primitives written in a compiled language and exported as Tcl commands. A Tcl script is used to compose the primitives into the overall application. The script layer has access to shell-like capability to run other programs, has access to the file system, and can call directly into the compiled part of the application through the Tcl commands you define. In addition, from the C programming level, you can call Tcl scripts, set and query Tcl variables, and even trace the execution of the Tcl interpreter. There are many Tcl extensions freely available on the Internet. Most extensions include a C library that provides some new functionality, and a Tcl interface to the library. Examples include database access, telephone control, MIDI controller access, and expect, which adds Tcl commands to control interactive programs.

The most notable extension is Tk, a toolkit for graphical user interfaces. Tk defines Tcl commands that let you create and manipulate user interface widgets. The script-based approach to user interface programming has three benefits: Development is fast because of the rapid turnaround; there is no waiting for long compilations. The Tcl commands provide a higher-level interface than most standard C library user-interface toolkits. Simple user interfaces require just a handful of commands to define them. At the same time, it is possible to refine the user interface in order to get every detail just so. The fast turnaround aids the refinement process. The user interface can be factored out from the rest of your application. The developer can concentrate on the implementation of the application core and then fairly painlessly work up a user interface. The core set of Tk widgets is often sufficient for all your user interface needs. However, it is also possible to write custom Tk widgets in C, and again there are many contributed Tk widgets available on the network. There are other choices for extension languages that include Visual Basic, Scheme, Elisp, Perl, Python, and Javascript. Your choice between them is partly a matter of taste. Tcl has simple constructs and looks somewhat like C. It is easy to add new Tcl primitives by writing C procedures. Tcl is very easy to learn, and I have heard many great stories of users completing impressive projects in a short amount of time (e.g., a few weeks), even though they never used Tcl before. Java has exploded onto the computer scene since this book was first published. Java is a great systems programming language that in the long run could displace C and C++. This is fine for Tcl, which is designed to glue together building blocks written in any system programming language. Tcl was designed to work with C, but has been adapted to work with the Java Virtual Machine. Where I say "C or C++", you can now say "C, C++, or Java," but the details are a bit different with Java. This book does not describe the Tcl/Java interface, but you can find TclBlend on the CD-ROM. TclBlend loads the Java Virtual Machine into your Tcl application and lets you invoke Java methods. It also lets you implement Tcl commands in Java instead of C or C++. Javascript is a language from Netscape that is designed to script interactions with Web pages. Javascript is important because Netscape is widely deployed. However, Tcl provides a more general purpose scripting solution that can be used in a wide variety of applications. The Tcl/Tk Web browser plugin provides a way to run Tcl in your browser. It turns out to be more of a Java alternative than a JavaScript alternative. The plugin lets you run Tcl applications inside your browser, while JavaScript gives you fine grain control over the browser and HTML display. The plugin is described in Chapter 20.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Preface

Tcl and Tk Versions

Tcl and Tk continue to evolve. See http://www.beedub.com/book/ for updates and news about the latest Tcl releases. Tcl and Tk have had separate version numbers for historical reasons, but they are released in pairs that work together. The original edition of this book was based on Tcl 7.4 and Tk 4.0, and there were a few references to features in Tk 3.6. This third edition has been updated to reflect new features added through Tcl/Tk 8.2: Tcl 7.5 and Tk 4.1 had their final release in May 1996. These releases feature the port of Tk to the Windows and Macintosh environments. The Safe-Tcl security mechanism was introduced to support safe execution of network applets. There is also network socket support and a new Input/Output (I/O) subsystem to support high-performance event-driven I/O. Tcl 7.6 and Tk 4.2 had their final release in October 1996. These releases include improvements in Safe-Tcl, and improvements to the grid geometry manager introduced in Tk 4.1. Crossplatform support includes virtual events (e.g., <<Copy>> as opposed to <Control-c>), standard dialogs, and more file manipulation commands. Tcl 7.7 and Tk 4.3 were internal releases used for the development of the Tcl/Tk plug-in for the Netscape Navigator and Microsoft Internet Explorer Web browsers. Their development actually proceeded in parallel to Tcl 7.6 and Tk 4.2. The plug-in has been released for a wide variety of platforms, including Solaris/SPARC, Solaris/INTEL, SunOS, Linux, Digital UNIX, IRIX, HP/UX, Windows 95, Windows NT, and the Macintosh. The browser plug-in supports Tcl applets in Web pages and uses the sophisticated security mechanism of Safe-Tcl to provide safety. Tcl 8.0 features an on-the-fly compiler for Tcl that provides many-times faster Tcl scripts. Tcl 8.0 supports strings with embedded null characters. The compiler is transparent to Tcl scripts, but extension writers need to learn some new C APIs to take advantage of its potential. The release history of 8.0 spread out over a couple of years as John Ousterhout moved from Sun Microsystems to Scriptics Corporation. The widely used 8.0p2 release was made in the fall of 1997, but the final patch release, 8.0.5, was made in the spring of 1999. Tk changed its version to match Tcl at 8.0. Tk 8.0 includes a new platform-independent font mechanism, native menus and menu bars, and more native widgets for better native look and feel on Windows and Macintosh.

This book was the second Tcl book after the original book by John Ousterhout, the creator of Tcl. Since then, the number of Tcl books has increased remarkably. The following are just some of the books currently available. Tcl and the Tk Toolkit (Addison-Wesley, 1994) by John Ousterhout provides a broad overview of all aspects of Tcl and Tk, even though it covers only Tcl 7.3 and Tk 3.6. The book provides a more detailed treatment of C programming for Tcl extensions. Exploring Expect (O'Reilly & Associates, Inc., 1995) by Don Libes is a great book about an extremely useful Tcl extension. Expect lets you automate the use of interactive programs like ftp and telnet that expect to interact with a user. By combining expect and Tk, you can create graphical user interfaces for old applications that you cannot modify directly. Graphical Applications with Tcl & Tk (M&T Press, 1996) by Eric Johnson is oriented toward Windows users. The second edition is up-to-date with Tcl/Tk 8.0. Tcl/Tk Tools (O'Reilly & Associates, Inc., 1997) by Mark Harrison describes many useful Tcl extensions. These include Oracle and Sybase interfaces, object-oriented language enhancements, additional Tk widgets, and much more. The chapters were contributed by the authors of the extensions, so they provide authoritative information on some excellent additions to the Tcl toolbox. CGI Developers Resource, Web Programming with Tcl and Perl (Prentice Hall, 1997) by John Ivler presents Tcl-based solutions to programming Web sites. Effective Tcl/Tk Programming (Addison Wesley, 1997) by Michael McLennan and Mark Harrison illustrate Tcl and Tk with examples and application design guidelines. Interactive Web Applications with Tcl/Tk (AP Professional, 1998) by Michael Doyle and Hattie Schroeder describes Tcl programming in the context of the Web browser plugin. Tcl/Tk for Programmers (IEEE Computer Society, 1998) by Adrian Zimmer describes Unix and Windows programming with Tcl/Tk. This book also includes solved exercises at the end of each chapter. Tcl/Tk for Real Programmers (Academic Press, 1999) by Clif Flynt is another example-oriented book.

Tcl/Tk in a Nutshell (O'Reilly, 1999) by Paul Raines and Jeff Tranter is a handy reference guide. It covers several popular extensions including Expect, [incr Tcl], Tix, TclX, BLT, SybTcl, OraTcl, and TclODBC. There is a tiny pocket-reference guide for Tcl/Tk that may eliminate the need to thumb through my large book to find the syntax of a particular Tcl or Tk command. Web Tcl Complete (McGraw Hill, 1999) by Steve Ball describes programming with the Tcl Web Server. It also covers Tcl/Java integration using TclBlend. [incr Tcl] From The Ground Up (Osborn-McGraw Hill, 1999) by Chad Smith describes the [incr Tcl] object-oriented extension to Tcl.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Preface

On-line Examples
The book comes with a CD-ROM that has source code for all of the examples, plus a selection of Tcl freeware found on the Internet. The CD-ROM is created with the Linux mkhybrid program, so it is readable on UNIX, Windows, and Macintosh. There, you will find the versions of Tcl and Tk that were available as the book went to press. You can also retrieve the sources shown in the book from my personal Web site: http://www.beedub.com/book/

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Preface

Ftp Archives
The primary site for the Tcl and Tk distributions is given below as a Universal Resource Location (URL): ftp://ftp.scriptics.com/pub/tcl You can use FTP and log in to the host (e.g., ftp.scriptics.com) under the anonymous user name. Give your e-mail address as the password. The directory is in the URL after the host name (e.g., /pub/tcl). There are many sites that mirror this distribution. The mirror sites provide an archive site for contributed Tcl commands, Tk widgets, and applications. There is also a set of Frequently Asked Questions files. These are some of the sites that maintain Tcl archives ftp://ftp.neosoft.com/pub/tcl ftp://ftp.syd.dit.csiro.au/pub/tk ftp://ftp.ibp.fr/pub/tcl ftp://src.doc.ic.ac.uk/packages/tcl/ ftp://ftp.luth.se/pub/unix/tcl/ ftp://sunsite.cnlab-switch.ch/mirror/tcl ftp://ftp.sterling.com/programming/languages/tcl ftp://ftp.sunet.se/pub/lang/tcl ftp://ftp.cs.columbia.edu/archives/tcl ftp://ftp.uni-paderborn.de/pub/unix/tcl ftp://sunsite.unc.edu/pub/languages/tcl ftp://ftp.funet.fi/pub/languages/tcl You can use a World Wide Web browser like Mosaic, Netscape, Internet Explorer, or Lynx to access

these sites. Enter the URL as specified above, and you are presented with a directory listing of that location. From there you can change directories and fetch files. If you do not have direct FTP access, you can use an e-mail server for FTP. Send e-mail to [email protected] with the message Help to get directions. If you are on BITNET, send e-mail to [email protected]. You can search for FTP sites that have Tcl by using the Archie service that indexes the contents of anonymous FTP servers. Information about using Archie can be obtained by sending mail to [email protected] that contains the message Help.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Table of Contents

Preface

Book Organization
The chapters of the book are divided into seven parts. The first part describes basic Tcl features. The first chapter describes the fundamental mechanisms that characterize the Tcl language. This is an important chapter that provides the basic grounding you will need to use Tcl effectively. Even if you have programmed in Tcl already, you should review Chapter 1. Chapter 2 goes over the details of using Tcl and Tk on UNIX, Windows, and Macintosh. Chapter 3 presents a sample application, a CGI script, that illustrates typical Tcl programming. The rest of Part I covers the basic Tcl commands in more detail, including string handling, data types, control flow, procedures, and scoping issues. Part I finishes with a description of the facilities for file I/O and running other programs. Part II describes advanced Tcl programming. It starts with eval, which lets you generate Tcl programs on the fly. Regular expressions provide powerful string processing. If your data-processing application runs slowly, you can probably boost its performance significantly with the regular expression facilities. Namespaces partition the global scope of procedures and variables. Unicode and message catalogs support internationalized applications. Libraries and packages provide a way to organize your code for sharing among projects. The introspection facilities of Tcl tell you about the internal state of Tcl. Event driven I/O helps server applications manage several clients simultaneously. Network sockets are used to implement the HTTP protocol used to fetch pages on the World Wide Web. Safe-Tcl is used to provide a secure environment to execute applets downloaded over the network. TclHttpd is an extensible web server built in Tcl. You can build applications on top of this server, or embed it into your existing applications to give them a web interface. Part III introduces Tk. It gives an overview of the toolkit facilities. A few complete examples are examined in detail to illustrate the features of Tk. Event bindings associate Tcl commands with events like keystrokes and button clicks. Part III ends with three chapters on the Tk geometry managers that provide powerful facilities for organizing your user interface. Part IV describes the Tk widgets. These include buttons, menus, scrollbars, labels, text entries, multiline and multifont text areas, drawing canvases, listboxes, and scales. The Tk widgets are highly configurable and very programmable, but their default behaviors make them easy to use as well. The resource database that can configure widgets provides an easy way to control the overall look of your application. Part V describes the rest of the Tk facilities. These include selections, keyboard focus, and standard dialogs. Fonts, colors, images, and other attributes that are common to the Tk widgets are described in detail. This part ends with a few larger Tk examples.

I am always open to comments about this book. My e-mail address is [email protected]. It helps me sort through my mail if you put the word "book" or the title of the book into the e-mail subject line. Visit my Web site at: http://www.beedub.com/ for current news about the book and my other interests.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Command Substitution
The second form of substitution is command substitution. A nested command is delimited by square brackets, [ ]. The Tcl interpreter takes everything between the brackets and evaluates it as a command. It rewrites the outer command by replacing the square brackets and everything between them with the result of the nested command. This is similar to the use of backquotes in other shells, except that it has the additional advantage of supporting arbitrary nesting of commands. Example 1-3 Command substitution. set len [string length foobar] => 6 In Example 1-3, the nested command is: string length foobar This command returns the length of the string foobar. The string command is described in detail starting on page 45. The nested command runs first. Then, command substitution causes the outer command to be rewritten as if it were: set len 6 If there are several cases of command substitution within a single command, the interpreter processes them from left to right. As each right bracket is encountered, the command it delimits is evaluated. This results in a sensible ordering in which nested commands are evaluated first so that their result can be used in arguments to the outer command.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

Math Expressions
The Tcl interpreter itself does not evaluate math expressions. Tcl just does grouping, substitutions and command invocations. The expr command is used to parse and evaluate math expressions. Example 1-4 Simple arithmetic. expr 7.2 / 4 => 1.8 The math syntax supported by expr is the same as the C expression syntax. The expr command deals with integer, floating point, and boolean values. Logical operations return either 0 (false) or 1 (true). Integer values are promoted to floating point values as needed. Octal values are indicated by a leading zero (e.g., 033 is 27 decimal). Hexadecimal values are indicated by a leading 0x. Scientific notation for floating point numbers is supported. A summary of the operator precedence is given on page 20. You can include variable references and nested commands in math expressions. The following example uses expr to add the value of x to the length of the string foobar. As a result of the innermost command substitution, the expr command sees 6 + 7, and len gets the value 13: Example 1-5 Nested commands. set x 7 set len [expr [string length foobar] + $x] => 13 The expression evaluator supports a number of built-in math functions. (For a complete listing, see page 21.) Example 1-6 computes the value of pi: Example 1-6 Built-in math functions.

set pi [expr 2*asin(1.0)] => 3.1415926535897931 The implementation of expr is careful to preserve accurate numeric values and avoid conversions between numbers and strings. However, you can make expr operate more efficiently by grouping the entire expression in curly braces. The explanation has to do with the byte code compiler that Tcl uses internally, and its effects are explained in more detail on page 15. For now, you should be aware that these expressions are all valid and run a bit faster than the examples shown above: Example 1-7 Grouping expressions with braces. expr {7.2 / 4} set len [expr {[string length foobar] + $x}] set pi [expr {2*asin(1.0)}]

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

Backslash Substitution
The final type of substitution done by the Tcl interpreter is backslash substitution. This is used to quote characters that have special meaning to the interpreter. For example, you can specify a literal dollar sign, brace, or bracket by quoting it with a backslash. As a rule, however, if you find yourself using lots of backslashes, there is probably a simpler way to achieve the effect you are striving for. In particular, the list command described on page 61 will do quoting for you automatically. In Example 1-8 backslash is used to get a literal $: Example 1-8 Quoting special characters with backslash. set dollar \$foo => $foo set x $dollar => $foo Only a single round of interpretation is done.

The second set command in the example illustrates an important property of Tcl. The value of dollar does not affect the substitution performed in the assignment to x. In other words, the Tcl parser does not care about the value of a variable when it does the substitution. In the example, the value of x and dollar is the string $foo. In general, you do not have to worry about the value of variables until you use eval, which is described in Chapter 10. You can also use backslash sequences to specify characters with their Unicode, hexadecimal, or octal value: set escape \u001b

set escape \0x1b set escape \033 The value of variable escape is the ASCII ESC character, which has character code 27. The table on page 20 summarizes backslash substitutions. A common use of backslashes is to continue long commands on multiple lines. This is necessary because a newline terminates a command. The backslash in the next example is required; otherwise the expr command gets terminated by the newline after the plus sign. Example 1-9 Continuing long lines with backslashes. set totalLength [expr [string length $one] + \ [string length $two]] There are two fine points to escaping newlines. First, if you are grouping an argument as described in the next section, then you do not need to escape newlines; the newlines are automatically part of the group and do not terminate the command. Second, a backslash as the last character in a line is converted into a space, and all the white space at the beginning of the next line is replaced by this substitution. In other words, the backslash-newline sequence also consumes all the leading white space on the next line.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

Grouping with Braces and Double Quotes

Double quotes and curly braces are used to group words together into one argument. The difference between double quotes and curly braces is that quotes allow substitutions to occur in the group, while curly braces prevent substitutions. This rule applies to command, variable, and backslash substitutions. Example 1-10 Grouping with double quotes vs. braces. set s Hello => Hello puts stdout "The => The length of puts stdout {The => The length of

length of $s is [string length $s]." Hello is 5. length of $s is [string length $s].} $s is [string length $s].

In the second command of Example 1-10, the Tcl interpreter does variable and command substitution on the second argument to puts. In the third command, substitutions are prevented, so the string is printed as is. In practice, grouping with curly braces is used when substitutions on the argument must be delayed until a later time (or never done at all). Examples include loops, conditional statements, and procedure declarations. Double quotes are useful in simple cases like the puts command previously shown. Another common use of quotes is with the format command. This is similar to the C printf function. The first argument to format is a format specifier that often includes special characters like newlines, tabs, and spaces. The easiest way to specify these characters is with backslash sequences (e.g., \n for newline and \t for tab). The backslashes must be substituted before the format command is called, so you need to use quotes to group the format specifier. puts [format "Item: %s\t%5.3f" $name $value] Here format is used to align a name and a value with a tab. The %s and %5.3f indicate how the

remaining arguments to format are to be formatted. Note that the trailing \n usually found in a C printf call is not needed because puts provides one for us. For more information about the format command, see page 52.

Square Brackets Do Not Group

The square bracket syntax used for command substitution does not provide grouping. Instead, a nested command is considered part of the current group. In the command below, the double quotes group the last argument, and the nested command is just part of that group. puts stdout "The length of $s is [string length $s]." If an argument is made up only of a nested command, you do not need to group it with double-quotes because the Tcl parser treats the whole nested command as part of the group. puts stdout [string length $s] The following is a redundant use of double quotes: puts stdout "[expr $x + $y]"

Grouping before Substitution

The Tcl parser makes a single pass through a command as it makes grouping decisions and performs string substitutions. Grouping decisions are made before substitutions are performed, which is an important property of Tcl. This means that the values being substituted do not affect grouping because the grouping decisions have already been made. The following example demonstrates how nested command substitution affects grouping. A nested command is treated as an unbroken sequence of characters, regardless of its internal structure. It is included with the surrounding group of characters when collecting arguments for the main command. Example 1-11 Embedded command and variable substitution. set x 7; set y 9 puts stdout $x+$y=[expr $x + $y] => 7+9=16 In Example 1-11, the second argument to puts is: $x+$y=[expr $x + $y]

The white space inside the nested command is ignored for the purposes of grouping the argument. By the time Tcl encounters the left bracket, it has already done some variable substitutions to obtain: 7+9= When the left bracket is encountered, the interpreter calls itself recursively to evaluate the nested command. Again, the $x and $y are substituted before calling expr. Finally, the result of expr is substituted for everything from the left bracket to the right bracket. The puts command gets the following as its second argument: 7+9=16 Grouping before substitution.

The point of this example is that the grouping decision about puts's second argument is made before the command substitution is done. Even if the result of the nested command contained spaces or other special characters, they would be ignored for the purposes of grouping the arguments to the outer command. Grouping and variable substitution interact the same as grouping and command substitution. Spaces or special characters in variable values do not affect grouping decisions because these decisions are made before the variable values are substituted. If you want the output to look nicer in the example, with spaces around the + and =, then you must use double quotes to explicitly group the argument to puts: puts stdout "$x + $y = [expr $x + $y]" The double quotes are used for grouping in this case to allow the variable and command substitution on the argument to puts.

Grouping Math Expressions with Braces

It turns out that expr does its own substitutions inside curly braces. This is explained in more detail on page 15. This means you can write commands like the one below and the substitutions on the variables in the expression still occur: puts stdout "$x + $y = [expr {$x + $y}]"

More Substitution Examples

If you have several substitutions with no white space between them, you can avoid grouping with quotes. The following command sets concat to the value of variables a, b, and c all concatenated together: set concat $a$b$c Again, if you want to add spaces, you'll need to use quotes: set concat "$a $b $c" In general, you can place a bracketed command or variable reference anywhere. The following computes a command name: [findCommand $x] arg arg When you use Tk, you often use widget names as command names: $text insert end "Hello, World!"

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

Procedures
Tcl uses the proc command to define procedures. Once defined, a Tcl procedure is used just like any of the other built-in Tcl commands. The basic syntax to define a procedure is: proc name arglist body The first argument is the name of the procedure being defined. The second argument is a list of parameters to the procedure. The third argument is a command body that is one or more Tcl commands. The procedure name is case sensitive, and in fact it can contain any characters. Procedure names and variable names do not conflict with each other. As a convention, this book begins procedure names with uppercase letters and it begins variable names with lowercase letters. Good programming style is important as your Tcl scripts get larger. Tcl coding style is discussed in Chapter 12. Example 1-12 Defining a procedure. proc Diag {a b} { set c [expr sqrt($a * $a + $b * $b)] return $c } puts "The diagonal of a 3, 4 right triangle is [Diag 3 4]" => The diagonal of a 3, 4 right triangle is 5.0 The Diag procedure defined in the example computes the length of the diagonal side of a right triangle given the lengths of the other two sides. The sqrt function is one of many math functions supported by the expr command. The variable c is local to the procedure; it is defined only during execution of Diag. Variable scope is discussed further in Chapter 7. It is not really necessary to use the variable c in this example. The procedure can also be written as:

proc Diag {a b} { return [expr sqrt($a * $a + $b * $b)] } The return command is used to return the result of the procedure. The return command is optional in this example because the Tcl interpreter returns the value of the last command in the body as the value of the procedure. So, the procedure could be reduced to: proc Diag {a b} { expr sqrt($a * $a + $b * $b) } Note the stylized use of curly braces in the example. The curly brace at the end of the first line starts the third argument to proc, which is the command body. In this case, the Tcl interpreter sees the opening left brace, causing it to ignore newline characters and scan the text until a matching right brace is found. Double quotes have the same property. They group characters, including newlines, until another double quote is found. The result of the grouping is that the third argument to proc is a sequence of commands. When they are evaluated later, the embedded newlines will terminate each command. The other crucial effect of the curly braces around the procedure body is to delay any substitutions in the body until the time the procedure is called. For example, the variables a, b, and c are not defined until the procedure is called, so we do not want to do variable substitution at the time Diag is defined. The proc command supports additional features such as having variable numbers of arguments and default values for arguments. These are described in detail in Chapter 7.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

A Factorial Example
To reinforce what we have learned so far, below is a longer example that uses a while loop to compute the factorial function: Example 1-13 A while loop to compute factorial. proc Factorial {x} { set i 1; set product 1 while {$i <= $x} { set product [expr $product * $i] incr i } return $product } Factorial 10 => 3628800 The semicolon is used on the first line to remind you that it is a command terminator just like the newline character. The while loop is used to multiply all the numbers from one up to the value of x. The first argument to while is a boolean expression, and its second argument is a command body to execute. The while/ command and other control structures are described in Chapter 6. The same math expression evaluator used by the expr command is used by while to evaluate the boolean expression. There is no need to explicitly use the expr command in the first argument to while, even if you have a much more complex expression. The loop body and the procedure body are grouped with curly braces in the same way. The opening curly brace must be on the same line as proc and while. If you like to put opening curly braces on the line after a while or if statement, you must escape the newline with a backslash:

while {$i < $x}\ { set product ... } Always group expressions and command bodies with curly braces.

Curly braces around the boolean expression are crucial because they delay variable substitution until the while command implementation tests the expression. The following example is an infinite loop: set i 1; while $i<=10 {incr i} The loop will run indefinitely.[*] The reason is that the Tcl interpreter will substitute for $i before while is called, so while gets a constant expression 1<=10 that will always be true. You can avoid these kinds of errors by adopting a consistent coding style that groups expressions with curly braces:
[*] Ironically,

Tcl 8.0 introduced a byte-code compiler, and the first releases of Tcl 8.0 had a bug in the compiler that caused this loop to terminate! This bug is fixed in the 8.0.5 patch release.

set i 1; while {$i<=10} {incr i} The incr command is used to increment the value of the loop variable i. This is a handy command that saves us from the longer command: set i [expr $i + 1] The incr command can take an additional argument, a positive or negative integer by which to change the value of the variable. Using this form, it is possible to eliminate the loop variable i and just modify the parameter x. The loop body can be written like this: while {$x > 1} { set product [expr $product * $x] incr x -1 } Example 1-14 shows factorial again, this time using a recursive definition. A recursive function is one that calls itself to complete its work. Each recursive call decrements x by one, and when x is one, then the recursion stops.

Example 1-14 A recursive definition of factorial. proc Factorial {x} { if {$x <= 1} { return 1 } else { return [expr $x * [Factorial [expr $x - 1]]] } }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

More about Variables

The set command will return the value of a variable if it is only passed a single argument. It treats that argument as a variable name and returns the current value of the variable. The dollar-sign syntax used to get the value of a variable is really just an easy way to use the set command. Example 1-15 shows a trick you can play by putting the name of one variable into another variable: Example 1-15 Using set to return a variable value. set var {the value of var} => the value of var set name var => var set name => var set $name => the value of var This is a somewhat tricky example. In the last command, $name gets substituted with var. Then, the set command returns the value of var, which is the value of var. Nested set commands provide another way to achieve a level of indirection. The last set command above can be written as follows: set [set name] => the value of var Using a variable to store the name of another variable may seem overly complex. However, there are some times when it is very useful. There is even a special command, upvar, that makes this sort of trick easier. The upvar command is described in detail in Chapter 7.

Funny Variable Names

The Tcl interpreter makes some assumptions about variable names that make it easy to embed variable references into other strings. By default, it assumes that variable names contain only letters, digits, and the underscore. The construct $foo.o represents a concatenation of the value of foo and the literal ".o". If the variable reference is not delimited by punctuation or white space, then you can use curly braces to explicitly delimit the variable name (e.g., ${x}). You can also use this to reference variables with funny characters in their name, although you probably do not want variables named like that. If you find yourself using funny variable names, or computing the names of variables, then you may want to use the upvar command. Example 1-16 Embedded variable references. set foo filename set object $foo.o => filename.o set a AAA set b abc${a}def => abcAAAdef set .o yuk! set x ${.o}y => yuk!y

The unset Command

You can delete a variable with the unset command: unset varName varName2 ... Any number of variable names can be passed to the unset command. However, unset will raise an error if a variable is not already defined.

Using info to Find Out about Variables

The existence of a variable can be tested with the info exists command. For example, because incr requires that a variable exist, you might have to test for the existence of the variable first. Example 1-17 Using info to determine if a variable exists. if {![info exists foobar]} { set foobar 0 } else { incr foobar }

Example 7-6 on page 86 implements a new version of incr which handles this case.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

More about Math Expressions

This section describes a few fine points about math in Tcl scripts. In Tcl 7.6 and earlier versions math is not that efficient because of conversions between strings and numbers. The expr command must convert its arguments from strings to numbers. It then does all its computations with double precision floating point values. The result is formatted into a string that has, by default, 12 significant digits. This number can be changed by setting the tcl_precision variable to the number of significant digits desired. Seventeen digits of precision are enough to ensure that no information is lost when converting back and forth between a string and an IEEE double precision number: Example 1-18 Controlling precision with tcl_precision. expr 1 / 3 => 0 expr 1 / 3.0 => 0.333333333333 set tcl_precision 17 => 17 expr 1 / 3.0 # The trailing 1 is the IEEE rounding digit => 0.33333333333333331 In Tcl 8.0 and later versions, the overhead of conversions is eliminated in most cases by the built-in compiler. Even so, Tcl was not designed to support math-intensive applications. You may want to implement math-intensive code in a compiled language and register the function as a Tcl command as described in Chapter 44. There is support for string comparisons by expr, so you can test string values in if statements. You must use quotes so that expr knows to do string comparisons: if {$answer == "yes"} {... }

However, the string compare and string equal commands described in Chapter 4 are more reliable because expr may do conversions on strings that look like numbers. The issues with string operations and expr are discussed on page 48. Expressions can include variable and command substitutions and still be grouped with curly braces. This is because an argument to expr is subject to two rounds of substitution: one by the Tcl interpreter, and a second by expr itself. Ordinarily this is not a problem because math values do not contain the characters that are special to the Tcl interpreter. The second round of substitutions is needed to support commands like while and if that use the expression evaluator internally. Grouping expressions can make them run more efficiently.

You should always group expressions in curly braces and let expr do command and variable substitutions. Otherwise, your values may suffer extra conversions from numbers to strings and back to numbers. Not only is this process slow, but the conversions can loose precision in certain circumstances. For example, suppose x is computed from a math function: set x [expr {sqrt(2.0)}] At this point the value of x is a double-precision floating point value, just as you would expect. If you do this: set two [expr $x * $x] then you may or may not get 2.0 as the result! This is because Tcl will substitute $x and expr will concatenate all its arguments into one string, and then parse the expression again. In contrast, if you do this: set two [expr {$x * $x}] then expr will do the substitutions, and it will be careful to preserve the floating point value of x. The expression will be more accurate and run more efficiently because no string conversions will be done. The story behind Tcl values is described in more detail in Chapter 44 on C programming and Tcl.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

Comments
Tcl uses the pound character, #, for comments. Unlike in many other languages, the # must occur at the beginning of a command. A # that occurs elsewhere is not treated specially. An easy trick to append a comment to the end of a command is to precede the # with a semicolon to terminate the previous command: # Here are some parameters set rate 7.0 ;# The interest rate set months 60 ;# The loan term One subtle effect to watch for is that a backslash effectively continues a comment line onto the next line of the script. In addition, a semicolon inside a comment is not significant. Only a newline terminates comments: # Here is the start of a Tcl comment \ and some more of it; still in the comment The behavior of a backslash in comments is pretty obscure, but it can be exploited as shown in Example 2-3 on page 27. A surprising property of Tcl comments is that curly braces inside comments are still counted for the purposes of finding matching brackets. I think the motivation for this mis-feature was to keep the original Tcl parser simpler. However, it means that the following will not work as expected to comment out an alternate version of an if expression: # if {boolean expression1} { if {boolean expression2} { some commands }

The previous sequence results in an extra left curly brace, and probably a complaint about a missing close brace at the end of your script! A technique I use to comment out large chunks of code is to put the code inside an if block that will never execute: if {0} { unused code here }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

Substitution and Grouping Summary

The following rules summarize the fundamental mechanisms of grouping and substitution that are performed by the Tcl interpreter before it invokes a command: Command arguments are separated by white space, unless arguments are grouped with curly braces or double quotes as described below. Grouping with curly braces, { }, prevents substitutions. Braces nest. The interpreter includes all characters between the matching left and right brace in the group, including newlines, semicolons, and nested braces. The enclosing (i.e., outermost) braces are not included in the group's value. Grouping with double quotes, " ", allows substitutions. The interpreter groups everything until another double quote is found, including newlines and semicolons. The enclosing quotes are not included in the group of characters. A double-quote character can be included in the group by quoting it with a backslash, (e.g., \"). Grouping decisions are made before substitutions are performed, which means that the values of variables or command results do not affect grouping. A dollar sign, $, causes variable substitution. Variable names can be any length, and case is significant. If variable references are embedded into other strings, or if they include characters other than letters, digits, and the underscore, they can be distinguished with the ${varname} syntax. Square brackets, [ ], cause command substitution. Everything between the brackets is treated as a command, and everything including the brackets is replaced with the result of the command. Nesting is allowed. The backslash character, \, is used to quote special characters. You can think of this as another form of substitution in which the backslash and the next character or group of characters are replaced with a new character. Substitutions can occur anywhere unless prevented by curly brace grouping. Part of a group can be a constant string, and other parts of it can be the result of substitutions. Even the command name can be affected by substitutions.

A single round of substitutions is performed before command invocation. The result of a substitution is not interpreted a second time. This rule is important if you have a variable value or a command result that contains special characters such as spaces, dollar signs, square brackets, or braces. Because only a single round of substitution is done, you do not have to worry about special characters in values causing extra substitutions.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

Fine Points
A common error is to forget a space between arguments when grouping with braces or quotes. This is because white space is used as the separator, while the braces or quotes only provide grouping. If you forget the space, you will get syntax errors about unexpected characters after the closing brace or quote. The following is an error because of the missing space between } and {: if {$x > 1} {puts "x = $x"} A double quote is only used for grouping when it comes after white space. This means you can include a double quote in the middle of a group without quoting it with a backslash. This requires that curly braces or white space delimit the group. I do not recommend using this obscure feature, but this is what it looks like: set silly a"b When double quotes are used for grouping, the special effect of curly braces is turned off. Substitutions occur everywhere inside a group formed with double quotes. In the next command, the variables are still substituted: set x xvalue set y "foo {$x}bar" => foo {xvalue}bar When double quotes are used for grouping and a nested command is encountered, the nested command can use double quotes for grouping, too. puts "results [format "%f %f" $x $y]" Spaces are not required around the square brackets used for command substitution. For the purposes of grouping, the interpreter considers everything between the square brackets as part of

the current group. The following sets x to the concatenation of two command results because there is no space between ] and [. set x [cmd1][cmd2] Newlines and semicolons are ignored when grouping with braces or double quotes. They get included in the group of characters just like all the others. The following sets x to a string that contains newlines: set x "This is line one. This is line two. This is line three." During command substitution, newlines and semicolons are significant as command terminators. If you have a long command that is nested in square brackets, put a backslash before the newline if you want to continue the command on another line. This was illustrated in Example 1-9 on page 8. A dollar sign followed by something other than a letter, digit, underscore, or left parenthesis is treated as a literal dollar sign. The following sets x to the single character $. set x $

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 1. Tcl Fundamentals

Reference
Backslash Sequences

Table 1-1. Backslash sequences.

\a \b \f \n \r \t \v \<newline> \\ \ooo \xhh \uhhhh \c

Bell. (0x7) Backspace. (0x8) Form feed. (0xc) Newline. (0xa) Carriage return. (0xd) Tab. (0x9) Vertical tab. (0xb) Replace the newline and the leading white space on the next line with a space. Backslash. ('\') Octal specification of character code. 1, 2, or 3 digits. Hexadecimal specification of character code. 1 or 2 digits. Hexadecimal specification of a 16-bit Unicode character value. 4 hex digits. Replaced with literal c if c is not one of the cases listed above. In particular, \$, \", \{, \} , \] , and \[ are used to obtain these characters.

Arithmetic Operators

Table 1-2. Arithmetic operators from highest to lowest precedence.

- ~ ! * / % + << >> < > <= >= == != & ^ | && || x?y:z

Unary minus, bitwise NOT, logical NOT. Multiply, divide, remainder. Add, subtract. Left shift, right shift. Comparison: less, greater, less or equal, greater or equal. Equal, not equal. Bitwise AND. Bitwise XOR. Bitwise OR. Logical AND. Logical OR. If x then y else z.

Built-in Math Functions

Table 1-3. Built-in math functions.

acos(x) asin(x) atan(x) atan2(y,x) ceil(x) cos(x) cosh(x) exp(x) floor(x) fmod(x,y) hypot(x,y) log(x) log10(x) pow(x,y)

Arc cosine of x. Arc sine of x. Arc tangent of x. Rectangular (x,y) to polar (r,th). atan2 gives th. Least integral value greater than or equal to x. Cosine of x. Hyperbolic cosine of x. Exponential, ex. Greatest integral value less than or equal to x. Floating point remainder of x/y. Returns sqrt(x*x + y*y). r part of polar coordinates. Natural log of x. Log base 10 of x.
x

to the y power, xy.

sin(x) sinh(x) sqrt(x) tan(x) tanh(x) abs(x) double(x) int(x) round(x) rand() srand(x)

Sine of x. Hyperbolic sine of x. Square root of x. Tangent of x. Hyperbolic tangent of x. Absolute value of x. Promote x to floating point. Truncate x to an integer. Round x to an integer. Return a random floating point value between 0.0 and 1.0. Set the seed for the random number generator to the integer x.

Core Tcl Commands

The pages listed in Table 1-4 give the primary references for the command.

Table 1-4. Built-in Tcl commands.

Command after append array binary break catch cd clock close concat console continue error eof

Pg. Description 218 Schedule a Tcl command for later execution. 51 Append arguments to a variable's value. No spaces added. 91 Query array state and search through elements. 54 Convert between strings and binary data. 77 Exit loop prematurely. 77 Trap errors. 115 Change working directory. 173 Get the time and format date strings. 115 Close an open I/O stream. 61 Concatenate arguments with spaces between. Splices lists. 28 Control the console used to enter commands interactively. 77 Continue with next loop iteration. 79 Raise an error. 109 Check for end of file.

Command eval exec exit expr fblocked fconfigure fcopy file fileevent flush for foreach format gets glob global history if incr info interp join lappend lindex linsert list llength load lrange lreplace lsearch

Pg. Description 122 Concatenate arguments and evaluate them as a command. 99 Fork and execute a UNIX program. 116 Terminate the process. 6 Evaluate a math expression. 223 Poll an I/O channel to see if data is ready. 221 Set and query I/O channel properties. 239 Copy from one I/O channel to another. 102 Query the file system. 219 Register callback for event-driven I/O. 109 Flush output from an I/O stream's internal buffers. 76 Loop construct similar to C for statement. 73 Loop construct over a list, or lists, of values. 52 Format a string similar to C sprintf. 112 Read a line of input from an I/O stream. 115 Expand a pattern to matching file names. 84 Declare global variables. 185 Use command-line history. 70 Conditional command. Allows else and elseif clauses. 12 Increment a variable by an integer amount. 176 Query the state of the Tcl interpreter. 276 Create additional Tcl interpreters. 65 Concatenate list elements with a given separator string. 61 Add elements to the end of a list. 63 Fetch an element of a list. 64 Insert elements into a list. 61 Create a list out of the arguments. 63 Return the number of elements in a list. 609 Load shared libraries that define Tcl commands. 63 Return a range of list elements. 64 Replace elements of a list. 64 Search for an element of a list that matches a pattern.

Command lsort namespace open package pid proc puts pwd read regexp regsub rename return scan seek set socket source split string subst switch tell time trace unknown unset uplevel upvar variable vwait

Pg. Description 65 Sort a list. 203 Create and manipulate namespaces. 110 Open a file or process pipeline for I/O. 165 Provide or require code packages. 116 Return the process ID. 81 Define a Tcl procedure. 112 Output a string to an I/O stream. 115 Return the current working directory. 113 Read blocks of characters from an I/O stream. 148 Match regular expressions. 152 Substitute based on regular expressions. 82 Change the name of a Tcl command. 80 Return a value from a procedure. 54 Parse a string according to a format specification. 114 Set the seek offset of an I/O stream. 5 Assign a value to a variable. 228 Open a TCP/IP network connection. 26 Evaluate the Tcl commands in a file. 65 Chop a string up into list elements. 45 Operate on strings. 132 Substitute embedded commands and variable references. 71 Multi-way branch. 114 Return the current seek offset of an I/O stream. 191 Measure the execution time of a command. 183 Monitor variable assignments. 167 Handle unknown commands. 13 Delete variables. 130 Execute a command in a different scope. 85 Reference a variable in a different scope. 197 Declare namespace variables. 220 Wait for a variable to be modified.

Command while

On UNIX you can create a stand alone Tcl or Tcl/Tk script much like an sh or csh script. The trick is in the first line of the file that contains your script. If the first line of a file begins with #!pathname, then UNIX uses pathname as the interpreter for the rest of the script. The "Hello, World!" program from Chapter 1 is repeated in Example 2-1 with the special starting line: Example 2-1 A standalone Tcl script on UNIX. #!/usr/local/bin/tclsh puts stdout {Hello, World!} Similarly, the Tk hello world program from Chapter 21 is shown in Example 2-2: Example 2-2 A standalone Tk script on UNIX. #!/usr/local/bin/wish button .hello -text Hello -command {puts "Hello, World!"} pack .hello -padx 10 -pady 10 The actual pathnames for tclsh and wish may be different on your system. If you type the pathname for the interpreter wrong, you receive a confusing "command not found" error. You can find out the complete pathname of the Tcl interpreter with the info nameofexecutable command. This is what appears on my system: info nameofexecutable => /home/welch/install/solaris/bin/tclsh8.2

Watch out for long pathnames.

On most UNIX systems, this special first line is limited to 32 characters, including the #!. If the pathname is too long, you may end up with /bin/sh trying to interpret your script, giving you syntax errors. You might try using a symbolic link from a short name to the true, long name of the interpreter. However, watch out for systems like Solaris in which the script interpreter cannot be a symbolic link. Fortunately, Solaris doesn't impose a 32-character limit on the pathname, so you can just use a long pathname. The next example shows a trick that works around the pathname length limitation in all cases. The trick comes from a posting to comp.lang.tcl by Kevin Kenny. It takes advantage of a difference between comments in Tcl and the Bourne shell. Tcl comments are described on page 16. In the example, the Bourne shell command that runs the Tcl interpreter is hidden in a comment as far as Tcl is concerned, but it is visible to /bin/sh: Example 2-3 Using /bin/sh to run a Tcl script. #!/bin/sh # The backslash makes the next line a comment in Tcl \ exec /some/very/long/path/to/wish "$0" ${1+"$@"} # ... Tcl script goes here ... You do not even have to know the complete pathname of tclsh or wish to use this trick. You can just do the following: #!/bin/sh # Run wish from the users PATH \ exec wish -f "$0" ${1+"$@"} The drawback of an incomplete pathname is that many sites have different versions of wish and tclsh that correspond to different versions of Tcl and Tk. In addition, some users may not have these programs in their PATH. If you have Tk version 3.6 or earlier, its version of wish requires a -f argument to make it read the contents of a file. The -f switch is ignored in Tk 4.0 and higher versions. The -f, if required, is also counted in the 32-character limit on #! lines. #!/usr/local/bin/wish -f

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 2. Getting Started

Windows 95 Start Menu

You can add your Tcl/Tk programs to the Windows start menu. The command is the complete name of the wish.exe program and the name of the script. The trick is that the name of wish.exe has a space in it in the default configuration, so you must use quotes. Your start command will look something like this: "c:\Program Files\TCL82\wish.exe" "c:\My Files\script.tcl" This starts c:\My Files\script.tcl as a stand alone Tcl/Tk program.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 2. Getting Started

The Macintosh and ResEdit

If you want to create a self-contained Tcl/Tk application on Macintosh, you must copy the Wish program and add a Macintosh resource named tclshrc that has the start-up Tcl code. The Tcl code can be a single source command that reads your script file. Here are step-by-step instructions to create the resource using ResEdit:

First, make a copy of Wish and open the copy in ResEdit. Pull down the Resource menu and select Create New Resource operation to make a new TEXT resource. ResEdit opens a window and you can type in text. Type in a source command that names your script: source "Hard Disk:Tcl/Tk 8.1:Applications:MyScript.tcl" Set the name of the resource to be tclshrc. You do this through the Get Resource Info dialog under the Resources menu in ResEdit. This sequence of commands is captured in an application called "Drag n Drop Tclets", which comes with the Macintosh Tcl distribution. If you drag a Tcl script onto this icon, it will create a copy of Wish and create the tclshrc text resource that has a source command that will load that script. If you have a Macintosh development environment, you can build a version of Wish that has additional resources built right in. You add the resources to the applicationInit.r file. If a resource contains Tcl code, you use it like this: source -rcrc resource If you don't want to edit resources, you can just use the Wish Source menu to select a script to run.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 2. Getting Started

The console Command

The Windows and Macintosh platforms have a built-in console that is used to enter Tcl commands interactively. You can control this console with the console command. The console is visible by default. Hide the console like this: console hide Display the console like this: console show The console is implemented by a second Tcl interpreter. You can evaluate Tcl commands in that interpreter with: console eval command There is an alternate version of this console called TkCon. It is included on the CD-ROM, and you can find current versions on the Internet. TkCon was created by Jeff Hobbs and has lots of nice features. You can use TkCon on Unix systems, too.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 2. Getting Started

Command-Line Arguments
If you run a script from the command line, for example from a UNIX shell, you can pass the script command-line arguments. You can also specify these arguments in the shortcut command in Windows. For example, under UNIX you can type this at a shell: % myscript.tcl arg1 arg2 arg3 In Windows, you can have a shortcut that runs wish on your script and also passes additional arguments: "c:\Program Files\TCL82\wish.exe" c:\your\script.tcl arg1 The Tcl shells pass the command-line arguments to the script as the value of the argv variable. The number of command-line arguments is given by the argc variable. The name of the program, or script, is not part of argv nor is it counted by argc. Instead, it is put into the argv0 variable. Table 2-2 lists all the predefined variables in the Tcl shells. argv is a list, so you can use the lindex command, which is described on page 59, to extract items from it: set arg1 [lindex $argv 0] The following script prints its arguments (foreach is described on page 73): Example 2-4 The EchoArgs script. # Tcl script to echo command line arguments puts "Program: $argv0" puts "Number of arguments: $argc" set i 0

foreach arg $argv { puts "Arg $i: $arg" incr i }

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 3. The Guestbook CGI Application

A Quick Introduction to HTML

Web pages are written in a text markup language called HTML (HyperText Markup Language). The idea of HTML is that you annotate, or mark up, regular text with special tags that indicate structure and formatting. For example, the title of a Web page is defined like this: <TITLE>My Home Page</TITLE> The tags provide general formatting guidelines, but the browsers that display HTML pages have freedom in how they display things. This keeps the markup simple. The general syntax for HTML tags is: <tag parameters>normal text</tag> As shown here, the tags usually come in pairs. The open tag may have some parameters, and the close tag name begins with a slash. The case of a tag is not considered, so <title>, <Title>, and <TITLE> are all valid and mean the same thing. The corresponding close tag could be </title>, </Title>, </TITLE>, or even </TiTlE>. The <A> tag defines hypertext links that reference other pages on the Web. The hypertext links connect pages into a Web so that you can move from page to page to page and find related information. It is the flexibility of the links that make the Web so interesting. The <A> tag takes an HREF parameter that defines the destination of the link. If you wanted to link to my home page, you would put this in your page: <A HREF="http://www.beedub.com/">Brent Welch</A> When this construct appears in a Web page, your browser typically displays "Brent Welch" in blue underlined text. When you click on that text, your browser switches to the page at the address "http://www.beedub.com/". There is a lot more to HTML, of course, but this should give you a basic idea of what is going on in the examples. The following list summarizes the HTML tags that will be

used in the examples:

Table 3-1. HTML tags used in the examples.

HTML HEAD TITLE BODY H1 - H6 P BR B I A IMG DL DT DD UL LI TABLE TR TD FORM INPUT TEXTAREA

Main tag that surrounds the whole document. Delimits head section of the HTML document. Defines the title of the page. Delimits the body section. Lets you specify page colors. HTML defines 6 heading levels: H1, H2, H3, H4, H5, H6. Start a new paragraph. One blank line. Bold text. Italic text. Used for hypertext links. Specify an image. Definition list. Term clause in a definition list. Definition clause in a definition list. An unordered list. A bulleted item within a list. Create a table. A table row. A cell within a table row. Defines a data entry form. A one-line entry field, checkbox, radio button, or submit button. A multiline text field.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 3. The Guestbook CGI Application

CGI for Dynamic Pages

There are two classes of pages on the Web, static and dynamic. A static page is written and stored on a Web server, and the same thing is returned each time a user views the page. This is the easy way to think about Web pages. You have some information to share, so you compose a page and tinker with the HTML tags to get the information to look good. If you have a home page, it is probably in this class. In contrast, a dynamic page is computed each time it is viewed. This is how pages that give up-to-theminute stock prices work, for example. A dynamic page does not mean it includes animations; it just means that a program computes the page contents when a user visits the page. The advantage of this approach is that a user might see something different each time he or she visits the page. As we shall see, it is also easier to maintain information in a database of some sort and generate the HTML formatting for the data with a program. A CGI (Common Gateway Interface) program is used to compute Web pages. The CGI standard defines how inputs are passed to the program as well as a way to identify different types of results, such as images, plain text, or HTML markup. A CGI program simply writes the contents of the document to its standard output, and the Web server takes care of delivering the document to the user's Web browser. The following is a very simple CGI script: Example 3-1 A simple CGI script. puts puts puts puts "Content-Type: text/html" "" "<TITLE>The Current Time</TITLE>" "The time is <B>[clock format [clock seconds]]</B>"

The program computes a simple HTML page that has the current time. Each time a user visits the page they will see the current time on the server. The server that has the CGI program and the user viewing the page might be on different sides of the planet. The output of the program starts with a ContentType line that tells your Web browser what kind of data comes next. This is followed by a blank line and then the contents of the page.

The clock command is used twice: once to get the current time in seconds, and a second time to format the time into a nice looking string. The clock command is described in detail on page 173. Fortunately, there is no conflict between the markup syntax used by HTML and the Tcl syntax for embedded commands, so we can mix the two in the argument to the puts command. Double quotes are used to group the argument to puts so that the clock commands will be executed. When run, the output of the program will look like this: Example 3-2 Output of Example 3-1. Content-Type: text/html <TITLE>The Current Time</TITLE> The time is <B>Wed Oct 16 11:23:43 1996</B>

This example is a bit sloppy in its use of HTML, but it should display properly in most Web browsers. Example 3-3 includes all the required tags for a proper HTML document.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 3. The Guestbook CGI Application

The guestbook.cgi Script

The guestbook.cgi script computes a page that lists all the registered guests. The example is shown first, and then each part of it is discussed in more detail later. One thing to note right away is that the HTML tags are generated by procedures that hide the details of the HTML syntax. The first lines of the script use the UNIX trick to have tclsh interpret the script. This trick is described on page 26: Example 3-3 The guestbook.cgi script. #!/bin/sh # guestbook.cgi # Implement a simple guestbook page. # The set of visitors is kept in a simple database. # The newguest.cgi script will update the database. # \ exec tclsh "$0" ${1+"$@"} # The cgilib.tcl file has helper procedures # The guestbook.data file has the database # Both file are in the same directory as the script set dir [file dirname [info script]] source [file join $dir cgilib.tcl] set datafile [file join $dir guestbook.data] Cgi_Header "Brent's Guestbook" {BGCOLOR=white TEXT=black} P if {![file exists $datafile]} { puts "No registered guests, yet." P puts "Be the first [Link {registered guest!}newguest.html]" } else { puts "The following folks have registered in my GuestBook." P puts [Link Register newguest.html]

H2 Guests catch {source $datafile} foreach name [lsort [array names Guestbook]] { set item $Guestbook($name) set homepage [lindex $item 0] set markup [lindex $item 1] H3 [Link $name $homepage] puts $markup } } Cgi_End

Using a Script Library File

The script uses a number of Tcl procedures that make working with HTML and the CGI interface easier. These procedures are kept in the cgilib.tcl file, which is kept in the same directory as the main script. The script starts by sourcing the cgilib.tcl file so that these procedures are available. The following command determines the location of the cgilib.tcl file based on the location of the main script. The info script command returns the file name of the script. The file dirname and file join commands manipulate file names in a platform-independent way. They are described on page 102. I use this trick to avoid putting absolute file names into my scripts, which would have to be changed if the program moves later: set dir [file dirname [info script]] source [file join $dir cgilib.tcl]

Beginning the HTML Page

The following command generates the standard information that comes at the beginning of an HTML page: Cgi_Header {Brent's GuestBook} {bgcolor=white text=black} The Cgi_Header is shown in Example 3-4: Example 3-4 The Cgi_Header procedure. proc Cgi_Header {title {bodyparams {}}} { puts stdout \ "Content-Type: text/html <HTML> <HEAD> <TITLE>$title</TITLE>

</HEAD> <BODY $bodyparams> <H1>$title</H1>" } The Cgi_Header procedure takes as arguments the title for the page and some optional parameters for the HTML <Body> tag. The guestbook.cgi script specifies black text on a white background to avoid the standard gray background of most browsers. The procedure definition uses the syntax for an optional parameter, so you do not have to pass bodyparams to Cgi_Header. Default values for procedure parameters are described on page 81. The Cgi_Header procedure just contains a single puts command that generates the standard boilerplate that appears at the beginning of the output. Note that several lines are grouped together with double quotes. Double quotes are used so that the variable references mixed into the HTML are substituted properly. The output begins with the CGI content-type information, a blank line, and then the HTML. The HTML is divided into a head and a body part. The <TITLE> tag goes in the head section of an HTML document. Finally, browsers display the title in a different place than the rest of the page, so I always want to repeat the title as a level-one heading (i.e., H1) in the body of the page.

Simple Tags and Hypertext Links

The next thing the program does is to see whether there are any registered guests or not. The file command, which is described in detail on page 102, is used to see whether there is any data: if {![file exists $datafile]} { If the database file does not exist, a different page is displayed to encourage a registration. The page includes a hypertext link to a registration page. The newguest.html page will be described in more detail later: puts "No registered guests, yet." P puts "Be the first [Link {registered guest!}newguest.html]" The P command generates the HTML for a paragraph break. This trivial procedure saves us a few keystrokes: proc P {} { puts <P> } The Link command formats and returns the HTML for a hypertext link. Instead of printing the HTML

directly, it is returned, so you can include it in-line with other text you are printing: Example 3-5 The Link command formats a hypertext link. proc Link {text url} { return "<A HREF=\"$url\">$text</A>" } The output of the program would be as below if there were no data: Example 3-6 Initial output of guestbook.cgi. Content-Type: text/html <HTML> <HEAD> <TITLE>Brent's Guestbook</TITLE> </HEAD> <BODY BGCOLOR=white TEXT=black> <H1>Brent's Guestbook</H1> <P> No registered guests. <P> Be the first <A HREF="newguest.html">registered guest!</A> </BODY> </HTML> If the database file exists, then the real work begins. We first generate a link to the registration page, and a level-two header to separate that from the guest list: puts [Link Register newguest.html] H2 Guests The H2 procedure handles the detail of including the matching close tag: proc H2 {string} { puts "<H2>$string</H2>" }

Using a Tcl Array for the Database

The datafile contains Tcl commands that define an array that holds the guestbook data. If this file is

kept in the same directory as the guestbook.cgi script, then you can compute its name: set dir [file dirname [info script]] set datafile [file join $dir guestbook.data] By using Tcl commands to represent the data, we can load the data with the source command. The catch command is used to protect the script from a bad data file, which will show up as an error from the source command. Catching errors is described in detail on page 79: catch {source $datafile} The Guestbook variable is the array defined in guestbook.data. Array variables are the topic of Chapter 8. Each element of the array is defined with a Tcl command that looks like this: set Guestbook(key) {url markup} The person's name is the array index, or key. The value of the array element is a Tcl list with two elements: their URL and some additional HTML markup that they can include in the guestbook. Tcl lists are the topic of Chapter 5. The following example shows what the command looks like with real data: set {Guestbook(Brent Welch)} { http://www.beedub.com/ {<img src=http://www.beedub.com/welch.gif>} } The spaces in the name result in additional braces to group the whole variable name and each list element. This syntax is explained on page 90. Do not worry about it now. We will see on page 42 that all the braces in the previous statement are generated automatically. The main point is that the person's name is the key, and the value is a list with two elements. The array names command returns all the indices, or keys, in the array, and the lsort command sorts these alphabetically. The foreach command loops over the sorted list, setting the loop variable x to each key in turn: foreach name [lsort [array names Guestbook]] { Given the key, we get the value like this: set item $Guestbook($name)

The two list elements are extracted with lindex, which is described on page 63. set homepage [lindex $item 0] set markup [lindex $item 1] We generate the HTML for the guestbook entry as a level-three header that contains a hypertext link to the guest's home page. We follow the link with any HTML markup text that the guest has supplied to embellish his or her entry. The H3 procedure is similar to the H2 procedure already shown, except it generates <H3> tags: H3 [Link $name $homepage] puts $markup

Sample Output
The last thing the script does is call Cgi_End to output the proper closing tags. Example 3-7 shows the output of the guestbook.cgi script: Example 3-7 Output of guestbook.cgi. Content-Type: text/html <HTML> <HEAD> <TITLE>Brent's Guestbook</TITLE> </HEAD> <BODY BGCOLOR=white TEXT=black> <H1>Brent's Guestbook</H1> <P> The following folks have registered in my guestbook. <P> <A HREF="newguest.html">Register</A> <H2>Guests</H2> <H3><A HREF="http://www.beedub.com/">Brent Welch</A></H3> <IMG SRC="http://www.beedub.com/welch.gif"> </BODY> </HTML>

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch Table of Contents Chapter 3. The Guestbook CGI Application

Defining Forms and Processing Form Data

The guestbook.cgi script only generates output. The other half of CGI deals with input from the user. Input is more complex for two reasons. First, we have to define another HTML page that has a form for the user to fill out. Second, the data from the form is organized and encoded in a standard form that must be decoded by the script. Example 3-8 on page 40 defines a very simple form, and the procedure that decodes the form data is shown in Example 11-6 on page 155. The guestbook page contains a link to newguest.html . This page contains a form that lets a user register his or her name, home page URL, and some additional HTML markup. The form has a submit button. When a user clicks that button in their browser, the information from the form is passed to the newguest.cgi script. This script updates the database and computes another page for the user that acknowledges the user's contribution.

The newguest.html Form

An HTML form contains tags that define data entry fields, buttons, checkboxes, and other elements that let the user specify values. For example, a one-line entry field that is used to enter the home page URL is defined like this: <INPUT TYPE=text NAME=url> The INPUT tag is used to define several kinds of input elements, and its type parameter indicates what kind. In this case, TYPE=text creates a one-line text entry field. The submit button is defined with an INPUT tag that has TYPE=submit , and the VALUE parameter becomes the text that appears on the button:

<INPUT TYPE=submit NAME=submit VALUE=Register> A general type-in window is defined with the TEXTAREA tag. This creates a multiline, scrolling text field that is useful for specifying lots of information, such as a free-form comment. In our case we will let guests type in HTML that will appear with their guestbook entry. The text between the open and close TEXTAREA tags is inserted into the type-in window when the page is first displayed. <TEXTAREA NAME=markup ROWS=10 COLS=50>Hello.</TEXTAREA> A common parameter to the form tags is NAME= something . This name identifies the data that will come back from the form. The tags also have parameters that affect their display, such as the label on the submit button and the size of the text area. Those details are not important for our example. The complete form is shown in Example 3-8 : Example 3-8 The newguest.html form.

<!Doctype HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML> <HEAD> <TITLE>Register in my Guestbook</TITLE>  <META HTTP-Equiv=Editor Content="SunLabs WebTk 1.0beta 10/11/96"> </HEAD> <BODY> <FORM ACTION="newguest.cgi" METHOD="POST">

<H1>Register in my Guestbook</H1> <UL> <LI>Name <INPUT TYPE="text" NAME="name" SIZE="40"> <LI>URL <INPUT TYPE="text" NAME="url" SIZE="40"> <P> If you don't have a home page, you can use an email URL like "mailto:[email protected] <LI>Additional HTML to include after your link: <BR> <TEXTAREA NAME="html" COLS="60" ROWS="15"> </TEXTAREA> <LI><INPUT TYPE="submit" NAME="new" VALUE="Add me to your guestbook"> <LI><INPUT TYPE="submit" NAME="update" VALUE="Update my guestbook entry"> </UL>

</FORM> </BODY> </HTML>

The newguest.cgi Script

When the user clicks the Submit button in their browser, the data from the form is passed to the program identified by the Action parameter of the form tag. That program takes the data, does something useful with it, and then returns a new page for the browser to display. In our case the FORM tag names newguest.cgi as the program to handle the data: <FORM ACTION=newguest.cgi METHOD=POST> The CGI specification defines how the data from the form is passed to the program. The data is encoded and organized so that the program can figure out the values the user specified for each form element. The encoding is handled rather nicely with some regular expression tricks that are done in Cgi_Parse . Cgi_Parse saves the form data, and Cgi_Value gets a form value in the script. These procedures are described in Example 11-6 on page 155. Example 3-9 starts out by calling Cgi_Parse : Example 3-9 The newguest.cgi script.

#!/bin/sh # \ exec tclsh "$0" ${1+"$@"} # source cgilib.tcl from the same directory as newguest.cgi set dir [file dirname [info script]] source [file join $dir cgilib.tcl] set datafile [file join $dir guestbook.data] Cgi_Parse # Open the datafile in append mode if [catch {open $datafile a}out] { Cgi_Header "Guestbook Registration Error" \ {BGCOLOR=black TEXT=red} P puts "Cannot open the data file" P

puts $out;# the error message exit 0 } # Append a Tcl set command that defines the guest's entry puts $out "" puts $out [list set Guestbook([Cgi_Value name]) \ [list [Cgi_Value url] [Cgi_Value html]]] close $out # Return a page to the browser Cgi_Header "Guestbook Registration Confirmed" \ {BGCOLOR=white TEXT=black} puts " <DL> <DT>Name <DD>[Cgi_Value name] <DT>URL <DD>[Link [Cgi_Value url] [Cgi_Value url]] </DL> [Cgi_Value html] " Cgi_End The main idea of the newguest.cgi script is that it saves the data to a file as a Tcl command that defines an element of the Guestbook array. This lets the guestbook.cgi script simply load the data by using the Tcl source command. This trick of storing data as a Tcl script saves us from the chore of defining a new file format and writing code to parse it. Instead, we can rely on the well-tuned Tcl implementation to do the hard work for us efficiently. The script opens the datafile in append mode so that it can add a new record to the end. Opening files is described in detail on page 110. The script uses a catch command to guard against errors. If an error occurs, a page explaining the error is returned to the user. Working with files is one of the most common sources of errors (permission denied, disk full, file-not-found, and so on), so I always open the file inside a catch statement: if [catch {open $datafile a} out] { # an error occurred } else { # open was ok

} In this command, the variable out gets the result of the open command, which is either a file descriptor or an error message. This style of using catch is described in detail in Example 6-14 on page 77. The script writes the data as a Tcl set command. The list command is used to format the data properly: puts $out [list set Guestbook([Cgi_Value name]) \ [list [Cgi_Value url] [Cgi_Value html]]] There are two lists. First the url and html values are formatted into one list. This list will be the value of the array element. Then, the whole Tcl command is formed as a list. In simplified form, the command is generated from this: list set variable value Using the list command ensures that the result will always be a valid Tcl command that sets the variable to the given value. The list command is described in more detail on page 61.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 3. The Guestbook CGI Application

The cgi.tcl Package

The cgilib.tcl file included with this book just barely scratches the surface of things you might like to do in a CGI script. Don Libes has created a comprehensive package for CGI scripts known as cgi.tcl . You can find it on the web at http://expect.nist.gov/cgi.tcl/ One of Don's goals in cgi.tcl was to eliminate the need to directly write any HTML markup at all. Instead, he has defined a whole suite of Tcl commands similar to the P and H2 procedures shown in this chapter that automatically emit the matching close tags. He also has support procedures to deal with browser cookies, page redirects, and other CGI features.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 3. The Guestbook CGI Application

Next Steps
There are a number of details that can be added to this example. A user may want to update their entry, for example. They could do that now, but they would have to retype everything. They might also like a chance to check the results of their registration and make changes before committing them. This requires another page that displays their guest entry as it would appear on a page, and also has the fields that let them update the data. The details of how a CGI script is hooked up with a Web server vary from server to server. You should ask your local Webmaster for help if you want to try this out on your local web site. The Tcl Web Server comes with this guestbook example already set up, plus it has a number of other very interesting ways to generate pages. My own taste in web page generation has shifted from CGI to a template-based approach supported by the Tcl Web Server. This is the topic of Chapter 18. The next few chapters describe basic Tcl commands and data structures. We return to this example in Chapter 11 on regular expressions.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Part I. Tcl Basics

Chapter 4. String Processing in Tcl

This chapter describes string manipulation and simple pattern matching. Tcl commands described are: string, append, format, scan, and binary. The string command is a collection of several useful string manipulation operations. Strings are the basic data item in Tcl, so it should not be surprising that there are a large number of commands to manipulate strings. A closely related topic is pattern matching, in which string comparisons are made more powerful by matching a string against a pattern. This chapter describes a simple pattern matching mechanism that is similar to that used in many other shell languages. Chapter 11 describes a more complex and powerful regular expression pattern matching mechanism.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 4. String Processing in Tcl

The string Command

The string command is really a collection of operations you can perform on strings. The following example calculates the length of the value of a variable. set name "Brent Welch" string length $name => 11 The first argument to string determines the operation. You can ask string for valid operations by giving it a bad one: string junk => bad option "junk": should be bytelength, compare, equal, first, index, is, last, length, map, match, range, repeat, replace, tolower, totitle, toupper, trim, trimleft, trimright, wordend, or wordstart This trick of feeding a Tcl command bad arguments to find out its usage is common across many commands. Table 4-1 summarizes the string command

Table 4-1. The string command.

string bytelength str string compare ?nocase? ?-length len? str1 str2

Returns the number of bytes used to store a string, which may be different from the character length returned by string length because of UTF-8 encoding. See page 210 of Chapter 15 about Unicode and UTF-8. Compares strings lexicographically. Use -nocase for case insensitve comparison. Use -length to limit the comparison to the first len characters. Returns 0 if equal, -1 if str1 sorts before str2, else 1.

string equal ?nocase? str1 str2 string first str1 str2

Compares strings and returns 1 if they are the same. Use -nocase for case insensitve comparison. Returns the index in str2 of the first occurrence of str1, or -1 if str1 is not found. An index counts from zero. Use

string index string Returns the character at the specified index. index end for the last character. string is class ?strict? ?failindex varname? string

Returns 1 if string belongs to class. If -strict, then empty strings never match, otherwise they always match. If -failindex is specified, then varname is assigned the index of the character in string that prevented it from being a member of class. See Table 4-3 on page 50 for character class names. Returns the index in str2 of the last occurrence of str1, or -1 if str1 is not found. Returns the number of characters in string. Returns a new string created by mapping characters in string according to the input, output list in charMap. See page 51. Returns 1 if str matches the pattern, else 0. Glob-style matching is used. See page 48. Returns the range of characters in str from i to j. Returns str repeated count times. Returns a new string created by replacing characters first through last with newstr, or nothing. Returns string in lower case. first and last determine the range of string on which to operate. Capitalizes string by replacing its first character with the Unicode title case, or upper case, and the rest with lower case. first and last determine the range of string on which to operate. Returns string in upper case. first and last determine the range of string on which to operate. Trims the characters in chars from both ends of string. chars defaults to whitespace. Trims the characters in chars from the beginning of string. chars defaults to whitespace. Trims the characters in chars from the end of string. chars defaults to whitespace.

string last str1 str2 string length string string map ?nocase? charMap string string match pattern str string range str i j string repeat str count string replace str first last?newstr? string tolower string?first? ? last? string totitle string?first? ? last? string toupper string?first? ? last? string trim string?chars? string trimleft string?chars? string trimright string?chars?

string wordend str ix string wordstart str ix

Returns the index in str of the character after the word containing the character at index ix. Returns the index in str of the first character in the word containing the character at index ix.

These are the string operations I use most: The equal operation, which is shown in Example 4-2 on page 48. String match. This pattern matching operation is described on page 48. The tolower, totitle, and toupper operations convert case. The trim, trimright, and trimleft operations are handy for cleaning up strings. These new operations were added in Tcl 8.1 (actually, they first appeared in the 8.1.1 patch release): The equal operation, which is simpler than using string compare. The is operation that test for kinds of strings. String classes are listed in Table 4-3 on page 50. The map operation that translates characters (e.g., like the Unix tr command.) The repeat and replace operations. The totitle operation, which is handy for capitalizing words.

String Indices
Several of the string operations involve string indices that are positions within a string. Tcl counts characters in strings starting with zero. The special index end is used to specify the last character in a string: string range abcd 2 end => cd Tcl 8.1 added syntax for specifying an index relative to the end. Specify end-N to get the Nth caracter before the end. For example, the following command returns a new string that drops the first and last characters from the original: string range $string 1 end-1 There are several operations that pick apart strings: first, last, wordstart, wordend, index, and range. If you find yourself using combinations of these operations to pick apart data, it will be faster if you can do it with the regular expression pattern matcher described in Chapter 11.

Strings and Expressions

Strings can be compared with expr, if, and while using the comparison operators ==, !=, < and >. However, there are a number of subtle issues that can cause problems. First, you must quote the string value so that the expression parser can identify it as a string type. Then, you must group the expression with curly braces to prevent the double quotes from being stripped off by the main interpreter: if {$x == "foo"}command
expr is

unreliable for string comparison.

Ironically, despite the quotes, the expression evaluator first converts items to numbers if possible, and then converts them back if it detects a case of string comparison. The conversion back is always done as a decimal number. This can lead to unexpected conversions between strings that look like hexadecimal or octal numbers. The following boolean expression is true! if {"0xa" == "10"} {puts stdout ack! } => ack! A safe way to compare strings is to use the string compare and equal operations. These operations work faster because the unnecessary conversions are eliminated. Like the C library strcmp function, string compare returns 0 if the strings are equal, minus 1 if the first string is lexicographically less than the second, or 1 if the first string is greater than the second: Example 4-1 Comparing strings with string compare. if {[string compare $s1 $s2] == 0} { # strings are equal } The string equal command added in Tcl 8.1 makes this simpler: Example 4-2 Comparing strings with string equal. if {[string equal $s1 $s2]} { # strings are equal }

String Matching
The string match command implements glob-style pattern matching that is modeled after the file name pattern matching done by various UNIX shells. The heritage of the word "glob" is rooted in UNIX, and Tcl preserves this historical oddity in the glob command that does pattern matching on file names. The glob command is described on page 115. Table 4-2 shows the three constructs used in string match patterns:

Table 4-2. Matching characters used with string match.

* ? [chars]

Match any number of any characters. Match exactly one character. Match any character in chars.

Any other characters in a pattern are taken as literals that must match the input exactly. The following example matches all strings that begin with a: string match a* alpha => 1 To match all two-letter strings: string match ?? XY => 1 To match all strings that begin with either a or b: string match {[ab]*}cello => 0 Be careful! Square brackets are also special to the Tcl interpreter, so you will need to wrap the pattern up in curly braces to prevent it from being interpreted as a nested command. Another approach is to put the pattern into a variable: set pat {[ab]*x} string match $pat box => 1

You can specify a range of characters with the syntax [x-y]. For example, [a-z] represents the set of all lower-case letters, and [0-9] represents all the digits. You can include more than one range in a set. Any letter, digit, or the underscore is matched with: string match {[a-zA-Z0-9_]}$char The set matches only a single character. To match more complicated patterns, like one or more characters from a set, then you need to use regular expression matching, which is described on page 148. If you need to include a literal *, ?, or bracket in your pattern, preface it with a backslash: string match {*\?}what? => 1 In this case the pattern is quoted with curly braces because the Tcl interpreter is also doing backslash substitutions. Without the braces, you would have to use two backslashes. They are replaced with a single backslash by Tcl before string match is called. string match *\\? what?

Character Classes
The string is command tests a string to see whether it belongs to a particular class. This is useful for input validation. For example, to make sure something is a number, you do: if {![string is integer $input]} { error "Invalid input. Please enter a number." } Classes are defined in terms of the Unicode character set, which means they are more general than specifying character sets with ranges over the ASCII encoding. For example, alpha includes many characters outside the range of [A-Za-z] because of different characters in other alphabets. The classes are listed in Table 4-3.

Table 4-3. Character class names.

alnum alpha ascii boolean control digit double false graph integer lower print punct space true upper wordchar xdigit

Any alphabet or digit character. Any alphabet character. Any character with a 7-bit character code (i.e., less than 128.)
0, 1, true, false (in any case).

Character code less than 32, and not NULL. Any digit character. A valid floating point number.
0

or false (in any case).

Any printing characters, not including space characters. A valid integer. A string in all lower case. A synonym for alnum. Any punctuation character. Space, tab, newline, carriage return, vertical tab, backspace.
1

or true (in any case).

A string all in upper case. Alphabet, digit, and the underscore. Valid hexadecimal digits.

Mapping Strings
The string map command translates a string based on a character map. The map is in the form of a input, output list. Whereever a string contains an input sequence, that is replaced with the corresponding output. For example: string map "food" {f p d l} => pool The inputs and outputs can be more than one character and do not have to be the same length: string map "food" {f p d ll oo u} => pull Example 4-3 is more practical. It uses string map to replace fancy quotes and hyphens produced by Microsoft Word into ASCII equivalents. It uses the open, read, and close file operations that are described in Chapter 9, and the fconfigure command described on page 223 to ensure that the file

format is UNIX friendly. Example 4-3 Mapping Microsoft World special characters to ASCII. proc Dos2Unix {filename} { set input [open $filename] set output [open $filename.new] fconfigure $output -translation lf puts $output [string map { \223 " \224 " \222 ' \226 }[read $input]] close $input close $output }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 4. String Processing in Tcl

The append Command

The append command takes a variable name as its first argument and concatenates its remaining arguments onto the current value of the named variable. The variable is created if it does not already exist: set foo z append foo a b c set foo => zabc The append command is efficient with large strings.

The append command provides an efficient way to add items to the end of a string. It modifies a variable directly, so it can exploit the memory allocation scheme used internally by Tcl. Using the append command like this: append x " some new stuff" is always faster than this: set x "$x some new stuff" The lappend command described on page 61 has similar performance benefits when working with Tcl lists.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 4. String Processing in Tcl

The format Command

The format command is similar to the C printf function. It formats a string according to a format specification: format spec value1 value2 ... The spec argument includes literals and keywords. The literals are placed in the result as is, while each keyword indicates how to format the corresponding argument. The keywords are introduced with a percent sign, %, followed by zero or more modifiers, and terminate with a conversion specifier. Example keywords include %f for floating point, %d for integer, and %s for string format. Use %% to obtain a single percent character. The most general keyword specification for each argument contains up to six parts: position specifier flags field width precision word length conversion character These components are explained by a series of examples. The examples use double quotes around the format specification. This is because often the format contains white space, so grouping is required, as well as backslash substitutions like \t or \n, and the quotes allow substitution of these special characters. Table 4-4 lists the conversion characters:

Table 4-4. Format conversions.

d u i o x or X c s f e or E g or G

Signed integer. Unsigned integer. Signed integer. The argument may be in hex (0x) or octal (0) format. Unsigned octal. Unsigned hexadecimal. 'x' gives lowercase results. Map from an integer to the ASCII character it represents. A string. Floating point number in the format a.b. Floating point number in scientific notation, a.bE+-c. Floating point number in either %f or %e format, whichever is shorter.

A position specifier is i$, which means take the value from argument i as opposed to the normally corresponding argument. The position counts from 1. If a position is specified for one format keyword, the position must be used for all of them. If you group the format specification with double quotes, you need to quote the $ with a backslash: set lang 2 format "%${lang}\$s" one un uno => un The position specifier is useful for picking a string from a set, such as this simple language-specific example. The message catalog facility described in Chapter 15 is a much more sophisticated way to solve this problem. The position is also useful if the same value is repeated in the formatted string. The flags in a format are used to specify padding and justification. In the following examples, the # causes a leading 0x to be printed in the hexadecimal value. The zero in 08 causes the field to be padded with zeros. Table 4-5 summarizes the format flag characters. format "%#x" 20 => 0x14 format "%#08x" 10 => 0x0000000a

Table 4-5. Format flags.

+ space 0 #

Left justify the field. Always include a sign, either + or -. Precede a number with a space, unless the number has a leading sign. Useful for packing numbers close together. Pad with zeros. Leading 0 for octal. Leading 0x for hex. Always include a decimal point in floating point. Do not remove trailing zeros (%g).

After the flags you can specify a minimum field width value. The value is padded to this width with spaces, or with zeros if the 0 flag is used: format "%-20s %3d" Label 2 => Label 2 You can compute a field width and pass it to format as one of the arguments by using * as the field width specifier. In this case the next argument is used as the field width instead of the value, and the argument after that is the value that gets formatted. set maxl 8 format "%-*s = %s" $maxl Key Value => Key = Value The precision comes next, and it is specified with a period and a number. For %f and %e it indicates how many digits come after the decimal point. For %g it indicates the total number of significant digits used. For %d and %x it indicates how many digits will be printed, padding with zeros if necessary. format "%6.2f %6.2d" 1 1 => 1.00 01 The storage length part comes last but it is rarely useful because Tcl maintains all floating point values in double-precision, and all integers as long words.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 4. String Processing in Tcl

The scan Command

The scan command parses a string according to a format specification and assigns values to variables. It returns the number of successful conversions it made. The general form of the command is: scan string format var ?var? ?var? ... The format for scan is nearly the same as in the format command. There is no %u scan format. The %c scan format converts one character to its decimal value. The scan format includes a set notation. Use square brackets to delimit a set of characters. The set matches one or more characters that are copied into the variable. A dash is used to specify a range. The following scans a field of all lowercase letters. scan abcABC {%[a-z]}result => 1 set result => abc If the first character in the set is a right square bracket, then it is considered part of the set. If the first character in the set is ^, then characters not in the set match. Again, put a right square bracket immediately after the ^ to include it in the set. Nothing special is required to include a left square bracket in the set. As in the previous example, you will want to protect the format with braces, or use backslashes, because square brackets are special to the Tcl parser.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 4. String Processing in Tcl

The binary Command

Tcl 8.0 added support for binary strings. Previous versions of Tcl used null-terminated strings internally, which foils the manipulation of some types of data. Tcl now uses counted strings, so it can tolerate a null byte in a string value without truncating it. This section describes the binary command that provides conversions between strings and packed binary data representations. The binary format command takes values and packs them according to a template. For example, this can be used to format a floating point vector in memory suitable for passing to Fortran. The resulting binary value is returned: binary format template value ?value ...? The binary scan command extracts values from a binary string according to a similar template. For example, this is useful for extracting data stored in binary format. It assigns values to a set of Tcl variables: binary scan value template variable ?variable ...?

Format Templates
The template consists of type keys and counts. The types are summarized in Table 4-6. In the table, count is the optional count following the type letter.

Table 4-6. Binary conversion types.

a A b B h H c s S i I f d x

A character string of length count. Padded with nulls in binary format. A character string of length count. Padded with spaces in binary format. Trailing nulls and blanks are discarded in binary scan. A binary string of length count. Low-to-high order. A binary string of length count. High-to-low order. A hexadecimal string of length count. Low-to-high order. A hexadecimal string of length count. High-to-low order. (More commonly used than h.) An 8-bit character code. The count is for repetition. A 16-bit integer in little-endian byte order. The count is for repetition. A 16-bit integer in big-endian byte order. The count is for repetition. A 32-bit integer in little-endian byte order. The count is for repetition. A 32-bit integer in big-endian byte order. The count is for repetition. Single-precision floating point value in native format. count is for repetition. Double-precision floating point value in native format. count is for repetition. Pack count null bytes with binary format. Skip count bytes with binary scan.

X @

Backup count bytes. Skip to absolute position specified by count. If count is *, skip to the end.

The count is interpreted differently depending on the type. For types like integer (i) and double (d), the count is a repetition count (e.g., i3 means three integers). For strings, the count is a length (e.g., a3 means a three-character string). If no count is specified, it defaults to 1. If count is *, then binary scan uses all the remaining bytes in the value. Several type keys can be specified in a template. Each key-count combination moves an imaginary cursor through the binary data. There are special type keys to move the cursor. The x key generates null bytes in binary format, and it skips over bytes in binary scan. The @ key uses its count as an absolute byte offset to which to set the cursor. As a special case, @* skips to the end of the data. The X key backs up count bytes. Numeric types have a particular byte order that determines how their value is laid out in memory. The type keys are lowercase for little-endian byte order (e.g., Intel) and uppercase for big-endian byte order (e.g., SPARC and Motorola). Different integer sizes are 16-bit (s or S), 32-bit (i or I), and possibly 64-bit (l or L) on those machines that support 64-bit integers. Note that the official byte order for data transmitted over a network is big-endian. Floating point values are always machine-specific, so it only makes sense to format and scan these values on the same machine. There are three string types: character (a or A), binary (b or B), and hexadecimal (h or H). With these types the count is the length of the string. The a type pads its value to the specified length with null bytes in binary format and the A type pads its value with spaces. If the value is too long, it is truncated. In binary scan, the A type strips trailing blanks and nulls.

A binary string consists of zeros and ones. The b type specifies bits from low-to-high order, and the B type specifies bits from high-to-low order. A hexadecimal string specifies 4 bits (i.e., nybbles) with each character. The h type specifies nybbles from low-to-high order, and the H type specifies nybbles from high-to-low order. The B and H formats match the way you normally write out numbers.

Examples
When you experiment with binary format and binary scan, remember that Tcl treats things as strings by default. A "6", for example, is the character 6 with character code 54 or 0x36. The c type returns these character codes: set input 6 binary scan $input "c" 6val set 6val => 54 You can scan several character codes at a time: binary scan abc "c3" list => 1 set list => 97 98 99 The previous example uses a single type key, so binary scan sets one corresponding Tcl variable. If you want each character code in a separate variable, use separate type keys: binary scan abc "ccc" x y z => 3 set z => 99 Use the H format to get hexadecimal values: binary scan 6 "H2" 6val set 6val => 36 Use the a and A formats to extract fixed width fields. Here the * count is used to get all the rest of the string. Note that A trims trailing spaces: binary scan "hello world " a3x2A* first second puts "\"$first\" \"$second\""

=> "hel" " world" Use the @ key to seek to a particular offset in a value. The following command gets the second doubleprecision number from a vector. Assume the vector is read from a binary data file: binary scan $vector "@8d" double With binary format, the a and A types create fixed width fields. A pads its field with spaces, if necessary. The value is truncated if the string is too long: binary format "A9A3" hello world => hello wor An array of floating point values can be created with this command: binary format "f*" 1.2 3.45 7.43 -45.67 1.03e4 Remember that floating point values are always in native format, so you have to read them on the same type of machine that they were created. With integer data you specify either big-endian or littleendian formats. The tcl_platform variable described on page 182 can tell you the byte order of the current platform.

Binary Data and File I/O

When working with binary data in files, you need to turn off the newline translations and character set encoding that Tcl performs automatically. These are described in more detail on pages 114 and 209. For example, if you are generating binary data, the following command puts your standard output in binary mode: fconfigure stdout -translation binary -encoding binary puts [binary format "B8" 11001010]

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 4. String Processing in Tcl

Related Chapters
To learn more about manipulating data in Tcl, read about lists in Chapter 5 and arrays in Chapter 8. For more about pattern matching, read about regular expressions in Chapter 11. For more about file I/O, see Chapter 9. For information on Unicode and other Internationalization issues, see Chapter 15.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Part I. Tcl Basics

Chapter 5. Tcl Lists

This chapter describes Tcl lists. Tcl commands described are: list, lindex, llength, lrange, lappend , linsert , lreplace, lsearch , lsort, concat, join, and split. Lists in Tcl have the same structure as Tcl commands. All the rules you learned about grouping arguments in Chapter 1 apply to creating valid Tcl lists. However, when you work with Tcl lists, it is best to think of lists in terms of operations instead of syntax. Tcl commands provide operations to put values into a list, get elements from lists, count the elements of lists, replace elements of lists, and so on. The syntax can sometimes be confusing, especially when you have to group arguments to the list commands themselves. Lists are used with commands such as foreach that take lists as arguments. In addition, lists are important when you are building up a command to be evaluated later. Delayed command evaluation with eval is described in Chapter 10, and similar issues with Tk callback commands are described in Chapter 27. However, Tcl lists are not often the right way to build complicated data structures in scripts. You may find Tcl arrays more useful, and they are the topic of Chapter 8. List operations are also not right for handling unstructured data such as user input. Use regular expressions instead, which are described in Chapter 11.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 5. Tcl Lists

Tcl Lists
A Tcl list is a sequence of values. When you write out a list, it has the same syntax as a Tcl command. A list has its elements separated by white space.Braces or quotes can be used to group words with white space into a single list element. Because of the relationship between lists and commands, the list-related commands described in this chapter are used often when constructing Tcl commands. Big lists were often slow before Tcl 8.0.

Unlike list data structures in other languages, Tcl lists are just strings with a special interpretation. The string representation must be parsed on each list access, so be careful when you use large lists. A list with a few elements will not slow down your code much. A list with hundreds or thousands of elements can be very slow. If you find yourself maintaining large lists that must be frequently accessed, consider changing your code to use arrays instead. The performance of lists was improved by the Tcl compiler added in Tcl 8.0. The compiler stores lists in an internal format that requires constant time to access. Accessing the first element costs the same as accessing any other element in the list. Before Tcl 8.0, the cost of accessing an element was proportional to the number of elements before it in the list. The internal format also records the number of list elements, so getting the length of a list is cheap. Before Tcl 8.0, computing the length required reading the whole list. Table 5-1 briefly describes the Tcl commands related to lists.

Table 5-1. List-related commands.

list arg1 arg2 ... lindex list i llength list lrange list i j lappend listVar arg arg ... linsert list index arg arg ... lreplace list i j arg arg ... lsearch ?mode? list value lsort ?switches? list

Creates a list out of all its arguments. Returns the ith element from list. Returns the number of elements in list. Returns the ith through jth elements from list. Appends elements to the value of listVar. Inserts elements into list before the element at position index. Returns a new list. Replaces elements i through j of list with the args. Returns a new list. Returns the index of the element in list that matches the value according to the mode, which is -exact, -glob, or -regexp. -glob is the default. Returns -1 if not found. Sorts elements of the list according to the switches: -ascii, -integer, real, -dictionary, -increasing, -decreasing, -index ix , -command command . Returns a new list.

concat list list ... join list joinString split string splitChars

Joins multiple lists together into one list. Merges the elements of a list together by separating them with joinString. Splits a string up into list elements, using the characters in splitChars as boundaries between list elements.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 5. Tcl Lists

Constructing Lists
Constructing a list can be tricky because you must maintain proper list syntax. In simple cases, you can do this by hand. In more complex cases, however, you should use Tcl commands that take care of quoting so that the syntax comes out right.

The list command

The list command constructs a list out of its arguments so that there is one list element for each argument. If any of the arguments contain special characters, the list command adds quoting to ensure that they are parsed as a single element of the resulting list. The automatic quoting is very useful, and the examples in this book use the list command frequently. The next example uses list to create a list with three values, two of which contain special characters. Example 5-1 Constructing a list with the list command. set x {1 2} => 1 2 set y foo => foo set l1 [list $x "a b" $y] => {1 2} {a b}foo set l2 "\{$x\\a b}$y" => {1 2} {a b}foo The list command does automatic quoting.

Compare the use of list with doing the quoting by hand in Example 5-1. The assignment of l2

requires carefully constructing the first list element by using quoted braces. The braces must be turned off so that $x can be substituted, but we need to group the result so that it remains a single list element. We also have to know in advance that $x contains a space, so quoting is required. We are taking a risk by not quoting $y because we know it doesn't contain spaces. If its value changes in the future, the structure of the list can change and even become invalid. In contrast, the list command takes care of all these details automatically. When I first experimented with Tcl lists, I became confused by the treatment of curly braces. In the assignment to x, for example, the curly braces disappear. However, they come back again when $x is put into a bigger list. Also, the double quotes around a b get changed into curly braces. What's going on? Remember that there are two steps. In the first step, the Tcl parser groups arguments. In the grouping process, the braces and quotes are syntax that define groups. These syntax characters get stripped off. The braces and quotes are not part of the value. In the second step, the list command creates a valid Tcl list. This may require quoting to get the list elements into the right groups. The list command uses curly braces to group values back into list elements.

The lappend Command

The lappend command is used to append elements to the end of a list. The first argument to lappend is the name of a Tcl variable, and the rest of the arguments are added to the variable's value as new list elements. Like list, lappend preserves the structure of its arguments. It may add braces to group the values of its arguments so that they retain their identity as list elements when they are appended onto the string representation of the list. Example 5-2 Using lappend to add elements to a list. lappend new => 1 2 lappend new => 1 2 3 {4 set new => 1 2 3 {4 1 2 3 "4 5" 5} 5}

The lappend command is unique among the list-related commands because its first argument is the name of a list-valued variable, while all the other commands take list values as arguments. You can call lappend with the name of an undefined variable and the variable will be created. The lappend command is implemented efficiently to take advantage of the way that Tcl stores lists internally. It is always more efficient to use lappend than to try and append elements by hand.

The concat Command

The concat command is useful for splicing lists together. It works by concatenating its arguments, separating them with spaces. This joins multiple lists into one list where the top-level list elements in each input list become top-level list elements in the resulting list: Example 5-3 Using concat to splice lists together.

set x {4 5 6} set y {2 3} set z 1 concat $z $y $x => 1 2 3 4 5 6 Double quotes behave much like the concat command. In simple cases, double quotes behave exactly like concat. However, the concat command trims extra white space from the end of its arguments before joining them together with a single separating space character. Example 5-4 compares the use of list, concat, and double quotes: Example 5-4 Double quotes compared to the concat and list commands. set x {1 2} => 1 2 set y "$x 3" => 1 2 3 set y [concat $x 3] => 1 2 3 set s { 2 } => 2 set y "1 $s 3" => 1 2 3 set y [concat 1 $s 3] => 1 2 3 set z [list $x $s 3] => {1 2} { 2 } 3 The distinction between list and concat becomes important when Tcl commands are built dynamically. The basic rule is that list and lappend preserve list structure, while concat (or double quotes) eliminates one level of list structure. The distinction can be subtle because there are examples where list and concat return the same results. Unfortunately, this can lead to data-dependent bugs. Throughout the examples of this book, you will see the list command used to safely construct lists. This issue is discussed more in Chapter 10.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

The split Command

The split command takes a string and turns it into a list by breaking it at specified characters and ensuring that the result has the proper list syntax. The split command provides a robust way to turn input lines into proper Tcl lists: set line {welch:*:28405:100:Brent Welch:/usr/welch:/bin/csh} split $line : => welch * 28405 100 {Brent Welch} /usr/welch /bin/csh lindex [split $line :] 4 => Brent Welch Do not use list operations on arbitrary data.

Even if your data has space-separated words, you should be careful when using list operators on arbitrary input data. Otherwise, stray double quotes or curly braces in the input can result in invalid list structure and errors in your script. Your code will work with simple test cases, but when invalid list syntax appears in the input, your script will raise an error. The next example shows what happens when input is not a valid list. The syntax error, an unmatched quote, occurs in the middle of the list. However, you cannot access any of the list because the lindex command tries to convert the value to a list before returning any part of it. Example 5-8 Use split to turn input data into Tcl lists. set line {this is "not a tcl list} lindex $line 1 => unmatched open quote in list lindex [split $line] 2

=> "not The default separator character for split is white space, which contains spaces, tabs, and newlines. If there are multiple separator characters in a row, these result in empty list elements; the separators are not collapsed. The following command splits on commas, periods, spaces, and tabs. The backslashspace sequence is used to include a space in the set of characters. You could also group the argument to split with double quotes: set line "\tHello, world." split $line \,.\t => {}Hello {}world {} A trick that splits each character into a list element is to specify an empty string as the split character. This lets you get at individual characters with list operations: split abc {} => a b c However, if you write scripts that process data one character at a time, they may run slowly. Read Chapter 11 about regular expressions for hints on really efficient string processing.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 5. Tcl Lists

The join Command

The join command is the inverse of split. It takes a list value and reformats it with specified characters separating the list elements. In doing so, it removes any curly braces from the string representation of the list that are used to group the top-level elements. For example: join {1 {2 3} {4 5 6}}: => 1:2 3:4 5 6 If the treatment of braces is puzzling, remember that the first value is parsed into a list. The braces around element values disappear in the process. Example 5-9 shows a way to implement join in a Tcl procedure, which may help to understand the process: Example 5-9 Implementing join in Tcl. proc join {list sep} { set s {} ;# s is the current separator set result {} foreach x $list { append result $s $x set s $sep } return $result }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Table of Contents

Chapter 6. Control Structure Commands

Switch
The switch command is used to branch to one of many command bodies depending on the value of an expression. The choice can be made on the basis of pattern matching as well as simple comparisons. Pattern matching is discussed in more detail in Chapter 4 and Chapter 11. The general form of the command is: switch flags value pat1 body1 pat2 body2 ... Any number of pattern-body pairs can be specified. If multiple patterns match, only the body of the first matching pattern is evaluated. You can also group all the pattern-body pairs into one argument: switch flags value {pat1 body1 pat2 body2 ... } The first form allows substitutions on the patterns but will require backslashes to continue the command onto multiple lines. This is shown in Example 6-4 on page 72. The second form groups all the patterns and bodies into one argument. This makes it easy to group the whole command without worrying about newlines, but it suppresses any substitutions on the patterns. This is shown in Example 6-3. In either case, you should always group the command bodies with curly braces so that substitution occurs only on the body with the pattern that matches the value. There are four possible flags that determine how value is matched.
-exact -glob -regexp --

Matches the value exactly to one of the patterns. This is the default. Uses glob-style pattern matching. See page 48. Uses regular expression pattern matching. See page 134. No flag (or end of flags). Necessary when value can begin with -.

The switch command raises an error if any other flag is specified or if the value begins with -. In practice I always use the -- flag before value so that I don't have to worry about that problem.

If the pattern associated with the last body is default, then this command body is executed if no other patterns match. The default keyword works only on the last pattern-body pair. If you use the default pattern on an earlier body, it will be treated as a pattern to match the literal string default: Example 6-3 Using switch for an exact match. switch -exact -- $value { foo { doFoo; incr count(foo) } bar { doBar; return $count(foo)} default { incr count(other) } } If you have variable references or backslash sequences in the patterns, then you cannot use braces around all the pattern-body pairs. You must use backslashes to escape the newlines in the command: Example 6-4 Using switch with substitutions in the patterns. switch -regexp -- $value \ ^$key { body1 }\ \t### { body2 }\ {[0-9]*} { body3 } In this example, the first and second patterns have substitutions performed to replace $key with its value and \t with a tab character. The third pattern is quoted with curly braces to prevent command substitution; square brackets are part of the regular expression syntax, too. (See page Chapter 11.) If the body associated with a pattern is just a dash, -, then the switch command "falls through" to the body associated with the next pattern. You can tie together any number of patterns in this manner. Example 6-5 A switch with "fall through" cases. switch -glob -- $value { X* Y* { takeXorYaction $value } }

Comments in switch Commands

A comment can occur only where the Tcl parser expects a command to begin. This restricts the location of comments in a switch command. You must put them inside the command body associated with a pattern, as shown in Example 6-6. If you put a comment at the same level as the patterns, the switch command will try to interpret the comment as one or more pattern-body pairs.

Example 6-6 Comments in switch commands. switch -- $value { # this comment confuses switch pattern { # this comment is ok } }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 6. Control Structure Commands

While
The while command takes two arguments, a test and a command body: while booleanExpr body The while command repeatedly tests the boolean expression and then executes the body if the expression is true (nonzero). Because the test expression is evaluated again before each iteration of the loop, it is crucial to protect the expression from any substitutions before the while command is invoked. The following is an infinite loop (see also Example 1-13 on page 12): set i 0 ; while $i<10 {incr i} The following behaves as expected: set i 0 ; while {$i<10} {incr i} It is also possible to put nested commands in the boolean expression. The following example uses gets to read standard input. The gets command returns the number of characters read, returning -1 upon end of file. Each time through the loop, the variable line contains the next line in the file: Example 6-7 A while loop to read standard input. set numLines 0 ; set numChars 0 while {[gets stdin line] >= 0} { incr numLines incr numChars [string length $line] }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 6. Control Structure Commands

Foreach
The foreach command loops over a command body assigning one or more loop variables to each of the values in one or more lists. Multiple loop variables were introduced in Tcl 7.5. The syntax for the simple case of a single variable and a single list is: foreach loopVar valueList commandBody The first argument is the name of a variable, and the command body is executed once for each element in the list with the loop variable taking on successive values in the list. The list can be entered explicitly, as in the next example: Example 6-8 Looping with foreach. set i 1 foreach value {1 3 5 7 11 13 17 19 23} { set i [expr $i*$value] } set i => 111546435 It is also common to use a list-valued variable or command result instead of a static list value. The next example loops through command-line arguments. The variable argv is set by the Tcl interpreter to be a list of the command-line arguments given when the interpreter was started: Example 6-9 Parsing command-line arguments. # # # # argv is set by the Tcl shells possible flags are: -max integer -force

# -verbose set state flag set force 0 set verbose 0 set max 10 foreach arg $argv { switch -- $state { flag { switch -glob -- $arg { -f* {set force 1} -v* {set verbose 1} -max {set state max} default {error "unknown flag $arg"} } } max { set max $arg set state flag } } } The loop uses the state variable to keep track of what is expected next, which in this example is either a flag or the integer value for -max. The -- flag to switch is required in this example because the switch command complains about a bad flag if the pattern begins with a - character. The -glob option lets the user abbreviate the -force and -verbose options. If the list of values is to contain variable values or command results, then the list command should be used to form the list. Avoid double quotes because if any values or command results contain spaces or braces, the list structure will be reparsed, which can lead to errors or unexpected results. Example 6-10 Using list with foreach. foreach x [list $a $b [foo]] { puts stdout "x = $x" } The loop variable x will take on the value of a, the value of b, and the result of the foo command, regardless of any special characters or whitespace in those values.

Multiple Loop Variables

You can have more than one loop variable with foreach. Suppose you have two loop variables x and y. In the first iteration of the loop, x gets the first value from the value list and y gets the second value. In the second iteration, x gets the third value and y gets the fourth value. This continues until there are no more values. If there are not enough values to assign to all the loop variables, the extra variables get

the empty string as their value. Example 6-11 Multiple loop variables with foreach. foreach {key value} {orange 55 blue 72 red 24 green} { puts "$key: $value" } orange: 55 blue: 72 red: 24 green: If you have a command that returns a short list of values, then you can abuse the foreach command to assign the results of the commands to several variables all at once. For example, suppose the command MinMax returns two values as a list: the minimum and maximum values. Here is one way to get the values: set result [MinMax $list] set min [lindex $result 0] set max [lindex $result 1] The foreach command lets us do this much more compactly: foreach {min max}[MinMax $list] {break} The break in the body of the foreach loop guards against the case where the command returns more values than we expected. This trick is encapsulated into the lassign procedure in Example 10-4 on page 131.

Multiple Value Lists

The foreach command has the ability to loop over multiple value lists in parallel. In this case, each value list can also have one or more variables. The foreach command keeps iterating until all values are used from all value lists. If a value list runs out of values before the last iteration of the loop, its corresponding loop variables just get the empty string for their value. Example 6-12 Multiple value lists with foreach. foreach {k1 k2} {orange blue red green black}value {55 72 24} { puts "$k1 $k2: $value" } orange blue: 55 red green: 72

Table of Contents

Chapter 6. Control Structure Commands

Catch
Until now we have ignored the possibility of errors. In practice, however, a command will raise an error if it is called with the wrong number of arguments, or if it detects some error condition particular to its implementation. An uncaught error aborts execution of a script.[*] The catch command is used to trap such errors. It takes two arguments:
[*] More

precisely, the Tcl script unwinds and the current Tcl_Eval procedure in the C runtime library returns TCL_ERROR . There are three cases. In interactive use, the Tcl shell prints the error message. In Tk, errors that arise during event handling trigger a call to bgerror, a Tcl procedure you can implement in your application. In your own C code, you should check the result of Tcl_Eval and take appropriate action in the case of an error.

catch command ?resultVar? The first argument to catch is a command body. The second argument is the name of a variable that will contain the result of the command, or an error message if the command raises an error. catch returns zero if there was no error caught, or a nonzero error code if it did catch an error. You should use curly braces to group the command instead of double quotes because catch invokes the full Tcl interpreter on the command. If double quotes are used, an extra round of substitutions occurs before catch is even called. The simplest use of catch looks like the following: catch {command } A more careful catch phrase saves the result and prints an error message: Example 6-14 A standard catch phrase. if {[catch { command arg1 arg2 ... }result]} { puts stderr $result } else {

# command was ok, result contains the return value } A more general catch phrase is shown in the next example. Multiple commands are grouped into a command body. The errorInfo variable is set by the Tcl interpreter after an error to reflect the stack trace from the point of the error: Example 6-15 A longer catch phrase. if {[catch { command1 command2 command3 } result]} { global errorInfo puts stderr $result puts stderr "*** Tcl TRACE ***" puts stderr $errorInfo } else { # command body ok, result of last command is in result } These examples have not grouped the call to catch with curly braces. This is acceptable because catch always returns an integer, so the if command will parse correctly. However, if we had used while instead of if , then curly braces would be necessary to ensure that the catch phrase was evaluated repeatedly.

Catching More Than Errors

The catch command catches more than just errors. If the command body contains return, break, or continue commands, these terminate the command body and are reflected by catch as nonzero return codes. You need to be aware of this if you try to isolate troublesome code with a catch phrase. An innocent looking return command will cause the catch to signal an apparent error. The next example uses switch to find out exactly what catch returns. Nonerror cases are passed up to the surrounding code by invoking return, break, or continue: Example 6-16 There are several possible return values from catch. switch [catch { command1 command2 ... } result] { 0 { 1 { 2 { return $result

# Normal completion } # Error case } ;# return from procedure}

3 { break 4 { continue default { }

;# break out of the loop} ;# continue loop} # User-defined error codes }

Global scope is the toplevel scope. This scope is outside of any procedure. Variables defined at the global scope must be made accessible to the commands inside a procedure by using the global command. The syntax for global is: global varName1 varName2 ... The global command goes inside a procedure.

The global command adds a global variable to the current scope. A common mistake is to have a single global command and expect that to apply to all procedures. However, a global command in the global scope has no effect. Instead, you must put a global command in all procedures that access the global variable. The variable can be undefined at the time the global command is used. When the variable is defined, it becomes visible in the global scope. Example 7-4 shows a random number generator. Before we look at the example, let me point out that the best way to get random numbers in Tcl is to use the rand() math function: expr rand() => .137287362934 The point of the example is to show a state variable, the seed, that has to persist between calls to random, so it is kept in a global variable. The choice of randomSeed as the name of the global variable associates it with the random number generator. It is important to pick names of global variables carefully to avoid conflict with other parts of your program. For comparison, Example 14-1 on page 196 uses namespaces to hide the state variable:

Example 7-4 A random number generator.[*] proc RandomInit { seed } { global randomSeed set randomSeed $seed } proc Random {} { global randomSeed set randomSeed [expr ($randomSeed*9301 + 49297) % 233280] return [expr $randomSeed/double(233280)] } proc RandomRange { range } { expr int([Random]*$range) } RandomInit [pid] => 5049 Random => 0.517686899863 Random => 0.217176783265 RandomRange 100 => 17
[*]

Adapted from Exploring Expect by Don Libes, O'Reilly & Associates, Inc., 1995, and from Numerical Recipes in C by Press et al., Cambridge University Press, 1988.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 7. Procedures and Scope

Call by Name Using upvar

Use the upvar command when you need to pass the name of a variable, as opposed to its value, into a procedure. The upvar command associates a local variable with a variable in a scope up the Tcl call stack. The syntax of the upvar command is: upvar ?level? varName localvar The level argument is optional, and it defaults to 1, which means one level up the Tcl call stack. You can specify some other number of frames to go up, or you can specify an absolute frame number with a #number syntax. Level #0 is the global scope, so the global foo command is equivalent to: upvar #0 foo foo The variable in the uplevel stack frame can be either a scalar variable, an array element, or an array name. In the first two cases, the local variable is treated like a scalar variable. In the case of an array name, then the local variable is treated like an array. The use of upvar and arrays is discussed further in Chapter 8 on page 92. The following procedure uses upvar to print the value of a variable given its name. Example 7-5 Print variable by name. proc PrintByName { varName } { upvar 1 $varName var puts stdout "$varName = $var" } You can use upvar to fix the incr command. One drawback of the built-in incr is that it raises an error if the variable does not exist. We can define a new version of incr that initializes the variable if it does not already exist:

Example 7-6 Improved incr procedure. proc incr { varName {amount 1}} { upvar 1 $varName var if {[info exists var]} { set var [expr $var + $amount] } else { set var $amount } return $var }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 7. Procedures and Scope

Variable Aliases with upvar

The upvar command is useful in any situation where you have the name of a variable stored in another variable. In Example 7-2 on page 82, the loop variable param holds the names of other variables. Their value is obtained with this construct: puts stdout "\t$param = [set $param]" Another way to do this is to use upvar. It eliminates the need to use awkward constructs like [set $param] . If the variable is in the same scope, use zero as the scope number with upvar. The following is equivalent: upvar 0 $param x puts stdout "\t$param = $x"

Associating State with Data

Suppose you have a program that maintains state about a set of objects like files, URLs, or people. You can use the name of these objects as the name of a variable that keeps state about the object. The upvar command makes this more convenient: upvar #0 $name state Using the name directly like this is somewhat risky. If there were an object named x, then this trick might conflict with an unrelated variable named x elsewhere in your program. You can modify the name to make this trick more robust: upvar #0 state$name state

Your code can pass name around as a handle on an object, then use upvar to get access to the data associated with the object. Your code is just written to use the state variable, which is an alias to the state variable for the current object. This technique is illustrated in Example 17-7 on page 232.

Namespaces and upvar

You can use upvar to create aliases for namespace variables, too. Namespaces are described in Chapter 14. For example, as an alternative to reserving all global variables beginning with state, you can use a namespace to hide these variables: upvar #0 state::$name state Now state is an alias to the namespace variable. This upvar trick works from inside any namespace.

Commands That Take Variable Names

Several Tcl commands involve variable names. For example, the Tk widgets can be associated with a global Tcl variable. The vwait and tkwait commands also take variable names as arguments.
Upvar

aliases do not work with text variables.

The aliases created with upvar do not work with these commands, nor do they work if you use trace, which is described on page 183. Instead, you must use the actual name of the global variable. To continue the above example where state is an alias, you cannot: vwait state(foo) button .b -textvariable state(foo) Instead, you must vwait state$name\(foo) button .b -textvariable state$name\(foo) The backslash turns off the array reference so Tcl does not try to access name as an array. You do not need to worry about special characters in $name, except parentheses. Once the name has been passed into the Tk widget it will be used directly as a variable name.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Part I. Tcl Basics

Chapter 8. Tcl Arrays

This chapter describes Tcl arrays, which provide a flexible mechanism to build many other data structures in Tcl. Tcl command described is: array. An array is a Tcl variable with a string-valued index. You can think of the index as a key, and the array as a collection of related data items identified by different keys. The index, or key, can be any string value. Internally, an array is implemented with a hash table, so the cost of accessing each array element is about the same. Before Tcl 8.0, arrays had a performance advantage over lists that took time to access proportional to the size of the list. The flexibility of arrays makes them an important tool for the Tcl programmer. A common use of arrays is to manage a collection of variables, much as you use a C struct or Pascal record. This chapter shows how to create several simple data structures using Tcl arrays.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 8. Tcl Arrays

Array Syntax
The index of an array is delimited by parentheses. The index can have any string value, and it can be the result of variable or command substitution. Array elements are defined with set: set arr(index) value The value of an array element is obtained with $ substitution: set foo $arr(index) Example 8-1 uses the loop variable value $i as an array index. It sets arr(x) to the product of 1 * 2 * ... * x: Example 8-1 Using arrays. set arr(0) 1 for {set i 1} {$i <= 10} {incr i} { set arr($i) [expr {$i * $arr([expr $i-1])}] }

Complex Indices
An array index can be any string, like orange, 5, 3.1415, or foo,bar. The examples in this chapter, and in this book, often use indices that are pretty complex strings to create flexible data structures. As a rule of thumb, you can use any string for an index, but avoid using a string that contains spaces.

Parentheses are not a grouping mechanism.

The main Tcl parser does not know about array syntax. All the rules about grouping and substitution described in Chapter 1 are still the same in spite of the array syntax described here. Parentheses do not group like curly braces or quotes, which is why a space causes problems. If you have complex indices, use a comma to separate different parts of the index. If you use a space in an index instead, then you have a quoting problem. The space in the index needs to be quoted with a backslash, or the whole variable reference needs to be grouped: set {arr(I'm asking for trouble)} {I told you so.} set arr(I'm\ asking\ for\ trouble) {I told you so.} If the array index is stored in a variable, then there is no problem with spaces in the variable's value. The following works well: set index {I'm asking for trouble} set arr($index) {I told you so.}

Array Variables
You can use an array element as you would a simple variable. For example, you can test for its existence with info exists, increment its value with incr, and append elements to it with lappend: if {[info exists stats($event)]} {incr stats($event)} You can delete an entire array, or just a single array element with unset. Using unset on an array is a convenient way to clear out a big data structure. It is an error to use a variable as both an array and a normal variable. The following is an error: set arr(0) 1 set arr 3 => can't set "arr": variable is array The name of the array can be the result of a substitution. This is a tricky situation, as shown in Example 8-2: Example 8-2 Referencing an array indirectly.

set name TheArray => TheArray set ${name}(xyz) {some value} => some value set x $TheArray(xyz) => some value set x ${name}(xyz) => TheArray(xyz) set x [set ${name}(xyz)] => some value A better way to deal with this situation is to use the upvar command, which is introduced on page 85. The previous example is much cleaner when upvar is used: Example 8-3 Referencing an array indirectly using upvar. set name TheArray => TheArray upvar 0 $name a set a(xyz) {some value} => some value set x $TheArray(xyz) => some value

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 8. Tcl Arrays

The array Command

The array command returns information about array variables. The array names command returns the index names that are defined in the array. If the array variable is not defined, then array names just returns an empty list. It allows easy iteration through an array with a foreach loop: foreach index [array names arr pattern] { # use arr($index) } The order of the names returned by array names is arbitrary. It is essentially determined by the hash table implementation of the array. You can limit what names are returned by specifying a pattern that matches indices. The pattern is the kind supported by the string match command, which is described on page 48. It is also possible to iterate through the elements of an array one at a time using the search-related commands listed in Table 8-1. The ordering is also random, and I find the foreach over the results of array names much more convenient. If your array has an extremely large number of elements, or if you need to manage an iteration over a long period of time, then the array search operations might be more appropriate. Frankly, I never use them. Table 8-1 summarizes the array command:

Table 8-1. The array command.

array exists arr array get arr ? pattern?

Returns 1 if arr is an array variable. Returns a list that alternates between an index and the corresponding array value. pattern selects matching indices. If not specified, all indices and values are returned. Returns the list of all indices defined for arr, or those that match the string match pattern. Initializes the array arr from list, which has the same form as the list returned by array get. Returns the number of indices defined for arr. Returns a search token for a search through arr. Returns the value of the next element in array in the search identified by the token id. Returns an empty string if no more elements remain in the search. Returns 1 if more elements remain in the search. Ends the search identified by id.

array names arr ? pattern? array set arr list array size arr array startsearch arr array nextelement arr id array anymore arr id array donesearch arr id

Converting Between Arrays and Lists

The array get and array set operations are used to convert between an array and a list. The list returned by array get has an even number of elements. The first element is an index, and the next is the corresponding array value. The list elements continue to alternate between index and value. The list argument to array set must have the same structure. array set fruit { best kiwi worst peach ok banana } array get fruit => ok banana best kiwi worst peach Another way to loop through the contents of an array is to use array get and the two-variable form of the foreach command. foreach {key value}[array get fruit] { # key is ok, best, or worst # value is some fruit }

Passing Arrays by Name

The upvar command works on arrays. You can pass an array name to a procedure and use the upvar command to get an indirect reference to the array variable in the caller's scope. This is illustrated in Example 8-4, which inverts an array. As with array names, you can specify a pattern to array get to limit what part of the array is returned. This example uses upvar because the array names are passed into the ArrayInvert procedure. The inverse array does not need to exist before you call ArrayInvert. Example 8-4 ArrayInvert inverts an array. proc ArrayInvert {arrName inverseName {pattern *}} { upvar $arrName array $inverseName inverse foreach {index value}[array get array $pattern] { set inverse($value) $index } }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 8. Tcl Arrays

Building Data Structures with Arrays

This section describes several data structures you can build with Tcl arrays. These examples are presented as procedures that implement access functions to the data structure. Wrapping up your data structures in procedures is good practice. It shields the user of your data structure from the details of its implementation. Use arrays to collect related variables.

A good use for arrays is to collect together a set of related variables for a module, much as one would use a record in other languages. By collecting these together in an array that has the same name as the module, name conflicts between different modules are avoided. Also, in each of the module's procedures, a single global statement will suffice to make all the state variables visible. You can also use upvar to manage a collection of arrays, as shown in Example 8-8 on page 95.

Simple Records
Suppose we have a database of information about people. One approach uses a different array for each class of information. The name of the person is the index into each array: Example 8-5 Using arrays for records, version 1. proc Emp_AddRecord {id name manager phone} { global employeeID employeeManager \ employeePhone employeeName set employeeID($name) $id set employeeManager($name) $manager set employeePhone($name) $phone

set employeeName($id) $name } proc Emp_Manager {name} { global employeeManager return $employeeManager($name) } Simple procedures are defined to return fields of the record, which hides the implementation so that you can change it more easily. The employeeName array provides a secondary key. It maps from the employee ID to the name so that the other information can be obtained if you have an ID instead of a name. Another way to implement the same little database is to use a single array with more complex indices: Example 8-6 Using arrays for records, version 2. proc Emp_AddRecord {id name manager phone} { global employee set employee(id,$name) $id set employee(manager,$name) $manager set employee(phone,$name) $phone set employee(name,$id) $name } proc Emp_Manager {name} { global employee return $employee(manager,$name) } The difference between these two approaches is partly a matter of taste. Using a single array can be more convenient because there are fewer variables to manage. In any case, you should hide the implementation in a small set of procedures.

A Stack
A stack can be implemented with either a list or an array. If you use a list, then the push and pop operations have a runtime cost that is proportional to the size of the stack. If the stack has a few elements this is fine. If there are a lot of items in a stack, you may wish to use arrays instead. Example 8-7 Using a list to implement a stack. proc Push { stack value } { upvar $stack list lappend list $value } proc Pop { stack } { upvar $stack list set value [lindex $list end]

set list [lrange $list 0 [expr [llength $list]-2]] return $value } In these examples, the name of the stack is a parameter, and upvar is used to convert that into the data used for the stack. The variable is a list in Example 8-7 and an array in Example 8-8. The user of the stack module does not have to know. The array implementation of a stack uses one array element to record the number of items in the stack. The other elements of the array have the stack values. The Push and Pop procedures both guard against a nonexistent array with the info exists command. When the first assignment to S(top) is done by Push, the array variable is created in the caller's scope. The example uses array indices in two ways. The top index records the depth of the stack. The other indices are numbers, so the construct $S($S(top)) is used to reference the top of the stack. Example 8-8 Using an array to implement a stack. proc Push { stack value } { upvar $stack S if {![info exists S(top)]} { set S(top) 0 } set S($S(top)) $value incr S(top) } proc Pop { stack } { upvar $stack S if {![info exists S(top)]} { return {} } if {$S(top) == 0} { return {} } else { incr S(top) -1 set x $S($S(top)) unset S($S(top)) return $x } }

A List of Arrays
Suppose you have many arrays, each of which stores some data, and you want to maintain an overall ordering among the data sets. One approach is to keep a Tcl list with the name of each array in order. Example 8-9 defines RecordInsert to add an array to the list, and an iterator function, RecordIterate, that applies a script to each array in order. The iterator uses upvar to make data an alias for the current array. The script is executed with eval, which is described in detail in Chapter 10. The Tcl commands in script can reference the arrays with the name data:

Example 8-9 A list of arrays. proc RecordAppend {listName arrayName} { upvar $listName list lappend list $arrayName } proc RecordIterate {listName script} { upvar $listName list foreach arrayName $list { upvar #0 $arrayName data eval $script } } Another way to implement this list-of-records structure is to keep references to the arrays that come before and after each record. Example 8-10 shows the insert function and the iterator function when using this approach. Once again, upvar is used to set up data as an alias for the current array in the iterator. In this case, the loop is terminated by testing for the existence of the next array. It is perfectly all right to make an alias with upvar to a nonexistent variable. It is also all right to change the target of the upvar alias. One detail that is missing from the example is the initialization of the very first record so that its next element is the empty string: Example 8-10 A list of arrays. proc RecordInsert {recName afterThis} { upvar $recName record $afterThis after set record(next) $after(next) set after(next) $recName } proc RecordIterate {firstRecord body} { upvar #0 $firstRecord data while {[info exists data]} { eval $body upvar #0 $data(next) data } }

A Simple In-Memory Database

Suppose you have to manage a lot of records, each of which contain a large chunk of data and one or more key values you use to look up those values. The procedure to add a record is called like this: Db_Insert keylist datablob

The datablob might be a name, value list suitable for passing to array set, or simply a large chunk of text or binary data. One implementation of Db_Insert might just be: foreach key $keylist { lappend Db($key) $datablob } The problem with this approach is that it duplicates the data chunks under each key. A better approach is to use two arrays. One stores all the data chunks under a simple ID that is generated automatically. The other array stores the association between the keys and the data chunks. Example 8-11, which uses the namespace syntax described in Chapter 14, illustrates this approach. The example also shows how you can easily dump data structures by writing array set commands to a file, and then load them later with a source command: Example 8-11 A simple in-memory database. namespace eval db { variable data ;# Array of data blobs variable uid 0 ;# Index into data variable index ;# Cross references into data } proc db::insert {keylist datablob} { variable data variable uid variable index set data([incr uid]) $datablob foreach key $keylist { lappend index($key) $uid } } proc db::get {key} { variable data variable index set result {} if {![info exist index($key)]} { return {} } foreach uid $index($key) { lappend result $data($uid) } return $result } proc db::save {filename} { variable uid set out [open $filename w] puts $out [list namespace eval db \ [list variable uid $uid]] puts $out [list array set db::data [array get db::data]]

puts $out [list array set db::index [array get db::index]] close $out } proc db::load {filename} { source $filename }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Part I. Tcl Basics

Chapter 9. Working with Files and Programs

This chapter describes how to run programs, examine the file system, and access environment variables through the env array. Tcl commands described are: exec, file, open, close, read, write, puts, gets, flush, seek, tell, glob, pwd, cd , exit, pid, and registry. This chapter describes how to run programs and access the file system from Tcl. These commands were designed for UNIX. In Tcl 7.5 they were implemented in the Tcl ports to Windows and Macintosh. There are facilities for naming files and manipulating file names in a platform-independent way, so you can write scripts that are portable across systems. These capabilities enable your Tcl script to be a general-purpose glue that assembles other programs into a tool that is customized for your needs.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

Running Programs with exec

The exec command runs programs from your Tcl script.[*] For example:
[*] Unlike

other UNIX shell exec commands, the Tcl exec does not replace the current process with the new one. Instead, the Tcl library forks first and executes the program as a child process.

set d [exec date] The standard output of the program is returned as the value of the exec command. However, if the program writes to its standard error channel or exits with a nonzero status code, then exec raises an error. If you do not care about the exit status, or you use a program that insists on writing to standard error, then you can use catch to mask the errors: catch {exec program arg arg} result The exec command supports a full set of I/O redirection and pipeline syntax. Each process normally has three I/O channels associated with it: standard input, standard output, and standard error. With I/O redirection, you can divert these I/O channels to files or to I/O channels you have opened with the Tcl open command. A pipeline is a chain of processes that have the standard output of one command hooked up to the standard input of the next command in the pipeline. Any number of programs can be linked together into a pipeline. Example 9-1 Using exec on a process pipeline. set n [exec sort < /etc/passwd | uniq | wc -l 2> /dev/null] Example 9-1 uses exec to run three programs in a pipeline. The first program is sort, which takes its input from the file /etc/passwd. The output of sort is piped into uniq, which suppresses duplicate lines. The output of uniq is piped into wc, which counts the lines. The error output of the command is

diverted to the null device to suppress any error messages. Table 9-1 provides a summary of the syntax understood by the exec command.

Table 9-1. Summary of the exec syntax for I/O redirection.

-keepnewline | |& < fileName <@ fileId << value > fileName 2> fileName >& fileName >> fileName 2>> fileName >>& fileName >@ fileId 2>@ fileId >&@ fileId &

(First argument.) Do not discard trailing newline from the result. Pipes standard output from one process into another. Pipes both standard output and standard error output. Takes input from the named file. Takes input from the I/O channel identified by fileId. Takes input from the given value. Overwrites fileName with standard output. Overwrites fileName with standard error output. Overwrites fileName with both standard error and standard out. Appends standard output to the named file. Appends standard error to the named file. Appends both standard error and standard output to the named file. Directs standard output to the I/O channel identified by fileId. Directs standard error to the I/O channel identified by fileId. Directs both standard error and standard output to the I/O channel. As the last argument, indicates pipeline should run in background.

A trailing & causes the program to run in the background. In this case, the process identifier is returned by the exec command. Otherwise, the exec command blocks during execution of the program, and the standard output of the program is the return value of exec. The trailing newline in the output is trimmed off, unless you specify -keepnewline as the first argument to exec. If you look closely at the I/O redirection syntax, you'll see that it is built up from a few basic building blocks. The basic idea is that | stands for pipeline, > for output, and < for input. The standard error is joined to the standard output by &. Standard error is diverted separately by using 2>. You can use your own I/O channels by using @.

The auto_noexec Variable

The Tcl shell programs are set up during interactive use to attempt to execute unknown Tcl commands as programs. For example, you can get a directory listing by typing: ls

instead of: exec ls This is handy if you are using the Tcl interpreter as a general shell. It can also cause unexpected behavior when you are just playing around. To turn this off, define the auto_noexec variable: set auto_noexec anything

Limitations of exec on Windows

Windows 3.1 has an unfortunate combination of special cases that stem from console-mode programs, 16-bit programs, and 32-bit programs. In addition, pipes are really just simulated by writing output from one process to a temporary file and then having the next process read from that file. If exec or a process pipeline fails, it is because of a fundamental limitation of Windows. The good news is that Windows 95 and Windows NT cleaned up most of the problems with exec. Windows NT 4.0 is the most robust. Tcl 8.0p2 was the last release to officially support Windows 3.1. That release includes Tcl1680.dll, which is necessary to work with the win32s subsystem. If you copy that file into the same directory as the other Tcl DLLs, you may be able to use later releases of Tcl on Windows 3.1. However, there is no guarantee this trick will continue to work. AppleScript on Macintosh The exec command is not provided on the Macintosh. Tcl ships with an AppleScript extension that lets you control other Macintosh applications. You can find documentation in the AppleScript.html that goes with the distribution. You must use package require to load the AppleScript command: package require Tclapplescript AppleScript junk => bad option "junk": must be compile, decompile, delete, execute, info, load, run, or store.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

The file Command

The file command provides several ways to check the status of files in the file system. For example, you can find out if a file exists, what type of file it is, and other file attributes. There are facilities for manipulating files in a platform-independent manner. Table 9-2 provides a summary of the various forms of the file command. They are described in more detail later. Note that the split, join, and pathtype operations were added in Tcl 7.5. The copy, delete, mkdir, and rename operations were added in Tcl 7.6. The attributes operation was added in Tcl 8.0

Table 9-2. The file command options.

file atime name file attributes name ? option? ?value? ... file copy ?-force? source destination file delete ?-force? name file dirname name file executable name file exists name file extension name file isdirectory name file isfile name file join path path... file lstat name var

Returns access time as a decimal string. Queries or sets file attributes. (Tcl 8.0) Copies file source to file destination. The source and destination can be directories. (Tcl 7.6) Deletes the named file. (Tcl 7.6) Returns parent directory of file name. Returns 1 if name has execute permission, else 0. Returns 1 if name exists, else 0. Returns the part of name from the last dot (i.e., .) to the end. The dot is included in the return value. Returns 1 if name is a directory, else 0. Returns 1 if name is not a directory, symbolic link, or device, else 0. Joins pathname components into a new pathname. (Tcl 7.5) Places attributes of the link name into var.

file mkdir name file mtime name file nativename name file owned name file pathtype name file readable name file readlink name file rename ?-force? old new file rootname name file size name file split name file stat name var file tail name file type name file writable name

Creates directory name. (Tcl 7.6) Returns modify time of name as a decimal string. Returns the platform-native version of name. (Tk 8.0). Returns 1 if current user owns the file name, else 0.
relative, absolute,

or driverelative. (Tcl 7.5)

Returns 1 if name has read permission, else 0. Returns the contents of the symbolic link name. Changes the name of old to new. (Tcl 7.6) Returns all but the extension of name (i.e., up to but not including the last . in name). Returns the number of bytes in name. Splits name into its pathname components. (Tcl 7.5) Places attributes of name into array var. The elements defined for var are listed in Table 9-3. Returns the last pathname component of name. Returns type identifier, which is one of: file, directory, characterSpecial, blockSpecial , fifo, link, or socket. Returns 1 if name has write permission, else 0.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

Cross-Platform File Naming

Files are named differently on UNIX, Windows, and Macintosh. UNIX separates file name components with a forward slash (/), Macintosh separates components with a colon (:), and Windows separates components with a backslash (\). In addition, the way that absolute and relative names are distinguished is different. For example, these are absolute pathnames for the Tcl script library (i.e., $tcl_library ) on Macintosh, Windows, and UNIX, respectively: Disk:System Folder:Extensions:Tool Command Language:tcl7.6 c:\Program Files\Tcl\lib\Tcl7.6 /usr/local/tcl/lib/tcl7.6 The good news is that Tcl provides operations that let you deal with file pathnames in a platformindependent manner. The file operations described in this chapter allow either native format or the UNIX naming convention. The backslash used in Windows pathnames is especially awkward because the backslash is special to Tcl. Happily, you can use forward slashes instead: c:/Program Files/Tcl/lib/Tcl7.6 There are some ambiguous cases that can be specified only with native pathnames. On my Macintosh, Tcl and Tk are installed in a directory that has a slash in it. You can name it only with the native Macintosh name: Disk:Applications:Tcl/Tk 4.2 Another construct to watch out for is a leading // in a file name. This is the Windows syntax for network names that reference files on other computers. You can avoid accidentally constructing a network name by using the file join command described next. Of course, you can use network names to access remote files. If you must communicate with external programs, you may need to construct a file name in the native

syntax for the current platform. You can construct these names with file join described later. You can also convert a UNIX-like name to a native name with file nativename. Several of the file operations operate on pathnames as opposed to returning information about the file itself. You can use the dirname, extension, join, pathtype, rootname, split, and tail operations on any string; there is no requirement that the pathnames refer to an existing file.

Building up Pathnames: file join

You can get into trouble if you try to construct file names by simply joining components with a slash. If part of the name is in native format, joining things with slashes will result in incorrect pathnames on Macintosh and Windows. The same problem arises when you accept user input. The user is likely to provide file names in native format. For example, this construct will not create a valid pathname on the Macintosh because $tcl_library is in native format: set file $tcl_library/init.tcl Use file join to construct file names.

The platform-independent way to construct file names is with file join. The following command returns the name of the init.tcl file in native format: set file [file join $tcl_library init.tcl] The file join operation can join any number of pathname components. In addition, it has the feature that an absolute pathname overrides any previous components. For example (on UNIX), /b/c is an absolute pathname, so it overrides any paths that come before it in the arguments to file join: file join a b/c d => a/b/c/d file join a /b/c d => /b/c/d On Macintosh, a relative pathname starts with a colon, and an absolute pathname does not. To specify an absolute path, you put a trailing colon on the first component so that it is interpreted as a volume specifier. These relative components are joined into a relative pathname: file join a :b:c d

=> :a:b:c:d In the next case, b:c is an absolute pathname with b: as the volume specifier. The absolute name overrides the previous relative name: file join a b:c d => b:c:d The file join operation converts UNIX-style pathnames to native format. For example, on Macintosh you get this: file join /usr/local/lib => usr:local:lib

Chopping Pathnames: split, dirname, tail

The file split command divides a pathname into components. It is the inverse of file join. The split operation detects automatically if the input is in native or UNIX format. The results of file split may contain some syntax to help resolve ambiguous cases when the results are passed back to file join. For example, on Macintosh a UNIX-style pathname is split on slash separators. The Macintosh syntax for a volume specifier ( Disk:) is returned on the leading component: file split "/Disk/System Folder/Extensions" => Disk: {System Folder} Extensions A common reason to split up pathnames is to divide a pathname into the directory part and the file part. This task is handled directly by the dirname and tail operations. The dirname operation returns the parent directory of a pathname, while tail returns the trailing component of the pathname: file dirname /a/b/c => /a/b file tail /a/b/c => c For a pathname with a single component, the dirname option returns ".", on UNIX and Windows, or ":" on Macintosh. This is the name of the current directory. The extension and root options are also complementary. The extension option returns everything from the last period in the name to the end (i.e., the file suffix including the period.) The root option returns everything up to, but not including, the last period in the pathname:

file root /a/b.c => /a/b file extension /a/b.c => .c

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

Manipulating Files and Directories

Tcl 7.6 added file operations to copy files, delete files, rename files, and create directories. In earlier versions it was necessary to exec other programs to do these things, except on Macintosh, where cp, rm , mv , mkdir, and rmdir were built in. These commands are no longer supported on the Macintosh. Your scripts should use the file command operations described below to manipulate files in a platform-independent way. File name patterns are not directly supported by the file operations. Instead, you can use the glob command described on page 115 to get a list of file names that match a pattern.

Copying Files
The file copy operation copies files and directories. The following example copies file1 to file2. If file2 already exists, the operation raises an error unless the -force option is specified: file copy ?-force? file1 file2 Several files can be copied into a destination directory. The names of the source files are preserved. The -force option indicates that files under directory can be replaced: file copy ?-force? file1 file2 ... directory Directories can be recursively copied. The -force option indicates that files under dir2 can be replaced: file copy ?-force? dir1 dir2

Creating Directories

The file mkdir operation creates one or more directories: file mkdir dir dir ... It is not an error if the directory already exists. Furthermore, intermediate directories are created if needed. This means that you can always make sure a directory exists with a single mkdir operation. Suppose /tmp has no subdirectories at all. The following command creates /tmp/sub1 and /tmp/sub1/sub2: file mkdir /tmp/sub1/sub2 The -force option is not understood by file mkdir, so the following command -accidentally creates a folder named -force, as well as one named oops. file mkdir -force oops

Deleting Files
The file delete operation deletes files and directories. It is not an error if the files do not exist. A non-empty directory is not deleted unless the -force option is specified, in which case it is recursively deleted: file delete ?-force? name name ... To delete a file or directory named -force, you must specify a nonexistent file before the -force to prevent it from being interpreted as a flag (-force -force won't work): file delete xyzzy -force

Renaming Files and Directories

The file rename operation changes a file's name from old to new. The -force option causes new to be replaced if it already exists. file rename ?-force? old new Using file rename is the best way to update an existing file. First, generate the new version of the file in a temporary file. Then, use file rename to replace the old version with the new version. This

ensures that any other programs that access the file will not see the new version until it is complete.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

File Attributes
There are several file operations that return specific file attributes: atime, executable, exists, isdirectory, isfile, mtime, owned, readable, readlink, size and type. Refer to Table 9-2 on page 102 for their function. The following command uses file mtime to compare the modify times of two files. If you have ever resorted to piping the results of ls -l into awk in order to derive this information in other shell scripts, you will appreciate this example: Example 9-2 Comparing file modify times. proc newer { file1 file2 } { if ![file exists $file2] { return 1 } else { # Assume file1 exists expr [file mtime $file1] > [file mtime $file2] } } The stat and lstat operations return a collection of file attributes. They take a third argument that is the name of an array variable, and they initialize that array with elements that contain the file attributes. If the file is a symbolic link, then the lstat operation returns information about the link itself and the stat operation returns information about the target of the link. The array elements are listed in Table 9-3. All the element values are decimal strings, except for type, which can have the values returned by the type option. The element names are based on the UNIX stat system call. Use the file attributes command described later to get other platform-specific attributes:

Table 9-3. Array elements defined by file stat.

atime ctime dev gid ino mode mtime nlink size type uid

The last access time, in seconds. The last change time (not the create time), in seconds. The device identifier, an integer. The group owner, an integer. The file number (i.e., inode number), an integer. The permission bits. The last modify time, in seconds. The number of links, or directory references, to the file. The number of bytes in the file.
file, directory, characterSpecial, blockSpecial , fifo, link,

or socket.

The owner's user ID, an integer.

Example 9-3 uses the device (dev) and inode (ino) attributes of a file to determine whether two pathnames reference the same file. The attributes are UNIX specific; they are not well defined on Windows and Macintosh. Example 9-3 Determining whether pathnames reference the same file. proc fileeq { path1 path2 } { file stat $path1 stat1 file stat $path2 stat2 expr $stat1(ino) == $stat2(ino) && \ $stat1(dev) == $stat2(dev) } The file attributes operation was added in Tcl 8.0 to provide access to platform-specific attributes. The attributes operation lets you set and query attributes. The interface uses option-value pairs. With no options, all the current values are returned. file attributes book.doc => -creator FRAM -hidden 0 -readonly 0 -type MAKR These Macintosh attributes are explained in Table 9-4. The four-character type codes used on Macintosh are illustrated on page 516. With a single option, only that value is returned: file attributes book.doc -readonly => 0 The attributes are modified by specifying one or more optionvalue pairs. Setting attributes can raise

an error if you do not have the right permissions: file attributes book.doc -readonly 1 -hidden 0

Table 9-4. Platform-specific file attributes.

-permissions mode -group ID -owner ID -archive bool -hidden bool -readonly bool -system bool -creator type -type type

File permission bits. mode is a number with bits defined by the chmod system call. (UNIX) The group owner of the file. (UNIX) The owner of the file. (UNIX) The archive bit, which is set by backup programs. (Windows) If set, then the file does not appear in listings. (Windows, Macintosh) If set, then you cannot write the file. (Windows, Macintosh) If set, then you cannot remove the file. (Windows)
type is 4-character code of creating application. (Macintosh) type is 4-character type code. (Macintosh)

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

Input/Output Command Summary

The following sections describe how to open, read, and write files. The basic model is that you open a file, read or write it, then close the file. Network sockets also use the commands described here. Socket programming is discussed in Chapter 17, and more advanced event-driven I/O is described in Chapter 16. Table 9-5 lists the basic commands associated with file I/O:

Table 9-5. Tcl commands used for file access.

open what ?access? ?permissions? puts ?-nonewline? ?channel? string gets channel ?varname? read channel ?numBytes? read -nonewline channel tell channel seek channel offset ?origin? eof channel flush channel close channel

Returns channel ID for a file or pipeline. Writes a string. Reads a line. Reads numBytes bytes, or all data. Reads all bytes and discard the last \n. Returns the seek offset. Sets the seek offset. origin is one of start, current, or end. Queries end-of-file status. Writes buffers of a channel. Closes an I/O channel.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

Opening Files for I/O

The open command sets up an I/O channel to either a file or a pipeline of processes. The return value of open is an identifier for the I/O channel. Store the result of open in a variable and use the variable as you used the stdout, stdin, and stderr identifiers in the examples so far. The basic syntax is: open what ?access? ?permissions? The what argument is either a file name or a pipeline specification similar to that used by the exec command. The access argument can take two forms, either a short character sequence that is compatible with the fopen library routine, or a list of POSIX access flags. Table 9-6 summarizes the first form, while Table 9-7 summarizes the POSIX flags. If access is not specified, it defaults to read. Example 9-4 Opening a file for writing. set fileId [open /tmp/foo w 0600] puts $fileId "Hello, foo!" close $fileId

Table 9-6. Summary of the open access arguments.

r r+ w w+ a a+

Opens for reading. The file must exist. Opens for reading and writing. The file must exist. Opens for writing. Truncate if it exists. Create if it does not exist. Opens for reading and writing. Truncate or create. Opens for writing. Data is appended to the file. Opens for reading and writing. Data is appended.

Table 9-7. Summary of POSIX flags for the access argument.

RDONLY WRONLY RDWR APPEND CREAT EXCL NOCTTY NONBLOCK TRUNC

Opens for reading. Opens for writing. Opens for reading and writing. Opens for append. Creates the file if it does not exist. If CREAT is also specified, then the file cannot already exist. Prevents terminal devices from becoming the controlling terminal. Does not block during the open. Truncates the file if it exists.

The permissions argument is a value used for the permission bits on a newly created file. UNIX uses three bits each for the owner, group, and everyone else. The bits specify read, write, and execute permission. These bits are usually specified with an octal number, which has a leading zero, so that there is one octal digit for each set of bits. The default permission bits are 0666, which grant read/write access to everybody. Example 9-4 specifies 0600 so that the file is readable and writable only by the owner. 0775 would grant read, write, and execute permissions to the owner and group, and read and execute permissions to everyone else. You can set other special properties with additional high-order bits. Consult the UNIX manual page on chmod command for more details. The following example illustrates how to use a list of POSIX access flags to open a file for reading and writing, creating it if needed, and not truncating it. This is something you cannot do with the simpler form of the access argument: set fileId [open /tmp/bar {RDWR CREAT}] Catch errors from open.

In general, you should check for errors when opening files. The following example illustrates a catch phrase used to open files. Recall that catch returns 1 if it catches an error; otherwise, it returns zero. It treats its second argument as the name of a variable. In the error case, it puts the error message into the variable. In the normal case, it puts the result of the command into the variable: Example 9-5 A more careful use of open.

if [catch {open /tmp/data r}fileId] { puts stderr "Cannot open /tmp/data: $fileId" } else { # Read and process the file, then... close $fileId }

Opening a Process Pipeline

You can open a process pipeline by specifying the pipe character, |, as the first character of the first argument. The remainder of the pipeline specification is interpreted just as with the exec command, including input and output redirection. The second argument determines which end of the pipeline open returns. The following example runs the UNIX sort program on the password file, and it uses the split command to separate the output lines into list elements: Example 9-6 Opening a process pipeline. set input [open "|sort /etc/passwd" r] set contents [split [read $input] \n] close $input You can open a pipeline for both read and write by specifying the r+ access mode. In this case, you need to worry about buffering. After a puts, the data may still be in a buffer in the Tcl library. Use the flush command to force the data out to the spawned processes before you try to read any output from the pipeline. You can also use the fconfigure command described on page 223 to force line buffering. Remember that read-write pipes will not work at all with Windows 3.1 because pipes are simulated with files. Event-driven I/O is also very useful with pipes. It means you can do other processing while the pipeline executes, and simply respond when the pipe generates data. This is described in Chapter 16.

Expect
If you are trying to do sophisticated things with an external application, you will find that the Expect extension provides a much more powerful interface than a process pipeline. Expect adds Tcl commands that are used to control interactive applications. It is extremely useful for automating FTP, Telnet, and programs under test. It comes as a Tcl shell named expect, and it is also an extension that you can dynamically load into other Tcl shells. It was created by Don Libes at the National Institute of Standards and Technology (NIST). Expect is described in Exploring Expect (Libes, O'Reilly & Associates, Inc., 1995). You can find the software on the CD and on the web at: http://expect.nist.gov/

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

Reading and Writing

The standard I/O channels are already open for you. There is a standard input channel, a standard output channel, and a standard error output channel. These channels are identified by stdin, stdout, and stderr, respectively. Other I/O channels are returned by the open command, and by the socket command described on page 228. There may be cases when the standard I/O channels are not available. Windows has no standard error channel. Some UNIX window managers close the standard I/O channels when you start programs from window manager menus. You can also close the standard I/O channels with close.

The puts and gets Commands

The puts command writes a string and a newline to the output channel. There are a couple of details about the puts command that we have not yet used. It takes a -nonewline argument that prevents the newline character that is normally appended to the output channel. This is used in the prompt example below. The second feature is that the channel identifier is optional, defaulting to stdout if not specified. Note that you must use flush to force output of a partial line. This is illustrated in Example 9-7. Example 9-7 Prompting for input. puts -nonewline "Enter value: " flush stdout ;# Necessary to get partial line output set answer [gets stdin] The gets command reads a line of input, and it has two forms. In the previous example, with just a single argument, gets returns the line read from the specified I/O channel. It discards the trailing newline from the return value. If end of file is reached, an empty string is returned. You must use the eof command to tell the difference between a blank line and end-of-file. eof returns 1 if there is end of file. Given a second varName argument, gets stores the line into a named variable and returns the number of bytes read. It discards the trailing newline, which is not counted. A -1 is returned if the channel has reached the end of file.

Example 9-8 A read loop using gets. while {[gets $channel line] >= 0} { # Process line } close $channel

The read Command

The read command reads blocks of data, and this capability is often more efficient. There are two forms for read: You can specify the -nonewline argument or the numBytes argument, but not both. Without numBytes, the whole file (or what is left in the I/O channel) is read and returned. The nonewline argument causes the trailing newline to be discarded. Given a byte count argument, read returns that amount, or less if there is not enough data in the channel. The trailing newline is not discarded in this case. Example 9-9 A read loop using read and split. foreach line [split [read $channel] \n] { # Process line } close $channel For moderate-sized files, it is about 10 percent faster to loop over the lines in a file using the read loop in the second example. In this case, read returns the whole file, and split chops the file into list elements, one for each line. For small files (less than 1K) it doesn't really matter. For large files (megabytes) you might induce paging with this approach.

Platform-Specific End of Line Characters

Tcl automatically detects different end of line conventions. On UNIX, text lines are ended with a newline character (\n). On Macintosh, they are terminated with a carriage return (\r). On Windows, they are terminated with a carriage return, newline sequence (\r\n). Tcl accepts any of these, and the line terminator can even change within a file. All these different conventions are converted to the UNIX style so that once read, text lines are always terminated with a newline character (\n). Both the read and gets commands do this conversion. During output, text lines are generated in the platform-native format. The automatic handling of line formats means that it is easy to convert a file to native format. You just need to read it in and write it out: puts -nonewline $out [read $in]

To suppress conversions, use the fconfigure command, which is described in more detail on page 223. Example 9-10 demonstrates a File_Copy procedure that translates files to native format. It is complicated because it handles directories: Example 9-10 Copy a file and translate to native format. proc File_Copy {src dest} { if [file isdirectory $src] { file mkdir $dest foreach f [glob -nocomplain [file join $src *]] { File_Copy $f [file join $dest [file tail $f]] } return } if [file isdirectory $dest] { set dest [file join $dest [file tail $src]] } set in [open $src] set out [open $dest w] puts -nonewline $out [read $in] close $out ; close $in }

Random Access I/O

The seek and tell commands provide random access to I/O channels. Each channel has a current position called the seek offset. Each read or write operation updates the seek offset by the number of bytes transferred. The current value of the offset is returned by the tell command. The seek command sets the seek offset by an amount, which can be positive or negative, from an origin which is either start, current, or end.

Closing I/O channels

The close command is just as important as the others because it frees operating system resources associated with the I/O channel. If you forget to close a channel, it will be closed when your process exits. However, if you have a long-running program, like a Tk script, you might exhaust some operating system resources if you forget to close your I/O channels. The close command can raise an error.

If the channel was a process pipeline and any of the processes wrote to their standard error channel, then Tcl believes this is an error. The error is raised when the channel to the pipeline is finally closed. Similarly, if any of the processes in the pipeline exit with a nonzero status, close raises an error.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

The Current Directory ?cd and pwd

Every process has a current directory that is used as the starting point when resolving a relative pathname. The pwd command returns the current directory, and the cd command changes the current directory. Example 9-11 uses these commands.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

Matching File Names with glob

The glob command expands a pattern into the set of matching file names. The general form of the glob command is: glob ?flags? pattern ?pattern? ... The pattern syntax is similar to the string match patterns:
* ?

matches zero or more characters. matches a single character. matches a set of characters. or c.

[abc]

{a,b,c} matches any of a, b,

All other characters must match themselves. The -nocomplain flag causes glob to return an empty list if no files match the pattern. Otherwise, glob raises an error if no files match. The -- flag must be used if the pattern begins with a -. Unlike the glob matching in csh, the Tcl glob command matches only the names of existing files. In csh, the {a,b} construct can match nonexistent names. In addition, the results of glob are not sorted. Use the lsort command to sort its result if you find it important. Example 9-11 shows the FindFile procedure, which traverses the file system hierarchy using recursion. At each iteration it saves its current directory and then attempts to change to the next subdirectory. A catch guards against bogus names. The glob command matches file names: Example 9-11 Finding a file by name.

proc FindFile { startDir namePat } { set pwd [pwd] if [catch {cd $startDir}err] { puts stderr $err return } foreach match [glob -nocomplain -- $namePat]{ puts stdout [file join $startDir $match] } foreach file [glob -nocomplain *] { if [file isdirectory $file] { FindFile [file join $startDir $file] $namePat } } cd $pwd }

Expanding Tilde in File Names

The glob command also expands a leading tilde (~) in filenames. There are two cases:
~/

expands to the current user's home directory. expands to the home directory of user.

~user

If you have a file that starts with a literal tilde, you can avoid the tilde expansion by adding a leading ./ (e.g., ./~foobar).

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

The exit and pid Commands

The exit command terminates your script. Note that exit causes termination of the whole process that was running the script. If you supply an integer-valued argument to exit, then that becomes the exit status of the process. The pid command returns the process ID of the current process. This can be useful as the seed for a random number generator because it changes each time you run your script. It is also common to embed the process ID in the name of temporary files. You can also find out the process IDs associated with a process pipeline with pid: set pipe [open "|command"] set pids [pid $pipe] There is no built-in mechanism to control processes in Tcl. On UNIX systems you can exec the kill program to terminate a process: exec kill $pid

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

Environment Variables
Environment variables are a collection of string-valued variables associated with each process. The process's environment variables are available through the global array env. The name of the environment variable is the index, (e.g., env(PATH)), and the array element contains the current value of the environment variable. If assignments are made to env, they result in changes to the corresponding environment variable. Environment variables are inherited by child processes, so programs run with the exec command inherit the environment of the Tcl script. The following example prints the values of environment variables. Example 9-12 Printing environment variable values. proc printenv { args } { global env set maxl 0 if {[llength $args] == 0} { set args [lsort [array names env]] } foreach x $args { if {[string length $x] > $maxl} { set maxl [string length $x] } } incr maxl 2 foreach x $args { puts stdout [format "%*s = %s" $maxl $x $env($x)] } } printenv USER SHELL TERM => USER = welch SHELL = /bin/csh TERM = tx

Note: Environment variables can be initialized for Macintosh applications by editing a resource of type STR# whose name is Tcl Environment Variables. This resource is part of the tclsh and wish applications. Follow the directions on page 28 for using ResEdit. The format of the resource values is NAME=VALUE.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 9. Working with Files and Programs

The registry Command

Windows uses the registry to store various system configuration information. The Windows tool to browse and edit the registry is called regedit. Tcl provides a registry command. It is a loadable package that you must load by using: package require registry The registry structure has keys, value names, and typed data. The value names are stored under a key, and each value name has data associated with it. The keys are organized into a hierarchical naming system, so another way to think of the value names is as an extra level in the hierarchy. The main point is that you need to specify both a key name and a value name in order to get something out of the registry. The key names have one of the following formats: \\hostname\rootname\keypath rootname\keypath rootname The rootname is one of HKEY_LOCAL_MACHINE, HKEY_PERFORMANCE_DATA, HKEY_USERS, HKEY_CLASSES_ROOT , HKEY_CURRENT_USER , HKEY_CURRENT_CONFIG, or HKEY_DYN_DATA. Tables 9-8 and 9-9 summarize the registry command and data types:

Table 9-8. The registry command.

registry delete key ? valueName? registry get key valueName registry keys key ?pat? registry set key registry set key valueName data ?type? registry type key valueName registry values key ?pat?

Deletes the key and the named value, or it deletes all values under the key if valueName is not specified. Returns the value associated with valueName under key. Returns the list of keys or value names under key that match pat, which is a string match pattern. Creates key. Creates valueName under key with value data of the given type. Types are listed in Table 9-9. Returns the type of valueName under key. Returns the names of the values stored under key that match pat, which is a string match pattern. Table 9-9. The registry data types.

binary none expand_sz dword dword_big_endian link multi_sz resource_list

Arbitrary binary data. Arbitrary binary data. A string that contains references to environment variables with the %VARNAME% syntax. A 32-bit integer. A 32-bit integer in the other byte order. It is represented in Tcl as a decimal string. A symbolic link. An array of strings, which are represented as a Tcl list. A device driver resource list.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

set cmd {puts stdout "Hello, World!"} => puts stdout "Hello, World!" # sometime later... eval $cmd => Hello, World! In this case, the value of cmd is passed to Tcl. All the standard grouping and substitution are done again on the value, which is a puts command. However, suppose that part of the command is stored in a variable, but that variable will not be defined at the time eval is used. We can artificially create this situation like this: set string "Hello, World!" set cmd {puts stdout $string} => puts stdout $string unset string eval $cmd => can't read "string": no such variable In this case, the command contains $string. When this is processed by eval, the interpreter looks for the current value of string, which is undefined. This example is contrived, but the same problem occurs if string is a local variable, and cmd will be evaluated later in the global scope. A common mistake is to use double quotes to group the command. That will let $string be substituted now. However, this works only if string has a simple value, but it fails if the value of string contains spaces or other Tcl special characters: set cmd "puts stdout $string" => puts stdout Hello, World! eval $cmd => bad argument "World!": should be "nonewline" The problem is that we have lost some important structure. The identity of $string as a single argument gets lost in the second round of parsing by eval. The solution to this problem is to construct the command using list, as shown in the following example: Example 10-1 Using list to construct commands. set string "Hello, World!" set cmd [list puts stdout $string] => puts stdout {Hello, World!} unset string eval $cmd => Hello, World!

The trick is that list has formed a list containing three elements: puts, stdout, and the value of string. The substitution of $string occurs before list is called, and list takes care of grouping that value for us. In contrast, using double quotes is equivalent to: set cmd [concat puts stdout $string] Double quotes lose list structure.

The problem here is that concat does not preserve list structure. The main lesson is that you should use list to construct commands if they contain variable values or command results that must be substituted now. If you use double quotes, the values are substituted but you lose proper command structure. If you use curly braces, then values are not substituted until later, which may not be in the right context.

Commands That Concatenate Their Arguments

The uplevel, after and send commands concatenate their arguments into a command and execute it later in a different context. The uplevel command is described on page 130, after is described on page 218, and send is described on page 560. Whenever I discover such a command, I put it on my danger list and make sure I explicitly form a single command argument with list instead of letting the command concat items for me. Get in the habit now: after 100 [list doCmd $param1 $param2] send $interp [list doCmd $param1 $param2];# Safe! The danger here is that concat and list can result in the same thing, so you can be led down the rosy garden path only to get errors later when values change. The two previous examples always work. The next two work only if param1 and param2 have values that are single list elements: after 100 doCmd $param1 $param2 send $interp doCmd $param1 $param2;# Unsafe! If you use other Tcl extensions that provide eval-like functionality, carefully check their documentation to see whether they contain commands that concat their arguments into a command. For example, Tcl-DP, which provides a network version of send, dp_send, also uses concat.

Commands That Use Callbacks

The general strategy of passing out a command or script to call later is a flexible way to assemble different parts of an application, and it is widely used by Tcl commands. Examples include commands that are called when users click on Tk buttons, commands that are called when I/O channels have data ready, or commands that are called when clients connect to network servers. It is also easy to write your own procedures or C extensions that accept scripts and call them later in response to some event. These other callback situations may not appear to have the "concat problem" because they take a single script argument. However, as soon as you use double quotes to group that argument, you have created the concat problem all over again. So, all the caveats about using list to construct these commands still apply.

Command Prefix Callbacks

There is a variation on command callbacks called a command prefix. In this case, the command is given additional arguments when it is invoked. In other words, you provide only part of the command, the command prefix, and the module that invokes the callback adds additional arguments before using eval to invoke the command. For example, when you create a network server, you supply a procedure that is called when a client makes a connection. That procedure is called with three additional arguments that indicate the client's socket, IP address, and port number. This is described in more detail on page 227. The tricky thing is that you can define your callback procedure to take four (or more) arguments. In this case you specify some of the parameters when you define the callback, and then the socket subsystem specifies the remaining arguments when it makes the callback. The following command creates the server side of a socket: set virtualhost www.beedub.com socket -server [list Accept $virtualhost] 8080 However, you define the Accept procedure like this: proc Accept {myname sock ipaddr port} { ... } The myname parameter is set when you construct the command prefix. The remaining parameters are set when the callback is invoked. The use of list in this example is not strictly necessary because "we know" that virtualhost will always be a single list element. However, using list is just a good habit when forming callbacks, so I always write the code this way. There are many other examples of callback arguments that are really command prefixes. Some of these include the scrolling callbacks between Tk scrollbars and their widgets, the command aliases used with Safe Tcl, the sorting functions in lsort, and the completion callback used with fcopy. Example 13-6 on page 181 shows how to use eval to make callbacks from Tcl procedures.

Constructing Procedures Dynamically

The previous examples have all focused on creating single commands by using list operations.

Suppose you want to create a whole procedure dynamically. Unfortunately, this can be particularly awkward because a procedure body is not a simple list. Instead, it is a sequence of commands that are each lists, but they are separated by newlines or semicolons. In turn, some of those commands may be loops and if commands that have their own command bodies. To further compound the problem, you typically have two kinds of variables in the procedure body: some that are to be used as values when constructing the body, and some that are to be used later when executing the procedure. The result can be very messy. The main trick to this problem is to use either format or regsub to process a template for your dynamically generated procedure. If you use format, then you can put %s into your templates where you want to insert values. You may find the positional notation of the format string (e.g., %1$s and %2$s) useful if you need to repeat a value in several places within your procedure body. The following example is a procedure that generates a new version of other procedures. The new version includes code that counts the number of times the procedure was called and measures the time it takes to run: Example 10-2 Generating procedures dynamically with a template. proc TraceGen {procName} { rename $procName $procName-orig set arglist {} foreach arg [info args $procName-orig] { append arglist "\$$arg " } proc $procName [info args $procName-orig] [format { global _trace_count _trace_msec incr _trace_count(%1$s) incr _trace_msec(%1$s) [lindex [time { set result [%1$s-orig %2$s] } 1] 0] return $result } $procName $arglist] } Suppose that we have a trivial procedure foo: proc foo {x y} { return [expr $x * $y] } If you run TraceGen on it and look at the results, you see this: TraceGen foo info body foo => global _trace_count _trace_msec incr _trace_count(foo)

incr _trace_msec(foo) [lindex [time { set result [foo-orig $x $y] }1] 0] return $result

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 10. Quoting Issues and Eval

Exploiting the concat inside eval

The previous section warns about the danger of concatenation when forming commands. However, there are times when concatenation is done for good reason. This section illustrates cases where the concat done by eval is useful in assembling a command by concatenating multiple lists into one list. A concat is done internally by eval when it gets more than one argument: eval list1 list2 list3 ... The effect of concat is to join all the lists into one list; a new level of list structure is not added. This is useful if the lists are fragments of a command. It is common to use this form of eval with the args construct in procedures. Use the args parameter to pass optional arguments through to another command. Invoke the other command with eval, and the values in $args get concatenated onto the command properly. The special args parameter is illustrated in Example 7-2 on page 82.

Using eval in a Wrapper Procedure.

Here, we illustrate the use of eval and $args with a simple Tk example. In Tk, the button command creates a button in the user interface. The button command can take many arguments, and commonly you simply specify the text of the button and the Tcl command that is executed when the user clicks on the button: button .foo -text Foo -command foo After a button is created, it is made visible by packing it into the display. The pack command can also take many arguments to control screen placement. Here, we just specify a side and let the packer take care of the rest of the details: pack .foo -side left

Even though there are only two Tcl commands to create a user interface button, we will write a procedure that replaces the two commands with one. Our first version might be: proc PackedButton {name txt cmd} { button $name -text $txt -command $cmd pack $name -side left } This is not a very flexible procedure. The main problem is that it hides the full power of the Tk button command, which can really take about 20 widget configuration options, such as -background, cursor, -relief , and more. They are listed on page 391. For example, you can easily make a red button like this: button .foo -text Foo -command foo -background red A better version of PackedButton uses args to pass through extra configuration options to the button command. The args parameter is a list of all the extra arguments passed to the Tcl procedure. My first attempt to use $args looked like this, but it was not correct: proc PackedButton {name txt cmd args} { button $name -text $txt -command $cmd $args pack $name -side left } PackedButton .foo "Hello, World!" {exit} -background red => unknown option "-background red" The problem is that $args is a list value, and button gets the whole list as a single argument. Instead, button needs to get the elements of $args as individual arguments. Use eval with $args

In this case, you can use eval because it concatenates its arguments to form a single list before evaluation. The single list is, by definition, the same as a single Tcl command, so the button command parses correctly. Here we give eval two lists, which it joins into one command: eval {button $name -text $txt -command $cmd} $args The use of the braces in this command is discussed in more detail below. We also generalize our

procedure to take some options to the pack command. This argument, pack, must be a list of packing options. The final version of PackedButton is shown in Example 10-3: Example 10-3 Using eval with $args. # PackedButton creates and packs a button. proc PackedButton {path txt cmd {pack {-side right}} args} { eval {button $path -text $txt -command $cmd} $args eval {pack $path} $pack } In PackedButton, both pack and args are list-valued parameters that are used as parts of a command. The internal concat done by eval is perfect for this situation. The simplest call to PackedButton is: PackedButton .new "New" { New } The quotes and curly braces are redundant in this case but are retained to convey some type information. The quotes imply a string label, and the braces imply a command. The pack argument takes on its default value, and the args variable is an empty list. The two commands executed by PackedButton are: button .new -text New -command New pack .new -side right creates a horizontal stack of buttons by default. The packing can be controlled with a packing specification:
PackedButton

PackedButton .save "Save" { Save $file } {-side left} The two commands executed by PackedButton are: button .new -text Save -command { Save $file } pack .new -side left The remaining arguments, if any, are passed through to the button command. This lets the caller finetune some of the button attributes: PackedButton .quit Quit { Exit } {-side left -padx 5} \ -background red}

The two commands executed by PackedButton are: button .quit -text Quit -command { Exit }-background red pack .quit -side left -padx 5 You can see a difference between the pack and args argument in the call to PackedButton. You need to group the packing options explicitly into a single argument. The args parameter is automatically made into a list of all remaining arguments. In fact, if you group the extra button parameters, it will be a mistake: PackedButton .quit Quit { Exit } {-side left -padx 5} \ {-background red} => unknown option "-background red"

Correct Quoting with eval

What about the peculiar placement of braces in PackedButton? eval {button $path -text $txt -command $cmd} $args By using braces, we control the number of times different parts of the command are seen by the Tcl evaluator. Without any braces, everything goes through two rounds of substitution. The braces prevent one of those rounds. In the above command, only $args is substituted twice. Before eval is called, the $args is replaced with its list value. Then, eval is invoked, and it concatenates its two list arguments into one list, which is now a properly formed command. The second round of substitutions done by eval replaces the txt and cmd values. Do not use double quotes with eval.

You may be tempted to use double quotes instead of curly braces in your uses of eval. Don't give in! Using double quotes is, mostly likely, wrong. Suppose the first eval command is written like this: eval "button $path -text $txt -command $cmd $args" Incidentally, the previous is equivalent to:

eval button $path -text $txt -command $cmd $args These versions happen to work with the following call because txt and cmd have one-word values with no special characters in them: PackedButton .quit Quit { Exit } The button command that is ultimately evaluated is: button .quit -text Quit -command { Exit } In the next call, an error is raised: PackedButton .save "Save As" [list Save $file] => unknown option "As" This is because the button command is this: button .save -text Save As -command Save /a/b/c But it should look like this instead: button .save -text {Save As}-command {Save /a/b/c} The problem is that the structure of the button command is now wrong. The value of txt and cmd are substituted first, before eval is even called, and then the whole command is parsed again. The worst part is that sometimes using double quotes works, and sometimes it fails. The success of using double quotes depends on the value of the parameters. When those values contain spaces or special characters, the command gets parsed incorrectly. Braces: the one true way to group arguments to eval.

To repeat, the safe construct is:

eval {button $path -text $txt -command $cmd} $args The following variations are also correct. The first uses list to do quoting automatically, and the others use backslashes or braces to prevent the extra round of substitutions: eval [list button $path -text $txt -command $cmd] $args eval button \$path -text \$txt -command \$cmd $args eval button {$path} -text {$txt} -command {$cmd} $args Finally, here is one more incorrect approach that tries to quote by hand: eval "button {$path}-text {$txt}-command {$cmd} $args" The problem above is that quoting is not always done with curly braces. If a value contains an unmatched curly brace, Tcl would have used backslashes to quote it, and the above command would raise an error: set blob "foo\{bar space" => foo{bar space eval "puts {$blob}" => missing close brace eval puts {$blob} => foo{bar space

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 10. Quoting Issues and Eval

The uplevel Command

The uplevel command is similar to eval, except that it evaluates a command in a different scope than the current procedure. It is useful for defining new control structures entirely in Tcl. The syntax for uplevel is: uplevel ?level? command ?list1 list2 ...? As with upvar, the level parameter is optional and defaults to 1, which means to execute the command in the scope of the calling procedure. The other common use of level is #0, which means to evaluate the command in the global scope. You can count up farther than one (e.g., 2 or 3), or count down from the global level (e.g., #1 or #2), but these cases rarely make sense. When you specify the command argument, you must be aware of any substitutions that might be performed by the Tcl interpreter before uplevel is called. If you are entering the command directly, protect it with curly braces so that substitutions occur in the other scope. The following affects the variable x in the caller's scope: uplevel {set x [expr $x + 1]} However, the following will use the value of x in the current scope to define the value of x in the calling scope, which is probably not what was intended: uplevel "set x [expr $x + 1]" If you are constructing the command dynamically, again use list. This fragment is used later in Example 10-4: uplevel [list foreach $args $valueList {break}]

It is common to have the command in a variable. This is the case when the command has been passed into your new control flow procedure as an argument. In this case, you should evaluate the command one level up. Put the level in explicitly to avoid cases where $cmd looks like a number! uplevel 1 $cmd Another common scenario is reading commands from users as part of an application. In this case, you should evaluate the command at the global scope. Example 16-2 on page 220 illustrates this use of uplevel : uplevel #0 $cmd If you are assembling a command from a few different lists, such as the args parameter, then you can use concat to form the command: uplevel [concat $cmd $args] The lists in $cmd and $args are concatenated into a single list, which is a valid Tcl command. Like eval, uplevel uses concat internally if it is given extra arguments, so you can leave out the explicit use of concat. The following commands are equivalent: uplevel [concat $cmd $args] uplevel "$cmd $args" uplevel $cmd $args Example 10-4 shows list assignment using the foreach trick described on Page 75. List assignment is useful if a command returns several values in a list. The lassign procedure assigns the list elements to several variables. The lassign procedure hides the foreach trick, but it must use the uplevel command so that the loop variables get assigned in the correct scope. The list command is used to construct the foreach command that is executed in the caller's scope. This is necessary so that $variables and $values get substituted before the command is evaluated in the other scope. Example 10-4 lassign: list assignment with foreach. # # # # Assign a set of variables from a list of values. If there are more values than variables, they are returned. If there are fewer values than variables, the variables get the empty string.

proc lassign {valueList args} { if {[llength $args] == 0} {

error "wrong # args: lassign list varname ?varname..?" } if {[llength $valueList] == 0} { # Ensure one trip through the foreach loop set valueList [list {}] } uplevel 1 [list foreach $args $valueList {break}] return [lrange $valueList [llength $args] end] } Example 10-5 illustrates a new control structure with the File_Process procedure that applies a callback to each line in a file. The call to uplevel allows the callback to be concatenated with the line to form the command. The list command is used to quote any special characters in line, so it appears as a single argument to the command. Example 10-5 The File_Process procedure applies a command to each line of a file. proc File_Process {file callback} { set in [open $file] while {[gets $file line] >= 0} { uplevel 1 $callback [list $line] } close $in } What is the difference between these two commands? uplevel 1 [list $callback $line] uplevel 1 $callback [list $line] The first form limits callback to be the name of the command, while the second form allows callback to be a command prefix. Once again, what is the bug with this version? uplevel 1 $callback $line The arbitrary value of $line is concatenated to the callback command, and it is likely to be a malformed command when executed.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 10. Quoting Issues and Eval

The subst Command

The subst command is useful when you have a mixture of Tcl commands, Tcl variable references, and plain old data. The subst command looks through the data for square brackets, dollar signs, and backslashes, and it does substitutions on those. It leaves the rest of the data alone: set a "foo bar" subst {a=$a date=[exec date]} => a=foo bar date=Thu Dec 15 10:13:48 PST 1994 The subst command does not honor the quoting effect of curly braces. It does substitutions regardless of braces: subst {a=$a date={[exec date]}} => a=foo bar date={Thu Dec 15 10:15:31 PST 1994} You can use backslashes to prevent variable and command substitution. subst {a=\$a date=\[exec date]} => a=$a date=[exec date] You can use other backslash substitutions like \uXXXX to get Unicode characters, \n to get newlines, or \-newline to hide newlines. The subst command takes flags that limit the substitutions it will perform. The flags are nobackslashes, -nocommands, or -novariables . You can specify one or more of these flags before the string that needs to be substituted: subst -novariables {a=$a date=[exec date]} => a=$a date=Thu Dec 15 10:15:31 PST 1994

String Processing with subst

The subst command can be used with the regsub command to do efficient, two-step string processing. In the first step, regsub is used to rewrite an input string into data with embedded Tcl commands. In the second step, subst or eval replaces the Tcl commands with their result. By artfully mapping the data into Tcl commands, you can dynamically construct a Tcl script that processes the data. The processing is efficient because the Tcl parser and the regular expression processor have been highly tuned. Chapter 11 has several examples that use this technique.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Part II. Advanced Tcl

Chapter 11. Regular Expressions

This chapter describes regular expression pattern matching and string processing based on regular expression substitutions. These features provide the most powerful string processing facilities in Tcl. Tcl commands described are: regexp and regsub. Regular expressions are a formal way to describe string patterns. They provide a powerful and compact way to specify patterns in your data. Even better, there is a very efficient implementation of the regular expression mechanism due to Henry Spencer. If your script does much string processing, it is worth the effort to learn about the regexp command. Your Tcl scripts will be compact and efficient. This chapter uses many examples to show you the features of regular expressions. Regular expression substitution is a mechanism that lets you rewrite a string based on regular expression matching. The regsub command is another powerful tool, and this chapter includes several examples that do a lot of work in just a few Tcl commands. Stephen Uhler has shown me several ways to transform input data into a Tcl script with regsub and then use subst or eval to process the data. The idea takes a moment to get used to, but it provides a very efficient way to process strings. Tcl 8.1 added a new regular expression implementation that supports Unicode and advanced regular expressions (ARE). This implementation adds more syntax and escapes that makes it easier to write patterns, once you learn the new features! If you know Perl, then you are already familiar with these features. The Tcl advanced regular expressions are almost identical to the Perl 5 regular expressions. The new features include a few very minor incompatibilities with the regular expressions implemented in earlier versions of Tcl 8.0, but these rarely occur in practice. The new regular expression package supports Unicode, of course, so you can write patterns to match Japanese or Hindu documents!

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 11. Regular Expressions

When to Use Regular Expressions

Regular expressions can seem overly complex at first. They introduce their own syntax and their own rules, and you may be tempted to use simpler commands like string first, string range, or string match to process your strings. However, often a single regular expression command can replace a sequence of several string commands. Any time you can replace several Tcl commands with one, you get a performance improvement. Furthermore, the regular expression matcher is implemented in optimized C code, so pattern matching is fast. The regular expression matcher does more than test for a match. It also tells you what part of your input string matches the pattern. This is useful for picking data out of a large input string. In fact, you can capture several pieces of data in just one match by using subexpressions. The regexp Tcl command makes this easy by assigning the matching data to Tcl variables. If you find yourself using string first and string range to pick out data, remember that regexp can do it in one step instead. The regular expression matcher is structured so that patterns are first compiled into an form that is efficient to match. If you use the same pattern frequently, then the expensive compilation phase is done only once, and all your matching uses the efficient form. These details are completely hidden by the Tcl interface. If you use a pattern twice, Tcl will nearly always be able to retrieve the compiled form of the pattern. As you can see, the regular expression matcher is optimized for lots of heavy-duty string processing.

Avoiding a Common Problem

Group your patterns with curly braces.

One of the stumbling blocks with regular expressions is that they use some of the same special characters as Tcl. Any pattern that contains brackets, dollar signs, or spaces must be quoted when used

in a Tcl command. In many cases you can group the regular expression with curly braces, so Tcl pays no attention to it. However, when using Tcl 8.0 (or earlier) you may need Tcl to do backslash substitutions on part of the pattern, and then you need to worry about quoting the special characters in the regular expression. Advanced regular expressions eliminate this problem because backslash substitution is now done by the regular expression engine. Previously, to get \n to mean the newline character (or \t for tab) you had to let Tcl do the substitution. With Tcl 8.1, \n and \t inside a regular expression mean newline and tab. In fact, there are now about 20 backslash escapes you can use in patterns. Now more than ever, remember to group your patterns with curly braces to avoid conflicts between Tcl and the regular expression engine. The patterns in the first sections of this Chapter ignore this problem. The sample expressions in Table 11-7 on page 151 are quoted for use within Tcl scripts. Most are quoted simply by putting the whole pattern in braces, but some are shown without braces for comparison.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 11. Regular Expressions

Regular Expression Syntax

This section describes the basics of regular expression patterns, which are found in all versions of Tcl. There are occasional references to features added by advanced regular expressions, but they are covered in more detail starting on page 138. There is enough syntax in regular expressions that there are five tables that summarize all the options. These tables appear together starting at page 145. A regular expression is a sequence of the following items: A literal character. A matching character, character set, or character class. A repetition quantifier. An alternation clause. A subpattern grouped with parentheses.

Matching Characters
Most characters simply match themselves. The following pattern matches an a followed by a b: ab The general wild-card character is the period, ".". It matches any single character. The following pattern matches an a followed by any character: a. Remember that matches can occur anywhere within a string; a pattern does not have to match the

whole string. You can change that by using anchors, which are described on page 137.

Character Sets
The matching character can be restricted to a set of characters with the [xyz] syntax. Any of the characters between the two brackets is allowed to match. For example, the following matches either Hello or hello: [Hh]ello The matching set can be specified as a range over the character set with the [x-y] syntax. The following matches any digit: [0-9] There is also the ability to specify the complement of a set. That is, the matching character can be anything except what is in the set. This is achieved with the [^xyz] syntax. Ranges and complements can be combined. The following matches anything except the uppercase and lowercase letters: [^a-zA-Z] Using special characters in character sets.

If you want a ] in your character set, put it immediately after the initial opening bracket. You do not need to do anything special to include [ in your character set. The following matches any square brackets or curley braces: [][{}] Most regular expression syntax characters are no longer special inside character sets. This means you do not need to backslash anything inside a bracketed character set except for backslash itself. The following pattern matches several of the syntax characters used in regular expressions: [][+*?()|\\]

Advanced regular expressions add names and backslash escapes as shorthand for common sets of characters like white space, alpha, alphanumeric, and more. These are described on page 139 and listed in Table 11-3 on page 146.

Quantifiers
Repetition is specified with *, for zero or more, +, for one or more, and ?, for zero or one. These quantifiers apply to the previous item, which is either a matching character, a character set, or a subpattern grouped with parentheses. The following matches a string that contains b followed by zero or more a's: ba* You can group part of the pattern with parentheses and then apply a quantifier to that part of the pattern. The following matches a string that has one or more sequences of ab: (ab)+ The pattern that matches anything, even the empty string, is: .* These quantifiers have a greedy matching behavior: They match as many characters as possible. Advanced regular expressions add nongreedy matching, which is described on page 140. For example, a pattern to match a single line might look like this: .*\n However, as a greedy match, this will match all the lines in the input, ending with the last newline in the input string. The following pattern matches up through the first newline. [^\n]*\n We will shorten this pattern even further on page 140 by using nongreedy quantifiers. There are also special newline sensitive modes you can turn on with some options described on page 143.

Alternation
Alternation lets you test more than one pattern at the same time. The matching engine is designed to be

able to test multiple patterns in parallel, so alternation is efficient. Alternation is specified with |, the pipe symbol. Another way to match either Hello or hello is: hello|Hello You can also write this pattern as: (h|H)ello or as: [hH]ello

Anchoring a Match
By default a pattern does not have to match the whole string. There can be unmatched characters before and after the match. You can anchor the match to the beginning of the string by starting the pattern with ^, or to the end of the string by ending the pattern with $. You can force the pattern to match the whole string by using both. All strings that begin with spaces or tabs are matched with: ^[ \t]+ If you have many text lines in your input, you may be tempted to think of ^ as meaning "beginning of line" instead of "beginning of string." By default, the ^ and $ anchors are relative to the whole input, and embedded newlines are ignored. Advanced regular expressions support options that make the ^ and $ anchors line-oriented. They also add the \A and \Z anchors that always match the beginning and end of the string, respectively.

Backslash Quoting
Use the backslash character to turn off these special characters : . * ? + [ ] ( ) ^ $ | \ For example, to match the plus character, you will need: \+

Remember that this quoting is not necessary inside a bracketed expression (i.e., a character set definition.) For example, to match either plus or question mark, either of these patterns will work: (\+|\?) [+?] To match a single backslash, you need two. You must do this everywhere, even inside a bracketed expression. Or you can use \B, which was added as part of advanced regular expressions. Both of these match a single backslash: \\ \B Unknown backslash sequences are an error.

Versions of Tcl before 8.1 ignored unknown backslash sequences in regular expressions. For example, \= was just =, and \w was just w. Even \n was just n, which was probably frustrating to many beginners trying to get a newline into their pattern. Advanced regular expressions add backslash sequences for tab, newline, character classes, and more. This is a convenient improvement, but in rare cases it may change the semantics of a pattern. Usually these cases are where an unneeded backslash suddenly takes on meaning, or causes an error because it is unknown.

Matching Precedence
If a pattern can match several parts of a string, the matcher takes the match that occurs earliest in the input string. Then, if there is more than one match from that same point because of alternation in the pattern, the matcher takes the longest possible match. The rule of thumb is: first, then longest. This rule gets changed by nongreedy quantifiers that prefer a shorter match. Watch out for *, which means zero or more, because zero of anything is pretty easy to match. Suppose your pattern is: [a-z]* This pattern will match against 123abc, but not how you expect. Instead of matching on the letters in the string, the pattern will match on the zero-length substring at the very beginning of the input string! This behavior can be seen by using the -indices option of the regexp command described on page 148. This option tells you the location of the matching string instead of the value of the matching string.

Capturing Subpatterns
Use parentheses to capture a subpattern. The string that matches the pattern within parentheses is remembered in a matching variable, which is a Tcl variable that gets assigned the string that matches the pattern. Using parentheses to capture subpatterns is very useful. Suppose we want to get everything between the <td> and </td> tags in some HTML. You can use this pattern: <td>([^<]*)</td> The matching variable gets assigned the part of the input string that matches the pattern inside the parentheses. You can capture many subpatterns in one match, which makes it a very efficient way to pick apart your data. Matching variables are explained in more detail on page 148 in the context of the regexp command. Sometimes you need to introduce parentheses but you do not care about the match that occurs inside them. The pattern is slightly more efficient if the matcher does not need to remember the match. Advanced regular expressions add noncapturing parentheses with this syntax: (?:pattern)

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 11. Regular Expressions

Advanced Regular Expressions

The syntax added by advanced regular expressions is mostly just short hand notation for constructs you can make with the basic syntax already described. There are also some new features that add additional power: nongreedy quantifiers, back references, look-ahead patterns, and named character classes. If you are just starting out with regular expressions, you can ignore most of this section, except for the one about backslash sequences. Once you master the basics, of if you are already familar with regular expressions in Tcl (or the UNIX vi editor or grep utility), then you may be interested in the new features of advanced regular expressions.

Compatibility with Patterns in Tcl 8.0

Advanced regular expressions add syntax in an upward compatible way. Old patterns continue to work with the new matcher, but advanced regular expressions will raise errors if given to old versions of Tcl. For example, the question mark is used in many of the new constructs, and it is artfully placed in locations that would not be legal in older versions of regular expressions. The added syntax is summarized in Table 11-2 on page 145. If you have unbraced patterns from older code, they are very likely to be correct in Tcl 8.1 and later versions. For example, the following pattern picks out everything up to the next newline. The pattern is unbraced, so Tcl substitutes the newline character for each occurrence of \n. The square brackets are quoted so that Tcl does not think they delimit a nested commmand: regexp "(\[^\n\]+)\n" $input The above command behaves identically when using advanced regular expressions, although you can now also write it like this: regexp {([^\n]+)\n} $input The curley braces hide the brackets from the Tcl parser, so they do not need to be escaped with

backslash. This saves us two characters and looks a bit cleaner.

Backslash Escape Sequences

The most significant change in advanced regular expression syntax is backslash substitutions. In Tcl 8.0 and earlier, a backslash is only used to turn off special characters such as: . + * ? [ ]. Otherwise it was ignored. For example, \n was simply n to the Tcl 8.0 regular expression engine. This was a source of confusion, and it meant you could not always quote patterns in braces to hide their special characters from Tcl's parser. In advanced regular expressions, \n now means the newline characer to the regular expression engine, so you should never need to let Tcl do backslash processing. Again, always group your pattern with curley braces to avoid confusion. Advanced regular expressions add a lot of new backslash sequences. They are listed in Table 11-4 on page 146. Some of the more useful ones include \s, which matches space-like characters, \w, which matches letters, digit, and the underscore, \y, which matches the beginning or end of a word, and \B, which matches a backslash.

Character Classes
Character classes are names for sets of characters. The named character class syntax is valid only inside a bracketed character set. The syntax is [:identifier:] For example, alpha is the name for the set of uppercase and lowercase letters. The following two patterns are almost the same: [A-Za-z] [[:alpha:]] The difference is that the alpha character class also includes accented characters like . If you match data that contains nonASCII characters, the named character classes are more general than trying to name the characters explicitly. There are also backslash sequences that are shorthand for some of the named character classes. The following patterns to match digits are equivalent: [0-9] [[:digit:]] \d The following patterns match space-like characters including backspace, form feed, newline, carriage return, tag, and vertical tab:

[ \b\f\n\r\t\v] [:space:] \s The named character classes and the associated backslash sequence are listed in Table 11-3 on page 146. You can use character classes in combination with other characters or character classes inside a character set definition. The following patterns match leters, digits, and underscore: [[:digit:][:alpha:]_] [\d[:alpha:]_] [[:alnum:]_] \w Note that \d, \s and \w can be used either inside or outside character sets. When used outside a bracketed expression, they form their own character set. There are also \D, \S, and \W, which are the complement of \d, \s, and \w. These escapes (i.e., \D for not-a-digit) cannot be used inside a bracketed character set. There are two special character classes, [[:<:] and [[:>:]], that match the beginning and end of a word, respectively. A word is defined as one or more characters that match \w.

nongreedy Quantifiers
The *, +, and ? characters are quantifiers that specify repetition. By default these match as many characters as possible, which is called greedy matching. A nongreedy match will match as few characters as possible. You can specify nongreedy matching by putting a question mark after these quantifiers. Consider the pattern to match "one or more of not-a-newline followed by a newline." The not-a-newline must be explicit with the greedy quantifier, as in: [^\n]+\n Otherwise, if the pattern were just .+\n then the "." could well match newlines, so the pattern would greedily consume everything until the very last newline in the input. A nongreedy match would be satisfied with the very first newline instead:

.+?\n By using the nongreedy quantifier we've cut the pattern from eight characters to five Another example that is shorter with a nongreedy quantifier is the HTML example from page 138. The following pattern also matches everything between <td> and </td>: <td>(.*?)</td> Even ? can be made nongreedy, ??, which means it prefers to match zero instead of one. This only makes sense inside the context of a larger pattern. Send me e-mail if you have a compelling example for it!

Bound Quantifiers
The {m,n} syntax is a quantifier that means match at least m and at most n of the previous matching item. There are two variations on this syntax. A simple {m} means match exactly m of the previous matching item. A {m,} means match m or more of the previous matching item. All of these can be made nongreedy by adding a ? after them.

Back References
A back reference is a feature you cannot easily get with basic regular expressions. A back reference matches the value of a subpattern captured with parentheses. If you have several sets of parentheses you can refer back to different captured expressions with \1, \2, and so on. You count by left parentheses to determine the reference. For example, suppose you want to match a quoted string, where you can use either single or double quotes. You need to use an alternation of two patterns to match strings that are enclosed in double quotes or in single quotes: ("[^"]*"|'[^']*') With a back reference, \1, the pattern becomes simpler: ('|").*?\1 The first set of parenthesis matches the leading quote, and then the \1 refers back to that particular quote character. The nongreedy quantifier ensures that the pattern matches up to the first occurrence of the matching quote.

Look-ahead

Look-ahead patterns are subexpressions that are matched but do not consume any of the input. They act like constraints on the rest of the pattern, and they typically occur at the end of your pattern. A positive look-ahead causes the pattern to match if it also matches. A negative look-ahead causes the pattern to match if it would not match. These constraints make more sense in the context of matching variables and in regular expression subsitutions done with the regsub command. For example, the following pattern matches a filename that begins with A and ends with .txt Â.*\.txt$ The next version of the pattern adds parentheses to group the file name suffix. Â.*(\.txt)$ The parentheses are not strictly necessary, but they are introduced so that we can compare the pattern to one that uses look-ahead. A version of the pattern that uses look-ahead looks like this: Â.*(?=\.txt)$ The pattern with the look-ahead constraint matches only the part of the filename before the .txt, but only if the .txt is present. In other words, the .txt is not consumed by the match. This is visible in the value of the matching variables used with the regexp command. It would also affect the substitutions done in the regsub command. There is negative look-ahead too. The following pattern matches a filename that begins with A and does not end with .txt. Â.*(?!\.txt)$ Writing this pattern without negative look-ahead is awkward.

Character Codes
The \nn and \mmm syntax, where n and m are digits, can also mean an 8-bit character code corresponding to the octal value nn or mmm. This has priority over a back reference. However, I just wouldn't use this notation for character codes. Instead, use the Unicode escape sequence, \unnnn, which specifies a 16-bit value. The \xnn sequence also specifies an 8-bit character code. Unfortunately, the \x escape consumes all hex digits after it (not just two!) and then truncates the hexadecimal value down to 8 bits. This misfeature of \x is not considered a bug and will probably not change even in future versions of Tcl. The \Uyyyyyyyy syntax is reserved for 32-bit Unicode, but I don't expect to see that implemented anytime soon.

Collating Elements
Collating elements are characters or long names for characters that you can use inside character sets. Currently, Tcl only has some long names for various ASCII punctuation characters. Potentially, it could support names for every Unicode character, but it doesn't because the mapping tables would be huge. This section will briefly mention the syntax so that you can understand it if you see it. But its usefulness is still limited. Within a bracketed expression, the following syntax is used to specify a collating element: [.identifier.] The identifier can be a character or a long name. The supported long names can be found in the generic/regc_locale.c file in the Tcl source code distribution. A few examples are shown below: [.c.] [.#.] [.number-sign.]

Equivalence Classes
An equivalence class is all characters that sort to the same position. This is another feature that has limited usefulness in the current version of Tcl. In Tcl, characters sort by their Unicode character value, so there are no equivalence classes that contain more than one character! However, you could imagine a character class for 'o', '', and other accented versions of the letter o. The syntax for equivalence classes within bracketed expressions is: [=char=] where char is any one of the characters in the character class. This syntax is valid only inside a character class definition.

Newline Sensitive Matching

By default, the newline character is just an ordinary character to the matching engine. You can make the newline character special with two options: lineanchor and linestop. You can set these options with flags to the regexp and regsub Tcl commands, or you can use the embedded options described later in Table 11-5 on page 147. The lineanchor option makes the ^ and $ anchors work relative to newlines. The ^ matches immediately after a newline, and $ matches immediately before a newline. These anchors continue to match the very beginning and end of the input,too. With or without the lineanchor option, you can use \A and \Z to match the beginning and end of the string.

The linestop option prevents . (i.e., period) and character sets that begin with ^ from matching a newline character. In otherwords, unless you explicitly include \n in your pattern, it will not match across newlines.

Embedded Options
You can start a pattern with embedded options to turn on or off case sensitivity, newline sensitivity, and expanded syntax, which is explained in the next section. You can also switch from advanced regular expressions to a literal string, or to older forms of regular expressions. The syntax is a leading: (?chars) where chars is any number of option characters. The option characters are listed in Table 11-5 on page 147.

Expanded Syntax
Expanded syntax lets you include comments and extra white space in your patterns. This can greatly improve the readability of complex patterns. Expanded syntax is turned on with a regexp command option or an embeded option. Comments start with a # and run until the end of line. Extra white space and comments can occur anywhere except inside bracketed expressions (i.e., character sets) or within multicharacter syntax elements like (?=. When you are in expanded mode, you can turn off the comment character or include an explicit space by preceeding them with a backslash. Example 11-1 shows a pattern to match URLs. The leading (?x) turns on expanded syntax. The whole pattern is grouped in curly braces to hide it from Tcl. This example is considered again in more detail in Example 11-3 on page 150: Example 11-1 Expanded regular expressions allow comments. regexp {(?x) ([^:]+): //([^:/]+) (:([0-9]+))? (/.*) } $input # # # # # A pattern to match URLS The protocol before the initial colon The server name The optional port number The trailing pathname

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 11. Regular Expressions

Syntax Summary
Table 11-1 summarizes the syntax of regular expressions available in all versions of Tcl:

Table 11-1. Basic regular expression syntax.

. * + ? ( ) | [ ] ^ $

Matches any character. Matches zero or more instances of the previous pattern item. Matches one or more instances of the previous pattern item. Matches zero or one instances of the previous pattern item. Groups a subpattern. The repetition and alternation operators apply to the preceding subpattern. Alternation. Delimit a set of characters. Ranges are specified as [x-y]. If the first character in the set is ^, then there is a match if the remaining characters in the set are not present. Anchor the pattern to the beginning of the string. Only when first. Anchor the pattern to the end of the string. Only when last.

Advanced regular expressions, which were introduced in Tcl 8.1, add more syntax that is summarized in Table 11-2:

Table 11-2. Additional advanced regular expression syntax.

{m} {m}? {m,} {m,}? {m,n} {m,n}? *? +? ?? (?:re) (?=re) (?!re) (?abc) \c [: :] [. .] [= =]

Matches m instances of the previous pattern item. Matches m instances of the previous pattern item. Nongreedy. Matches m or more instances of the previous pattern item. Matches m or more instances of the previous pattern item. Nongreedy. Matches m through n instances of the previous pattern item. Matches m through n instances of the previous pattern item. Nongreedy. Matches zero or more instances of the previous pattern item. Nongreedy. Matches one or more instances of the previous pattern item. Nongreedy. Matches zero or one instances of the previous pattern item. Nongreedy. Groups a subpattern, re, but does not capture the result. Positive look-ahead. Matches the point where re begins. Negative look-ahead. Matches the point where re does not begin. Embedded options, where abc is any number of option letters listed in Table 11-5. One of many backslash escapes listed in Table 11-4. Delimits a character class within a bracketed expression. See Table 11-3. Delimits a collating element within a bracketed expression. Delimits an equivalence class within a bracketed expression.

Table 11-3 lists the named character classes defined in advanced regular expressions and their associated backslash sequences, if any. Character class names are valid inside bracketed character sets with the [:class:] syntax.

Table 11-3. Character classes.

alnum alpha blank cntrl digit graph lower print punct space upper xdigit

Upper and lower case letters and digits. Upper and lower case letters. Space and tab. Control characters: \u0001 through \u001F. The digits zero through nine. Also \d. Printing characters that are not in cntrl or space. Lowercase letters. The same as alnum. Punctuation characters. Space, newline, carrage return, tab, vertical tab, form feed. Also \s. Uppercase letters. Hexadecimal digits: zero through nine, a-f, A-F.

Table 11-4 lists backslash sequences supported in Tcl 8.1.

Table 11-4. Backslash escapes in regular expressions.

\a \A \b \B \cX \d \D \e \f \m \M \n \r \s \S

Alert, or "bell", character. Matches only at the beginning of the string. Backspace character, \u0008. Synonym for backslash. Control-X. Digits. Same as [[:digit:]] Not a digit. Same as [^[:digit:]] Escape character, \u001B. Form feed, \u000C. Matches the beginning of a word. Matches the end of a word. Newline, \u000A. Carriage return, \u000D. Space. Same as [[:space:]] Not a space. Same as [^[:space:]]

\t \uXXXX \v \w \W \xhh \y \Y \Z \0 \x \xy \xyz

Horizontal tab, \u0009. A 16-bit Unicode character code. Vertical tab, \u000B. Letters, digit, and underscore. Same as [[:alnum:]_] Not a letter, digit, or underscore. Same as [^[:alnum:]_] An 8-bit hexidecimal character code. Consumes all hex digits after \x. Matches the beginning or end of a word. Matches a point that is not the beginning or end of a word. Matches the end of the string. NULL, \u0000 Where x is a digit, this is a back-reference. Where x and y are digits, either a decimal back-reference, or an 8-bit octal character code. Where x, y and z are digits, either a decimal back-reference or an 8-bit octal character code.

Table 11-5 lists the embeded option characters used with the (?abc) syntax.

Table 11-5. Embedded option characters used with the (?x) syntax.
b c e i m n p q s t w x

The rest of the pattern is a basic regular expression (a la vi or grep). Case sensitive matching. This is the default. The rest of the pattern is an extended regular expression (a la Tcl 8.0). Case insensitive matching. Synonym for the n option. Newline sensitive matching . Both lineanchor and linestop mode. Partial newline sensitive matching. Only linestop mode. The rest of the pattern is a literal string. No newline sensitivity. This is the default. Tight syntax; no embedded comments. This is the default. Inverse partial newline-sensitive matching. Only lineanchor mode. Expanded syntax with embeded white space and comments.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 11. Regular Expressions

The regexp Command

The regexp command provides direct access to the regular expression matcher. Not only does it tell you whether a string matches a pattern, it can also extract one or more matching substrings. The return value is 1 if some part of the string matches the pattern; it is 0 otherwise. Its syntax is: regexp ?flags? pattern string ?match sub1 sub2...? The flags are described in Table 11-6:

Table 11-6. Options to the regexp command.

-nocase -indices

Lowercase characters in pattern can match either lowercase or uppercase letters in string. The match variables each contain a pair of numbers that are in indices delimiting the match within string. Otherwise, the matching string itself is copied into the match variables. The pattern uses the expanded syntax discussed on page 144. The same as specifying both -lineanchor and -linestop. Change the behavior of ^ and $ so they are line-oriented as discussed on page 143. Change matching so that . and character classes do not match newlines as discussed on page 143. Useful for debugging. It returns information about the pattern instead of trying to match it against the input. Signals the end of the options. You must use this if your pattern begins with -.

-expanded -line -lineanchor -linestop -about --

The pattern argument is a regular expression as described earlier. If string matches pattern, then the results of the match are stored in the variables named in the command. These match variable

arguments are optional. If present, match is set to be the part of the string that matched the pattern. The remaining variables are set to be the substrings of string that matched the corresponding subpatterns in pattern. The correspondence is based on the order of left parentheses in the pattern to avoid ambiguities that can arise from nested subpatterns. Example 11-2 uses regexp to pick the hostname out of the DISPLAY environment variable, which has the form: hostname:display.screen

Example 11-2 Using regular expressions to parse a string. set env(DISPLAY) sage:0.1 regexp {([^:]*):}$env(DISPLAY) match host => 1 set match => sage: set host => sage The pattern involves a complementary set, [^:], to match anything except a colon. It uses repetition, *, to repeat that zero or more times. It groups that part into a subexpression with parentheses. The literal colon ensures that the DISPLAY value matches the format we expect. The part of the string that matches the complete pattern is stored into the match variable. The part that matches the subpattern is stored into host. The whole pattern has been grouped with braces to quote the square brackets. Without braces it would be: regexp (\[^:\]*): $env(DISPLAY) match host With advanced regular expressions the nongreedy quantifier *? can replace the complementary set: regexp (.*?): $env(DISPLAY) match host This is quite a powerful statement, and it is efficient. If we had only had the string command to work with, we would have needed to resort to the following, which takes roughly twice as long to interpret: set i [string first : $env(DISPLAY)] if {$i >= 0} { set host [string range $env(DISPLAY) 0 [expr $i-1]] }

A Pattern to Match URLs

Example 11-3 demonstrates a pattern with several subpatterns that extract the different parts of a URL. There are lots of subpatterns, and you can determine which match variable is associated with which subpattern by counting the left parenthesis. The pattern will be discussed in more detail after the example: Example 11-3 A pattern to match URLs. set url http://www.beedub.com:80/index.html regexp {([^:]+)://([^:/]+)(:([0-9]+))?(/.*)}$url \ match protocol x serverport path => 1 set match => http://www.beedub.com:80/index.html set protocol => http set server => www.beedub.com set x => :80 set port => 80 set path => /index.html Let's look at the pattern one piece at a time. The first part looks for the protocol, which is separated by a colon from the rest of the URL. The first part of the pattern is one or more characters that are not a colon, followed by a colon. This matches the http: part of the URL: [^:]+: Using nongreedy +? quantifier, you could also write that as: .+?: The next part of the pattern looks for the server name, which comes after two slashes. The server name is followed either by a colon and a port number, or by a slash. The pattern uses a complementary set that specifies one or more characters that are not a colon or a slash. This matches the //www.beedub.com part of the URL: //[^:/]+

The port number is optional, so a subpattern is delimited with parentheses and followed by a question mark. An additional set of parentheses are added to capture the port number without the leading colon. This matches the :80 part of the URL: (:([0-9]+))? The last part of the pattern is everything else, starting with a slash. This matches the /index.html part of the URL: /.* Use subpatterns to parse strings.

To make this pattern really useful, we delimit several subpatterns with parentheses: ([^:]+)://([^:/]+)(:([0-9]+))?(/.*) These parentheses do not change the way the pattern matches. Only the optional port number really needs the parentheses in this example. However, the regexp command gives us access to the strings that match these subpatterns. In one step regexp can test for a valid URL and divide it into the protocol part, the server, the port, and the trailing path. The parentheses around the port number include the : before the digits. We've used a dummy variable that gets the : and the port number, and another match variable that just gets the port number. By using noncapturing parentheses in advanced regular expressions, we can eliminate the unused match variable. We can also replace both complementary character sets with a nongreedy .+? match. Example 11-4 shows this variation: Example 11-4 An advanced regular expression to match URLs. set url http://www.beedub.com:80/book/ regexp {(.+?)://(.+?)(?::([0-9]+))?(/.*)}$url \ match protocol server port path => 1 set match => http://www.beedub.com:80/book/ set protocol => http set server

=> www.beedub.com set port => 80 set path => /book/

Sample Regular Expressions

The table in this section lists regular expressions as you would use them in Tcl commands. Most are quoted with curly braces to turn off the special meaning of square brackets and dollar signs. Other patterns are grouped with double quotes and use backslash quoting because the patterns include backslash sequences like \n and \t. In Tcl 8.0 and earlier, these must be substituted by Tcl before the regexp command is called. In these cases, the equivalent advanced regular expression is also shown.

Table 11-7. Sample regular expressions.

{^[yY]} {^(yes|YES|Yes)$} "^\[^ \t:\]+:" {^\S+:} "^\[ \t]*$" {(?n)^\s*$} "(\n|^)\[^\n\]*(\n|$)" {^[A-Za-z]+$} {^[[:alpha:]]+$} {[A-Za-z0-9_]+} {\w+} {[][${}\\]} "\[^\n\]*\n" {.*?\n} {\.} {[][$^?+*()|\\]} <H1>(.*?)</H1>  {[0-9a-hA-H][0-9a-hA-H]} {[[:xdigit:]]{2}}

Begins with y or Y, as in a Yes answer. Exactly "yes", "Yes", or "YES". Begins with colon-delimited field that has no spaces or tabs. Same as above, using \S for "not space". A string of all spaces or tabs. A blank line using newline sensitive mode. A blank line, the hard way. Only letters. Only letters, the Unicode way. Letters, digits, and the underscore. Letters, digits, and the underscore using \w. The set of Tcl special characters: ] [ $ { } \ Everything up to a newline. Everything up to a newline using nongreedy *? A period. The set of regular expression special characters: ] [ $ ^ ? + * ( ) | \ An H1 HTML tag. The subpattern matches the string between the tags. HTML comments. 2 hex digits. 2 hex digits, using advanced regular expressions.

{\d{1,3}}

1 to 3 digits, using advanced regular expressions.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 11. Regular Expressions

The regsub Command

The regsub command does string substitution based on pattern matching. It is very useful for processing your data. It can perform simple tasks like replacing sequences of spaces and tabs with a single space. It can perform complex data transforms, too, as described in the next section. Its syntax is: regsub ?switches? pattern string subspec varname The regsub command returns the number of matches and replacements, or 0 if there was no match. regsub copies string to varname , replacing occurrences of pattern with the substitution specified by subspec . If the pattern does not match, then string is copied to varname without modification. The optional switches include:
-all, which means to replace all occurrences of the pattern. Otherwise only the first occurrence

is replaced. The -nocase, -expanded, -line, -linestop, and -lineanchor switches are the same as in the regexp command. They are described on page 148. The -- switch separates the pattern from the switches, which is necessary if your pattern begins with a -. The replacement pattern, subspec, can contain literal characters as well as the following special sequences:
&

is replaced with the string that matched the pattern. in

\x , where x is a number, is replaced with the string that matched the corresponding subpattern pattern . The correspondence is based on the order of left parentheses in the pattern

specification. The following replaces a user's home directory with a ~:

regsub ^$env(HOME)/ $pathname ~/ newpath The following constructs a C compile command line given a filename: set file tclIO.c regsub {([^\.]*)\.c$}$file {cc -c & -o \1.o} ccCmd The matching pattern captures everything before the trailing .c in the file name. The & is replaced with the complete match, tclIO.c, and \1 is replaced with tclIO, which matches the pattern between the parentheses. The value assigned to ccCmd is: cc -c tclIO.c -o tclIO.o We could execute that with: eval exec $ccCmd The following replaces sequences of multiple space characters with a single space: regsub -all {\s+}$string " " string It is perfectly safe to specify the same variable as the input value and the result. Even if there is no match on the pattern, the input string is copied into the output variable. The regsub command can count things for us. The following command counts the newlines in some text. In this case the substitution is not important: set numLines [regsub -all \n $text {} ignore]

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 11. Regular Expressions

Transforming Data to Program with regsub

One of the most powerful combinations of Tcl commands is regsub and subst. This section describes a few examples that use regsub to transform data into Tcl commands, and then use subst to replace those commands with a new version of the data. This technique is very efficient because it relies on two subsystems that are written in highly optimized C code: the regular expression engine and the Tcl parser. These examples are primarily written by Stephen Uhler.

URL Decoding
When a URL is transmitted over the network, it is encoded by replacing special characters with a %xx sequence, where xx is the hexadecimal code for the character. In addition, spaces are replaced with a plus (+). It would be tedious and very inefficient to scan a URL one character at a time with Tcl statements to undo this encoding. It would be more efficient to do this with a custom C program, but still very tedious. Instead, a combination of regsub and subst can efficiently decode the URL in just a few Tcl commands. Replacing the + with spaces requires quoting the + because it is the one-or-more special character in regular expressions: regsub -all {\+}$url {} url The %xx are replaced with a format command that will generate the right character: regsub -all {%([0-9a-hA-H][0-9a-hA-H])} $url \ {[format %c 0x\1]} url The %c directive to format tells it to generate the character from a character code number. We force a hexadecimal interpretation with a leading 0x. Advanced regular expressions let us write the "2 hex digits" pattern a bit more cleanly:

regsub -all {%([[:xdigit:]]{2})} $url \ {[format %c 0x\1]} url The resulting string is passed to subst to get the format commands substituted: set url [subst $url] For example, if the input is %7ewelch, the result of the regsub is: [format %c 0x7e]welch And then subst generates: ~welch Example 11-5 encapsulates this trick in the Url_Decode procedure. Example 11-5 The Url_Decode procedure. proc Url_Decode {url} { regsub -all {\+} $url {} url regsub -all {%([:xdigit:]]{2})} $url \ {[format %c 0x\1]} url return [subst $url] }

CGI Argument Parsing

Example 11-6 builds upon Url_Decode to decode the inputs to a CGI program that processes data from an HTML form. Each form element is identified by a name, and the value is URL encoded. All the names and encoded values are passed to the CGI program in the following format: name1=value1&name2=value2&name3=value3 Example 11-6 shows Cgi_List and Cgi_Query. Cgi_Query receives the form data from the standard input or the QUERY_STRING environment variable, depending on whether the form data is transmitted with a POST or GET request. These HTTP operations are described in detail in Chapter 17. Cgi_List uses split to get back a list of names and values, and then it decodes them with Url_Decode. It returns a Tcl-friendly name, value list that you can either iterate through with a foreach command, or assign to an array with array set:

Example 11-6 The Cgi_Parse and Cgi_Value procedures. proc Cgi_List {} { set query [Cgi_Query] regsub -all {\+}$query {} query set result {} foreach {x}[split $query &=] { lappend result [Url_Decode $x] } return $result } proc Cgi_Query {} { global env if {![info exists env(QUERY_STRING)] || [string length $env(QUERY_STRING)] == 0} { if {[info exists env(CONTENT_LENGTH)] && [string length $env(CONTENT_LENGTH)] != 0} { set query [read stdin $env(CONTENT_LENGTH)] } else { gets stdin query } set env(QUERY_STRING) $query set env(CONTENT_LENGTH) 0 } return $env(QUERY_STRING) } An HTML form can have several form elements with the same name, and this can result in more than one value for each name. If you blindly use array set to map the results of Cgi_List into an array, you will lose the repeated values. Example 11-6 shows Cgi_Parse and Cgi_Value that store the query data in a global cgi array. Cgi_Parse adds list structure whenever it finds a repeated form value. The global cgilist array keeps a record of how many times a form value is repeated. The Cgi_Value procedure returns elements of the global cgi array, or the empty string if the requested value is not present. Example 11-7 Cgi_Parse and Cgi_Value store query data in the cgi array. proc Cgi_Parse {} { global cgi cgilist catch {unset cgi cgilist} set query [Cgi_Query] regsub -all {\+}$query {}query foreach {name value}[split $query &=] { set name [CgiDecode $name] if {[info exists cgilist($name)] && ($cgilist($name) == 1)} { # Add second value and create list structure

set cgi($name) [list $cgi($name) \ [Url_Decode $value]] } elseif {[info exists cgi($name)]} { # Add additional list elements lappend cgi($name) [CgiDecode $value] } else { # Add first value without list structure set cgi($name) [CgiDecode $value] set cgilist($name) 0 ;# May need to listify } incr cgilist($name) } return [array names cgi] } proc Cgi_Value {key} { global cgi if {[info exists cgi($key)]} { return $cgi($key) } else { return {} } } proc Cgi_Length {key} { global cgilist if {[info exist cgilist($key)]} { return $cgilist($key) } else { return 0 } }

Decoding HTML Entities

The next example is a decoder for HTML entities. In HTML, special characters are encoded as entities. If you want a literal < or > in your document, you encode them as the entities < and >, respectively, to avoid conflict with the <tag> syntax used in HTML. HTML syntax is briefly described in Chapter 3 on page 32. Characters with codes above 127 like copyright and egrave are also encoded. There are named entities, like < for < and è for . You can also use decimalvalued entities such as © for . Finally, the trailing semicolon is optional, so &lt or < can both be used to encode <. The entity decoder is similar to Url_Decode. In this case, however, we need to be more careful with subst. The text passed to the decoder could contain special characters like a square bracket or dollar sign. With Url_Decode we can rely on those special characters being encoded as, for example, %24. Entity encoding is different (do not ask me why URLs and HTML have different encoding standards), and dollar signs and square brackets are not necessarily encoded. This requires an additional pass to quote these characters. This regsub puts a backslash in front of all the brackets, dollar signs, and backslashes.

regsub -all {[][$\\]} $text {\\&} new The decimal encoding (e.g., ©) is also more awkward than the hexadecimal encoding used in URLs. We cannot force a decimal interpretation of a number in Tcl. In particular, if the entity has a leading zero (e.g., 
) then Tcl interprets the value (e.g., 010) as octal. The scan command is used to do a decimal interpretation. It scans into a temporary variable, and set is used to get that value: regsub -all {&#([0-9][0-9]?[0-9]?);?} $new \ {[format %c [scan \1 %d tmp; set tmp]]} new With advanced regular expressions, this could be written as follows using bound quantifiers to specify one to three digits: regsub -all {&#(\d{1,3});?} $new \ {[format %c [scan \1 %d tmp;set tmp]]} new The named entities are converted with an array that maps from the entity names to the special character. The only detail is that unknown entity names (e.g., &foobar;) are not converted. This mapping is done inside HtmlMapEntity, which guards against invalid entities. regsub -all {&([a-zA-Z]+)(;?)} $new \ {[HtmlMapEntity \1 \\\2 ]} new If the input text contained: [x < y] then the regsub would transform this into: \[x [HtmlMapEntity lt \; ] y\] Finally, subst will result in: [x < y]

Example 11-8 Html_DecodeEntity.

proc Html_DecodeEntity {text} { if {![regexp & $text]} {return $text} regsub -all {[][$\\]}$text {\\&} new regsub -all {&#([0-9][0-9]?[0-9]?);?} $new {\ [format %c [scan \1 %d tmp;set tmp]]} new regsub -all {&([a-zA-Z]+)(;?)} $new \ {[HtmlMapEntity \1 \\\2 ]} new return [subst $new] } proc HtmlMapEntity {text {semi {}}} { global htmlEntityMap if {[info exist htmlEntityMap($text)]} { return $htmlEntityMap($text) } else { return $text$semi } } # Some of the htmlEntityMap array set htmlEntityMap { lt < gt > amp & aring \xe5 atilde \xe3 copy \xa9 ecirc \xea egrave \xe8 }

A Simple HTML Parser

The following example is the brainchild of Stephen Uhler. It uses regsub to transform HTML into a Tcl script. When it is evaluated the script calls a procedure to handle each tag in an HTML document. This provides a general framework for processing HTML. Different callback procedures can be applied to the tags to achieve different effects. For example, the html_library-0.3 package on the CD-ROM uses Html_Parse to display HTML in a Tk text widget. Example 11-9 Html_Parse. proc Html_Parse {html cmd {start {}}} { # Map braces and backslashes into HTML entities regsub -all \{ $html {\&ob;} html regsub -all \} $html {\&cb;} html regsub -all {\\} $html &bsl;} html # This pattern matches the parts of an HTML tag set s" \t\r\n" ;# white space set exp <(/?)(\[^$s>]+)\[$s]*(\[^>]*)> # This generates a call to cmd with HTML tag parts # \1 is the leading /, if any # \2 is the HTML tag name

# \3 is the parameters to the tag, if any # The curly braces at either end group of all the text # after the HTML tag, which becomes the last arg to $cmd. set sub "\}\n {\\2} {\\1} {\\3} \{" regsub -all $exp $html $sub html # This balances the curly braces, # and calls $cmd with $start as a pseudo-tag # at the beginning and end of the script. eval "$cmd {$start} {} {} {$html}" eval "$cmd {$start} / {} {}" } The main regsub pattern can be written more simply with advanced regular expressions: set exp {<(/?)(\S+?)\s*(.*?)>} An example will help visualize the transformation. Given this HTML: <Title>My Home Page</Title> <Body bgcolor=white text=black> <H1>My Home</H1> This is my <b>home</b> page. and a call to Html_Parse that looks like this: Html_Parse $html {Render .text}hmstart then the generated program is this: Render .text Render .text Render .text } Render .text } Render .text Render .text This is my } Render .text Render .text } Render .text {hmstart} {} {} {} {Title} {} {} {My Home Page} {Title} {/} {} { {Body} {} {bgcolor=white text=black} { {H1} {} {} {My Home} {H1} {/} {} { {b} {} {} {home} {b} {/} {} {page. {hmstart}/ {} {}

One overall point to make about this example is the difference between using eval and subst with the generated script. The decoders shown in Examples 11-5 and 11-8 use subst to selectively replace encoded characters while ignoring the rest of the text. In Html_Parse we must process all the text. The main trick is to replace the matching text (e.g., the HTML tag) with some Tcl code that ends in an open curly brace and starts with a close curly brace. This effectively groups all the unmatched text. When eval is used this way you must do something with any braces and backslashes in the unmatched text. Otherwise, the resulting script does not parse correctly. In this case, these special characters are encoded as HTML entities. We can afford to do this because the cmd that is called must deal with encoded entities already. It is not possible to quote these special characters with backslashes because all this text is inside curly braces, so no backslash substitution is performed. If you try that the backslashes will be seen by the cmd callback. Finally, I must admit that I am always surprised that this works: eval "$cmd {$start} {} {} {$html}" I always forget that $start and $html are substituted in spite of the braces. This is because double quotes are being used to group the argument, so the quoting effect of braces is turned off. Try this: set x hmstart set y "foo {$x}bar" => foo {hmstart}bar

Stripping HTML Comments

The Html_Parse procedure does not correctly handle HTML comments. The problem is that the syntax for HTML commands allows tags inside comments, so there can be > characters inside the comment. HTML comments are also used to hide Javascript inside pages, which can also contain >. We can fix this with a pass that eliminates the comments. The comment syntax is this:  Using nongreedy quantifiers, we can strip comments with a single regsub: regsub -all  $html {}html Using only greedy quantifiers, it is awkward to match the closing --> without getting stuck on embedded > characters, or without matching too much and going all the way to the end of the last comment. Time for another trick:

regsub -all --> $html \x81 html This replaces all the end comment sequences with a single character that is not allowed in HTML. Now you can delete the comments like this: regsub -all "<!--\[^\x81\]*\x81" $html {}html

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 11. Regular Expressions

Other Commands That Use Regular Expressions

Several Tcl commands use regular expressions. takes a -regexp flag so that you can search for list items that match a regular expression. The lsearch command is described on page 64.
lsearch

takes a -regexp flag, so you can branch based on a regular expression match instead of an exact match or a string match style match. The switch command is described on page 71.
switch

The Tk text widget can search its contents based on a regular expression match. Searching in the text widget is described on page 463. The Expect Tcl extension can match the output of a program with regular expressions. Expect is the subject of its own book, Exploring Expect (O'Reilly, 1995) by Don Libes.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Part II. Advanced Tcl

Chapter 12. Script Libraries and Packages

Collections of Tcl commands are kept in libraries and organized into packages. Tcl automatically loads libraries as an application uses their commands. Tcl commands discussed are: package, pkg_mkIndex, auto_mkindex , unknown , and tcl_findLibrary. Libraries group useful sets of Tcl procedures so that they can be used by multiple applications. For example, you could use any of the code examples that come with this book by creating a script library and then directing your application to check in that library for missing procedures. One way to structure a large application is to have a short main script and a library of support scripts. The advantage of this approach is that not all the Tcl code needs to be loaded to start the application. Applications start up quickly, and as new features are accessed, the code that implements them is loaded automatically. The Tcl package facility supports version numbers and has a provide/require model of use. Typically, each file in a library provides one package with a particular version number. Packages also work with shared object libraries that implement Tcl commands in compiled code, which are described in Chapter 44. A package can be provided by a combination of script files and object files. Applications specify which packages they require and the libraries are loaded automatically. The package facility is an alternative to the auto loading scheme used in earlier versions of Tcl. You can use either mechanism, and this chapter describes them both. If you create a package you may wish to use the namespace facility to avoid conflicts between procedures and global variables used in different packages. Namespaces are the topic of Chapter 14. Before Tcl 8.0 you had to use your own conventions to avoid conflicts. This chapter explains a simple coding convention for large Tcl programs. I use this convention in exmh, a mail user interface that has grown from about 2,000 to over 35,000 lines of Tcl code. A majority of the code has been contributed by the exmh user community. Such growth might not have been possible without coding conventions.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 12. Script Libraries and Packages

Locating Packages: The auto_path Variable

The package facility assumes that Tcl libraries are kept in well-known directories. The list of wellknown directories is kept in the auto_path Tcl variable. This is initialized by tclsh and wish to include the Tcl script library directory, the Tk script library directory (for wish), and the parent directory of the Tcl script library directory. For example, on my Macintosh auto_path is a list of these three directories: Disk:System Folder:Extensions:Tool Command Language:tcl8.2 Disk:System Folder:Extensions:Tool Command Language Disk:System Folder:Extensions:Tool Command Language:tk8.2 On my Windows 95 machine the auto_path lists these directories: c:\Program Files\Tcl\lib\Tcl8.2 c:\Program Files\Tcl\lib c:\Program Files\Tcl\lib\Tk8.2 On my UNIX workstation the auto_path lists these directories: /usr/local/tcl/lib/tcl8.2 /usr/local/tcl/lib /usr/local/tcl/lib/tk8.2 The package facility searches these directories and their subdirectories for packages. The easiest way to manage your own packages is to create a directory at the same level as the Tcl library: /usr/local/tcl/lib/welchbook

Packages in this location, for example, will be found automatically because the auto_path list includes /usr/local/tcl/lib. You can also add directories to the auto_path explicitly: lappend auto_path directory One trick I often use is to put the directory containing the main script into the auto_path. The following command sets this up: lappend auto_path [file dirname [info script]] If your code is split into bin and lib directories, then scripts in the bin directory can add the adjacent lib directory to their auto_path with this command: lappend auto_path \ [file join [file dirname [info script]] ../lib]

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 12. Script Libraries and Packages

Using Packages
Each script file in a library declares what package it implements with the package provide command: package provide name version The name identifies the package, and the version has a major.minor format. The convention is that the minor version number can change and the package implementation will still be compatible. If the package changes in an incompatible way, then the major version number should change. For example, Chapter 17 defines several procedures that use the HTTP network protocol. These include Http_Open, Http_Get, and Http_Validate. The file that contains the procedures starts with this command: package provide Http 1.0 Case is significant in package names. In particular, the package that comes with Tcl is named http ?all lowercase. More than one file can contribute to the same package simply by specifying the same name and version . In addition, different versions of the same package can be kept in the same directory but in different files. An application specifies the packages it needs with the package require command: package require name ?version? ?-exact? If the version is left off, then the highest available version is loaded. Otherwise the highest version with the same major number is loaded. For example, if the client requires version 1.1, version 1.2 could be loaded if it exists, but versions 1.0 and 2.0 would not be loaded. You can restrict the package to a specific version with the -exact flag. If no matching version can be found, then the package require command raises an error.

Loading Packages Automatically

The package require command depends on an index to record which files implement which packages. The index must be maintained by you, your project librarian, or your system administrator when packages change. The index is computed by the pkg_mkIndex command that puts the results into the pkgIndex.tcl file in each library directory. The pkg_mkIndex command takes the name of a directory and one or more glob patterns that specify files within that directory. File name patterns are described on page 115. The syntax is: pkg_mkIndex ?options? directory pattern ?pattern ...? For example: pkg_mkIndex /usr/local/lib/welchbook *.tcl pkg_mkIndex -direct /usr/local/lib/Sybtcl *.so The pkg_mkIndex command sources or loads all the files matched by the pattern, detects what packages they provide, and computes the index. You should be aware of this behavior because it works well only for libraries. If the pkg_mkIndex command hangs or starts random applications, it is because it sourced an application file instead of a library file. By default, the index created by pkg_mkIndex contains commands that set up the auto_index array used to automatically load commands when they are first used. This means that code does not get loaded when your script does a package require. If you want the package to be loaded right away, specify the -direct flag to pkg_mkIndex so that it creates an index file with source and load commands. The pkg_mkIndex options are summarized in Table 12-1.

Table 12-1. Options to the pkg_mkIndex command.

-direct -load pattern -verbose

Generates an index with source and load commands in it. This results in packages being loaded directly as a result of package require. Dynamically loads packages that match pattern into the slave interpreter used to compute the index. A common reason to need this is with the tcbload package needed to load .tbc files compiled with TclPro Compiler. Displays the name of each file processed and any errors that occur.

Packages Implemented in C Code

The files in a library can be either script files that define Tcl procedures or binary files in shared library format that define Tcl commands in compiled code (i.e., a Dynamic Link Library (DLL)). Chapter 44 describes how to implement Tcl commands in C. There is a C API to the package facility that you use to declare the package name for your commands. This is shown in Example 44-1 on page

610. Chapter 37 also describes the Tcl load command that is used instead of source to link in shared libraries. The pkg_mkIndex command also handles shared libraries: pkg_mkIndex directory *.tcl *.so *.shlib *.dll In this example, .so, .shlib, and .dll are file suffixes for shared libraries on UNIX, Macintosh, and Windows systems, respectively. You can have packages that have some of their commands implemented in C, and some implemented as Tcl procedures. The script files and the shared library must simply declare that they implement the same package. The pkg_mkIndex procedure will detect this and set up the auto_index, so some commands are defined by sourcing scripts, and some are defined by loading shared libraries. If your file servers support more than one machine architecture, such as Solaris and Linux systems, you probably keep the shared library files in machine-specific directories. In this case the auto_path should also list the machine-specific directory so that the shared libraries there can be loaded automatically. If your system administrator configured the Tcl installation properly, this should already be set up. If not, or you have your shared libraries in a nonstandard place, you must append the location to the auto_path variable.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 12. Script Libraries and Packages

Summary of Package Loading

The basic structure of package loading works like this: An application does a package require command. If the package is already loaded, the command just returns the version number of the already loaded package. If is not loaded, the following steps occur. The package facility checks to see if it knows about the package. If it does, then it runs the Tcl scripts registered with the package ifneeded command. These commands either load the package or set it up to be loaded automatically when its commands are first used. If the package is unknown, the tclPkgUnknown procedure is called to find it. Actually, you can specify what procedure to call to do the lookup with the package unknown command, but the standard one is tclPkgUnknown. The tclPkgUnknown procedure looks through the auto_path directories and their subdirectories for pkgIndex.tcl files. It sources those to build an internal database of packages and version information. The pkgIndex.tcl files contain calls to package ifneeded that specify what to do to define the package. The standard action is to call the tclPkgSetup procedure that sets up the auto_index so that the commands in the package will be automatically loaded. If you use direct with pkg_mkIndex, the script contains source and load commands instead. The tclPkgSetup procedure defines the auto_index array to contain the correct source or load commands to define each command in the package. Automatic loading and the auto_index array are described in more detail later. As you can see, there are several levels of processing involved in finding packages. The system is flexible enough that you can change the way packages are located and how packages are loaded. The default scenario is complicated because it uses the delayed loading of source code that is described in the next section. Using the -direct flag to pkg_mkIndex simplifies the situation somewhat. In any case it all boils down to three key steps:

Use pkg_mkIndex to maintain your index files. Decide at this time whether or not to use direct package loading.

Put the appropriate package require and package provide commands in your code. Ensure that your library directories, or their parent directories, are listed in the auto_path variable.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 12. Script Libraries and Packages

The package Command

The package command has several operations that are used primarily by the pkg_mkIndex procedure and the automatic loading facility. These operations are summarized in Table 12-2.

Table 12-2. The package command.

package forget package package ifneeded package ?command? package names package provide package version package require package ?version? ?-exact? package unknown ? command? package vcompare v1 v2 package versions package package vsatisfies v1 v2

Deletes registration information for package. Queries or sets the command used to set up automatic loading of a package. Returns the set of registered packages. Declares that a script file defines commands for package with the given version. Declares that a script uses package. The -exact flag specifies that the exact version must be loaded. Otherwise, the highest matching version is loaded. Queries or sets the command used to locate packages. Compares version v1 and v2. Returns 0 if they are equal, minus 1 if v1 is less than v2, or 1 if v1 is greater than v2. Returns which versions of the package are registered. Returns 1 if v1 is greater or equal to v2 and still has the same major version number. Otherwise returns 0.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 12. Script Libraries and Packages

Libraries Based on the tclIndex File

You can create libraries without using the package command. The basic idea is that a directory has a library of script files, and an index of the Tcl commands defined in the library is kept in a tclIndex file. The drawback is that versions are not supported and you may need to adjust the auto_path to list your library directory. The main advantage of this approach is that this mechanism has been part of Tcl since the earliest versions. If you currently maintain a library using tclIndex files, it will still work. You must generate the index that records what procedures are defined in the library. The auto_mkindex procedure creates the index, which is stored in a file named tclIndex that is kept in the script library directory. (Watch out for the difference in capitalization between auto_mkindex and pkg_mkIndex!) Suppose all the examples from this book are in the directory /usr/local/tcl/welchbook. You can make the examples into a script library by creating the tclIndex file: auto_mkindex /usr/local/tcl/welchbook *.tcl You will need to update the tclIndex file if you add procedures or change any of their names. A conservative approach to this is shown in the next example. It is conservative because it re-creates the index if anything in the library has changed since the tclIndex file was last generated, whether or not the change added or removed a Tcl procedure. Example 12-1 Maintaining a tclIndex file. proc Library_UpdateIndex { libdir } { set index [file join $libdir tclIndex] if {![file exists $index]} { set doit 1 } else { set age [file mtime $index] set doit 0 # Changes to directory may mean files were deleted if {[file mtime $libdir] > $age} {

set doit 1 } else { # Check each file for modification foreach file [glob [file join $libdir *.tcl]] { if {[file mtime $file] > $age} { set doit 1 break } } } } if { $doit } { auto_mkindex $libdir *.tcl } } Tcl uses the auto_path variable to record a list of directories to search for unknown commands. To continue our example, you can make the procedures in the book examples available by putting this command at the beginning of your scripts: lappend auto_path /usr/local/tcl/welchbook This has no effect if you have not created the tclIndex file. If you want to be extra careful, you can call Library_UpdateIndex. This will update the index if you add new things to the library. lappend auto_path /usr/local/tcl/welchbook Library_UpdateIndex /usr/local/tcl/welchbook This will not work if there is no tclIndex file at all because Tcl won't be able to find the implementation of Library_UpdateIndex. Once the tclIndex has been created for the first time, then this will ensure that any new procedures added to the library will be installed into tclIndex. In practice, if you want this sort of automatic update, it is wise to include something like the Library_UpdateIndex procedure directly into your application as opposed to loading it from the library it is supposed to be maintaining.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 12. Script Libraries and Packages

The unknown Command

Automatic loading of Tcl commands is implemented by the unknown command. Whenever the Tcl interpreter encounters a command that it does not know about, it calls the unknown command with the name of the missing command. The unknown command is implemented in Tcl, so you are free to provide your own mechanism to handle unknown commands. This chapter describes the behavior of the default implementation of unknown, which can be found in the init.tcl file in the Tcl library. The location of the library is returned by the info library command.

How Auto Loading Works

The unknown command uses an array named auto_index. One element of the array is defined for each procedure that can be automatically loaded. The auto_index array is initialized by the package mechanism or by tclIndex files. The value of an auto_index element is a command that defines the procedure. Typical commands are: source [file join $dir bind_ui.tcl] load [file join $dir mime.so] Mime The $dir gets substituted with the name of the directory that contains the library file, so the result is a source or load command that defines the missing Tcl command. The substitution is done with eval, so you could initialize auto_index with any commands at all. Example 12-2 is a simplified version of the code that reads the tclIndex file. Example 12-2 Loading a tclIndex file. # This is a simplified part of the auto_load_index procedure. # Go through auto_path from back to front. set i [expr [llength $auto_path]-1] for {} {$i >= 0} {incr i -1} { set dir [lindex $auto_path $i] if [catch {open [file join $dir tclIndex]} f] {

# No index continue } # eval the file as a script. Because eval is # used instead of source, an extra round of # substitutions is performed and $dir gets expanded # The real code checks for errors here. eval [read $f] close $f }

Disabling the Library Facility: auto_noload

If you do not want the unknown procedure to try and load procedures, you can set the auto_noload variable to disable the mechanism: set auto_noload anything Auto loading is quite fast. I use it regularly on applications both large and small. A large application will start faster if you only need to load the code necessary to start it up. As you access more features of your application, the code will load automatically. Even a small application benefits from auto loading because it encourages you to keep commonly used code in procedure libraries.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 12. Script Libraries and Packages

Interactive Conveniences
The unknown command provides a few other conveniences. These are used only when you are typing commands directly. They are disabled once execution enters a procedure or if the Tcl shell is not being used interactively. The convenience features are automatic execution of programs, command history, and command abbreviation. These options are tried, in order, if a command implementation cannot be loaded from a script library.

Auto Execute
The unknown procedure implements a second feature: automatic execution of external programs. This makes a Tcl shell behave more like other UNIX shells that are used to execute programs. The search for external programs is done using the standard PATH environment variable that is used by other shells to find programs. If you want to disable the feature all together, set the auto_noexec variable: set auto_noexec anything

History
The history facility described in Chapter 13 is implemented by the unknown procedure.

Abbreviations
If you type a unique prefix of a command, unknown recognizes it and executes the matching command for you. This is done after automatic program execution is attempted and history substitutions are performed.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 12. Script Libraries and Packages

Tcl Shell Library Environment

Tcl searches for its script library directory when it starts up. In early versions of Tcl you had to compile in the correct location, set a Windows registry value, or set the TCL_LIBRARY environment variable to the correct location. Recent versions of Tcl use a standard searching scheme to locate the script library. The search understands the standard installation and build environments for Tcl, and it should eliminate the need to use the TCL_LIBRARY environment variable. On Windows the search for the library used to depend on registry values, but this has also been discontinued in favor of a standard search. In summary, "it should just work." However, this section explains how Tcl finds its script library so that you can troubleshoot problems.

Locating the Tcl Script Library

The default library location is defined when you configure the source distribution, which is explained on page 644. At this time an initial value for the auto_path variable is defined. (This default value appears in tcl_pkgPath, but changing this variable has no effect once Tcl has started. I just pretend tcl_pkgPath does not exist.) These values are just hints; Tcl may use other directories depending on what it finds in the file system. When Tcl starts up, it searches for a directory that contains its init.tcl startup script. You can shortcircuit the search by defining the TCL_LIBRARY environment variable. If this is defined, Tcl uses it only for its script library directory. However, you should not need to define this with normal installations of Tcl 8.0.5 or later. In my environment I'm often using several different versions of Tcl for various applications and testing purposes, so setting TCL_LIBRARY is never correct for all possibilities. If I find myself setting this environment variable, I know something is wrong with my Tcl installations! The standard search starts with the default value that is compiled into Tcl (e.g., /usr/local/lib/tcl8.1.) After that, the following directories are examined for an init.tcl file. These example values assume Tcl version 8.1 and patch level 8.1.1: ../lib/tcl8.1 ../../lib/tcl8.1 ../library ../../tcl8.1.1/library ../../../tcl8.1.1/library

The first two directories correspond to the standard installation directories, while the last three correspond to the standard build environment for Tcl or Tk. The first directory in the list that contains a valid init.tcl file becomes the Tcl script library. This directory location is saved in the tcl_library global variable, and it is also returned by the info library command. The primary thing defined by init.tcl is the implementation of the unknown procedure. It also initializes auto_path to contain $tcl_library and the parent directory of $tcl_library. There may be additional directories added to auto_path depending on the compiled in value of tcl_pkgPath. tcl_findLibrary A generalization of this search is implemented by tcl_findLibrary. This procedure is designed for use by extensions like Tk and [incr Tcl]. Of course, Tcl cannot use tcl_findLibrary itself because it is defined in init.tcl! The tcl_findLibrary procedure searches relative to the location of the main program (e.g., tclsh or wish) and assumes a standard installation or a standard build environment. It also supports an override by an environment variable, and it takes care of sourcing an initialization script. The usage of tcl_findLibrary is: tcl_findLibrary base version patch script enVar varName The base is the prefix of the script library directory name. The version is the main version number (e.g., "8.0"). The patch is the full patch level (e.g., "8.0.3"). The script is the initialization script to source from the directory. The enVar names an environment variable that can be used to override the default search path. The varName is the name of a variable to set to name of the directory found by tcl_findLibrary. A side effect of tcl_findLibrary is to source the script from the directory. An example call is: tcl_findLibrary tk 8.0 8.0.3 tk.tcl TK_LIBRARY tk_library This call first checks to see whether TK_LIBRARY is defined in the environment. If so, it uses its value. Otherwise, it searches the following directories for a file named tk.tcl. It sources the script and sets the tk_library variable to the directory containing that file. The search is relative to the value returned by info nameofexecutable: ../lib/tk8.0 ../../lib/tk8.0 ../library ../../tk8.0.3/library ../../../tk8.0.3/library Tk also adds $tk_library to the end of auto_path, so the other script files in that directory are

available to the application: lappend auto_path $tk_library

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 12. Script Libraries and Packages

Coding Style
If you supply a package, you need to follow some simple coding conventions to make your library easier to use by other programmers. You can use the namespace facility introduced in Tcl 8.0. You can also use conventions to avoid name conflicts with other library packages and the main application. This section describes the conventions I developed before namespaces were added to Tcl.

A Module Prefix for Procedure Names

The first convention is to choose an identifying prefix for the procedures in your package. For example, the preferences package in Chapter 42 uses Pref as its prefix. All the procedures provided by the library begin with Pref. This convention is extended to distinguish between private and exported procedures. An exported procedure has an underscore after its prefix, and it is acceptable to call this procedure from the main application or other library packages. Examples include Pref_Add, Pref_Init, and Pref_Dialog. A private procedure is meant for use only by the other procedures in the same package. Its name does not have the underscore. Examples include PrefDialogItem and PrefXres. This naming convention precludes casual names like doit, setup, layout, and so on. Without using namespaces, there is no way to hide procedure names, so you must maintain the naming convention for all procedures in a package.

A Global Array for State Variables

You should use the same prefix on the global variables used by your package. You can alter the capitalization; just keep the same prefix. I capitalize procedure names and use lowercase letters for variables. By sticking with the same prefix you identify what variables belong to the package and you avoid conflict with other packages. Collect state in a global array.

In general, I try to use a single global array for a package. The array provides a convenient place to collect a set of related variables, much as a struct is used in C. For example, the preferences package uses the pref array to hold all its state information. It is also a good idea to keep the use of the array private. It is better coding practice to provide exported procedures than to let other modules access your data structures directly. This makes it easier to change the implementation of your package without affecting its clients. If you do need to export a few key variables from your module, use the underscore convention to distinguish exported variables. If you need more than one global variable, just stick with the prefix convention to avoid conflicts.

The Official Tcl Style Guide

John Ousterhout has published two programming style guides, one for C programming known as "The Engineering Manual" and one for Tcl scripts known as "The Style Guide". These describe details about file structure as well as naming conventions for modules, procedures, and variables. The Tcl Style Guide conventions use Tcl namespaces to separate packages. Namespaces automatically provide a way to avoid conflict between procedure names. Namespaces also support collections of variables without having to use arrays for grouping. You can find these style guides on the CD-ROM and also in ftp://ftp.scriptics.com/pub/tcl/doc. The Engineering Manual is distributed as a compressed tar file, engManual.tar.Z, that contains sample files as well as the main document. The Style Guide is distributed as styleGuide.ps (or .pdf).

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Part II. Advanced Tcl

Chapter 13. Reflection and Debugging

This chapter describes commands that give you a view into the interpreter. The history command and a simple debugger are useful during development and debugging. The info command provides a variety of information about the internal state of the Tcl interpreter. The time command measures the time it takes to execute a command. Tcl commands discussed are: clock, info, history, and time. Reflection provides feedback to a script about the internal state of the interpreter. This is useful in a variety of cases, from testing to see whether a variable exists to dumping the state of the interpreter. The info command provides lots of different information about the interpreter. The clock command is useful for formatting dates as well as parsing date and time values. It also provides high-resolution timer information for precise measurements. Interactive command history is the third topic of the chapter. The history facility can save you some typing if you spend a lot of time entering commands interactively. Debugging is the last topic. The old-fashioned approach of adding puts commands to your code is often quite useful. For tough problems, however, a real debugger is invaluable. The TclPro tools from Scriptics include a high quality debugger and static code checker. The tkinspect program is an inspector that lets you look into the state of a Tk application. It can hook up to any Tk application dynamically, so it proves quite useful.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 13. Reflection and Debugging

The clock Command

The clock command has facilities for getting the current time, formatting time values, and scanning printed time strings to get an integer time value. The clock command was added in Tcl 7.5. Table 131 summarizes the clock command:

Table 13-1. The clock command.

clock clicks clock format value ?-format str? clock scan string ?-base clock? ? -gmt boolean? clock seconds

A system-dependent high resolution counter. Formats a clock value according to str. Parses date string and return seconds value. The clock value determines the date. Returns the current time in seconds.

The following command prints the current time: clock format [clock seconds] => Sun Nov 24 14:57:04 1996 The clock seconds command returns the current time, in seconds since a starting epoch. The clock format command formats an integer value into a date string. It takes an optional argument that controls the format. The format strings contains % keywords that are replaced with the year, month, day, date, hours, minutes, and seconds, in various formats. The default string is: %a %b %d %H:%M:%S %Z %Y Tables 13-2 and 13-3 summarize the clock formatting strings:

Table 13-2. Clock formatting keywords.

%% %a %A %b %B %c %d %H %I %j %m %M %p %S %U %w %W %x %X %y %Y %Z

Inserts a %. Abbreviated weekday name (Mon, Tue, etc.). Full weekday name (Monday, Tuesday, etc.). Abbreviated month name (Jan, Feb, etc.). Full month name. Locale specific date and time (e.g., Nov 24 16:00:59 1996). Day of month (01 ?31). Hour in 24-hour format (00 ?23). Hour in 12-hour format (01 ?12). Day of year (001 ?366). Month number (01 ?12). Minute (00 ?59). AM/PM indicator. Seconds (00 ?59). Week of year (00 ?52) when Sunday starts the week. Weekday number (Sunday = 0). Week of year (01 ?52) when Monday starts the week. Locale specific date format (e.g., Feb 19 1997). Locale specific time format (e.g., 20:10:13). Year without century (00 ?99). Year with century (e.g. 1997). Time zone name. Table 13-3. UNIX-specific clock formatting keywords.

%D %e %h %n %r %R %t %T

Date as %m/%d/%y (e.g., 02/19/97). Day of month (1 ?31), no leading zeros. Abbreviated month name. Inserts a newline. Time as %I:%M:%S %p (e.g., 02:39:29 PM). Time as %H:%M (e.g., 14:39). Inserts a tab. Time as %H:%M:%S (e.g., 14:34:29).

The clock clicks command returns the value of the system's highest resolution clock. The units of the clicks are not defined. The main use of this command is to measure the relative time of different performance tuning trials. The following command counts the clicks per second over 10 seconds, which will vary from system to system: Example 13-1 Calculating clicks per second. set t1 [clock clicks] after 10000 ;# See page 218 set t2 [clock clicks] puts "[expr ($t2 - $t1)/10] Clicks/second" => 1001313 Clicks/second The clock scan command parses a date string and returns a seconds value. The command handles a variety of date formats. If you leave off the year, the current year is assumed. Year 2000 Compliance

Tcl implements the standard interpretation of two-digit year values, which is that 70?9 are 1970?999, 00?9 are 2000?069. Versions of Tcl before 8.0 did not properly deal with two-digit years in all cases. Note, however, that Tcl is limited by your system's time epoch and the number of bits in an integer. On Windows, Macintosh, and most UNIX systems, the clock epoch is January 1, 1970. A 32-bit integer can count enough seconds to reach forward into the year 2037, and backward to the year 1903. If you try to clock scan a date outside that range, Tcl will raise an error because the seconds counter will overflow or underflow. In this case, Tcl is just reflecting limitations of the underlying system. If you leave out a date, clock scan assumes the current date. You can also use the -base option to specify a date. The following example uses the current time as the base, which is redundant:

clock scan "10:30:44 PM" -base [clock seconds] => 2931690644 The date parser allows these modifiers: year, month, fortnight (two weeks), week, day, hour, minute, second. You can put a positive or negative number in front of a modifier as a multiplier. For example: clock format [clock scan "10:30:44 PM 1 week"] => Sun Dec 01 22:30:44 1996 clock format [clock scan "10:30:44 PM -1 week"] Sun Nov 17 22:30:44 1996 You can also use tomorrow, yesterday, today, now, last, this, next, and ago, as modifiers. clock format [clock scan "3 years ago"] => Wed Nov 24 17:06:46 1993 Both clock format and clock scan take a -gmt option that uses Greenwich Mean Time. Otherwise, the local time zone is used. clock format [clock seconds] -gmt true => Sun Nov 24 09:25:29 1996 clock format [clock seconds] -gmt false => Sun Nov 24 17:25:34 1996

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 13. Reflection and Debugging

The info Command

Table 13-4 summarizes the info command. The operations are described in more detail later.

Table 13-4. The info command.

info args procedure info body procedure info cmdcount info commands ? pattern? info complete string info default proc arg var info exists variable info globals ?pattern? info hostname info level info level number info library info loaded ?interp? info locals ?pattern?

A list of procedure's arguments. The commands in the body of procedure. The number of commands executed so far. A list of all commands, or those matching pattern. Includes built-ins and Tcl procedures. True if string contains a complete Tcl command. True if arg has a default parameter value in procedure proc. The default value is stored into var. True if variable is defined. A list of all global variables, or those matching pattern. The name of the machine. This may be the empty string if networking is not initialized. The stack level of the current procedure, or 0 for the global scope. A list of the command and its arguments at the specified level of the stack. The pathname of the Tcl library directory. A list of the libraries loaded into the interpreter named interp, which defaults to the current one. A list of all local variables, or those matching pattern.

info nameofexecutable info patchlevel info procs ?pattern? info script info sharedlibextension info tclversion info vars ?pattern?

The file name of the program (e.g., of tclsh or wish). The release patch level for Tcl. A list of all Tcl procedures, or those that match pattern. The name of the file being processed, or the empty string. The file name suffix of shared libraries. The version number of Tcl. A list of all visible variables, or those matching pattern.

Variables
There are three categories of variables: local, global, and visible. Information about these categories is returned by the locals, globals, and vars operations, respectively. The local variables include procedure arguments as well as locally defined variables. The global variables include all variables defined at the global scope. The visible variables include locals, plus any variables made visible via global or upvar commands. A pattern can be specified to limit the returned list of variables to those that match the pattern. The pattern is interpreted according to the rules of string match, which is described on page 48: info globals auto* => auto_index auto_noexec auto_path Namespaces, which are the topic of the next chapter, partition global variables into different scopes. You query the variables visible in a namespace with: info vars namespace::* Remember that a variable may not be defined yet even though a global or upvar command has declared it visible in the current scope. Use the info exists command to test whether a variable or an array element is defined or not. An example is shown on page 90.

Procedures
You can find out everything about a Tcl procedure with the args, body, and default operations. This is illustrated in the following Proc_Show example. The puts commands use the -nonewline flag because the newlines in the procedure body, if any, are retained: Example 13-2 Printing a procedure definition. proc Proc_Show {{namepat *}{file stdout}}{ foreach proc [info procs $namepat] {

set space "" puts -nonewline $file "proc $proc {" foreach arg [info args $proc] { if [info default $proc $arg value] { puts -nonewline $file "$space{$arg $value}" } else { puts -nonewline $file $space$arg } set space " " } # No newline needed because info body may return a # value that starts with a newline puts -nonewline $file "}{" puts -nonewline $file [info body $proc] puts $file "}" } } Example 13-3 is a more elaborate example of procedure introspection that comes from the direct.tcl file, which is part of the Tcl Web Server described in Chapter 18. This code is used to map URL requests and the associated query data directly into Tcl procedure calls. This is discussed in more detail on page 247. The Web server collects Web form data into an array called form. Example 13-3 matches up elements of the form array with procedure arguments, and it collects extra elements into an args parameter. If a form value is missing, then the default argument value or the empty string is used: Example 13-3 Mapping form data onto procedure arguments. # cmd is the name of the procedure to invoke # form is an array containing form values set cmdOrig $cmd set params [info args $cmdOrig] # Match elements of the form array to parameters foreach arg $params { if {![info exists form($arg)]} { if {[info default $cmdOrig $arg value]} { lappend cmd $value } elseif {[string compare $arg "args"] == 0} { set needargs yes } else { lappend cmd {} } } else { lappend cmd $form($arg)

} } # If args is a parameter, then append the form data # that does not match other parameters as extra parameters if {[info exists needargs]} { foreach {name value} $valuelist { if {[lsearch $params $name] < 0} { lappend cmd $name $value } } } # Eval the command set code [catch $cmd result] The info commands operation returns a list of all commands, which includes both built-in commands defined in C and Tcl procedures. There is no operation that just returns the list of built-in commands. Example 13-4 finds the built-in commands by removing all the procedures from the list of commands. Example 13-4 Finding built-in commands. proc Command_Info {{pattern *}}{ # Create a table of procedures for quick lookup foreach p [info procs $pattern] { set isproc($p) 1 } # Look for command not in the procedure table set result {} foreach c [info commands $pattern] { if {![info exists isproc($c)]}{ lappend result $c } } return [lsort $result] }

The Call Stack

The info level operation returns information about the Tcl evaluation stack, or call stack. The global level is numbered zero. A procedure called from the global level is at level one in the call stack. A procedure it calls is at level two, and so on. The info level command returns the current level number of the stack if no level number is specified. If a positive level number is specified (e.g., info level 3), then the command returns the procedure

name and argument values at that level in the call stack. If a negative level is specified, then it is relative to the current call stack. Relative level -1 is the level of the current procedure's caller, and relative level 0 is the current procedure. The following example prints the call stack. The Call_trace procedure avoids printing information about itself by starting at one less than the current call stack level: Example 13-5 Getting a trace of the Tcl call stack. proc Call_Trace {{file stdout}}{ puts $file "Tcl Call Trace" for {set x [expr [info level]-1]}{$x > 0}{incr x -1}{ puts $file "$x: [info level $x]" } }

Command Evaluation
If you want to know how many Tcl commands are executed, use the info cmdcount command. This counts all commands, not just top-level commands. The counter is never reset, so you need to sample it before and after a test run if you want to know how many commands are executed during a test. The info complete operation figures out whether a string is a complete Tcl command. This is useful for command interpreters that need to wait until the user has typed in a complete Tcl command before passing it to eval. Example 13-6 defines Command_Process that gets a line of input and builds up a command. When the command is complete, the command is executed at the global scope. Command_Process takes two callbacks as arguments. The inCmd is evaluated to get the line of input, and the outCmd is evaluated to display the results. Chapter 10 describes callbacks why the curly braces are used with eval as they are in this example: Example 13-6 A procedure to read and evaluate commands. proc Command_Process {inCmd outCmd}{ global command append command(line) [eval $inCmd] if [info complete $command(line)] { set code [catch {uplevel #0 $command(line)}result] eval $outCmd {$result $code} set command(line) {} } } proc Command_Read {{in stdin}}{ if [eof $in] { if {$in != "stdin"}{ close $in } return {} }

return [gets $in] } proc Command_Display {file result code}{ puts stdout $result } while {![eof stdin]}{ Command_Process {Command_Read stdin}\ {Command_Display stdout} }

Scripts and the Library

The name of the current script file is returned with the info script command. For example, if you use the source command to read commands from a file, then info script returns the name of that file if it is called during execution of the commands in that script. This is true even if the info script command is called from a procedure that is not defined in the script. Use info script to find related files.

I often use info script to source or process files stored in the same directory as the script that is running. A few examples are shown in Example 13-7. Example 13-7 Using info script to find related files. # Get the directory containing the current script. set dir [file dirname [info script]] # Source a file in the same directory source [file join $dir helper.tcl] # Add an adjacent script library directory to auto_path # The use of ../lib with file join is cross-platform safe. lappend auto_path [file join $dir ../lib] The pathname of the Tcl library is stored in the tcl_library variable, and it is also returned by the info library command. While you could put scripts into this directory, it might be better to have a separate directory and use the script library facility described in Chapter 12. This makes it easier to deal with new releases of Tcl and to package up your code if you want other sites to use it.

Version Numbers

Each Tcl release has a version number such as 7.4 or 8.0. This number is returned by the info tclversion command. If you want your script to run on a variety of Tcl releases, you may need to test the version number and take different actions in the case of incompatibilities between releases. The Tcl release cycle starts with one or two alpha and beta releases before the final release, and there may even be a patch release after that. The info patchlevel command returns a qualified version number, like 8.0b1 for the first beta release of 8.0. We switched from using "p" (e.g., 8.0p2) to a threelevel scheme (e.g., 8.0.3) for patch releases. The patch level is zero for the final release (e.g., 8.2.0). In general, you should be prepared for feature changes during the beta cycle, but there should only be bug fixes in the patch releases. Another rule of thumb is that the Tcl script interface remains quite compatible between releases; feature additions are upward compatible.

Execution Environment
The file name of the program being executed is returned with info nameofexecutable. This is more precise than the name in the argv0 variable, which could be a relative name or a name found in a command directory on your command search path. It is still possible for info nameofexecutable to return a relative pathname if the user runs your program as ./foo, for example. The following construct always returns the absolute pathname of the current program. If info nameofexecutable returns an absolute pathname, then the value of the current directory is ignored. The pwd command is described on page 115: file join [pwd] [info nameofexecutable] A few operations support dynamic loading of shared libraries, which are described in Chapter 44. The info sharedlibextension returns the file name suffix of dynamic link libraries. The info loaded command returns a list of libraries that have been loaded into an interpreter. Multiple interpreters are described in Chapter 19.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 13. Reflection and Debugging

Cross-Platform Support
Tcl is designed so that you can write scripts that run unchanged on UNIX, Macintosh, and Windows platforms. In practice, you may need a small amount of code that is specific to a particular platform. You can find out information about the platform via the tcl_platform variable. This is an array with these elements defined:
tcl_platform(platform)

is one of unix, macintosh, or windows.

tcl_platform(os) identifies the operating system. Examples include MacOS, Solaris , Linux, Win32s (Windows 3.1 with the Win32 subsystem), Windows 95, Windows NT, and SunOS. tcl_platform(osVersion) gives the version number of the operating system. tcl_platform(machine) identifies the hardware. Examples include ppc (Power (68000 family), sparc, intel, mips, and alpha. tcl_platform(isWrapped) indicates

PC), 68k

that the application has been wrapped up into a single executable with TclPro Wrapper. This is not defined in normal circumstances.
tcl_platform(user) gives the login name of the current user. tcl_platform(debug) indicates that Tcl was compiled with debugging symbols. tcl_platform(thread) indicates that Tcl was compiled with thread support enabled.

On some platforms a hostname is defined. If available, it is returned with the info hostname command. This command may return an empty string. One of the most significant areas affected by cross-platform portability is the file system and the way files are named. This topic is discussed on page 103.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 13. Reflection and Debugging

Tracing Variable Values

The trace command registers a command to be called whenever a variable is accessed, modified, or unset. This form of the command is: trace variable name ops command The name is a Tcl variable name, which can be a simple variable, an array, or an array element. If a whole array is traced, the trace is invoked when any element is used according to ops. The ops argument is one or more of the letters r, for read traces, w, for write traces, and u, for unset traces. The command is executed when one of these events occurs. It is invoked as: command name1 name2 op The name1 argument is the variable or array name. The name2 argument is the name of the array index, or null if the trace is on a simple variable. If there is an unset trace on an entire array and the array is unset, name2 is also null. The value of the variable is not passed to the procedure. The traced variable is one level up the Tcl call stack. The upvar, uplevel, or global commands need to be used to make the variable visible in the scope of command. These commands are described in more detail in Chapter 7. A read trace is invoked before the value of the variable is returned, so if it changes the variable itself, the new value is returned. A write trace is called after the variable is modified. The unset trace is called after the variable is unset.

Read-Only Variables
Example 13-8 uses traces to implement a read-only variable. A variable is modified before the trace procedure is called, so the ReadOnly variable is needed to preserve the original value. When a variable is unset, the traces are automatically removed, so the unset trace action reestablishes the trace explicitly. Note that the upvar alias (e.g., var) cannot be used to set up the trace:

Example 13-8 Tracing variables. proc ReadOnlyVar {varName}{ upvar 1 $varName var global ReadOnly set ReadOnly($varName) $var trace variable $varName wu ReadOnlyTrace } proc ReadOnlyTrace { varName index op }{ global ReadOnly upvar 1 $varName var switch $op { w { set var $ReadOnly($varName) } u { set var $ReadOnly($varName) # Re-establish the trace using the true name trace variable $varName wu ReadOnlyTrace } } } This example merely overrides the new value with the saved value. Another alternative is to raise an error with the error command. This will cause the command that modified the variable to return the error. Another common use of trace is to update a user interface widget in response to a variable change. Several of the Tk widgets have this feature built into them. If more than one trace is set on a variable, then they are invoked in the reverse order; the most recent trace is executed first. If there is a trace on an array and on an array element, then the trace on the array is invoked first.

Creating an Array with Traces

Example 13-9 uses an array trace to dynamically create array elements: Example 13-9 Creating array elements with array traces. # make sure variable is an array set dynamic() {} trace variable dynamic r FixupDynamic proc FixupDynamic {name index op}{ upvar 1 $name dynArray if {![info exists dynArray($index)]}{ set dynArray($index) 0 } }

Information about traces on a variable is returned with the vinfo option: trace vinfo dynamic => {r FixupDynamic} A trace is deleted with the vdelete option, which has the same form as the variable option. The trace in the previous example can be removed with the following command: trace vdelete dynamic r FixupDynamic

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 13. Reflection and Debugging

Interactive Command History

The Tcl shell programs keep a log of the commands that you type by using a history facility. The log is controlled and accessed via the history command. The history facility uses the term event to mean an entry in its history log. The events are just commands, and they have an event ID that is their index in the log. You can also specify an event with a negative index that counts backwards from the end of the log. Event -1 is the previous event. Table 13-5 summarizes the Tcl history command. In the table, event defaults to -1. In practice you will want to take advantage of the ability to abbreviate the history options and even the name of the history command itself. For the command, you need to type a unique prefix, and this depends on what other commands are already defined. For the options, there are unique one-letter abbreviations for all of them. For example, you could reuse the last word of the previous command with [hist w $]. This works because a $ that is not followed by alphanumerics or an open brace is treated as a literal $. Several of the history operations update the history list. They remove the actual history command and replace it with the command that resulted from the history operation. The event and redo operations all behave in this manner. This makes perfect sense because you would rather have the actual command in the history, instead of the history command used to retrieve the command.

Table 13-5. The history command.

history history add command ? exec? history change new ? event? history event ?event? history info ?count? history keep count history nextid history redo ?event?

Short for history info with no count. Adds the command to the history list. If exec is specified, then execute the command. Changes the command specified by event to new in the command history. Returns the command specified by event. Returns a formatted history list of the last count commands, or of all commands. Limits the history to the last count commands. Returns the number of the next event. Repeats the specified command.

History Syntax
Some extra syntax is supported when running interactively to make the history facility more convenient to use. Table 13-6 shows the special history syntax supported by tclsh and wish.

Table 13-6. Special history syntax.

!! !n !prefix !pattern ^old^new

Repeats the previous command. Repeats command number n.If n is negative it counts backward from the current command. The previous command is event -1. Repeats the last command that begins with prefix. Repeats the last command that matches pattern. Globally replaces old with new in the last command.

The next example shows how some of the history operations work: Example 13-10 Interactive history usage. % set a 5 5 % set a [expr $a+7] 12 % history 1 set a 5 2 set a [expr $a+7] 3 history % !2 19

% !! 26 % ^7^13 39 % !h 1 set a 5 2 set a [expr 3 history 4 set a [expr 5 set a [expr 6 set a [expr 7 history

$a+7] $a+7] $a+7] $a+13]

A Comparison to C Shell History Syntax

The history syntax shown in the previous example is simpler than the history syntax provided by the C shell. Not all of the history operations are supported with special syntax. The substitutions (using ^old^new) are performed globally on the previous command. This is different from the quick-history of the C shell. Instead, it is like the !:gs/old/new/ history command. So, for example, if the example had included ^a^b in an attempt to set b to 39, an error would have occurred because the command would have used b before it was defined: set b [expr $b+7] If you want to improve the history syntax, you will need to modify the unknown command, which is where it is implemented. This command is discussed in more detail in Chapter 12. Here is the code from the unknown command that implements the extra history syntax. The main limitation in comparison with the C shell history syntax is that the ! substitutions are performed only when ! is at the beginning of the command: Example 13-11 Implementing special history syntax. # Excerpts from the standard unknown command # uplevel is used to run the command in the right context if {$name == "!!"}{ set newcmd [history event] } elseif {[regexp {^!(.+)$}$name dummy event]}{ set newcmd [history event $event] } elseif {[regexp {^\^([^^]*)\^([^^]*)\^?$}$name x old new]}{ set newcmd [history event -1] catch {regsub -all -- $old $newcmd $new newcmd} } if {[info exists newcmd]}{ history change $newcmd 0 return [uplevel $newcmd] }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 13. Reflection and Debugging

Debugging
The rapid turnaround with Tcl coding means that it is often sufficient to add a few puts statements to your script to gain some insight about its behavior. This solution doesn't scale too well, however. A slight improvement is to add a Debug procedure that can have its output controlled better. You can log the information to a file, or turn it off completely. In a Tk application, it is simple to create a text widget to hold the contents of the log so that you can view it from the application. Here is a simple Debug procedure. To enable it you need to set the debug(enable) variable. To have its output go to your terminal, set debug(file) to stderr. Example 13-12 A Debug procedure. proc Debug { args }{ global debug if {![info exists debug(enabled)]}{ # Default is to do nothing return } puts $debug(file) [join $args " "] } proc DebugOn {{file {}}}{ global debug set debug(enabled) 1 if {[string length $file] == 0}{ set debug(file) stderr } else { if [catch {open $file w}fileID] { puts stderr "Cannot open $file: $fileID" set debug(file) stderr } else { puts stderr "Debug info to $file" set debug(file) $fileID } } }

proc DebugOff {}{ global debug if {[info exists debug(enabled)]}{ unset debug(enabled) flush $debug(file) if {$debug(file) != "stderr" && $debug(file) != "stdout"}{ close $debug(file) unset debug(file) } } }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 13. Reflection and Debugging

Scriptics' TclPro
Scriptics offers a commercial development environment for Tcl called TclPro. TclPro features an extended Tcl platform and a set of development tools. The Tcl platform includes the popular [incr Tcl], Expect, and TclX extensions. These extensions and Tcl/Tk are distributed in source and binary form for Windows and a variety of UNIX platforms. There is an evaluation copy of TclPro on the CDROM. The TclPro distribution includes a copy of Tcl/Tk and the extensions that you can use for free. However, you will need to register at the Scriptics web site to obtain an evaluation license for the TclPro development tools. Please visit the following URL: http://www.scriptics.com/registration/welchbook.html The current version of TclPro contains these tools:

TclPro Debugger
TclPro Debugger provides a nice graphical user interface with all the features you expect from a traditional debugger. You can set breakpoints, single step, examine variables, and look at the call stack. It understands a subtle issue that can arise from using the update command: nested call stacks. It is possible to launch a new Tcl script as a side effect of the update command, which pushes the current state onto the execution stack. This shows up clearly in the debugger stack trace. It maintains project state, so it will remember breakpoint settings and other preference items between runs. One of the most interesting features is that it can debug remotely running applications. I use it regularly to debug Tcl code running inside the Tcl Web Server.

TclPro Checker
TclPro Checker is a static code checker. This is a real win for large program development. It examines every line of your program looking for syntax errors and dubious coding practices. It has detailed knowledge of Tcl, Tk, Expect, [incr Tcl], and TclX commands and validates your use of them. It checks that you call Tcl procedures with the correct number of arguments, and can cross-check large groups of Tcl files. It knows about changes between Tcl versions, and it can warn you about old code that needs to be updated.

TclPro Compiler
TclPro Compiler is really just a reader and writer for the byte codes that the Tcl byte-code compiler generates internally. It lets you precompile scripts and save the results, and then load the byte-code later instead of raw source. This provides a great way to hide your source code, if that is important to you. It turns out to save less time than you might think, however. By the time it reads the file from disk, decodes it, and builds the necessary Tcl data structures, it is not much faster than reading a source file and compiling it on the fly.

TclPro Wrapper
TclPro Wrapper assembles a collection of Tcl scripts, data files, and a Tcl/Tk interpreter into a single executable file. This makes distribution of your Tcl application as easy as giving out one file. The Tcl C library has been augmented with hooks in its file system access routines so that a wrapped application can look inside itself for files. The rule is that if you use a relative pathname (i.e., lib/myfile.dat), then the wrapped application will look first inside itself for the file. If the file is not found, or if the pathname is absolute (e.g., /usr/local/lib/myfile.dat), then Tcl looks on your hard disk for the file. The nice thing about TclPro Wrapper is that it handles all kinds of files, not just Tcl source files. It works by concatenating a ZIP file onto the end of a specially prepared Tcl interpreter. TclPro comes with pre-built interpreters that include Expect, [incr Tcl], and TclX, or you can build your own interpreter that contains custom C extensions.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 13. Reflection and Debugging

Other Tools
The Tcl community has built many interesting and useful tools to help your Tcl development. Only two of them are mentioned below, but you can find many more at the Scriptics Tcl Resource Center: http://www.scriptics.com/resource/

The tkinspect Program

The tkinspect program is a Tk application that lets you look at the state of other Tk applications. It displays procedures, variables, and the Tk widget hierarchy. With tkinspect you can issue commands to another application in order to change variables or test out commands. This turns out to be a very useful way to debug Tk applications. It was written by Sam Shen and is available on the CD-ROM. The current FTP address for this is: ftp.neosoft.com:/pub/tcl/sorted/devel/tkinspect-5.1.6.tar.gz

The Tuba Debugger

Tuba is a debugger written purely in Tcl. It sets breakpoints by rewriting Tcl procedures to contain extra calls to the debugger. A small amount of support code is loaded into your application automatically, and the debugger application can set breakpoints, watch variables, and trace execution. It was written by John Stump and is available on the CD-ROM. The current URL for this package is: http://www.geocities.com/SiliconValley/Ridge/2549/tuba/

The bgerror Command

When a Tcl script encounters an error during background processing, such as handling file events or during the command associated with a button, it signals the error by calling the bgerror procedure. A default implementation displays a dialog and gives you an opportunity to view the Tcl call stack at the point of the error. You can supply your own version of bgerror. For example, when my exmh mail

application gets an error it offers to send mail to me with a few words of explanation from the user and a copy of the stack trace. I get interesting bug reports from all over the world! The bgerror command is called with one argument that is the error message. The global variable errorInfo contains the stack trace information. There is an example tkerror implementation in the on-line sources associated with this book.

The tkerror Command

The bgerror command used to be called tkerror. When event processing shifted from Tk into Tcl with Tcl 7.5 and Tk 4.1, the name tkerror was changed to bgerror. Backwards compatibility is provided so that if tkerror is defined, then tkerror is called instead of bgerror. I have run into problems with the compatibility setup and have found it more reliable to update my applications to use bgerror instead of tkerror . If you have an application that runs under either Tk 4.0 or Tk 4.1, you can simply define both: proc bgerror [info args tkerror] [info body tkerror]

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 13. Reflection and Debugging

Performance Tuning
The time command measures the execution time of a Tcl command. It takes an optional parameter that is a repetition count: time {set a "Hello, World!"}1000 => 28 microseconds per iteration If you need the result of the command being timed, use set to capture the result: puts $log "command: [time {set result [command]}]"

Time stamps in a Log

Another way to gain insight into the performance of your script is to generate log records that contain time stamps. The clock seconds value is too coarse, but you can couple it with the clock clicks value to get higher resolution measurements. Use the code shown in Example 13-1 on page 175 to calibrate the clicks per second on your system. Example 13-13 writes log records that contain the current time and the number of clicks since the last record. There will be occasional glitches in the clicks value when the system counter wraps around or is reset by the system clock, but it will normally give pretty accurate results. The Log procedure adds overhead, too, so you should take several measurements in a tight loop to see how long each Log call takes: Example 13-13 Time Stamps in log records. proc Log {args}{ global log if [info exists log(file)] { set now [clock clicks] puts $log(file) [format "%s (%d)\t%s" \ [clock format [clock seconds]] \

[expr $now - $log(last)] \ [join $args " "]] set log(last) $now } } proc Log_Open {file}{ global log catch {close $log(file)} set log(file) [open $file w] set log(last) [clock clicks] } proc Log_Flush {}{ global log catch {flush $log(file)} } proc Log_Close {}{ global log catch {close $log(file)} catch {unset log(file)} } A more advanced profile command is part of the Extended Tcl (TclX) package, which is described in Tcl/Tk Tools (Mark Harrison, ed., O'Reilly & Associates, Inc., 1997). The TclX profile command monitors the number of calls, the CPU time, and the elapsed time spent in different procedures.

The Tcl Compiler

The built-in Tcl compiler improves performance in the following ways: Tcl scripts are converted into an internal byte-code format that is efficient to process. The byte codes are saved so that cost of compiling is paid only the first time you execute a procedure or loop. After that, execution proceeds much faster. Compilation is done as needed, so unused code is never compiled. If you redefine a procedure, it is recompiled the next time it is executed. Variables and command arguments are kept in a native format as long as possible and converted to strings only when necessary. There are several native types, including integers, floating point numbers, Tcl lists, byte codes, and arrays. There are C APIs for implementing new types. Tcl is still dynamically typed, so a variable can contain different types during its lifetime. Expressions and control structures are compiled into special byte codes, so they are executed more efficiently. Because expr does its own round of substitutions, the compiler generates better code if you group expressions with braces. This means that expressions go through only one round of substitutions. The compiler can generate efficient code because it does not have to worry about strange code like: set subexpr {$x+$y} expr 5 * $subexpr

The previous expression is not fully defined until runtime, so it has to be parsed and executed each time it is used. If the expression is grouped with braces, then the compiler knows in advance what operations will be used and can generate byte codes to implement the expression more efficiently. The operation of the compiler is essentially transparent to scripts, but there are some differences in lists and expressions. These are described in Chapter 51. With lists, the good news is that large lists are more efficient. The problem is that lists are parsed more aggressively, so syntax errors at the end of a list will be detected even if you access only the beginning of the list. There were also some bugs in the code generator in the widely used Tcl 8.0p2 release. Most of these were corner cases like unbraced expressions in if and while commands. Most of these bugs were fixed in the 8.0.3 patch release, and the rest were cleaned up in Tcl 8.1 with the addition of a new internal parsing package.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Part II. Advanced Tcl

Chapter 14. Namespaces

Namespaces group procedures and variables into separate name spaces. Namespaces were added in Tcl 8.0. This chapter describes the namespace and variable commands. Namespaces provide new scopes for procedures and global variables. Originally Tcl had one global scope for shared variables, local scopes within procedures, and one global namespace for procedures. The single global scope for procedures and global variables can become unmanageable as your Tcl application grows. I describe some simple naming conventions on page 171 that I have used successfully in large programs. The namespace facility is a more elegant solution that partitions the global scope for procedure names and global variables. Namespaces help structure large Tcl applications, but they add complexity. In particular, command callbacks may have to be handled specially so that they execute in the proper namespace. You choose whether or not you need the extra structure and learning curve of namespaces. If your applications are small, then you can ignore the namespace facility. If you are developing library packages that others will use, you should pick a namespace for your procedures and data so that they will not conflict with the applications in which they are used.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

Using Namespaces
Namespaces add new syntax to procedure and variable names. A double colon, ::, separates the namespace name from the variable or procedure name. You use this syntax to reference procedures and variables in a different namespace. The namespace import command lets you name things in other namespaces without the extra syntax. Namespaces can be nested, so you can create a hierarchy of scopes. These concepts are explained in more detail in the rest of this chapter. One feature not provided by namespaces is any sort of protection, or a way to enforce access controls between different namespaces. This sort of thing is awkward, if not impossible, to provide in a dynamic language like Tcl. For example, you are always free to use namespace eval to reach into any other namespace. Instead of providing strict controls, namespaces are meant to provide structure that enables large scale programming. The package facility described in Chapter 12 was designed before namespaces. This chapter illustrates a style that ties the two facilities together, but they are not strictly related. It is possible to create a package named A that implements a namespace B, or to use a package without namespaces, or a namespace without a package. However, it makes sense to use the facilities together. Example 14-1 repeats the random number generator from Example 7-4 on page 85 using namespaces. The standard naming style conventions for namespaces use lowercase: Example 14-1 Random number generator using namespaces. package provide random 1.0 namespace eval random { # Create a variable inside the namespace variable seed [clock seconds] # Make the procedures visible to namespace import namespace export init random range # Create procedures inside the namespace proc init { value } {

variable seed set seed $value } proc random {} { variable seed set seed [expr ($seed*9301 + 49297) % 233280] return [expr $seed/double(233280)] } proc range { range } { expr int([random]*$range) } } Example 14-1 defines three procedures and a variable inside the namespace random. From inside the namespace, you can use these procedures and variables directly. From outside the namespace, you use the :: syntax for namespace qualifiers. For example, the state variable is just seed within the namespace, but you use random::seed to refer to the variable from outside the namespace. Using the procedures looks like this: random::random => 0.3993355624142661 random::range 10 => 4 If you use a package a lot you can import its procedures. A namespace declares what procedures can be imported with the namespace export command. Once you import a procedure, you can use it without a qualified name: namespace import random::random random => 0.54342849794238679 Importing and exporting are described in more detail later.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

Namespace Variables
The variable command defines a variable inside a namespace. It is like the set command because it can define a value for the variable. You can declare several namespace variables with one variable command. The general form is: variable name ?value? ?name value? ... If you have an array, do not assign a value in the variable command. Instead, use regular Tcl commands after you declare the variable. You can put any commands inside a namespace block: namespace eval foo { variable arr array set arr {name value name2 value2} } A namespace variable is similar to a global variable because it is outside the scope of any procedures. Procedures use the variable command or qualified names to reference namespace variables. For example, the random procedure has a variable command that brings the namespace variable into the current scope: variable seed If a procedure has a variable command that names a new variable, it is created in the namespace when it is first set. Watch out for conflicts with global variables.

You need to be careful when you use variables inside a namespace block. If you declare them with a variable command, they are clearly namespace variables. However, if you forget to declare them, then they will either become namespace variables, or latch onto an existing global variable by the same name. Consider the following code: namespace eval foo { variable table for {set i 1} {$i <= 256} {incr i} { set table($i) [format %c $i] } } If there is already a global variable i, then the for loop will use that variable. Otherwise, it will create the foo::i variable. I found this behavior surprising, but it does make it easier to access global variables like env without first declaring them with global inside the namespace block.

Qualified Names
A fully qualified name begins with ::, which is the name for the global namespace. A fully qualified name unambiguously names a procedure or a variable. The fully qualified name works anywhere. If you use a fully qualified variable name, it is not necessary to use a global command. For example, suppose namespace foo has a namespace variable x, and there is also a global variable x. The global variable x can be named with this: ::x The :: syntax does not affect variable substitutions. You can get the value of the global variable x with $::x. Name the namespace variable x with this: ::foo::x A partially qualified name does not have a leading ::. In this case the name is resolved from the current namespace. For example, the following also names the namespace variable x: foo::x You can use qualified names with global. Once you do this, you can access the variable with its short name:

global ::foo::x set x 5 Declaring variables is more efficient than using qualified names.

The Tcl byte-code compiler generates faster code when you declare namespace and global variables. Each procedure context has its own table of variables. The table can be accessed by a direct slot index, or by a hash table lookup of the variable name. The hash table lookup is slower than the direct slot access. When you use the variable or global command, then the compiler can use a direct slot access. If you use qualified names, the compiler uses the more general hash table lookup.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

Command Lookup
A command is looked up first in the current name space. If it is not found there, then it is looked up in the global namespace. This means that you can use all the built-in Tcl commands inside a namespace with no special effort. You can play games by redefining commands within a namespace. For example, a namespace could define a procedure named set. To get the built-in set you could use ::set, while set referred to the set defined inside namespace. Obviously you need to be quite careful when you do this. You can use qualified names when defining procedures. This eliminates the need to put the proc commands inside a namespace block. However, you still need to use namespace eval to create the namespace before you can create procedures inside it. Example 14-2 repeats the random number generator using qualified names. random::init does not need a variable command because it uses a qualified name for seed: Example 14-2 Random number generator using qualified names.

namespace eval random { # Create a variable inside the namespace variable seed [clock seconds] } # Create procedures inside the namespace proc random::init { seed } { set ::random::seed $seed } proc random::random {} { variable seed set seed [expr ($seed*9301 + 49297) % 233280] return [expr $seed/double(233280)] } proc random::range { range } { expr int([random]*$range) }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

Nested Namespaces
Namespaces can be nested inside other namespaces. Example 14-3 shows three namespaces that have their own specific variable x. The fully qualified names for these variables are ::foo::x, ::bar::x, and ::bar::foo::x. Example 14-3 Nested namespaces. namespace eval foo { variable x 1 ;# ::foo::x } namespace eval bar { variable x 2 ;# ::bar::x namespace foo { variable x 3 ;# ::bar::foo::x } puts $foo::x ;# prints 3 } puts $foo::x ;# prints 1 Partially qualified names can refer to two different objects.

In Example 14-3 the partially qualified name foo::x can reference one of two variables depending on the current namespace. From the global scope the name foo::x refers to the namespace variable x inside ::foo. From the ::bar namespace, foo::x refers to the variable x inside ::bar::foo. If you want to unambiguously name a variable in the current namespace, you have two choices. The simplest is to bring the variable into scope with the variable command:

variable x set x something If you need to give out the name of the variable, then you have two choices. The most general solution is to use the namespace current command to create a fully qualified name: trace variable [namespace current]::x r \ [namespace current]::traceproc However, it is simpler to just explicitly write out the namespace as in: trace variable ::myname::x r ::myname::traceproc The drawback of this approach is that it litters your code with references to ::myname::, which might be subject to change during program development.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

Importing and Exporting Procedures

Commands can be imported from namespaces to make it easier to name them. An imported command can be used without its namespace qualifier. Each namespace specifies exported procedures that can be the target of an import. Variables cannot be imported. Note that importing is only a convenience; you can always use qualified names to access any procedure. As a matter of style, I avoid importing names, so I know what package a command belongs to when I'm reading code. The namespace export command goes inside the namespace block, and it specifies what procedures a namespace exports. The specification is a list of string match patterns that are compared against the set of commands defined in a namespace. The export list can be defined before the procedures being exported. You can do more than one namespace export to add more procedures, or patterns, to the export list for a namespace. Use the -clear flag if you need to reset the export list. namespace export ?-clear? ?pat? ?pat? ... Only exported names appear in package indexes.

When you create the pkgIndex.tcl package index file with pkg_mkIndex, which is described Chapter 12, you should be aware that only exported names appear in the index. Because of this, I often resort to exporting everything. I never plan to import the names, but I do rely on automatic code loading based on the index files. This exports everything: namespace export * The namespace import command makes commands in another namespace visible in the current namespace. An import can cause conflicts with commands in the current namespace. The namespace

command raises an error if there is a conflict. You can override this with the -force option. The general form of the command is:
import

namespace import ?-force? namespace::pat ?namespace::pat?... The pat is a string match type pattern that is matched against exported commands defined in namespace. You cannot use patterns to match namespace. The namespace can be a fully or partially qualified name of a namespace. If you are lazy, you can import all procedures from a namespace: namespace import random::* The drawback of this approach is that random exports an init procedure, which might conflict with another module you import in the same way. It is safer to import just the procedures you plan on using: namespace import random::random random::range A namespace import takes a snapshot.

If the set of procedures in a namespace changes, or if its export list changes, then this has no effect on any imports that have already occurred from that namespace.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

Callbacks and Namespaces

Commands like after, bind, and button take arguments that are Tcl scripts that are evaluated later. These callback commands execute later in the global scope by default. If you want a callback to be evaluated in a particular namespace, you can construct the callback with namespace code. This command does not execute the callback. Instead, it generates a Tcl command that will execute in the current namespace scope when it is evaluated later. For example, suppose ::current is the current namespace. The namespace code command determines the current scope and adds that to the namespace inscope command it generates: set callback [namespace code {set x 1}] => namespace inscope ::current {set x 1} # sometime later ... eval $callback When you evaluate $callback later, it executes in the ::current namespace because of the namespace inscope command. In particular, if there is a namespace variable ::current::x , then that variable is modified. An alternative to using namespace code is to name the variable with a qualified name: set callback {set ::current::x 1} The drawback of this approach is that it makes it tedious to move the code to a different namespace. If you need substitutions to occur on the command when you define it, use list to construct it. Using list is discussed in more detail on pages 123 and 389. Example 14-4 wraps up the list and the namespace inscope into the code procedure, which is handy because you almost always want to use

list when

constructing callbacks. The uplevel in code ensures that the correct namespace is captured; you can use code anywhere: Example 14-4 The code procedure to wrap callbacks. proc code {args} { set namespace [uplevel {namespace current}] return [list namespace inscope $namespace $args] } namespace eval foo { variable y "y value" x {} set callback [code set x $y] => namespace inscope ::foo {set x {y value}} } The example defines a callback that will set ::foo::x to y value. If you want to set x to the value that y has at the time of the callback, then you do not want to do any substitutions. In that case, the original namespace code is what you want: set callback [namespace code {set x $y}] => namespace inscope ::foo {set x $y} If the callback has additional arguments added by the caller, namespace inscope correctly adds them. For example, the scrollbar protocol described on page 431 adds parameters to the callback that controls a scrollbar.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

Introspection
The info commands operation returns all the commands that are currently visible. It is described in more detail on page 179. You can limit the information returned with a string match pattern. You can also include a namespace specifier in the pattern to see what is visible in a namespace. Remember that global commands and imported commands are visible, so info commands returns more than just what is defined by the namespace. Example 14-5 uses namespace origin, which returns the original name of imported commands, to sort out the commands that are really defined in a namespace: Example 14-5 Listing commands defined by a namespace. proc Namespace_List {{namespace {}}} { if {[string length $namespace] == 0} { # Determine the namespace of our caller set namespace [uplevel {namespace current}] } set result {} foreach cmd [info commands ${namespace}::*] { if {[namespace origin $cmd] == $cmd} { lappend result $cmd } } return [lsort $result] }

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

The namespace Command

Table 14-1 summarizes the namespace operations:

Table 14-1. The namespace command.

namespace current namespace children ? name? ?pat? namespace code script

Returns the current namespace. Returns names of nested namespaces. name defaults to current namespace. pat is a string match pattern that limits what is returned. Generates a namespace inscope command that will eval script in the current namespace. Deletes the variables and commands from the specified namespaces. Concatenates args, if present, onto cmd and evaluates it in name namespace. Adds patterns to the export list for current namespace. Returns export list if no patterns. Undoes the import of names matching patterns. Adds the names matching the patterns to the current namespace. Appends args, if present, onto cmd as list elements and evaluates it in name namespace. Returns the original name of cmd. Returns the parent namespace of name, or of the current namespace.

namespace delete name ? name? ... namespace eval name cmd ? args? ... namespace export ?clear? ?pat? ?pat? ... namespace forget pat ? pat? ... namespace import ?force? pat ?pat? ... namespace inscope name cmd ?args? ... namespace origin cmd namespace parent ?name?

namespace qualifiers name namespace which ?flag? name namespace tail name

Returns the part of name up to the last :: in it. Returns the fully qualified version of name. The flag is one of command , -variable, or -namespace. Returns the last component of name.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

Converting Existing Packages to use Namespaces

Suppose you have an existing set of Tcl procedures that you want to wrap in a namespace. Obviously, you start by surrounding your existing code in a namespace eval block. However, you need to consider three things: global variables, exported procedures, and callbacks. Global variables remain global until you change your code to use variable instead of global. Some variables may make sense to leave at the global scope. Remember that the variables that Tcl defines are global, including env, tcl_platform, and the others listed in Table 2-2 on page 30. If you use the upvar #0 trick described on page 86, you can adapt this to namespaces by doing this instead: upvar #0 [namespace current]::$instance state Exporting procedures makes it more convenient for users of your package. It is not strictly necessary because they can always use qualified names to reference your procedures. An export list is a good hint about which procedures are expected to be used by other packages. Remember that the export list determines what procedures are visible in the index created by pkg_mkIndex. Callbacks execute at the global scope. If you use variable traces and variables associated with Tk widgets, these are also treated as global variables. If you want a callback to invoke a namespace procedure, or if you give out the name of a namespace variable, then you must construct fully qualified variable and procedure names. You can hardwire the current namespace: button .foo -command ::myname::callback \ -textvariable ::myname::textvar or you can use namespace current: button .foo -command [namespace current]::callback \ -textvariable [namespace current]::textvar

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

[incr Tcl] Object

System

The Tcl namespace facility does not provide classes and inheritance. It just provides new scopes and a way to hide procedures and variables inside a scope. There are Tcl C APIs that support hooks in variable name and command lookup for object systems so that they can implement classes and inheritance. By exploiting these interfaces, various object systems can be added to Tcl as shared libraries. The Tcl namespace facility was proposed by Michael McLennan based on his experiences with [incr Tcl], which is the most widely used object-oriented extension for Tcl. [incr Tcl] provides classes, inheritance, and protected variables and commands. If you are familiar with C++, [incr Tcl] should feel similar. A complete treatment of [incr Tcl] is not made in this book. Tcl/Tk Tools (Mark Harrison, O'Reilly & Associates, Inc., 1997) is an excellent source of information. You can find a version of [incr Tcl] on the CD-ROM. The [incr Tcl] home page is: http://www.tcltk.com/itcl/

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 14. Namespaces

Notes
The final section of this chapter touches on a variety of features of the namespace facility.

Names for Widgets, Images, and Interpreters

There are a number of Tcl extensions that are not affected by the namespaces described in this chapter, which apply only to commands and variable names. For example, when you create a Tk widget, a Tcl command is also created that corresponds to the Tk widget. This command is always created in the global command namespace even when you create the Tk widget from inside a namespace eval block. Other examples include Tcl interpreters, which are described in Chapter 19, and Tk images, which are described in Chapter 38.

The variable command at the global scope

It turns out that you can use variable like the global command if your procedures are not inside a namespace. This is consistent because it means "this variable belongs to the current namespace," which might be the global namespace.

Auto Loading and auto_import

The following sequence of commands can be used to import commands from the foo package: package require foo namespace import foo::* However, because of the default behavior of packages, there may not be anything that matches foo::* after the package require. Instead, there are entries in the auto_index array that will be used to load those procedures when you first use them. The auto loading mechanism is described in Chapter 12. To account for this, Tcl calls out to a hook procedure called auto_import. This default implementation of this procedure searches auto_index and forcibly loads any pending procedures that match the import

pattern. Packages like [incr Tcl] exploit this hook to implement more elaborate schemes. The auto_import hook was first introduced in Tcl 8.0.3.

Namespaces and uplevel

Namespaces affect the Tcl call frames just like procedures do. If you walk the call stack with info level, the namespace frames are visible. This means that you can get access to all variables with uplevel and upvar. Level #0 is still the absolute global scope, outside any namespace or procedure. Try out Call_Trace from Example 13-5 on page 180 on your code that uses namespaces to see the effect.

Naming Quirks
When you name a namespace, you are allowed to have extra colons at the end. You can also have two or more colons as the separator between namespace name components. These rules make it easier to assemble names by adding to the value returned from namespace current. These all name the same namespace: ::foo::bar ::foo::bar:: ::foo:::::::bar The name of the global namespace can be either :: or the empty string. This follows from the treatment of :: in namespace names. When you name a variable or command, a trailing :: is significant. In the following command a variable inside the ::foo::bar namespace is modified. The variable has an empty string for its name! set ::foo::bar:: 3 namespace eval ::foo::bar { set {} } => 3 If you want to embed a reference to a variable just before two colons, use a backslash to turn off the variable name parsing before the colons: set x xval set y $x\::foo => xval::foo

Miscellaneous
You can remove names you have imported:

namespace forget random::init You can rename imported procedures to modify their names: rename range Range You can even move a procedure into another namespace with rename: rename random::init myspace::init

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Part II. Advanced Tcl

Chapter 15. Internationalization

This chapter describes features that support text processing for different character sets such as ASCII and Japanese. Tcl can read and write data in various character set encodings, but it processes data in a standard character set called Unicode. Tcl has a message catalog that lets you generate different versions of an application for different languages. Tcl commands described are: encoding and msgcat. Different languages use different alphabets, or character sets. An encoding is a standard way to represent a character set. Tcl hides most of the issues associated with encodings and character sets, but you need to be aware of them when you write applications that are used in different countries. You can also write an application using a message catalog so that the strings you display to users can be in the language of their choice. Using a message catalog is more work, but Tcl makes it as easy as possible. Most of the hard work in dealing with character set encodings is done "under the covers" by the Tcl C library. The Tcl C library underwent substantial changes to support international character sets. Instead of using 8-bit bytes to store characters, Tcl uses a 16-bit character set called Unicode, which is large enough to encode the alphabets of all languages. There is also plenty of room left over to represent special characters like and . In spite of all the changes to support Unicode, there are few changes visible to the Tcl script writer. Scripts written for Tcl 8.0 and earlier continue to work fine with Tcl 8.1 and later versions. You only need to modify scripts if you want to take advantage of the features added to support internationalization. This chapter begins with a discussion of what a character set is and why different codings are used to represent them. It concludes with a discussion of message catalogs.

Top

Practical Programming in Tcl & Tk, Third Edition By Brent B. Welch

Table of Contents

Chapter 15. Internationalization

Character Sets and Encodings

If you are from the United States, you've probably never thought twice about character sets. Most computers use the ASCII encoding, which has 127 characters. That is enough for the 26 letters in the English alphabet, upper case and lower case, plus numbers, various punctuation characters, and control characters like tab and newline. ASCII fits easily in 8-bit characters, which can represent 256 different values. European alphabets include accented characters like , , and . The ISO Latin-1 encoding is a superset of ASCII that encodes 256 characters. It shares the ASCII encoding in values 0 through 127 and uses the "high half" of the encoding space to represent accented characters as well as special characters like . There are several ISO Latin encodings to handle different alphabets, and these share the trick of encoding ASCII in the lower half and other characters in the high half. You might see these encodings referred